Introduction
This guide details a robust backup solution for Raspberry Pi systems, featuring Docker container management, service handling, and smart retention policies. The solution uses rsync over SSH to create efficient, incremental backups to a Synology NAS.
Core Features & Technical Analysis
Service and Container Management
# Smart handling of critical services
CRITICAL_SERVICES=(
"mariadb"
"postgres"
)
# Docker container handling
CRITICAL_CONTAINERS=(
"immich_postgres"
"immich_redis"
"immich_server"
)
Code language: PHP (php)
Technical Details:
- Proactive service state detection
- Docker container pause/unpause instead of stop/start for faster recovery
- State restoration regardless of backup outcome
- Graceful failure handling with detailed logging
Backup Integrity
# Critical files monitored
CRITICAL_FILES=(
"/etc/passwd"
"/etc/shadow"
"/etc/fstab"
"/boot/config.txt"
)
Code language: PHP (php)
Verification Process:
- SHA256 checksum generation for critical files
- Remote verification after transfer
- Automatic cleanup of temporary verification files
- Intelligent handling of empty critical files list
Resource Management
# Performance optimization
nice -n 19 ionice -c2 -n7 sudo rsync
Code language: PHP (php)
Benefits:
- System remains responsive during backup
- I/O prioritization prevents resource starvation
- Safe for production environments
Smart Retention
# Retention configuration
WEEKLY_RETENTION=4 # Last 4 weeks
MONTHLY_RETENTION=6 # Last 6 months
YEARLY_RETENTION=2 # Last 2 years
Code language: PHP (php)
Implementation Details:
- Hard-link based retention for space efficiency
- Automatic cleanup of expired backups
- Calendar-aware rotation (weekly on Sundays, monthly on 1st, yearly on Jan 1st)
Error Handling & Logging
# Comprehensive error states
case $exit_code in
0) return 0 ;; # Success
23) log_message "WARNING: Some files were not transferred" ;;
12) log_message "ERROR: rsync connection failed" ;;
*) log_message "ERROR: Backup failed with code $exit_code" ;;
esac
Code language: PHP (php)
Implementation Guide
Initial Setup
- Create SSH Infrastructure:
# Generate backup-specific key
sudo ssh-keygen -t ed25519 -f /root/.ssh/id_ed25519_backup -N ""
# Configure Synology NAS (choose one):
# Option 1: Append to existing keys
echo "PASTE_YOUR_PUBLIC_KEY" >> /var/services/homes/[username]/.ssh/authorized_keys
# Option 2: Create new authorized_keys (caution!)
echo "PASTE_YOUR_PUBLIC_KEY" > /var/services/homes/[username]/.ssh/authorized_keys
Code language: PHP (php)
- Prepare Backup Location:
# On Synology NAS
mkdir -p /volume1/RPi-archive/[hostname]/current
chmod 700 /volume1/RPi-archive/[hostname]
Code language: PHP (php)
Script Installation
- Deploy Script:
sudo cp backup-script.sh /usr/local/sbin/rpi-system-backup.sh
sudo chmod 700 /usr/local/sbin/rpi-system-backup.sh
- Configure Automation:
# Add to root's crontab
sudo crontab -e
# Example: Run at 2 AM every Sunday
0 2 * * 0 /usr/local/sbin/rpi-system-backup.sh
Code language: PHP (php)
Recovery Procedures
Current Backup Restore
# Mount target SD card
sudo mount /dev/sdX2 /mnt/restore
sudo mount /dev/sdX1 /mnt/restore/boot
# Restore using rsync
sudo rsync -aAXv --delete \
user@nas:/volume1/RPi-archive/hostname/current/ /mnt/restore/
Code language: PHP (php)
Historical Backup Restore
# From weekly backup
sudo rsync -aAXv --delete \
user@nas:/volume1/RPi-archive/hostname/weekly/2024-W03/ /mnt/restore/
# From monthly backup
sudo rsync -aAXv --delete \
user@nas:/volume1/RPi-archive/hostname/monthly/2024-01/ /mnt/restore/
Code language: PHP (php)
Known Limitations & Solutions
ACL Handling
Recent testing revealed ACL permission issues. Two solutions:
- Disable ACL transfer (recommended):
RSYNC_OPTS="-aAX --delete --timeout=120 --no-specials --copy-unsafe-links --partial --quiet --no-acls"
Code language: JavaScript (javascript)
- Configure NAS ACL support:
sudo synoacltool -enable /volume1/RPi-archive
sudo chmod -R 770 /volume1/RPi-archive
Impact of disabling ACLs:
- Basic permissions (rwx) preserved
- Special access controls lost but can be manually restored
- Suitable for most Pi deployments
Other Considerations
- Network Dependency
- Backup fails if network unreachable
- Implement timeout handling
- Consider local staging
- Storage Requirements
- Space verification before backup
- Hard links minimize space usage
- Automatic cleanup of old backups
- Security
- Dedicated SSH key
- Limited NAS user permissions
- Secure key storage
Best Practices
Monitoring
# Check backup logs
sudo tail -f /var/log/rpi-system-backup.log
# Verify backup integrity
sudo sha256sum -c /tmp/backup_checksums
Code language: PHP (php)
Testing
- Regular dry runs:
sudo /usr/local/sbin/rpi-system-backup.sh --dry-run
- Periodic test restores
- Service state verification
the script
#!/bin/bash
set -e
set -u
# Configuration
HOSTNAME=$(hostname)
REMOTE_HOST="root"
LOG="/var/log/rpi-system-backup.log"
LOG_MAX_SIZE=10M
LOG_BACKUP_COUNT=7
LOCK_TIMEOUT=3600
REMOTE_PATH="/volume1/RPi-archive/${HOSTNAME}"
# Backup exclusions
EXCLUDE_LIST=(
"/proc"
"/sys"
"/dev"
"/tmp"
"/run"
"/lost+found"
"/var/log"
"/var/cache/apt/archives"
"/home/*/.cache"
"/media"
"/mnt"
# Docker specifieke excludes
"/var/lib/docker/overlay2"
"/var/lib/docker/containers"
"/var/lib/docker/volumes"
"/var/lib/docker/tmp"
"/var/lib/docker/**/*.sock"
# Database en applicatie specifieke excludes
"/home/*/unifi/data/data/db" # UniFi database directory
"/home/*/unifi/data/db" # Alternatieve UniFi database locatie
# Systeem specifieke excludes
"/var/run/systemd/inaccessible"
"/var/run/user/*/systemd/inaccessible"
"/var/run/**/*.sock"
"/opt/pivpn/scripts"
"/usr/local/src/pivpn/scripts"
# Swapfile exclude
"/swapfile"
"/var/swap"
)
# Services to pause during backup
# Add your critical services here, for example:
# CRITICAL_SERVICES=(
# "mariadb"
# "postgres"
# )
CRITICAL_SERVICES=()
# Docker containers to pause
# Add your critical containers here, for example:
#CRITICAL_CONTAINERS=(
# media containers
# "immich_postgres"
# "immich_redis"
# "immich_server"
# network containers
# "netmngr-app-1"
# "unifi-controller"
# "adguardhome"
# "grafana"
# "prometheus"
# "cadvisor"
# "uptime-kuma"
# "ddclient"
# "portainer"
# )
CRITICAL_CONTAINERS=()
# Retention configuration (alleen weekly nodig)
WEEKLY_RETENTION=10
# Performance settings
DRY_RUN=false
NICE_LEVEL=19
IO_CLASS="idle"
# Lock file
LOCK_FILE="/tmp/rpi_system_backup.lock"
# Logging function with rotation
log_message() {
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
echo "$timestamp - $1" | tee -a "$LOG"
}
log_debug() {
if [ "$DRY_RUN" = true ]; then
log_message "DEBUG: $1"
fi
}
# Log rotation check
rotate_logs() {
if [ -f "$LOG" ]; then
local size
size=$(stat -f%z "$LOG" 2>/dev/null || stat -c%s "$LOG" 2>/dev/null)
local max_size
max_size=$(numfmt --from=iec "$LOG_MAX_SIZE")
if [ "${size:-0}" -gt "$max_size" ]; then
for i in $(seq $((LOG_BACKUP_COUNT-1)) -1 1); do
[ -f "${LOG}.$i" ] && mv "${LOG}.$i" "${LOG}.$((i+1))"
done
mv "$LOG" "${LOG}.1"
touch "$LOG"
fi
fi
}
# Lock management
check_lock() {
if [ -f "$LOCK_FILE" ]; then
local pid
pid=$(cat "$LOCK_FILE" 2>/dev/null || echo "")
if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
local lock_age
lock_age=$(($(date +%s) - $(stat -c %Y "$LOCK_FILE")))
if [ "$lock_age" -gt "$LOCK_TIMEOUT" ]; then
log_message "Lock file is stale (age: ${lock_age}s). Removing."
rm -f "$LOCK_FILE"
else
log_message "Another backup process is running (PID: $pid)"
exit 1
fi
fi
fi
echo $$ > "$LOCK_FILE"
}
# Cleanup function
cleanup() {
handle_services start
rm -f "$LOCK_FILE"
log_message "Cleanup completed"
}
# Handle services
handle_services() {
local action=$1
local failed=0
log_message "${action^}ing critical services and containers"
# System services
for service in "${CRITICAL_SERVICES[@]}"; do
if systemctl is-active --quiet "$service"; then
if ! systemctl "$action" "$service"; then
log_message "Failed to $action $service"
failed=1
fi
fi
done
# Docker containers
if command -v docker >/dev/null 2>&1; then
for container in "${CRITICAL_CONTAINERS[@]}"; do
if docker ps -q -f name="$container" >/dev/null; then
if [ "$action" = "stop" ]; then
if ! docker inspect --format '{{.State.Paused}}' "$container" | grep -q "true"; then
docker pause "$container" || failed=1
fi
else
if docker inspect --format '{{.State.Paused}}' "$container" | grep -q "true"; then
docker unpause "$container" || failed=1
fi
fi
fi
done
fi
return $failed
}
# Check available space
check_space() {
log_message "Checking free space..."
local required_space
required_space=$(du -sx --exclude=/proc --exclude=/sys --exclude=/dev / 2>/dev/null | awk '{print $1}')
local available_space
if ! available_space=$(ssh "$REMOTE_HOST" "df -k '${REMOTE_PATH}' | tail -1 | awk '{print \$4}'"); then
log_message "ERROR: Failed to check remote space"
return 1
fi
if [ -z "$available_space" ] || [ -z "$required_space" ]; then
log_message "ERROR: Failed to determine space requirements"
return 1
fi
if [ "$available_space" -lt "$required_space" ]; then
log_message "ERROR: Insufficient space. Required: ${required_space}KB, Available: ${available_space}KB"
return 1
fi
log_message "Space check passed (${available_space}KB available)"
return 0
}
# Initialize backup structure
initialize_backup_structure() {
log_message "Initializing backup directory structure..."
local init_command="mkdir -p '${REMOTE_PATH}/current' '${REMOTE_PATH}/weekly' && chown Erik:users '${REMOTE_PATH}/current' '${REMOTE_PATH}/weekly' && chmod 755 '${REMOTE_PATH}/current' '${REMOTE_PATH}/weekly'"
log_debug "Executing: $init_command"
if ! ssh "$REMOTE_HOST" "$init_command" 2>&1; then
log_message "ERROR: Failed to initialize backup structure"
return 1
fi
return 0
}
# Generate exclude parameters
generate_exclude_params() {
local exclude_params=""
for item in "${EXCLUDE_LIST[@]}"; do
exclude_params+="--exclude=${item} "
done
echo "$exclude_params"
}
# Perform backup
perform_backup() {
log_message "Starting backup process..."
if ! initialize_backup_structure; then
return 1
fi
local exclude_params
exclude_params=$(generate_exclude_params)
local rsync_opts=(
-aAX # Archive mode met extra flags voor system backup
--delete # Delete extraneous files
--numeric-ids # Don't map uid/gid values
--partial # Keep partially transferred files
--timeout=300 # Increase timeout to 5 minutes
--rsync-path="/usr/bin/rsync" # Explicit rsync path on remote
--ignore-errors # Continue on errors
--force # Force deletion of dirs even if not empty
--delete-missing-args # Delete files that are missing on sender
)
# Add link-dest if previous backup exists
if ssh "$REMOTE_HOST" "[ -d '${REMOTE_PATH}/weekly' ] && [ ! -z \"\$(ls -A '${REMOTE_PATH}/weekly')\" ]"; then
local latest_weekly
latest_weekly=$(ssh "$REMOTE_HOST" "ls -1t '${REMOTE_PATH}/weekly' | head -n1")
latest_weekly=$(ssh root "ls -1t '${REMOTE_PATH}/weekly' | head -n1")
if [ -n "$latest_weekly" ]; then
log_debug "Found previous backup: weekly/${latest_weekly}"
rsync_opts+=("--link-dest=${REMOTE_PATH}/weekly/${latest_weekly}")
fi
fi
# Add verbose options for dry run
if [ "$DRY_RUN" = true ]; then
rsync_opts+=(
"--dry-run"
"--verbose"
"--itemize-changes"
)
fi
# Execute rsync
local rsync_command="nice -n ${NICE_LEVEL} ionice -c2 -n7 rsync ${rsync_opts[*]} $exclude_params / ${REMOTE_HOST}:'${REMOTE_PATH}/current/'"
log_message "Executing rsync command: $rsync_command"
if ! eval "$rsync_command" 2> >(tee -a "$LOG" >&2); then
log_message "ERROR: rsync failed"
return 1
fi
log_message "rsync completed successfully"
return 0
}
# Create snapshot of the current backup
create_snapshot() {
local target_dir="$1"
local retention="$2"
local snapshot_type="$3"
log_message "Creating ${snapshot_type} snapshot in ${target_dir}"
# Voer de rsync uit op de remote host zelf met sudo om root rechten te krijgen
if ! ssh "$REMOTE_HOST" "sudo rsync -a \
-H \
-x \
--delete \
--numeric-ids \
--inplace \
--partial \
--modify-window=1 \
--link-dest='${REMOTE_PATH}/current' \
'${REMOTE_PATH}/current/' \
'${target_dir}/'"
then
log_message "ERROR: Failed to create ${snapshot_type} snapshot"
return 1
fi
# Cleanup old snapshots (ook met sudo)
local cleanup_command="cd \"\$(dirname '${target_dir}')\" && ls -1t | tail -n +$((retention + 1)) | xargs -r sudo rm -rf"
log_debug "Executing cleanup command: $cleanup_command"
if ! ssh "$REMOTE_HOST" "$cleanup_command" 2>&1; then
log_message "ERROR: Failed to cleanup old ${snapshot_type} snapshots"
return 1
fi
return 0
}
# Main execution
main() {
log_message "Starting backup process"
rotate_logs
check_lock
trap cleanup EXIT INT TERM
if ! check_space; then
exit 1
fi
if ! handle_services stop; then
log_message "Failed to stop services"
handle_services start
exit 1
fi
if perform_backup; then
local week=$(date +%Y-W%V)
local weekly_target="${REMOTE_PATH}/weekly/${week}"
if ! create_snapshot "$weekly_target" "$WEEKLY_RETENTION" "weekly"; then
log_message "ERROR: Snapshot creation failed"
exit 1
fi
log_message "Backup process completed successfully"
else
log_message "ERROR: Backup process failed"
exit 1
fi
}
# Parse command line arguments
while [[ $# -gt 0 ]]; do
case $1 in
--dry-run)
DRY_RUN=true
shift
;;
*)
log_message "Unknown parameter: $1"
exit 1
;;
esac
done
# Execute main
if [ "$DRY_RUN" = true ]; then
log_message "Performing dry run..."
fi
main
exit 0
Code language: PHP (php)