Files
homelab-docs/BACKUP-STRATEGY.md
Hutson 56b82df497 Complete Phase 2 documentation: Add HARDWARE, SERVICES, MONITORING, MAINTENANCE
Phase 2 documentation implementation:
- Created HARDWARE.md: Complete hardware inventory (servers, GPUs, storage, network cards)
- Created SERVICES.md: Service inventory with URLs, credentials, health checks (25+ services)
- Created MONITORING.md: Health monitoring recommendations, alert setup, implementation plan
- Created MAINTENANCE.md: Regular procedures, update schedules, testing checklists
- Updated README.md: Added all Phase 2 documentation links
- Updated CLAUDE.md: Cleaned up to quick reference only (1340→377 lines)

All detailed content now in specialized documentation files with cross-references.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-23 00:34:21 -05:00

9.6 KiB

Backup Strategy

🚨 Current Status: CRITICAL GAPS IDENTIFIED

This document outlines the backup strategy for the homelab infrastructure. As of 2025-12-22, there are significant gaps in backup coverage that need to be addressed.

Executive Summary

What We Have

  • Syncthing: File synchronization across 5+ devices
  • ZFS on TrueNAS: Copy-on-write filesystem with snapshot capability (not yet configured)
  • Proxmox: Built-in backup capabilities (not yet configured)

What We DON'T Have 🚨

  • No documented VM/CT backups
  • No ZFS snapshot schedule
  • No offsite backups
  • No disaster recovery plan
  • No tested restore procedures
  • No configuration backups

Risk Level: HIGH - A catastrophic failure could result in significant data loss.


Current State Analysis

Syncthing (File Synchronization)

What it is: Real-time file sync across devices What it is NOT: A backup solution

Folder Devices Size Protected?
documents Mac Mini, MacBook, TrueNAS, Windows PC, Phone 11 GB ⚠️ Sync only
downloads Mac Mini, TrueNAS 38 GB ⚠️ Sync only
pictures Mac Mini, MacBook, TrueNAS, Phone Unknown ⚠️ Sync only
notes Mac Mini, MacBook, TrueNAS, Phone Unknown ⚠️ Sync only
config Mac Mini, MacBook, TrueNAS Unknown ⚠️ Sync only

Limitations:

  • Accidental deletion → deleted everywhere
  • Ransomware/corruption → spreads everywhere
  • No point-in-time recovery
  • No version history (unless file versioning enabled - not documented)

Verdict: Syncthing provides redundancy and availability, NOT backup protection.

ZFS on TrueNAS (Potential Backup Target)

Current Status: Unknown - snapshots may or may not be configured

Needs Investigation:

# Check if snapshots exist
ssh truenas 'zfs list -t snapshot'

# Check if automated snapshots are configured
ssh truenas 'cat /etc/cron.d/zfs-auto-snapshot' || echo "Not configured"

# Check snapshot schedule via TrueNAS API/UI

If configured, ZFS snapshots provide:

  • Point-in-time recovery
  • Protection against accidental deletion
  • Fast rollback capability
  • ⚠️ Still single location (no offsite protection)

Proxmox VM/CT Backups

Current Status: Unknown - no backup jobs documented

Needs Investigation:

# Check backup configuration
ssh pve 'pvesh get /cluster/backup'

# Check if any backups exist
ssh pve 'ls -lh /var/lib/vz/dump/'
ssh pve2 'ls -lh /var/lib/vz/dump/'

Critical VMs Needing Backup:

VM/CT VMID Priority Notes
TrueNAS 100 🔴 CRITICAL All storage lives here
Saltbox 101 🟡 HIGH Media stack, complex config
homeassistant 110 🟡 HIGH Home automation config
gitea-vm 300 🟡 HIGH Git repositories
pihole 200 🟢 MEDIUM DNS config (easy to rebuild)
traefik 202 🟢 MEDIUM Reverse proxy config
trading-vm 301 🟡 HIGH AI trading platform
lmdev1 111 🟢 LOW Development (ephemeral)

Tier 1: Local Snapshots (IMPLEMENT IMMEDIATELY)

ZFS Snapshots on TrueNAS

Schedule automatic snapshots for all datasets:

Dataset Frequency Retention
vault/documents Every 15 min 1 hour
vault/documents Hourly 24 hours
vault/documents Daily 30 days
vault/documents Weekly 12 weeks
vault/documents Monthly 12 months

Implementation:

# Via TrueNAS UI: Storage → Snapshots → Add
# Or via CLI:
ssh truenas 'zfs snapshot vault/documents@daily-$(date +%Y%m%d)'

Proxmox VM Backups

Configure weekly backups to local storage:

# Create backup job via Proxmox UI:
# Datacenter → Backup → Add
# - Schedule: Weekly (Sunday 2 AM)
# - Storage: local-zfs or nvme-mirror1
# - Mode: Snapshot (fast)
# - Retention: 4 backups

Or via CLI:

ssh pve 'pvesh create /cluster/backup --schedule "sun 02:00" --storage local-zfs --mode snapshot --prune-backups keep-last=4'

Tier 2: Offsite Backups (CRITICAL GAP)

Option A: Cloud Storage (Recommended)

Use rclone or restic to sync critical data to cloud:

Provider Cost Pros Cons
Backblaze B2 $6/TB/mo Cheap, reliable Egress fees
AWS S3 Glacier $4/TB/mo Very cheap storage Slow retrieval
Wasabi $6.99/TB/mo No egress fees Minimum 90-day retention

Implementation Example (Backblaze B2):

# Install on TrueNAS
ssh truenas 'pkg install rclone restic'

# Configure B2
rclone config  # Follow prompts for B2

# Daily backup critical folders
0 3 * * * rclone sync /mnt/vault/documents b2:homelab-backup/documents --transfers 4

Option B: Offsite TrueNAS Replication

  • Set up second TrueNAS at friend/family member's house
  • Use ZFS replication to sync snapshots
  • Requires: Static IP or Tailscale, trust

Option C: USB Drive Rotation

  • Weekly backup to external USB drive
  • Rotate 2-3 drives (one always offsite)
  • Manual but simple

Tier 3: Configuration Backups

Proxmox Configuration

# Backup /etc/pve (configs are already in cluster filesystem)
# But also backup to external location:
ssh pve 'tar czf /tmp/pve-config-$(date +%Y%m%d).tar.gz /etc/pve /etc/network/interfaces /etc/systemd/system/*.service'

# Copy to safe location
scp pve:/tmp/pve-config-*.tar.gz ~/Backups/proxmox/

VM-Specific Configs

  • Traefik configs: /etc/traefik/ on CT 202
  • Saltbox configs: /srv/git/saltbox/ on VM 101
  • Home Assistant: /config/ on VM 110

Script to backup all configs:

#!/bin/bash
# Save as ~/bin/backup-homelab-configs.sh

DATE=$(date +%Y%m%d)
BACKUP_DIR=~/Backups/homelab-configs/$DATE

mkdir -p $BACKUP_DIR

# Proxmox configs
ssh pve 'tar czf -' /etc/pve /etc/network > $BACKUP_DIR/pve-config.tar.gz
ssh pve2 'tar czf -' /etc/pve /etc/network > $BACKUP_DIR/pve2-config.tar.gz

# Traefik
ssh pve 'pct exec 202 -- tar czf -' /etc/traefik > $BACKUP_DIR/traefik-config.tar.gz

# Saltbox
ssh saltbox 'tar czf -' /srv/git/saltbox > $BACKUP_DIR/saltbox-config.tar.gz

# Home Assistant
ssh pve 'qm guest exec 110 -- tar czf -' /config > $BACKUP_DIR/homeassistant-config.tar.gz

echo "Configs backed up to $BACKUP_DIR"

Disaster Recovery Scenarios

Scenario 1: Single VM Failure

Impact: Medium Recovery Time: 30-60 minutes

  1. Restore from Proxmox backup:
    ssh pve 'qmrestore /path/to/backup.vma.zst VMID'
    
  2. Start VM and verify
  3. Update IP if needed

Scenario 2: TrueNAS Failure

Impact: CATASTROPHIC (all storage lost) Recovery Time: Unknown - NO PLAN

Current State: 🚨 NO RECOVERY PLAN Needed:

  • Offsite backup of critical datasets
  • Documented ZFS pool creation steps
  • Share configuration export

Scenario 3: Complete PVE Server Failure

Impact: SEVERE Recovery Time: 4-8 hours

Current State: ⚠️ PARTIALLY RECOVERABLE Needed:

  • VM backups stored on TrueNAS or PVE2
  • Proxmox reinstall procedure
  • Network config documentation

Scenario 4: Complete Site Disaster (Fire/Flood)

Impact: TOTAL LOSS Recovery Time: Unknown

Current State: 🚨 NO RECOVERY PLAN Needed:

  • Offsite backups (cloud or physical)
  • Critical data prioritization
  • Restore procedures

Action Plan

Immediate (Next 7 Days)

  • Audit existing backups: Check if ZFS snapshots or Proxmox backups exist

    ssh truenas 'zfs list -t snapshot'
    ssh pve 'ls -lh /var/lib/vz/dump/'
    
  • Enable ZFS snapshots: Configure via TrueNAS UI for critical datasets

  • Configure Proxmox backup jobs: Weekly backups of critical VMs (100, 101, 110, 300)

  • Test restore: Pick one VM, back it up, restore it to verify process works

Short-term (Next 30 Days)

  • Set up offsite backup: Choose provider (Backblaze B2 recommended)

  • Install backup tools: rclone or restic on TrueNAS

  • Configure daily cloud sync: Critical folders to cloud storage

  • Document restore procedures: Step-by-step guides for each scenario

Long-term (Next 90 Days)

  • Implement monitoring: Alerts for backup failures

  • Quarterly restore test: Verify backups actually work

  • Backup rotation policy: Automate old backup cleanup

  • Configuration backup automation: Weekly cron job


Monitoring & Validation

Backup Health Checks

# Check last ZFS snapshot
ssh truenas 'zfs list -t snapshot -o name,creation -s creation | tail -5'

# Check Proxmox backup status
ssh pve 'pvesh get /cluster/backup-info/not-backed-up'

# Check cloud sync status (if using rclone)
ssh truenas 'rclone ls b2:homelab-backup | wc -l'

Alerts to Set Up

  • Email alert if no snapshot created in 24 hours
  • Email alert if Proxmox backup fails
  • Email alert if cloud sync fails
  • Weekly backup status report

Cost Estimate

Monthly Backup Costs:

Component Cost Notes
Local storage (already owned) $0 Using existing TrueNAS
Proxmox backups (local) $0 Using existing storage
Cloud backup (1 TB) $6-10/mo Backblaze B2 or Wasabi
Total ~$10/mo Minimal cost for peace of mind

One-time:

  • External USB drives (3x 4TB) | ~$300 | Optional, for rotation backup


Last Updated: 2025-12-22 Status: 🚨 CRITICAL GAPS - IMMEDIATE ACTION REQUIRED