# Backup Strategy ## 🚨 Current Status: CRITICAL GAPS IDENTIFIED This document outlines the backup strategy for the homelab infrastructure. **As of 2025-12-22, there are significant gaps in backup coverage that need to be addressed.** ## Executive Summary ### What We Have ✅ - **Syncthing**: File synchronization across 5+ devices - **ZFS on TrueNAS**: Copy-on-write filesystem with snapshot capability (not yet configured) - **Proxmox**: Built-in backup capabilities (not yet configured) ### What We DON'T Have 🚨 - ❌ No documented VM/CT backups - ❌ No ZFS snapshot schedule - ❌ No offsite backups - ❌ No disaster recovery plan - ❌ No tested restore procedures - ❌ No configuration backups **Risk Level**: HIGH - A catastrophic failure could result in significant data loss. --- ## Current State Analysis ### Syncthing (File Synchronization) **What it is**: Real-time file sync across devices **What it is NOT**: A backup solution | Folder | Devices | Size | Protected? | |--------|---------|------|------------| | documents | Mac Mini, MacBook, TrueNAS, Windows PC, Phone | 11 GB | ⚠️ Sync only | | downloads | Mac Mini, TrueNAS | 38 GB | ⚠️ Sync only | | pictures | Mac Mini, MacBook, TrueNAS, Phone | Unknown | ⚠️ Sync only | | notes | Mac Mini, MacBook, TrueNAS, Phone | Unknown | ⚠️ Sync only | | config | Mac Mini, MacBook, TrueNAS | Unknown | ⚠️ Sync only | **Limitations**: - ❌ Accidental deletion → deleted everywhere - ❌ Ransomware/corruption → spreads everywhere - ❌ No point-in-time recovery - ❌ No version history (unless file versioning enabled - not documented) **Verdict**: Syncthing provides redundancy and availability, NOT backup protection. ### ZFS on TrueNAS (Potential Backup Target) **Current Status**: ❓ Unknown - snapshots may or may not be configured **Needs Investigation**: ```bash # Check if snapshots exist ssh truenas 'zfs list -t snapshot' # Check if automated snapshots are configured ssh truenas 'cat /etc/cron.d/zfs-auto-snapshot' || echo "Not configured" # Check snapshot schedule via TrueNAS API/UI ``` **If configured**, ZFS snapshots provide: - ✅ Point-in-time recovery - ✅ Protection against accidental deletion - ✅ Fast rollback capability - ⚠️ Still single location (no offsite protection) ### Proxmox VM/CT Backups **Current Status**: ❓ Unknown - no backup jobs documented **Needs Investigation**: ```bash # Check backup configuration ssh pve 'pvesh get /cluster/backup' # Check if any backups exist ssh pve 'ls -lh /var/lib/vz/dump/' ssh pve2 'ls -lh /var/lib/vz/dump/' ``` **Critical VMs Needing Backup**: | VM/CT | VMID | Priority | Notes | |-------|------|----------|-------| | TrueNAS | 100 | 🔴 CRITICAL | All storage lives here | | Saltbox | 101 | 🟡 HIGH | Media stack, complex config | | homeassistant | 110 | 🟡 HIGH | Home automation config | | gitea-vm | 300 | 🟡 HIGH | Git repositories | | pihole | 200 | 🟢 MEDIUM | DNS config (easy to rebuild) | | traefik | 202 | 🟢 MEDIUM | Reverse proxy config | | trading-vm | 301 | 🟡 HIGH | AI trading platform | | lmdev1 | 111 | 🟢 LOW | Development (ephemeral) | --- ## Recommended Backup Strategy ### Tier 1: Local Snapshots (IMPLEMENT IMMEDIATELY) **ZFS Snapshots on TrueNAS** Schedule automatic snapshots for all datasets: | Dataset | Frequency | Retention | |---------|-----------|-----------| | vault/documents | Every 15 min | 1 hour | | vault/documents | Hourly | 24 hours | | vault/documents | Daily | 30 days | | vault/documents | Weekly | 12 weeks | | vault/documents | Monthly | 12 months | **Implementation**: ```bash # Via TrueNAS UI: Storage → Snapshots → Add # Or via CLI: ssh truenas 'zfs snapshot vault/documents@daily-$(date +%Y%m%d)' ``` **Proxmox VM Backups** Configure weekly backups to local storage: ```bash # Create backup job via Proxmox UI: # Datacenter → Backup → Add # - Schedule: Weekly (Sunday 2 AM) # - Storage: local-zfs or nvme-mirror1 # - Mode: Snapshot (fast) # - Retention: 4 backups ``` **Or via CLI**: ```bash ssh pve 'pvesh create /cluster/backup --schedule "sun 02:00" --storage local-zfs --mode snapshot --prune-backups keep-last=4' ``` ### Tier 2: Offsite Backups (CRITICAL GAP) **Option A: Cloud Storage (Recommended)** Use **rclone** or **restic** to sync critical data to cloud: | Provider | Cost | Pros | Cons | |----------|------|------|------| | Backblaze B2 | $6/TB/mo | Cheap, reliable | Egress fees | | AWS S3 Glacier | $4/TB/mo | Very cheap storage | Slow retrieval | | Wasabi | $6.99/TB/mo | No egress fees | Minimum 90-day retention | **Implementation Example (Backblaze B2)**: ```bash # Install on TrueNAS ssh truenas 'pkg install rclone restic' # Configure B2 rclone config # Follow prompts for B2 # Daily backup critical folders 0 3 * * * rclone sync /mnt/vault/documents b2:homelab-backup/documents --transfers 4 ``` **Option B: Offsite TrueNAS Replication** - Set up second TrueNAS at friend/family member's house - Use ZFS replication to sync snapshots - Requires: Static IP or Tailscale, trust **Option C: USB Drive Rotation** - Weekly backup to external USB drive - Rotate 2-3 drives (one always offsite) - Manual but simple ### Tier 3: Configuration Backups **Proxmox Configuration** ```bash # Backup /etc/pve (configs are already in cluster filesystem) # But also backup to external location: ssh pve 'tar czf /tmp/pve-config-$(date +%Y%m%d).tar.gz /etc/pve /etc/network/interfaces /etc/systemd/system/*.service' # Copy to safe location scp pve:/tmp/pve-config-*.tar.gz ~/Backups/proxmox/ ``` **VM-Specific Configs** - Traefik configs: `/etc/traefik/` on CT 202 - Saltbox configs: `/srv/git/saltbox/` on VM 101 - Home Assistant: `/config/` on VM 110 **Script to backup all configs**: ```bash #!/bin/bash # Save as ~/bin/backup-homelab-configs.sh DATE=$(date +%Y%m%d) BACKUP_DIR=~/Backups/homelab-configs/$DATE mkdir -p $BACKUP_DIR # Proxmox configs ssh pve 'tar czf -' /etc/pve /etc/network > $BACKUP_DIR/pve-config.tar.gz ssh pve2 'tar czf -' /etc/pve /etc/network > $BACKUP_DIR/pve2-config.tar.gz # Traefik ssh pve 'pct exec 202 -- tar czf -' /etc/traefik > $BACKUP_DIR/traefik-config.tar.gz # Saltbox ssh saltbox 'tar czf -' /srv/git/saltbox > $BACKUP_DIR/saltbox-config.tar.gz # Home Assistant ssh pve 'qm guest exec 110 -- tar czf -' /config > $BACKUP_DIR/homeassistant-config.tar.gz echo "Configs backed up to $BACKUP_DIR" ``` --- ## Disaster Recovery Scenarios ### Scenario 1: Single VM Failure **Impact**: Medium **Recovery Time**: 30-60 minutes 1. Restore from Proxmox backup: ```bash ssh pve 'qmrestore /path/to/backup.vma.zst VMID' ``` 2. Start VM and verify 3. Update IP if needed ### Scenario 2: TrueNAS Failure **Impact**: CATASTROPHIC (all storage lost) **Recovery Time**: Unknown - NO PLAN **Current State**: 🚨 NO RECOVERY PLAN **Needed**: - Offsite backup of critical datasets - Documented ZFS pool creation steps - Share configuration export ### Scenario 3: Complete PVE Server Failure **Impact**: SEVERE **Recovery Time**: 4-8 hours **Current State**: ⚠️ PARTIALLY RECOVERABLE **Needed**: - VM backups stored on TrueNAS or PVE2 - Proxmox reinstall procedure - Network config documentation ### Scenario 4: Complete Site Disaster (Fire/Flood) **Impact**: TOTAL LOSS **Recovery Time**: Unknown **Current State**: 🚨 NO RECOVERY PLAN **Needed**: - Offsite backups (cloud or physical) - Critical data prioritization - Restore procedures --- ## Action Plan ### Immediate (Next 7 Days) - [ ] **Audit existing backups**: Check if ZFS snapshots or Proxmox backups exist ```bash ssh truenas 'zfs list -t snapshot' ssh pve 'ls -lh /var/lib/vz/dump/' ``` - [ ] **Enable ZFS snapshots**: Configure via TrueNAS UI for critical datasets - [ ] **Configure Proxmox backup jobs**: Weekly backups of critical VMs (100, 101, 110, 300) - [ ] **Test restore**: Pick one VM, back it up, restore it to verify process works ### Short-term (Next 30 Days) - [ ] **Set up offsite backup**: Choose provider (Backblaze B2 recommended) - [ ] **Install backup tools**: rclone or restic on TrueNAS - [ ] **Configure daily cloud sync**: Critical folders to cloud storage - [ ] **Document restore procedures**: Step-by-step guides for each scenario ### Long-term (Next 90 Days) - [ ] **Implement monitoring**: Alerts for backup failures - [ ] **Quarterly restore test**: Verify backups actually work - [ ] **Backup rotation policy**: Automate old backup cleanup - [ ] **Configuration backup automation**: Weekly cron job --- ## Monitoring & Validation ### Backup Health Checks ```bash # Check last ZFS snapshot ssh truenas 'zfs list -t snapshot -o name,creation -s creation | tail -5' # Check Proxmox backup status ssh pve 'pvesh get /cluster/backup-info/not-backed-up' # Check cloud sync status (if using rclone) ssh truenas 'rclone ls b2:homelab-backup | wc -l' ``` ### Alerts to Set Up - Email alert if no snapshot created in 24 hours - Email alert if Proxmox backup fails - Email alert if cloud sync fails - Weekly backup status report --- ## Cost Estimate **Monthly Backup Costs**: | Component | Cost | Notes | |-----------|------|-------| | Local storage (already owned) | $0 | Using existing TrueNAS | | Proxmox backups (local) | $0 | Using existing storage | | Cloud backup (1 TB) | $6-10/mo | Backblaze B2 or Wasabi | | **Total** | **~$10/mo** | Minimal cost for peace of mind | **One-time**: - External USB drives (3x 4TB) | ~$300 | Optional, for rotation backup --- ## Related Documentation - [STORAGE.md](STORAGE.md) - ZFS pool layouts and capacity - [VMS.md](VMS.md) - VM inventory and prioritization - [DISASTER-RECOVERY.md](#) - Recovery procedures (coming soon) --- **Last Updated**: 2025-12-22 **Status**: 🚨 CRITICAL GAPS - IMMEDIATE ACTION REQUIRED