Phase 2 documentation implementation: - Created HARDWARE.md: Complete hardware inventory (servers, GPUs, storage, network cards) - Created SERVICES.md: Service inventory with URLs, credentials, health checks (25+ services) - Created MONITORING.md: Health monitoring recommendations, alert setup, implementation plan - Created MAINTENANCE.md: Regular procedures, update schedules, testing checklists - Updated README.md: Added all Phase 2 documentation links - Updated CLAUDE.md: Cleaned up to quick reference only (1340→377 lines) All detailed content now in specialized documentation files with cross-references. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
9.6 KiB
Backup Strategy
🚨 Current Status: CRITICAL GAPS IDENTIFIED
This document outlines the backup strategy for the homelab infrastructure. As of 2025-12-22, there are significant gaps in backup coverage that need to be addressed.
Executive Summary
What We Have ✅
- Syncthing: File synchronization across 5+ devices
- ZFS on TrueNAS: Copy-on-write filesystem with snapshot capability (not yet configured)
- Proxmox: Built-in backup capabilities (not yet configured)
What We DON'T Have 🚨
- ❌ No documented VM/CT backups
- ❌ No ZFS snapshot schedule
- ❌ No offsite backups
- ❌ No disaster recovery plan
- ❌ No tested restore procedures
- ❌ No configuration backups
Risk Level: HIGH - A catastrophic failure could result in significant data loss.
Current State Analysis
Syncthing (File Synchronization)
What it is: Real-time file sync across devices What it is NOT: A backup solution
| Folder | Devices | Size | Protected? |
|---|---|---|---|
| documents | Mac Mini, MacBook, TrueNAS, Windows PC, Phone | 11 GB | ⚠️ Sync only |
| downloads | Mac Mini, TrueNAS | 38 GB | ⚠️ Sync only |
| pictures | Mac Mini, MacBook, TrueNAS, Phone | Unknown | ⚠️ Sync only |
| notes | Mac Mini, MacBook, TrueNAS, Phone | Unknown | ⚠️ Sync only |
| config | Mac Mini, MacBook, TrueNAS | Unknown | ⚠️ Sync only |
Limitations:
- ❌ Accidental deletion → deleted everywhere
- ❌ Ransomware/corruption → spreads everywhere
- ❌ No point-in-time recovery
- ❌ No version history (unless file versioning enabled - not documented)
Verdict: Syncthing provides redundancy and availability, NOT backup protection.
ZFS on TrueNAS (Potential Backup Target)
Current Status: ❓ Unknown - snapshots may or may not be configured
Needs Investigation:
# Check if snapshots exist
ssh truenas 'zfs list -t snapshot'
# Check if automated snapshots are configured
ssh truenas 'cat /etc/cron.d/zfs-auto-snapshot' || echo "Not configured"
# Check snapshot schedule via TrueNAS API/UI
If configured, ZFS snapshots provide:
- ✅ Point-in-time recovery
- ✅ Protection against accidental deletion
- ✅ Fast rollback capability
- ⚠️ Still single location (no offsite protection)
Proxmox VM/CT Backups
Current Status: ❓ Unknown - no backup jobs documented
Needs Investigation:
# Check backup configuration
ssh pve 'pvesh get /cluster/backup'
# Check if any backups exist
ssh pve 'ls -lh /var/lib/vz/dump/'
ssh pve2 'ls -lh /var/lib/vz/dump/'
Critical VMs Needing Backup:
| VM/CT | VMID | Priority | Notes |
|---|---|---|---|
| TrueNAS | 100 | 🔴 CRITICAL | All storage lives here |
| Saltbox | 101 | 🟡 HIGH | Media stack, complex config |
| homeassistant | 110 | 🟡 HIGH | Home automation config |
| gitea-vm | 300 | 🟡 HIGH | Git repositories |
| pihole | 200 | 🟢 MEDIUM | DNS config (easy to rebuild) |
| traefik | 202 | 🟢 MEDIUM | Reverse proxy config |
| trading-vm | 301 | 🟡 HIGH | AI trading platform |
| lmdev1 | 111 | 🟢 LOW | Development (ephemeral) |
Recommended Backup Strategy
Tier 1: Local Snapshots (IMPLEMENT IMMEDIATELY)
ZFS Snapshots on TrueNAS
Schedule automatic snapshots for all datasets:
| Dataset | Frequency | Retention |
|---|---|---|
| vault/documents | Every 15 min | 1 hour |
| vault/documents | Hourly | 24 hours |
| vault/documents | Daily | 30 days |
| vault/documents | Weekly | 12 weeks |
| vault/documents | Monthly | 12 months |
Implementation:
# Via TrueNAS UI: Storage → Snapshots → Add
# Or via CLI:
ssh truenas 'zfs snapshot vault/documents@daily-$(date +%Y%m%d)'
Proxmox VM Backups
Configure weekly backups to local storage:
# Create backup job via Proxmox UI:
# Datacenter → Backup → Add
# - Schedule: Weekly (Sunday 2 AM)
# - Storage: local-zfs or nvme-mirror1
# - Mode: Snapshot (fast)
# - Retention: 4 backups
Or via CLI:
ssh pve 'pvesh create /cluster/backup --schedule "sun 02:00" --storage local-zfs --mode snapshot --prune-backups keep-last=4'
Tier 2: Offsite Backups (CRITICAL GAP)
Option A: Cloud Storage (Recommended)
Use rclone or restic to sync critical data to cloud:
| Provider | Cost | Pros | Cons |
|---|---|---|---|
| Backblaze B2 | $6/TB/mo | Cheap, reliable | Egress fees |
| AWS S3 Glacier | $4/TB/mo | Very cheap storage | Slow retrieval |
| Wasabi | $6.99/TB/mo | No egress fees | Minimum 90-day retention |
Implementation Example (Backblaze B2):
# Install on TrueNAS
ssh truenas 'pkg install rclone restic'
# Configure B2
rclone config # Follow prompts for B2
# Daily backup critical folders
0 3 * * * rclone sync /mnt/vault/documents b2:homelab-backup/documents --transfers 4
Option B: Offsite TrueNAS Replication
- Set up second TrueNAS at friend/family member's house
- Use ZFS replication to sync snapshots
- Requires: Static IP or Tailscale, trust
Option C: USB Drive Rotation
- Weekly backup to external USB drive
- Rotate 2-3 drives (one always offsite)
- Manual but simple
Tier 3: Configuration Backups
Proxmox Configuration
# Backup /etc/pve (configs are already in cluster filesystem)
# But also backup to external location:
ssh pve 'tar czf /tmp/pve-config-$(date +%Y%m%d).tar.gz /etc/pve /etc/network/interfaces /etc/systemd/system/*.service'
# Copy to safe location
scp pve:/tmp/pve-config-*.tar.gz ~/Backups/proxmox/
VM-Specific Configs
- Traefik configs:
/etc/traefik/on CT 202 - Saltbox configs:
/srv/git/saltbox/on VM 101 - Home Assistant:
/config/on VM 110
Script to backup all configs:
#!/bin/bash
# Save as ~/bin/backup-homelab-configs.sh
DATE=$(date +%Y%m%d)
BACKUP_DIR=~/Backups/homelab-configs/$DATE
mkdir -p $BACKUP_DIR
# Proxmox configs
ssh pve 'tar czf -' /etc/pve /etc/network > $BACKUP_DIR/pve-config.tar.gz
ssh pve2 'tar czf -' /etc/pve /etc/network > $BACKUP_DIR/pve2-config.tar.gz
# Traefik
ssh pve 'pct exec 202 -- tar czf -' /etc/traefik > $BACKUP_DIR/traefik-config.tar.gz
# Saltbox
ssh saltbox 'tar czf -' /srv/git/saltbox > $BACKUP_DIR/saltbox-config.tar.gz
# Home Assistant
ssh pve 'qm guest exec 110 -- tar czf -' /config > $BACKUP_DIR/homeassistant-config.tar.gz
echo "Configs backed up to $BACKUP_DIR"
Disaster Recovery Scenarios
Scenario 1: Single VM Failure
Impact: Medium Recovery Time: 30-60 minutes
- Restore from Proxmox backup:
ssh pve 'qmrestore /path/to/backup.vma.zst VMID' - Start VM and verify
- Update IP if needed
Scenario 2: TrueNAS Failure
Impact: CATASTROPHIC (all storage lost) Recovery Time: Unknown - NO PLAN
Current State: 🚨 NO RECOVERY PLAN Needed:
- Offsite backup of critical datasets
- Documented ZFS pool creation steps
- Share configuration export
Scenario 3: Complete PVE Server Failure
Impact: SEVERE Recovery Time: 4-8 hours
Current State: ⚠️ PARTIALLY RECOVERABLE Needed:
- VM backups stored on TrueNAS or PVE2
- Proxmox reinstall procedure
- Network config documentation
Scenario 4: Complete Site Disaster (Fire/Flood)
Impact: TOTAL LOSS Recovery Time: Unknown
Current State: 🚨 NO RECOVERY PLAN Needed:
- Offsite backups (cloud or physical)
- Critical data prioritization
- Restore procedures
Action Plan
Immediate (Next 7 Days)
-
Audit existing backups: Check if ZFS snapshots or Proxmox backups exist
ssh truenas 'zfs list -t snapshot' ssh pve 'ls -lh /var/lib/vz/dump/' -
Enable ZFS snapshots: Configure via TrueNAS UI for critical datasets
-
Configure Proxmox backup jobs: Weekly backups of critical VMs (100, 101, 110, 300)
-
Test restore: Pick one VM, back it up, restore it to verify process works
Short-term (Next 30 Days)
-
Set up offsite backup: Choose provider (Backblaze B2 recommended)
-
Install backup tools: rclone or restic on TrueNAS
-
Configure daily cloud sync: Critical folders to cloud storage
-
Document restore procedures: Step-by-step guides for each scenario
Long-term (Next 90 Days)
-
Implement monitoring: Alerts for backup failures
-
Quarterly restore test: Verify backups actually work
-
Backup rotation policy: Automate old backup cleanup
-
Configuration backup automation: Weekly cron job
Monitoring & Validation
Backup Health Checks
# Check last ZFS snapshot
ssh truenas 'zfs list -t snapshot -o name,creation -s creation | tail -5'
# Check Proxmox backup status
ssh pve 'pvesh get /cluster/backup-info/not-backed-up'
# Check cloud sync status (if using rclone)
ssh truenas 'rclone ls b2:homelab-backup | wc -l'
Alerts to Set Up
- Email alert if no snapshot created in 24 hours
- Email alert if Proxmox backup fails
- Email alert if cloud sync fails
- Weekly backup status report
Cost Estimate
Monthly Backup Costs:
| Component | Cost | Notes |
|---|---|---|
| Local storage (already owned) | $0 | Using existing TrueNAS |
| Proxmox backups (local) | $0 | Using existing storage |
| Cloud backup (1 TB) | $6-10/mo | Backblaze B2 or Wasabi |
| Total | ~$10/mo | Minimal cost for peace of mind |
One-time:
- External USB drives (3x 4TB) | ~$300 | Optional, for rotation backup
Related Documentation
- STORAGE.md - ZFS pool layouts and capacity
- VMS.md - VM inventory and prioritization
- DISASTER-RECOVERY.md - Recovery procedures (coming soon)
Last Updated: 2025-12-22 Status: 🚨 CRITICAL GAPS - IMMEDIATE ACTION REQUIRED