Auto-sync: 20260123-015626

Add central configuration reference section
Reference ~/.secrets, ~/.hosts, and ~/.ssh/config for centralized credentials and host management. Includes homelab-specific variables for Syncthing, Home Assistant, n8n, and Cloudflare. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 01:56:27 -05:00 · 2026-01-20 15:13:16 -05:00 · 2026-01-20 14:50:49 -05:00 · 2026-01-16 16:12:19 -05:00 · 2026-01-16 15:50:17 -05:00 · 2026-01-16 15:25:21 -05:00
33 changed files with 9873 additions and 1022 deletions
--- a/.stfolder/syncthing-folder-8be0b5.txt
+++ b/.stfolder/syncthing-folder-8be0b5.txt
@@ -0,0 +1,5 @@
+# This directory is a Syncthing folder marker.
+# Do not delete.
+
+folderID: homelab
+created: 2025-12-23T00:39:52-05:00
--- a/AUTOMATION-WELCOME-HOME.md
+++ b/AUTOMATION-WELCOME-HOME.md
@@ -0,0 +1,190 @@
+# Welcome Home Automation
+
+## Overview
+
+Automatically turns on lights when you arrive home after sunset, creating a warm welcome.
+
+## Status
+
+- **Created:** 2026-01-14
+- **State:** Active (enabled)
+- **Entity ID:** `automation.welcome_home`
+- **Last Triggered:** Never (newly created)
+
+## How It Works
+
+### Trigger
+- Activates when **person.hutson** enters **zone.home** (100m radius)
+- GPS tracking via device_tracker.honor (Honor phone)
+
+### Conditions
+The automation only runs when it's dark:
+- After sunset (with 30-minute early start) **OR**
+- Before sunrise
+
+This prevents lights from turning on during daytime arrivals.
+
+### Actions
+When triggered, the following lights turn on:
+
+| Light | Brightness | Purpose |
+|-------|------------|---------|
+| **Living Room** | 75% | Main ambient lighting |
+| **Living Room Lamp** | 60% | Softer accent light |
+| **Kitchen** | 80% | Task lighting for entry |
+
+## Climate Control Note
+
+No climate/heating entities were found in your Home Assistant setup. To add heating control in the future:
+1. Integrate your thermostat/HVAC with Home Assistant
+2. Add a climate action to this automation (see customization below)
+
+## Customization
+
+### Adjust Trigger Distance
+
+The home zone has a 100m radius. To change this:
+
+```yaml
+# In Home Assistant UI: Settings → Areas → Zones → Home
+# Or via API:
+curl -X PUT \
+  -H "Authorization: Bearer $HA_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"latitude": 35.6542655, "longitude": -78.7417665, "radius": 150}' \
+  "http://10.10.10.210:8123/api/config/zone/zone.home"
+```
+
+### Add More Lights
+
+To add additional lights (e.g., Office, Front Porch):
+
+```bash
+HA_TOKEN="your-token-here"
+
+# Get current config
+curl -s -H "Authorization: Bearer $HA_TOKEN" \
+  "http://10.10.10.210:8123/api/config/automation/config/welcome_home" > automation.json
+
+# Edit automation.json to add more light actions
+# Then update:
+curl -X POST \
+  -H "Authorization: Bearer $HA_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d @automation.json \
+  "http://10.10.10.210:8123/api/config/automation/config/welcome_home"
+```
+
+### Add Climate Control (when available)
+
+Add this action to the automation:
+
+```json
+{
+  "service": "climate.set_temperature",
+  "target": {
+    "entity_id": "climate.thermostat"
+  },
+  "data": {
+    "temperature": 72,
+    "hvac_mode": "heat"
+  }
+}
+```
+
+### Use a Scene Instead
+
+To activate a predefined scene instead of individual lights:
+
+```json
+{
+  "service": "scene.turn_on",
+  "target": {
+    "entity_id": "scene.living_room_relax"
+  }
+}
+```
+
+Available scenes include:
+- `scene.living_room_relax`
+- `scene.living_room_dimmed`
+- `scene.all_nightlight`
+
+## Testing
+
+### Manual Trigger
+```bash
+HA_TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIwZThjZmJjMzVlNDA0NzYwOTMzMjg3MTQ5ZjkwOGU2NyIsImlhdCI6MTc2NTk5MjQ4OCwiZXhwIjoyMDgxMzUyNDg4fQ.r743tsb3E5NNlrwEEu9glkZdiI4j_3SKIT1n5PGUytY"
+
+curl -X POST \
+  -H "Authorization: Bearer $HA_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"entity_id": "automation.welcome_home"}' \
+  "http://10.10.10.210:8123/api/services/automation/trigger"
+```
+
+### Check Last Triggered
+```bash
+curl -s -H "Authorization: Bearer $HA_TOKEN" \
+  "http://10.10.10.210:8123/api/states/automation.welcome_home" | \
+  python3 -c "import json, sys; print(json.load(sys.stdin)['attributes']['last_triggered'])"
+```
+
+## Disable/Enable
+
+### Disable
+```bash
+curl -X POST \
+  -H "Authorization: Bearer $HA_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"entity_id": "automation.welcome_home"}' \
+  "http://10.10.10.210:8123/api/services/automation/turn_off"
+```
+
+### Enable
+```bash
+curl -X POST \
+  -H "Authorization: Bearer $HA_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"entity_id": "automation.welcome_home"}' \
+  "http://10.10.10.210:8123/api/services/automation/turn_on"
+```
+
+## Monitoring
+
+### View in Home Assistant UI
+1. Go to http://10.10.10.210:8123
+2. Settings → Automations & Scenes → Automations
+3. Find "Welcome Home"
+
+### Check Automation State
+The automation is currently: **ON**
+
+### Troubleshooting
+
+If the automation doesn't trigger:
+1. Check person.hutson GPS accuracy (should be < 50m)
+2. Verify zone.home coordinates match your actual home location
+3. Check automation was triggered during dark hours
+4. Review Home Assistant logs for errors
+
+## Related Documentation
+
+- [Home Assistant API](./HOMEASSISTANT.md)
+- [Personal Assistant Integration](../personal-assistant/CLAUDE.md)
+- [Smart Home Control](../personal-assistant/docs/services-matrix.md)
+
+## Future Enhancements
+
+Potential improvements:
+- Add motion sensor override (don't trigger if motion already detected)
+- Integrate with calendar (different scenes for work vs personal time)
+- Add climate control when thermostat is integrated
+- Create "leaving home" automation to turn off lights
+- Add notification to phone when automation triggers
+- Adjust brightness based on time of day
+- Add office lights during work hours
+
+---
+*Created: 2026-01-14*  
+*Last Updated: 2026-01-14*
--- a/BACKUP-STRATEGY.md
+++ b/BACKUP-STRATEGY.md
@@ -0,0 +1,358 @@
+# Backup Strategy
+
+## 🚨 Current Status: CRITICAL GAPS IDENTIFIED
+
+This document outlines the backup strategy for the homelab infrastructure. **As of 2025-12-22, there are significant gaps in backup coverage that need to be addressed.**
+
+## Executive Summary
+
+### What We Have ✅
+- **Syncthing**: File synchronization across 5+ devices
+- **ZFS on TrueNAS**: Copy-on-write filesystem with snapshot capability (not yet configured)
+- **Proxmox**: Built-in backup capabilities (not yet configured)
+
+### What We DON'T Have 🚨
+- ❌ No documented VM/CT backups
+- ❌ No ZFS snapshot schedule
+- ❌ No offsite backups
+- ❌ No disaster recovery plan
+- ❌ No tested restore procedures
+- ❌ No configuration backups
+
+**Risk Level**: HIGH - A catastrophic failure could result in significant data loss.
+
+---
+
+## Current State Analysis
+
+### Syncthing (File Synchronization)
+
+**What it is**: Real-time file sync across devices
+**What it is NOT**: A backup solution
+
+| Folder | Devices | Size | Protected? |
+|--------|---------|------|------------|
+| documents | Mac Mini, MacBook, TrueNAS, Windows PC, Phone | 11 GB | ⚠️ Sync only |
+| downloads | Mac Mini, TrueNAS | 38 GB | ⚠️ Sync only |
+| pictures | Mac Mini, MacBook, TrueNAS, Phone | Unknown | ⚠️ Sync only |
+| notes | Mac Mini, MacBook, TrueNAS, Phone | Unknown | ⚠️ Sync only |
+| config | Mac Mini, MacBook, TrueNAS | Unknown | ⚠️ Sync only |
+
+**Limitations**:
+- ❌ Accidental deletion → deleted everywhere
+- ❌ Ransomware/corruption → spreads everywhere
+- ❌ No point-in-time recovery
+- ❌ No version history (unless file versioning enabled - not documented)
+
+**Verdict**: Syncthing provides redundancy and availability, NOT backup protection.
+
+### ZFS on TrueNAS (Potential Backup Target)
+
+**Current Status**: ❓ Unknown - snapshots may or may not be configured
+
+**Needs Investigation**:
+```bash
+# Check if snapshots exist
+ssh truenas 'zfs list -t snapshot'
+
+# Check if automated snapshots are configured
+ssh truenas 'cat /etc/cron.d/zfs-auto-snapshot' || echo "Not configured"
+
+# Check snapshot schedule via TrueNAS API/UI
+```
+
+**If configured**, ZFS snapshots provide:
+- ✅ Point-in-time recovery
+- ✅ Protection against accidental deletion
+- ✅ Fast rollback capability
+- ⚠️ Still single location (no offsite protection)
+
+### Proxmox VM/CT Backups
+
+**Current Status**: ❓ Unknown - no backup jobs documented
+
+**Needs Investigation**:
+```bash
+# Check backup configuration
+ssh pve 'pvesh get /cluster/backup'
+
+# Check if any backups exist
+ssh pve 'ls -lh /var/lib/vz/dump/'
+ssh pve2 'ls -lh /var/lib/vz/dump/'
+```
+
+**Critical VMs Needing Backup**:
+| VM/CT | VMID | Priority | Notes |
+|-------|------|----------|-------|
+| TrueNAS | 100 | 🔴 CRITICAL | All storage lives here |
+| Saltbox | 101 | 🟡 HIGH | Media stack, complex config |
+| homeassistant | 110 | 🟡 HIGH | Home automation config |
+| gitea-vm | 300 | 🟡 HIGH | Git repositories |
+| pihole | 200 | 🟢 MEDIUM | DNS config (easy to rebuild) |
+| traefik | 202 | 🟢 MEDIUM | Reverse proxy config |
+| trading-vm | 301 | 🟡 HIGH | AI trading platform |
+| lmdev1 | 111 | 🟢 LOW | Development (ephemeral) |
+
+---
+
+## Recommended Backup Strategy
+
+### Tier 1: Local Snapshots (IMPLEMENT IMMEDIATELY)
+
+**ZFS Snapshots on TrueNAS**
+
+Schedule automatic snapshots for all datasets:
+
+| Dataset | Frequency | Retention |
+|---------|-----------|-----------|
+| vault/documents | Every 15 min | 1 hour |
+| vault/documents | Hourly | 24 hours |
+| vault/documents | Daily | 30 days |
+| vault/documents | Weekly | 12 weeks |
+| vault/documents | Monthly | 12 months |
+
+**Implementation**:
+```bash
+# Via TrueNAS UI: Storage → Snapshots → Add
+# Or via CLI:
+ssh truenas 'zfs snapshot vault/documents@daily-$(date +%Y%m%d)'
+```
+
+**Proxmox VM Backups**
+
+Configure weekly backups to local storage:
+
+```bash
+# Create backup job via Proxmox UI:
+# Datacenter → Backup → Add
+# - Schedule: Weekly (Sunday 2 AM)
+# - Storage: local-zfs or nvme-mirror1
+# - Mode: Snapshot (fast)
+# - Retention: 4 backups
+```
+
+**Or via CLI**:
+```bash
+ssh pve 'pvesh create /cluster/backup --schedule "sun 02:00" --storage local-zfs --mode snapshot --prune-backups keep-last=4'
+```
+
+### Tier 2: Offsite Backups (CRITICAL GAP)
+
+**Option A: Cloud Storage (Recommended)**
+
+Use **rclone** or **restic** to sync critical data to cloud:
+
+| Provider | Cost | Pros | Cons |
+|----------|------|------|------|
+| Backblaze B2 | $6/TB/mo | Cheap, reliable | Egress fees |
+| AWS S3 Glacier | $4/TB/mo | Very cheap storage | Slow retrieval |
+| Wasabi | $6.99/TB/mo | No egress fees | Minimum 90-day retention |
+
+**Implementation Example (Backblaze B2)**:
+```bash
+# Install on TrueNAS
+ssh truenas 'pkg install rclone restic'
+
+# Configure B2
+rclone config  # Follow prompts for B2
+
+# Daily backup critical folders
+0 3 * * * rclone sync /mnt/vault/documents b2:homelab-backup/documents --transfers 4
+```
+
+**Option B: Offsite TrueNAS Replication**
+
+- Set up second TrueNAS at friend/family member's house
+- Use ZFS replication to sync snapshots
+- Requires: Static IP or Tailscale, trust
+
+**Option C: USB Drive Rotation**
+
+- Weekly backup to external USB drive
+- Rotate 2-3 drives (one always offsite)
+- Manual but simple
+
+### Tier 3: Configuration Backups
+
+**Proxmox Configuration**
+
+```bash
+# Backup /etc/pve (configs are already in cluster filesystem)
+# But also backup to external location:
+ssh pve 'tar czf /tmp/pve-config-$(date +%Y%m%d).tar.gz /etc/pve /etc/network/interfaces /etc/systemd/system/*.service'
+
+# Copy to safe location
+scp pve:/tmp/pve-config-*.tar.gz ~/Backups/proxmox/
+```
+
+**VM-Specific Configs**
+
+- Traefik configs: `/etc/traefik/` on CT 202
+- Saltbox configs: `/srv/git/saltbox/` on VM 101
+- Home Assistant: `/config/` on VM 110
+
+**Script to backup all configs**:
+```bash
+#!/bin/bash
+# Save as ~/bin/backup-homelab-configs.sh
+
+DATE=$(date +%Y%m%d)
+BACKUP_DIR=~/Backups/homelab-configs/$DATE
+
+mkdir -p $BACKUP_DIR
+
+# Proxmox configs
+ssh pve 'tar czf -' /etc/pve /etc/network > $BACKUP_DIR/pve-config.tar.gz
+ssh pve2 'tar czf -' /etc/pve /etc/network > $BACKUP_DIR/pve2-config.tar.gz
+
+# Traefik
+ssh pve 'pct exec 202 -- tar czf -' /etc/traefik > $BACKUP_DIR/traefik-config.tar.gz
+
+# Saltbox
+ssh saltbox 'tar czf -' /srv/git/saltbox > $BACKUP_DIR/saltbox-config.tar.gz
+
+# Home Assistant
+ssh pve 'qm guest exec 110 -- tar czf -' /config > $BACKUP_DIR/homeassistant-config.tar.gz
+
+echo "Configs backed up to $BACKUP_DIR"
+```
+
+---
+
+## Disaster Recovery Scenarios
+
+### Scenario 1: Single VM Failure
+
+**Impact**: Medium
+**Recovery Time**: 30-60 minutes
+
+1. Restore from Proxmox backup:
+   ```bash
+   ssh pve 'qmrestore /path/to/backup.vma.zst VMID'
+   ```
+2. Start VM and verify
+3. Update IP if needed
+
+### Scenario 2: TrueNAS Failure
+
+**Impact**: CATASTROPHIC (all storage lost)
+**Recovery Time**: Unknown - NO PLAN
+
+**Current State**: 🚨 NO RECOVERY PLAN
+**Needed**:
+- Offsite backup of critical datasets
+- Documented ZFS pool creation steps
+- Share configuration export
+
+### Scenario 3: Complete PVE Server Failure
+
+**Impact**: SEVERE
+**Recovery Time**: 4-8 hours
+
+**Current State**: ⚠️ PARTIALLY RECOVERABLE
+**Needed**:
+- VM backups stored on TrueNAS or PVE2
+- Proxmox reinstall procedure
+- Network config documentation
+
+### Scenario 4: Complete Site Disaster (Fire/Flood)
+
+**Impact**: TOTAL LOSS
+**Recovery Time**: Unknown
+
+**Current State**: 🚨 NO RECOVERY PLAN
+**Needed**:
+- Offsite backups (cloud or physical)
+- Critical data prioritization
+- Restore procedures
+
+---
+
+## Action Plan
+
+### Immediate (Next 7 Days)
+
+- [ ] **Audit existing backups**: Check if ZFS snapshots or Proxmox backups exist
+  ```bash
+  ssh truenas 'zfs list -t snapshot'
+  ssh pve 'ls -lh /var/lib/vz/dump/'
+  ```
+
+- [ ] **Enable ZFS snapshots**: Configure via TrueNAS UI for critical datasets
+
+- [ ] **Configure Proxmox backup jobs**: Weekly backups of critical VMs (100, 101, 110, 300)
+
+- [ ] **Test restore**: Pick one VM, back it up, restore it to verify process works
+
+### Short-term (Next 30 Days)
+
+- [ ] **Set up offsite backup**: Choose provider (Backblaze B2 recommended)
+
+- [ ] **Install backup tools**: rclone or restic on TrueNAS
+
+- [ ] **Configure daily cloud sync**: Critical folders to cloud storage
+
+- [ ] **Document restore procedures**: Step-by-step guides for each scenario
+
+### Long-term (Next 90 Days)
+
+- [ ] **Implement monitoring**: Alerts for backup failures
+
+- [ ] **Quarterly restore test**: Verify backups actually work
+
+- [ ] **Backup rotation policy**: Automate old backup cleanup
+
+- [ ] **Configuration backup automation**: Weekly cron job
+
+---
+
+## Monitoring & Validation
+
+### Backup Health Checks
+
+```bash
+# Check last ZFS snapshot
+ssh truenas 'zfs list -t snapshot -o name,creation -s creation | tail -5'
+
+# Check Proxmox backup status
+ssh pve 'pvesh get /cluster/backup-info/not-backed-up'
+
+# Check cloud sync status (if using rclone)
+ssh truenas 'rclone ls b2:homelab-backup | wc -l'
+```
+
+### Alerts to Set Up
+
+- Email alert if no snapshot created in 24 hours
+- Email alert if Proxmox backup fails
+- Email alert if cloud sync fails
+- Weekly backup status report
+
+---
+
+## Cost Estimate
+
+**Monthly Backup Costs**:
+
+| Component | Cost | Notes |
+|-----------|------|-------|
+| Local storage (already owned) | $0 | Using existing TrueNAS |
+| Proxmox backups (local) | $0 | Using existing storage |
+| Cloud backup (1 TB) | $6-10/mo | Backblaze B2 or Wasabi |
+| **Total** | **~$10/mo** | Minimal cost for peace of mind |
+
+**One-time**:
+- External USB drives (3x 4TB) | ~$300 | Optional, for rotation backup
+
+---
+
+## Related Documentation
+
+- [STORAGE.md](STORAGE.md) - ZFS pool layouts and capacity
+- [VMS.md](VMS.md) - VM inventory and prioritization
+- [DISASTER-RECOVERY.md](#) - Recovery procedures (coming soon)
+
+---
+
+**Last Updated**: 2025-12-22
+**Status**: 🚨 CRITICAL GAPS - IMMEDIATE ACTION REQUIRED
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -36,12 +36,12 @@ Investigated UPS power limit issues across both Proxmox servers.
  [Unit]
  Description=Disable KSM (Kernel Same-page Merging)
  After=multi-user.target
-
+  
  [Service]
  Type=oneshot
  ExecStart=/bin/sh -c "echo 0 > /sys/kernel/mm/ksm/run"
  RemainAfterExit=yes
-
+  
  [Install]
  WantedBy=multi-user.target
  ```
@@ -108,12 +108,12 @@ curl -X POST -H "X-API-Key: xxx" http://localhost:20910/rest/system/restart
  [Unit]
  Description=Set CPU governor to powersave with balance_power EPP
  After=multi-user.target
-
+  
  [Service]
  Type=oneshot
  ExecStart=/bin/bash -c 'for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo powersave > "$gov"; done; for epp in /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference; do echo balance_power > "$epp"; done'
  RemainAfterExit=yes
-
+  
  [Install]
  WantedBy=multi-user.target
  ```
@@ -127,12 +127,12 @@ curl -X POST -H "X-API-Key: xxx" http://localhost:20910/rest/system/restart
  [Unit]
  Description=Set CPU governor to schedutil for power savings
  After=multi-user.target
-
+  
  [Service]
  Type=oneshot
  ExecStart=/bin/bash -c 'for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo schedutil > "$gov"; done'
  RemainAfterExit=yes
-
+  
  [Install]
  WantedBy=multi-user.target
  ```
@@ -194,4 +194,4 @@ Not useful when:
 - `general_profit` is negative

 ### What is Memory Ballooning?
-Guest-cooperative memory management. Hypervisor can request VMs to give back unused RAM. Independent from KSMD. Both are Proxmox/KVM memory optimization features but serve different purposes.
+Guest-cooperative memory management. Hypervisor can request VMs to give back unused RAM. Independent from KSMD. Both are Proxmox/KVM memory optimization features but serve different purposes. 
--- a/CLAUDE.md
+++ b/CLAUDE.md
--- a/GATEWAY.md
+++ b/GATEWAY.md
@@ -0,0 +1,339 @@
+# UniFi Gateway (UCG-Fiber)
+
+Documentation for the UniFi Cloud Gateway Fiber (10.10.10.1) - the primary network gateway and router.
+
+## Overview
+
+| Property | Value |
+|----------|-------|
+| **Device** | UniFi Cloud Gateway Fiber (UCG-Fiber) |
+| **IP Address** | 10.10.10.1 |
+| **SSH User** | root |
+| **SSH Auth** | SSH key (`~/.ssh/id_ed25519`) |
+| **Host Aliases** | `ucg-fiber`, `gateway` |
+| **Firmware** | v4.4.9 (as of 2026-01-02) |
+| **UniFi Core** | 4.4.19 |
+| **RAM** | 2.9 GB (shared with UniFi apps) |
+
+---
+
+## SSH Access
+
+SSH key authentication is configured. Use host aliases:
+
+```bash
+# Quick access
+ssh ucg-fiber 'hostname'
+ssh gateway 'free -m'
+
+# Or use IP directly
+ssh root@10.10.10.1 'uptime'
+```
+
+**Note**: SSH key may need re-deployment after firmware updates if UniFi clears authorized_keys.
+
+---
+
+## Monitoring Services
+
+Two custom monitoring services run on the gateway to prevent and diagnose issues.
+
+### Internet Watchdog Service
+
+**Purpose**: Auto-reboots gateway if internet connectivity is lost for 5+ minutes
+
+**Location**: `/data/scripts/internet-watchdog.sh`
+
+**How it works**:
+1. Pings 1.1.1.1, 8.8.8.8, 208.67.222.222 every 60 seconds
+2. If all three fail, increments failure counter
+3. After 5 consecutive failures (~5 minutes), triggers reboot
+4. Logs all activity to `/var/log/internet-watchdog.log`
+
+**Commands**:
+```bash
+# Check service status
+ssh ucg-fiber 'systemctl status internet-watchdog'
+
+# View recent logs
+ssh ucg-fiber 'tail -50 /var/log/internet-watchdog.log'
+
+# Stop temporarily (if troubleshooting)
+ssh ucg-fiber 'systemctl stop internet-watchdog'
+
+# Restart
+ssh ucg-fiber 'systemctl restart internet-watchdog'
+```
+
+**Log Format**:
+```
+2026-01-02 22:45:01 - Watchdog started
+2026-01-02 22:46:01 - Internet check failed (1/5)
+2026-01-02 22:47:01 - Internet restored after 1 failures
+```
+
+---
+
+### Memory Monitor Service
+
+**Purpose**: Logs memory usage and top processes every 10 minutes for diagnostics
+
+**Location**: `/data/scripts/memory-monitor.sh`
+
+**Log File**: `/data/logs/memory-history.log`
+
+**How it works**:
+1. Every 10 minutes, logs current memory usage (`free -m`)
+2. Logs top 12 memory-consuming processes
+3. Auto-rotates log when it exceeds 10MB (keeps one .old file)
+
+**Commands**:
+```bash
+# Check service status
+ssh ucg-fiber 'systemctl status memory-monitor'
+
+# View recent memory history
+ssh ucg-fiber 'tail -100 /data/logs/memory-history.log'
+
+# Check current memory usage
+ssh ucg-fiber 'free -m'
+
+# See top memory consumers right now
+ssh ucg-fiber 'ps -eo pid,rss,comm --sort=-rss | head -12'
+```
+
+**Log Format**:
+```
+========== 2026-01-02 22:30:00 ==========
+--- MEMORY ---
+              total        used        free      shared  buff/cache   available
+Mem:           2892        1890         102         456         899        1002
+Swap:           512          88         424
+--- TOP MEMORY PROCESSES ---
+  PID   RSS COMMAND
+ 1234 327456 unifi-protect
+ 2345 252108 mongod
+ 3456 236544 java
+...
+```
+
+---
+
+## Known Memory Consumers
+
+| Process | Typical Memory | Purpose |
+|---------|----------------|---------|
+| unifi-protect | ~320 MB | Camera/NVR management |
+| mongod | ~250 MB | UniFi configuration database |
+| java (controller) | ~230 MB | UniFi Network controller |
+| postgres | ~180 MB | PostgreSQL database |
+| unifi-core | ~150 MB | UniFi OS core |
+| tailscaled | ~80 MB | Tailscale VPN |
+
+**Total available**: ~2.9 GB
+**Typical usage**: ~1.8-2.0 GB (leaves ~1 GB free)
+**Warning threshold**: <500 MB free
+**Critical**: <200 MB free or swap >50% used
+
+---
+
+## Disabled Services
+
+The following services were disabled to reduce memory usage:
+
+| Service | Memory Saved | Reason Disabled |
+|---------|--------------|-----------------|
+| UniFi Connect | ~200 MB | Not needed (cameras use Protect) |
+
+To re-enable if needed:
+```bash
+ssh ucg-fiber 'systemctl enable unifi-connect && systemctl start unifi-connect'
+```
+
+---
+
+## Common Issues
+
+### Gateway Freeze / Network Loss
+
+**Symptoms**:
+- All devices lose internet
+- Cannot ping 10.10.10.1
+- Physical reboot required
+
+**Root Cause**: Memory exhaustion causing soft lockup
+
+**Prevention**:
+1. Internet watchdog auto-reboots after 5 min outage
+2. Memory monitor logs help identify runaway processes
+3. UniFi Connect disabled to free ~200 MB
+
+**Post-Incident Analysis**:
+```bash
+# Check memory history for spike before freeze
+ssh ucg-fiber 'grep -B5 "Swap:" /data/logs/memory-history.log | tail -50'
+
+# Check watchdog logs
+ssh ucg-fiber 'cat /var/log/internet-watchdog.log'
+
+# Check system logs for errors
+ssh ucg-fiber 'dmesg | tail -100'
+ssh ucg-fiber 'journalctl -p err --since "1 hour ago"'
+```
+
+---
+
+### High Memory Usage
+
+**Check current state**:
+```bash
+ssh ucg-fiber 'free -m && echo "---" && ps -eo pid,rss,comm --sort=-rss | head -15'
+```
+
+**If swap is heavily used**:
+```bash
+# Check swap usage
+ssh ucg-fiber 'cat /proc/swaps'
+
+# See what's in swap
+ssh ucg-fiber 'for pid in $(ls /proc | grep -E "^[0-9]+$"); do
+  swap=$(grep VmSwap /proc/$pid/status 2>/dev/null | awk "{print \$2}");
+  [ "$swap" -gt 10000 ] 2>/dev/null && echo "$pid: ${swap}kB - $(cat /proc/$pid/comm)";
+done | sort -t: -k2 -rn | head -10'
+```
+
+**Consider reboot if**:
+- Available memory <200 MB
+- Swap usage >300 MB
+- System becoming unresponsive
+
+---
+
+### Tailscale Issues
+
+**Check Tailscale status**:
+```bash
+ssh ucg-fiber 'tailscale status'
+```
+
+**Common errors and fixes**:
+
+| Error | Fix |
+|-------|-----|
+| `DNS resolution failed` | Check upstream DNS (Pi-hole at 10.10.10.10) |
+| `TLS handshake failed` | Usually temporary; Tailscale auto-reconnects |
+| `Not connected` | `ssh ucg-fiber 'tailscale up'` |
+
+---
+
+## Firmware Updates
+
+**Check current version**:
+```bash
+ssh ucg-fiber 'ubnt-systool version'
+```
+
+**Update process**:
+1. Check UniFi site for latest stable firmware
+2. Download via UI or CLI
+3. Schedule update during low-usage time
+
+**After update**:
+- Verify SSH key still works
+- Check custom services still running
+- Verify Tailscale reconnects
+
+**Re-deploy SSH key if needed**:
+```bash
+ssh-copy-id -i ~/.ssh/id_ed25519 root@10.10.10.1
+```
+
+---
+
+## Service Locations
+
+| File | Purpose |
+|------|---------|
+| `/data/scripts/internet-watchdog.sh` | Watchdog script |
+| `/data/scripts/memory-monitor.sh` | Memory monitor script |
+| `/etc/systemd/system/internet-watchdog.service` | Watchdog systemd unit |
+| `/etc/systemd/system/memory-monitor.service` | Memory monitor systemd unit |
+| `/var/log/internet-watchdog.log` | Watchdog log |
+| `/data/logs/memory-history.log` | Memory history log |
+
+**Note**: `/data/` persists across firmware updates. `/var/log/` may not.
+
+---
+
+## Quick Reference Commands
+
+```bash
+# System status
+ssh ucg-fiber 'uptime && free -m'
+
+# Check both monitoring services
+ssh ucg-fiber 'systemctl status internet-watchdog memory-monitor'
+
+# Memory history (last hour)
+ssh ucg-fiber 'tail -60 /data/logs/memory-history.log'
+
+# Watchdog activity
+ssh ucg-fiber 'tail -20 /var/log/internet-watchdog.log'
+
+# Network devices (ARP table)
+ssh ucg-fiber 'cat /proc/net/arp'
+
+# Tailscale status
+ssh ucg-fiber 'tailscale status'
+
+# System logs
+ssh ucg-fiber 'journalctl -p warning --since "1 hour ago" | head -50'
+```
+
+---
+
+## Backup Considerations
+
+Custom services in `/data/scripts/` persist across firmware updates but may need:
+- Systemd services re-enabled after major updates
+- Script permissions re-applied if wiped
+
+**Backup critical files**:
+```bash
+# Copy scripts locally for reference
+scp ucg-fiber:/data/scripts/*.sh ~/Projects/homelab/data/scripts/
+```
+
+---
+
+## Related Documentation
+
+- [SSH-ACCESS.md](SSH-ACCESS.md) - SSH configuration and host aliases
+- [NETWORK.md](NETWORK.md) - Network architecture
+- [MONITORING.md](MONITORING.md) - Overall monitoring strategy
+- [HOMEASSISTANT.md](HOMEASSISTANT.md) - Home Assistant integration
+
+---
+
+## Incident History
+
+### 2025-12-27 to 2025-12-29: Gateway Freeze
+
+**Timeline**:
+- Dec 7: Firmware update to v4.4.9
+- Dec 24: Last healthy system logs
+- Dec 27-29: "No internet detected" errors in logs
+- Dec 29+: Complete silence (gateway frozen)
+- Jan 2: Physical reboot restored access
+
+**Root Cause**: Memory exhaustion causing soft lockup (no crash dump saved)
+
+**Resolution**:
+- Deployed internet-watchdog service
+- Deployed memory-monitor service
+- Disabled UniFi Connect (~200 MB saved)
+- Configured SSH key auth
+
+---
+
+**Last Updated**: 2026-01-02
--- a/HARDWARE.md
+++ b/HARDWARE.md
@@ -0,0 +1,455 @@
+# Hardware Inventory
+
+Complete hardware specifications for all homelab equipment.
+
+## Servers
+
+### PVE (10.10.10.120) - Primary Proxmox Server
+
+#### CPU
+- **Model**: AMD Ryzen Threadripper PRO 3975WX
+- **Cores**: 32 cores / 64 threads
+- **Base Clock**: 3.5 GHz
+- **Boost Clock**: 4.2 GHz
+- **TDP**: 280W
+- **Architecture**: Zen 2 (7nm)
+- **Socket**: sTRX4
+- **Features**: ECC support, PCIe 4.0
+
+#### RAM
+- **Capacity**: 128 GB
+- **Type**: DDR4 ECC Registered
+- **Speed**: Unknown (needs investigation)
+- **Channels**: 8-channel (quad-channel per socket)
+- **Idle Power**: ~30-40W
+
+#### Storage
+
+**OS/VM Storage:**
+
+| Pool | Devices | Type | Capacity | Purpose |
+|------|---------|------|----------|---------|
+| `nvme-mirror1` | 2x Sabrent Rocket Q NVMe | ZFS Mirror | 3.6 TB usable | High-performance VM storage |
+| `nvme-mirror2` | 2x Kingston SFYRD 2TB NVMe | ZFS Mirror | 1.8 TB usable | Additional fast VM storage |
+| `rpool` | 2x Samsung 870 QVO 4TB SSD | ZFS Mirror | 3.6 TB usable | Proxmox OS, containers, backups |
+
+**Total Storage**: ~9 TB usable
+
+#### GPUs
+
+| Model | Slot | VRAM | TDP | Purpose | Passed To |
+|-------|------|------|-----|---------|-----------|
+| NVIDIA Quadro P2000 | PCIe slot 1 | 5 GB GDDR5 | 75W | Plex transcoding | Host |
+| NVIDIA TITAN RTX | PCIe slot 2 | 24 GB GDDR6 | 280W | AI workloads | Saltbox (101), lmdev1 (111) |
+
+**Total GPU Power**: 75W + 280W = 355W (under load)
+
+#### Network Cards
+
+| Interface | Model | Speed | Purpose | Bridge |
+|-----------|-------|-------|---------|--------|
+| enp1s0 | Intel I210 (onboard) | 1 Gb | Management | vmbr0 |
+| enp35s0f0 | Intel X520 (dual-port SFP+) | 10 Gb | High-speed LXC | vmbr1 |
+| enp35s0f1 | Intel X520 (dual-port SFP+) | 10 Gb | High-speed VM | vmbr2 |
+
+**10Gb Transceivers**: Intel FTLX8571D3BCV (SFP+ 10GBASE-SR, 850nm, multimode)
+
+#### Storage Controllers
+
+| Model | Interface | Purpose |
+|-------|-----------|---------|
+| LSI SAS2308 HBA | PCIe 3.0 x8 | Passed to TrueNAS VM for EMC enclosure |
+| Samsung NVMe controller | PCIe | Passed to TrueNAS VM for ZFS caching |
+
+#### Motherboard
+- **Model**: Unknown - needs investigation
+- **Chipset**: AMD TRX40
+- **Form Factor**: ATX/EATX
+- **PCIe Slots**: Multiple PCIe 4.0 slots
+- **Features**: IOMMU support, ECC memory
+
+#### Power Supply
+- **Model**: Unknown
+- **Wattage**: Likely 1000W+ (needs investigation)
+- **Type**: ATX, 80+ certification unknown
+
+#### Cooling
+- **CPU Cooler**: Unknown - likely large tower or AIO
+- **Case Fans**: Unknown quantity
+- **Note**: CPU temps 70-80°C under load (healthy)
+
+---
+
+### PVE2 (10.10.10.102) - Secondary Proxmox Server
+
+#### CPU
+- **Model**: AMD Ryzen Threadripper PRO 3975WX
+- **Specs**: Same as PVE (32C/64T, 280W TDP)
+
+#### RAM
+- **Capacity**: 128 GB DDR4 ECC
+- **Same specs as PVE**
+
+#### Storage
+
+| Pool | Devices | Type | Capacity | Purpose |
+|------|---------|------|----------|---------|
+| `nvme-mirror3` | 2x NVMe (model unknown) | ZFS Mirror | Unknown | High-performance VM storage |
+| `local-zfs2` | 2x WD Red 6TB HDD | ZFS Mirror | ~6 TB usable | Bulk/archival storage (spins down) |
+
+**HDD Spindown**: Configured for 30-min idle spindown (saves ~10-16W)
+
+#### GPUs
+
+| Model | Slot | VRAM | TDP | Purpose | Passed To |
+|-------|------|------|-----|---------|-----------|
+| NVIDIA RTX A6000 | PCIe slot 1 | 48 GB GDDR6 | 300W | AI trading workloads | trading-vm (301) |
+
+#### Network Cards
+
+| Interface | Model | Speed | Purpose |
+|-----------|-------|-------|---------|
+| nic1 | Unknown (onboard) | 1 Gb | Management |
+
+**Note**: MTU set to 9000 for jumbo frames
+
+#### Motherboard
+- **Model**: Unknown
+- **Chipset**: AMD TRX40
+- **Similar to PVE**
+
+---
+
+## Network Equipment
+
+### UniFi Dream Machine Pro (UCG-Fiber)
+
+- **Model**: UniFi Cloud Gateway Fiber
+- **IP**: 10.10.10.1
+- **Ports**: Multiple 1Gb + SFP+ uplink
+- **Features**: Router, firewall, VPN, IDS/IPS
+- **MTU**: 9216 (supports jumbo frames)
+- **Tailscale**: Installed for VPN failover
+
+### Switches
+
+**Details needed** - investigate current switch setup:
+- 10Gb switch for high-speed connections?
+- 1Gb switch for general devices?
+- PoE capabilities?
+
+```bash
+# Check what's connected to 10Gb interfaces
+ssh pve 'ip link show enp35s0f0'
+ssh pve 'ip link show enp35s0f1'
+```
+
+---
+
+## Storage Hardware
+
+### EMC Storage Enclosure
+
+**See [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) for complete details**
+
+- **Model**: EMC KTN-STL4 (or similar)
+- **Form Factor**: 4U rackmount
+- **Drive Bays**: 25x 3.5" SAS/SATA
+- **Controllers**: Dual LCC (Link Control Cards)
+- **Connection**: SAS via LSI SAS2308 HBA
+- **Passed to**: TrueNAS VM (VMID 100)
+
+**Current Status**:
+- LCC A: Active (working)
+- LCC B: Failed (replacement ordered)
+
+**Drive Inventory**: Unknown - needs audit
+
+```bash
+# Get drive list from TrueNAS
+ssh truenas 'smartctl --scan'
+ssh truenas 'lsblk'
+```
+
+### NVMe Drives
+
+| Model | Quantity | Capacity | Location | Pool |
+|-------|----------|----------|----------|------|
+| Sabrent Rocket Q | 2 | Unknown | PVE | nvme-mirror1 |
+| Kingston SFYRD | 2 | 2 TB each | PVE | nvme-mirror2 |
+| Unknown model | 2 | Unknown | PVE2 | nvme-mirror3 |
+| Samsung (model unknown) | 1 | Unknown | TrueNAS (passed) | ZFS cache |
+
+### SSDs
+
+| Model | Quantity | Capacity | Location | Pool |
+|-------|----------|----------|----------|------|
+| Samsung 870 QVO | 2 | 4 TB each | PVE | rpool |
+
+### HDDs
+
+| Model | Quantity | Capacity | Location | Pool |
+|-------|----------|----------|----------|------|
+| WD Red | 2 | 6 TB each | PVE2 | local-zfs2 |
+| Unknown (in EMC) | Unknown | Unknown | TrueNAS | vault |
+
+---
+
+## UPS
+
+### Current UPS
+
+| Specification | Value |
+|---------------|-------|
+| **Model** | CyberPower OR2200PFCRT2U |
+| **Capacity** | 2200VA / 1320W |
+| **Form Factor** | 2U rackmount |
+| **Input** | NEMA 5-15P (rewired from 5-20P) |
+| **Outlets** | 2x 5-20R + 6x 5-15R |
+| **Output** | PFC Sinewave |
+| **Runtime** | ~15-20 min @ 33% load |
+| **Interface** | USB (connected to PVE) |
+
+**See [UPS.md](UPS.md) for configuration details**
+
+---
+
+## Client Devices
+
+### Mac Mini (Hutson's Workstation)
+
+- **Model**: Unknown generation
+- **CPU**: Unknown
+- **RAM**: Unknown
+- **Storage**: Unknown
+- **Network**: 1Gb Ethernet (en0) - MTU 9000
+- **Tailscale IP**: 100.108.89.58
+- **Local IP**: 10.10.10.125 (static)
+- **Purpose**: Primary workstation, Happy Coder daemon host
+
+### MacBook (Mobile)
+
+- **Model**: Unknown
+- **Network**: Wi-Fi + Ethernet adapter
+- **Tailscale IP**: Unknown
+- **Purpose**: Mobile work, development
+
+### Windows PC
+
+- **Model**: Unknown
+- **CPU**: Unknown
+- **Network**: 1Gb Ethernet
+- **IP**: 10.10.10.150
+- **Purpose**: Gaming, Windows development, Syncthing node
+
+### Phone (Android)
+
+- **Model**: Unknown
+- **IP**: 10.10.10.54 (when on Wi-Fi)
+- **Purpose**: Syncthing mobile node, Happy Coder client
+
+---
+
+## Rack Layout (If Applicable)
+
+**Needs documentation** - Current rack configuration unknown
+
+Suggested format:
+```
+U42: Blank panel
+U41: UPS (CyberPower 2U)
+U40: UPS (CyberPower 2U)
+U39: Switch (10Gb)
+U38-U35: EMC Storage Enclosure (4U)
+U34: PVE Server
+U33: PVE2 Server
+...
+```
+
+---
+
+## Power Consumption
+
+### Measured Power Draw
+
+| Component | Idle | Typical | Peak | Notes |
+|-----------|------|---------|------|-------|
+| PVE Server | 250-350W | 500W | 750W | CPU + GPUs + storage |
+| PVE2 Server | 200-300W | 400W | 600W | CPU + GPU + storage |
+| Network Gear | ~50W | ~50W | ~50W | Router + switches |
+| **Total** | **500-700W** | **~950W** | **~1400W** | Exceeds UPS under peak load |
+
+**UPS Capacity**: 1320W
+**Typical Load**: 33-50% (safe margin)
+**Peak Load**: Can exceed UPS capacity temporarily (acceptable)
+
+### Power Optimizations Applied
+
+**See [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) for details**
+
+- KSMD disabled: ~60-80W saved
+- CPU governors: ~60-120W saved
+- Syncthing rescans: ~60-80W saved
+- HDD spindown: ~10-16W saved when idle
+- **Total savings**: ~150-300W
+
+---
+
+## Thermal Management
+
+### CPU Cooling
+
+**PVE & PVE2**:
+- CPU cooler: Unknown model
+- Thermal paste: Unknown, likely needs refresh if temps >85°C
+- Target temp: 70-80°C under load
+- Max safe: 90°C Tctl (Threadripper PRO spec)
+
+### GPU Cooling
+
+All GPUs are passively managed (stock coolers):
+- TITAN RTX: 2-3W idle, 280W load
+- RTX A6000: 11W idle, 300W load
+- Quadro P2000: 25W constant (Plex active)
+
+### Case Airflow
+
+**Unknown** - needs investigation:
+- Case model?
+- Fan configuration?
+- Positive or negative pressure?
+
+---
+
+## Cable Management
+
+### Network Cables
+
+| Connection | Type | Length | Speed |
+|------------|------|--------|-------|
+| PVE → Switch (10Gb) | OM3 fiber | Unknown | 10Gb |
+| PVE2 → Router | Cat6 | Unknown | 1Gb |
+| Mac Mini → Switch | Cat6 | Unknown | 1Gb |
+| TrueNAS → EMC | SAS cable | Unknown | 6Gb/s |
+
+### Power Cables
+
+**Critical**: All servers on UPS battery-backed outlets
+
+---
+
+## Maintenance Schedule
+
+### Annual Maintenance
+
+- [ ] Clean dust from servers (every 6-12 months)
+- [ ] Check thermal paste on CPUs (every 2-3 years)
+- [ ] Test UPS battery runtime (annually)
+- [ ] Verify all fans operational
+- [ ] Check for bulging capacitors on PSUs
+
+### Drive Health
+
+```bash
+# Check SMART status on all drives
+ssh pve 'smartctl -a /dev/nvme0'
+ssh pve2 'smartctl -a /dev/sda'
+ssh truenas 'smartctl --scan | while read dev type; do echo "=== $dev ==="; smartctl -a $dev | grep -E "Model|Serial|Health|Reallocated|Current_Pending"; done'
+```
+
+### Temperature Monitoring
+
+```bash
+# Check all temps (needs lm-sensors installed)
+ssh pve 'sensors'
+ssh pve2 'sensors'
+```
+
+---
+
+## Warranty & Purchase Info
+
+**Needs documentation**:
+- When were servers purchased?
+- Where were components bought?
+- Any warranties still active?
+- Replacement part sources?
+
+---
+
+## Upgrade Path
+
+### Short-term Upgrades (< 6 months)
+
+- [ ] 20A circuit for UPS (restore original 5-20P plug)
+- [ ] Document missing hardware specs
+- [ ] Label all cables
+- [ ] Create rack diagram
+
+### Medium-term Upgrades (6-12 months)
+
+- [ ] Additional 10Gb NIC for PVE2?
+- [ ] More NVMe storage?
+- [ ] Upgrade network switches?
+- [ ] Replace EMC enclosure with newer model?
+
+### Long-term Upgrades (1-2 years)
+
+- [ ] CPU upgrade to newer Threadripper?
+- [ ] RAM expansion to 256GB?
+- [ ] Additional GPU for AI workloads?
+- [ ] Migrate to PCIe 5.0 storage?
+
+---
+
+## Investigation Needed
+
+High-priority items to document:
+
+- [ ] Get exact motherboard model (both servers)
+- [ ] Get PSU model and wattage
+- [ ] CPU cooler models
+- [ ] Network switch models and configuration
+- [ ] Complete drive inventory in EMC enclosure
+- [ ] RAM speed and timings
+- [ ] Case models
+- [ ] Exact NVMe models for all drives
+
+**Commands to gather info**:
+
+```bash
+# Motherboard
+ssh pve 'dmidecode -t baseboard'
+
+# CPU details
+ssh pve 'lscpu'
+
+# RAM details
+ssh pve 'dmidecode -t memory | grep -E "Size|Speed|Manufacturer"'
+
+# Storage devices
+ssh pve 'lsblk -o NAME,SIZE,TYPE,TRAN,MODEL'
+
+# Network cards
+ssh pve 'lspci | grep -i network'
+
+# GPU details
+ssh pve 'lspci | grep -i vga'
+ssh pve 'nvidia-smi -L'  # If nvidia-smi available
+```
+
+---
+
+## Related Documentation
+
+- [VMS.md](VMS.md) - VM resource allocation
+- [STORAGE.md](STORAGE.md) - Storage pools and usage
+- [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) - Power optimizations
+- [UPS.md](UPS.md) - UPS configuration
+- [NETWORK.md](NETWORK.md) - Network configuration
+- [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) - Storage enclosure details
+
+---
+
+**Last Updated**: 2025-12-22
+**Status**: ⚠️ Incomplete - many specs need investigation
--- a/HOMEASSISTANT.md
+++ b/HOMEASSISTANT.md
@@ -130,16 +130,232 @@ curl -s -H "Authorization: Bearer $HA_TOKEN" \

 - **Philips Hue** - Lights
 - **Sonos** - Speakers
+- **Nest** - Thermostat (climate.thermostat)
 - **Motion Sensors** - Various locations
+- **NUT (Network UPS Tools)** - UPS monitoring (added 2025-12-21)
+- **VeSync** - Levoit humidifier control (added 2026-01-14)
+- **HomeKit Controller** - Homebridge bridge for Govee sensors (added 2026-01-14)
+- **Oura Ring v2** - Sleep/health tracking via HACS (added 2026-01-16)
+- **HACS** - Home Assistant Community Store for custom integrations
+
+### NUT / UPS Integration
+
+Monitors the CyberPower OR2200PFCRT2U UPS connected to PVE.
+
+**Connection:**
+- Host: 10.10.10.120
+- Port: 3493
+- Username: upsmon
+- Password: upsmon123
+
+**Entities:**
+| Entity ID | Description |
+|-----------|-------------|
+| `sensor.cyberpower_battery_charge` | Battery percentage |
+| `sensor.cyberpower_load` | Current load % |
+| `sensor.cyberpower_input_voltage` | Input voltage |
+| `sensor.cyberpower_output_voltage` | Output voltage |
+| `sensor.cyberpower_status` | Status (Online, On Battery, etc.) |
+| `sensor.cyberpower_status_data` | Raw status (OL, OB, LB, CHRG) |
+
+**Dashboard Card Example:**
+```yaml
+type: entities
+title: UPS Status
+entities:
+  - entity: sensor.cyberpower_status
+    name: Status
+  - entity: sensor.cyberpower_battery_charge
+    name: Battery
+  - entity: sensor.cyberpower_load
+    name: Load
+  - entity: sensor.cyberpower_input_voltage
+    name: Input Voltage
+```
+
+### VeSync / Levoit LV600S Integration
+
+Controls the Levoit LV600S humidifier via VeSync cloud API.
+
+**Account:** vesync@htsn.io
+
+**Entities:**
+| Entity ID | Description |
+|-----------|-------------|
+| `humidifier.lv600s` | Main humidifier on/off control |
+| `sensor.lv600s_humidity` | Built-in humidity sensor (reads high near mist) |
+| `number.lv600s_mist_level` | Mist intensity (1-9) |
+| `switch.lv600s_display` | Display on/off |
+| `binary_sensor.lv600s_low_water` | Low water warning |
+| `binary_sensor.lv600s_water_tank_lifted` | Tank removed detection |
+
+### Oura Ring Integration (HACS)
+
+Monitors sleep, activity, and health metrics from Oura Ring via HACS custom integration.
+
+**Installation:** HACS → Integrations → Oura Ring v2
+
+**OAuth Credentials (Oura Developer Portal):**
+- Client ID: `e925a2a0-7767-4390-8b80-3a385a5b3ddc`
+- Client Secret: `xFSFSfUPihet1foWQRLAMUQbL9-kChqT_CjtHHpAxZs`
+- Redirect URI: `https://my.home-assistant.io/redirect/oauth`
+
+**Key Entities:**
+| Entity ID | Description |
+|-----------|-------------|
+| `sensor.oura_ring_readiness_score` | Daily readiness (0-100) |
+| `sensor.oura_ring_sleep_score` | Sleep quality (0-100) |
+| `sensor.oura_ring_current_heart_rate` | Current HR (bpm) |
+| `sensor.oura_ring_average_sleep_heart_rate` | Average HR during sleep |
+| `sensor.oura_ring_lowest_sleep_heart_rate` | Lowest HR during sleep |
+| `sensor.oura_ring_temperature_deviation` | Body temp deviation (°C) |
+| `sensor.oura_ring_spo2_average` | Blood oxygen (%) |
+| `sensor.oura_ring_steps` | Daily step count |
+| `sensor.oura_ring_activity_score` | Activity score (0-100) |
+
+**Troubleshooting:**
+- If sensors show "unavailable", check config entry state: `setup_retry` usually means API returned no data
+- Force sync the Oura app on your phone, then reload the integration
+- The integration polls Oura's API periodically; data updates after ring syncs to cloud
+
+### HomeKit Controller / Homebridge Integration
+
+Connects to Homebridge running on Mac Mini to access BLE devices (Govee sensors).
+
+**Homebridge Details:**
+- Host: Mac Mini (localhost)
+- Port: 51826
+- PIN: 031-45-154
+- Config: `~/.homebridge/config.json`
+- Logs: `~/.homebridge/homebridge.log`
+- LaunchAgent: `~/Library/LaunchAgents/com.homebridge.server.plist`
+
+**Govee H5074 Entities:**
+| Entity ID | Description |
+|-----------|-------------|
+| `sensor.goveeh5074_5059_humidity` | Room humidity (accurate reading) |
+| `sensor.goveeh5074_5059_temperature` | Room temperature |
+| `sensor.goveeh5074_5059_battery` | Sensor battery level |
+
+**Homebridge Management:**
+```bash
+# Check status
+launchctl list | grep homebridge
+
+# View logs
+tail -f ~/.homebridge/homebridge.log
+
+# Restart Homebridge
+launchctl stop com.homebridge.server
+launchctl start com.homebridge.server
+
+# Stop Homebridge
+launchctl unload ~/Library/LaunchAgents/com.homebridge.server.plist
+
+# Start Homebridge
+launchctl load ~/Library/LaunchAgents/com.homebridge.server.plist
+```

 ## Automations

-TODO: Document key automations
+### Guitar Room Humidity Control
+
+Maintains 45-47% humidity for guitar storage (Lowden recommends 49% ±2%).
+
+**Automations:**
+| Automation | Trigger | Action |
+|------------|---------|--------|
+| `guitar_room_humidity_low_turn_on_humidifier` | Govee H5074 < 45% | Turn ON humidifier, set mist to 6 |
+| `guitar_room_humidity_reached_turn_off_humidifier` | Govee H5074 > 47% | Turn OFF humidifier |
+
+**Why two thresholds (hysteresis):**
+- Prevents rapid on/off cycling
+- 45% turn-on, 47% turn-off creates a 2% buffer
+- Target range: 45-47% (conservatively below Lowden's 49% spec)
+
+### Oura Ring Health & Sleep Automations
+
+Uses Oura Ring biometrics for smart thermostat control and health alerts.
+
+**Sleep/Wake Detection:**
+| Automation | Trigger | Conditions | Action |
+|------------|---------|------------|--------|
+| `oura_sleep_detected_bedtime_mode` | HR < 55 bpm | Home, after 10pm | Thermostat → 66°F, front door light off, Telegram notify |
+| `oura_wake_up_detected_morning_mode` | HR > 65 bpm | Home, 5-11am, thermostat < 68°F | Thermostat → 69°F, Telegram notify |
+
+**Health Alerts:**
+| Automation | Trigger | Action |
+|------------|---------|--------|
+| `oura_low_readiness_alert` | 8am daily, readiness < 70 | Telegram: suggest rest day |
+| `oura_spo2_health_alert` | SpO2 < 94% | Urgent Telegram: health warning |
+| `oura_fever_detection_alert` | Temp deviation > 1°C | Telegram: possible illness alert |
+| `oura_sedentary_reminder` | 2pm weekdays, steps < 500 | Telegram: reminder to move |
+
+**Sleep Comfort & Recovery:**
+| Automation | Trigger | Conditions | Action |
+|------------|---------|------------|--------|
+| `oura_poor_sleep_recovery_mode` | 7am daily | Home, sleep score < 70 | Thermostat → 71°F (warmer for recovery) |
+| `oura_sleep_temp_adjustment_too_hot` | Temp deviation > +0.5°C | Home, 10pm-6am, HR < 60 | Thermostat → 64°F |
+| `oura_sleep_temp_adjustment_too_cold` | Temp deviation < -0.3°C | Home, 10pm-6am, HR < 60 | Thermostat → 68°F |
+
+**Notification Setup:**
+All notifications use `rest_command.notify_telegram` - ensure this is configured in `configuration.yaml`:
+```yaml
+rest_command:
+  notify_telegram:
+    url: "https://api.telegram.org/bot<TOKEN>/sendMessage"
+    method: POST
+    content_type: "application/json"
+    payload: '{"chat_id": "<CHAT_ID>", "text": "{{ message }}"}'
+```
+
+## SSH Access (Terminal & SSH Add-on)
+
+The Terminal & SSH add-on provides remote shell access to Home Assistant OS.
+
+**Connection:**
+```bash
+ssh root@10.10.10.210 -p 22
+```
+
+**Authentication:** SSH key from Mac Mini (`~/.ssh/id_ed25519.pub`)
+
+**Hostname:** `core-ssh`
+
+**Features:**
+- Direct shell access to Home Assistant OS
+- Access to Home Assistant CLI (`ha` command)
+- File system access for debugging
+
+## MCP Server Integration
+
+Home Assistant has a built-in Model Context Protocol (MCP) Server integration for AI assistant connectivity.
+
+**Status:** Enabled (configured with "Assist" service)
+
+**Endpoint:** `http://10.10.10.210:8123/api/mcp`
+
+**Claude Code Configuration:** Added to `~/.cursor/mcp.json`:
+```json
+{
+  "homeassistant": {
+    "type": "http",
+    "url": "http://10.10.10.210:8123/api/mcp",
+    "headers": {
+      "Authorization": "Bearer <HA_API_TOKEN>"
+    }
+  }
+}
+```
+
+**Note:** The MCP server uses the Assist API to expose entities and services to AI clients.

 ## TODO

 - [ ] Set static IP (currently DHCP at .210, should be .110)
- [ ] Add API token to this document
- [ ] Document installed integrations
- [ ] Document automations
+- [x] Add API token to this document
+- [x] Document installed integrations
+- [x] Document automations
 - [ ] Set up Traefik reverse proxy (ha.htsn.io)
+- [x] Install Terminal & SSH add-on
+- [x] Enable MCP Server integration
--- a/INFRASTRUCTURE.md
+++ b/INFRASTRUCTURE.md
@@ -45,7 +45,7 @@
 | 10.10.10.1 | router | Gateway/Firewall |
 | 10.10.10.102 | pve2 | Proxmox Server 2 |
 | 10.10.10.120 | pve | Proxmox Server 1 (Primary) |
-| 10.10.10.123 | mac-mini | Mac Mini (Syncthing node) |
+| 10.10.10.125 | mac-mini | Mac Mini (Syncthing node) |
 | 10.10.10.150 | windows-pc | Windows PC (Syncthing node) |
 | 10.10.10.147 | macbook | MacBook Pro (Syncthing node) |
 | 10.10.10.200 | truenas | TrueNAS (Storage/Syncthing hub) |
--- a/IP-ASSIGNMENTS.md
+++ b/IP-ASSIGNMENTS.md
@@ -45,6 +45,7 @@ This document tracks all IP addresses in the homelab infrastructure.
 |------|------|------------|---------|--------|
 | 300 | gitea-vm | 10.10.10.220 | Git server | Running |
 | 301 | trading-vm | 10.10.10.221 | AI trading platform (RTX A6000) | Running |
+| 302 | docker-host2 | 10.10.10.207 | Docker services (n8n, future apps) | Running |

 ## Workstations & Personal Devices

@@ -69,6 +70,10 @@ This document tracks all IP addresses in the homelab infrastructure.
 | CopyParty | cp.htsn.io | 10.10.10.201:3923 | Traefik-Primary |
 | LMDev | lmdev.htsn.io | 10.10.10.111 | Traefik-Primary |
 | Excalidraw | excalidraw.htsn.io | 10.10.10.206:8080 | Traefik-Primary |
+| MetaMCP | metamcp.htsn.io | 10.10.10.207:12008 | Traefik-Primary |
+| n8n | n8n.htsn.io | 10.10.10.207:5678 | Traefik-Primary |
+| PA API | pa.htsn.io | 10.10.10.207:8401 | Traefik-Primary (Tailscale only) |
+| Crafty Controller | mc.htsn.io | 10.10.10.207:8443 | Traefik-Primary |
 | Plex | plex.htsn.io | 10.10.10.100:32400 | Traefik-Saltbox |
 | Sonarr | sonarr.htsn.io | 10.10.10.100:8989 | Traefik-Saltbox |
 | Radarr | radarr.htsn.io | 10.10.10.100:7878 | Traefik-Saltbox |
@@ -92,6 +97,7 @@ This document tracks all IP addresses in the homelab infrastructure.
 - .200 - TrueNAS
 - .201 - CopyParty
 - .206 - Docker-host
+- .207 - Docker-host2
 - .220 - Gitea
 - .221 - Trading VM
 - .250 - Traefik-Primary
@@ -110,7 +116,7 @@ This document tracks all IP addresses in the homelab infrastructure.
 - 10.10.10.148 - 10.10.10.149 (2 IPs)
 - 10.10.10.151 - 10.10.10.199 (49 IPs)
 - 10.10.10.202 - 10.10.10.205 (4 IPs)
- 10.10.10.207 - 10.10.10.219 (13 IPs)
+- 10.10.10.208 - 10.10.10.219 (12 IPs)
 - 10.10.10.222 - 10.10.10.249 (28 IPs)
 - 10.10.10.251 - 10.10.10.254 (4 IPs)

@@ -123,6 +129,19 @@ This document tracks all IP addresses in the homelab infrastructure.
 | Portainer Agent | 9001 | Remote management from other Portainer |
 | Gotenberg | 3000 | PDF generation API |

+## Docker Host 2 Services (10.10.10.207) - PVE2
+
+| Service | Port | Purpose |
+|---------|------|---------|
+| PA API | 8401 | Personal Assistant API (pa.htsn.io) - Tailscale only |
+| MetaMCP | 12008 | MCP Aggregator/Gateway (metamcp.htsn.io) |
+| n8n | 5678 | Workflow automation |
+| Crafty Controller | 8443 | Minecraft server management (mc.htsn.io) |
+| Minecraft Java | 25565 | Minecraft Java Edition server |
+| Minecraft Bedrock | 19132/udp | Minecraft Bedrock Edition (Geyser) |
+| Trading Redis | 6379 | Redis for trading platform |
+| Trading TimescaleDB | 5433 | TimescaleDB for trading platform |
+
 ## Syncthing API Endpoints

 | Device | IP | Port | API Key |
@@ -132,6 +151,16 @@ This document tracks all IP addresses in the homelab infrastructure.
 | Android Phone | 10.10.10.54 | 8384 | Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM |
 | TrueNAS | 10.10.10.200 | 8384 | (check TrueNAS config) |

+## Mac Mini Services (10.10.10.125)
+
+| Service | Port | Purpose |
+|---------|------|---------|
+| MCP Bridge | 8400 | HTTP bridge for MCP tool execution (PA API backend) |
+| Beeper Desktop | 23373 | Message aggregation (Telegram, iMessage, SMS) |
+| Proton Bridge IMAP | 1143 | Personal email access |
+| Proton Bridge SMTP | 1025 | Personal email sending |
+| Syncthing | 8384 | File sync API |
+
 ## Notes

 - **MTU 9000** (jumbo frames) enabled on storage networks
--- a/MAINTENANCE.md
+++ b/MAINTENANCE.md
@@ -0,0 +1,618 @@
+# Maintenance Procedures and Schedules
+
+Regular maintenance procedures for homelab infrastructure to ensure reliability and performance.
+
+## Overview
+
+| Frequency | Tasks | Estimated Time |
+|-----------|-------|----------------|
+| **Daily** | Quick health check | 2-5 min |
+| **Weekly** | Service status, logs review | 15-30 min |
+| **Monthly** | Updates, backups verification | 1-2 hours |
+| **Quarterly** | Full system audit, testing | 2-4 hours |
+| **Annual** | Hardware maintenance, planning | 4-8 hours |
+
+---
+
+## Daily Maintenance (Automated)
+
+### Quick Health Check Script
+
+Save as `~/bin/homelab-health-check.sh`:
+
+```bash
+#!/bin/bash
+# Daily homelab health check
+
+echo "=== Homelab Health Check ==="
+echo "Date: $(date)"
+echo ""
+
+echo "=== Server Status ==="
+ssh pve 'uptime' 2>/dev/null || echo "PVE: UNREACHABLE"
+ssh pve2 'uptime' 2>/dev/null || echo "PVE2: UNREACHABLE"
+echo ""
+
+echo "=== CPU Temperatures ==="
+ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE: $(($(cat $f)/1000))°C"; fi; done'
+ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE2: $(($(cat $f)/1000))°C"; fi; done'
+echo ""
+
+echo "=== UPS Status ==="
+ssh pve 'upsc cyberpower@localhost | grep -E "battery.charge:|battery.runtime:|ups.load:|ups.status:"'
+echo ""
+
+echo "=== ZFS Pools ==="
+ssh pve 'zpool status -x' 2>/dev/null
+ssh pve2 'zpool status -x' 2>/dev/null
+ssh truenas 'zpool status -x vault'
+echo ""
+
+echo "=== Disk Space ==="
+ssh pve 'df -h | grep -E "Filesystem|/dev/(nvme|sd)"'
+ssh truenas 'df -h /mnt/vault'
+echo ""
+
+echo "=== VM Status ==="
+ssh pve 'qm list | grep running | wc -l' | xargs echo "PVE VMs running:"
+ssh pve2 'qm list | grep running | wc -l' | xargs echo "PVE2 VMs running:"
+echo ""
+
+echo "=== Syncthing Connections ==="
+curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" \
+  "http://127.0.0.1:8384/rest/system/connections" | \
+  python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; \
+  [print(f\"{v.get('name',k[:7])}: {'UP' if v['connected'] else 'DOWN'}\") for k,v in d.items()]"
+echo ""
+
+echo "=== Check Complete ==="
+```
+
+**Run daily via cron**:
+```bash
+# Add to crontab
+0 9 * * * ~/bin/homelab-health-check.sh | mail -s "Homelab Health Check" hutson@example.com
+```
+
+---
+
+## Weekly Maintenance
+
+### Service Status Review
+
+**Check all critical services**:
+```bash
+# Proxmox services
+ssh pve 'systemctl status pve-cluster pvedaemon pveproxy'
+ssh pve2 'systemctl status pve-cluster pvedaemon pveproxy'
+
+# NUT (UPS monitoring)
+ssh pve 'systemctl status nut-server nut-monitor'
+ssh pve2 'systemctl status nut-monitor'
+
+# Container services
+ssh pve 'pct exec 200 -- systemctl status pihole-FTL'  # Pi-hole
+ssh pve 'pct exec 202 -- systemctl status traefik'     # Traefik
+
+# VM services (via QEMU agent)
+ssh pve 'qm guest exec 100 -- bash -c "systemctl status nfs-server smbd"'  # TrueNAS
+```
+
+### Log Review
+
+**Check for errors in critical logs**:
+```bash
+# Proxmox system logs
+ssh pve 'journalctl -p err -b | tail -50'
+ssh pve2 'journalctl -p err -b | tail -50'
+
+# VM logs (if QEMU agent available)
+ssh pve 'qm guest exec 100 -- bash -c "journalctl -p err --since today"'
+
+# Traefik access logs
+ssh pve 'pct exec 202 -- tail -100 /var/log/traefik/access.log'
+```
+
+### Syncthing Sync Status
+
+**Check for sync errors**:
+```bash
+# Check all folder errors
+for folder in documents downloads desktop movies pictures notes config; do
+  echo "=== $folder ==="
+  curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" \
+    "http://127.0.0.1:8384/rest/folder/errors?folder=$folder" | jq
+done
+```
+
+**See**: [SYNCTHING.md](SYNCTHING.md)
+
+---
+
+## Monthly Maintenance
+
+### System Updates
+
+#### Proxmox Updates
+
+**Check for updates**:
+```bash
+ssh pve 'apt update && apt list --upgradable'
+ssh pve2 'apt update && apt list --upgradable'
+```
+
+**Apply updates**:
+```bash
+# PVE
+ssh pve 'apt update && apt dist-upgrade -y'
+
+# PVE2
+ssh pve2 'apt update && apt dist-upgrade -y'
+
+# Reboot if kernel updated
+ssh pve 'reboot'
+ssh pve2 'reboot'
+```
+
+**⚠️ Important**:
+- Check [Proxmox release notes](https://pve.proxmox.com/wiki/Roadmap) before major updates
+- Test on PVE2 first if possible
+- Ensure all VMs are backed up before updating
+- Monitor VMs after reboot - some may need manual restart
+
+#### Container Updates (LXC)
+
+```bash
+# Update all containers
+ssh pve 'for ctid in 200 202 205; do pct exec $ctid -- bash -c "apt update && apt upgrade -y"; done'
+```
+
+#### VM Updates
+
+**Update VMs individually via SSH**:
+```bash
+# Ubuntu/Debian VMs
+ssh truenas 'apt update && apt upgrade -y'
+ssh docker-host 'apt update && apt upgrade -y'
+ssh fs-dev 'apt update && apt upgrade -y'
+
+# Check if reboot required
+ssh truenas '[ -f /var/run/reboot-required ] && echo "Reboot required"'
+```
+
+### ZFS Scrubs
+
+**Schedule**: Run monthly on all pools
+
+**PVE**:
+```bash
+# Start scrub on all pools
+ssh pve 'zpool scrub nvme-mirror1'
+ssh pve 'zpool scrub nvme-mirror2'
+ssh pve 'zpool scrub rpool'
+
+# Check scrub status
+ssh pve 'zpool status | grep -A2 scrub'
+```
+
+**PVE2**:
+```bash
+ssh pve2 'zpool scrub nvme-mirror3'
+ssh pve2 'zpool scrub local-zfs2'
+ssh pve2 'zpool status | grep -A2 scrub'
+```
+
+**TrueNAS**:
+```bash
+# Scrub via TrueNAS web UI or SSH
+ssh truenas 'zpool scrub vault'
+ssh truenas 'zpool status vault | grep -A2 scrub'
+```
+
+**Automate scrubs**:
+```bash
+# Add to crontab (run on 1st of month at 2 AM)
+0 2 1 * * /sbin/zpool scrub nvme-mirror1
+0 2 1 * * /sbin/zpool scrub nvme-mirror2
+0 2 1 * * /sbin/zpool scrub rpool
+```
+
+**See**: [STORAGE.md](STORAGE.md) for pool details
+
+### SMART Tests
+
+**Run extended SMART tests monthly**:
+
+```bash
+# TrueNAS drives (via QEMU agent)
+ssh pve 'qm guest exec 100 -- bash -c "smartctl --scan | while read dev type; do smartctl -t long \$dev; done"'
+
+# Check results after 4-8 hours
+ssh pve 'qm guest exec 100 -- bash -c "smartctl --scan | while read dev type; do echo \"=== \$dev ===\"; smartctl -a \$dev | grep -E \"Model|Serial|test result|Reallocated|Current_Pending\"; done"'
+
+# PVE drives
+ssh pve 'for dev in /dev/nvme0 /dev/nvme1 /dev/sda /dev/sdb; do [ -e "$dev" ] && smartctl -t long $dev; done'
+
+# PVE2 drives
+ssh pve2 'for dev in /dev/nvme0 /dev/nvme1 /dev/sda /dev/sdb; do [ -e "$dev" ] && smartctl -t long $dev; done'
+```
+
+**Automate SMART tests**:
+```bash
+# Add to crontab (run on 15th of month at 3 AM)
+0 3 15 * * /usr/sbin/smartctl -t long /dev/nvme0
+0 3 15 * * /usr/sbin/smartctl -t long /dev/sda
+```
+
+### Certificate Renewal Verification
+
+**Check SSL certificate expiry**:
+```bash
+# Check Traefik certificates
+ssh pve 'pct exec 202 -- cat /etc/traefik/acme.json | jq ".letsencrypt.Certificates[] | {domain: .domain.main, expires: .Dates.NotAfter}"'
+
+# Check specific service
+echo | openssl s_client -servername git.htsn.io -connect git.htsn.io:443 2>/dev/null | openssl x509 -noout -dates
+```
+
+**Certificates should auto-renew 30 days before expiry via Traefik**
+
+**See**: [TRAEFIK.md](TRAEFIK.md) for certificate management
+
+### Backup Verification
+
+**⚠️ TODO**: No backup strategy currently in place
+
+**See**: [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) for implementation plan
+
+---
+
+## Quarterly Maintenance
+
+### Full System Audit
+
+**Check all systems comprehensively**:
+
+1. **ZFS Pool Health**:
+   ```bash
+   ssh pve 'zpool status -v'
+   ssh pve2 'zpool status -v'
+   ssh truenas 'zpool status -v vault'
+   ```
+   Look for: errors, degraded vdevs, resilver operations
+
+2. **SMART Health**:
+   ```bash
+   # Run SMART health check script
+   ~/bin/smart-health-check.sh
+   ```
+   Look for: reallocated sectors, pending sectors, failures
+
+3. **Disk Space Trends**:
+   ```bash
+   # Check growth rate
+   ssh pve 'zpool list -o name,size,allocated,free,fragmentation'
+   ssh truenas 'df -h /mnt/vault'
+   ```
+   Plan for expansion if >80% full
+
+4. **VM Resource Usage**:
+   ```bash
+   # Check if VMs need more/less resources
+   ssh pve 'qm list'
+   ssh pve 'pvesh get /nodes/pve/status'
+   ```
+
+5. **Network Performance**:
+   ```bash
+   # Test bandwidth between critical nodes
+   iperf3 -s  # On one host
+   iperf3 -c 10.10.10.120  # From another
+   ```
+
+6. **Temperature Monitoring**:
+   ```bash
+   # Check max temps over past quarter
+   # TODO: Set up Prometheus/Grafana for historical data
+   ssh pve 'sensors'
+   ssh pve2 'sensors'
+   ```
+
+### Service Dependency Testing
+
+**Test critical paths**:
+
+1. **Power failure recovery** (if safe to test):
+   - See [UPS.md](UPS.md) for full procedure
+   - Verify VM startup order works
+   - Confirm all services come back online
+
+2. **Failover testing**:
+   - Tailscale subnet routing (PVE → UCG-Fiber)
+   - NUT monitoring (PVE server → PVE2 client)
+
+3. **Backup restoration** (when backups implemented):
+   - Test restoring a VM from backup
+   - Test restoring files from Syncthing versioning
+
+### Documentation Review
+
+- [ ] Update IP assignments in [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md)
+- [ ] Review and update service URLs in [SERVICES.md](SERVICES.md)
+- [ ] Check for missing hardware specs in [HARDWARE.md](HARDWARE.md)
+- [ ] Update any changed procedures in this document
+
+---
+
+## Annual Maintenance
+
+### Hardware Maintenance
+
+**Physical cleaning**:
+```bash
+# Shut down servers (coordinate with users)
+ssh pve 'shutdown -h now'
+ssh pve2 'shutdown -h now'
+
+# Clean dust from:
+# - CPU heatsinks
+# - GPU fans
+# - Case fans
+# - PSU vents
+# - Storage enclosure fans
+
+# Check for:
+# - Bulging capacitors on PSU/motherboard
+# - Loose cables
+# - Fan noise/vibration
+```
+
+**Thermal paste inspection** (every 2-3 years):
+- Check CPU temps vs baseline
+- If temps >85°C under load, consider reapplying paste
+- Threadripper PRO: Tctl max safe = 90°C
+
+**See**: [HARDWARE.md](HARDWARE.md) for component details
+
+### UPS Battery Test
+
+**Runtime test**:
+```bash
+# Check battery health
+ssh pve 'upsc cyberpower@localhost | grep battery'
+
+# Perform runtime test (coordinate power loss)
+# 1. Note current runtime estimate
+# 2. Unplug UPS from wall
+# 3. Let battery drain to 20%
+# 4. Note actual runtime vs estimate
+# 5. Plug back in before shutdown triggers
+
+# Battery replacement if:
+# - Runtime < 10 min at typical load
+# - Battery age > 3-5 years
+# - Battery charge < 100% when on AC for 24h
+```
+
+**See**: [UPS.md](UPS.md) for full UPS details
+
+### Drive Replacement Planning
+
+**Check drive age and health**:
+```bash
+# Get drive hours and health
+ssh truenas 'smartctl --scan | while read dev type; do
+  echo "=== $dev ===";
+  smartctl -a $dev | grep -E "Model|Serial|Power_On_Hours|Reallocated|Pending";
+done'
+```
+
+**Replace drives if**:
+- Reallocated sectors > 0
+- Pending sectors > 0
+- SMART pre-fail warnings
+- Age > 5 years for HDDs (3-5 years for SSDs/NVMe)
+- Hours > 50,000 for consumer drives
+
+**Budget for replacements**:
+- HDDs: WD Red 6TB (~$150/drive)
+- NVMe: Samsung/Kingston 2TB (~$150-200/drive)
+
+### Capacity Planning
+
+**Review growth trends**:
+```bash
+# Storage growth (compare to last year)
+ssh pve 'zpool list'
+ssh truenas 'df -h /mnt/vault'
+
+# Network bandwidth (if monitoring in place)
+# Review Grafana dashboards
+
+# Power consumption
+ssh pve 'upsc cyberpower@localhost ups.load'
+```
+
+**Plan expansions**:
+- Storage: Add drives if >70% full
+- RAM: Check if VMs hitting limits
+- Network: Upgrade if bandwidth saturation
+- UPS: Upgrade if load >80%
+
+### License and Subscription Review
+
+**Proxmox subscription** (if applicable):
+- Community (free) or Enterprise subscription?
+- Check for updates to pricing/features
+
+**Service subscriptions**:
+- Domain registration (htsn.io)
+- Cloudflare plan (currently free)
+- Let's Encrypt (free, no action needed)
+
+---
+
+## Update Schedules
+
+### Proxmox
+
+| Component | Frequency | Notes |
+|-----------|-----------|-------|
+| Security patches | Weekly | Via `apt upgrade` |
+| Minor updates | Monthly | Test on PVE2 first |
+| Major versions | Quarterly | Read release notes, plan downtime |
+| Kernel updates | Monthly | Requires reboot |
+
+**Update procedure**:
+1. Check [Proxmox release notes](https://pve.proxmox.com/wiki/Roadmap)
+2. Backup VM configs: `vzdump --dumpdir /tmp`
+3. Update: `apt update && apt dist-upgrade`
+4. Reboot if kernel changed: `reboot`
+5. Verify VMs auto-started: `qm list`
+
+### Containers (LXC)
+
+| Container | Update Frequency | Package Manager |
+|-----------|------------------|-----------------|
+| Pi-hole (200) | Weekly | `apt` |
+| Traefik (202) | Monthly | `apt` |
+| FindShyt (205) | As needed | `apt` |
+
+**Update command**:
+```bash
+ssh pve 'pct exec CTID -- bash -c "apt update && apt upgrade -y"'
+```
+
+### VMs
+
+| VM | Update Frequency | Notes |
+|----|------------------|-------|
+| TrueNAS | Monthly | Via web UI or `apt` |
+| Saltbox | Weekly | Managed by Saltbox updates |
+| HomeAssistant | Monthly | Via HA supervisor |
+| Docker-host | Weekly | `apt` + Docker images |
+| Trading-VM | As needed | Via SSH |
+| Gitea-VM | Monthly | Via web UI + `apt` |
+
+**Docker image updates**:
+```bash
+ssh docker-host 'docker-compose pull && docker-compose up -d'
+```
+
+### Firmware Updates
+
+| Component | Check Frequency | Update Method |
+|-----------|----------------|---------------|
+| Motherboard BIOS | Annually | Manual flash (high risk) |
+| GPU firmware | Rarely | `nvidia-smi` or manual |
+| SSD/NVMe firmware | Quarterly | Vendor tools |
+| HBA firmware | Annually | LSI tools |
+| UPS firmware | Annually | PowerPanel or manual |
+
+**⚠️ Warning**: BIOS/firmware updates carry risk. Only update if:
+- Critical security issue
+- Needed for hardware compatibility
+- Fixing known bug affecting you
+
+---
+
+## Testing Checklists
+
+### Pre-Update Checklist
+
+Before ANY system update:
+- [ ] Check current system state: `uptime`, `qm list`, `zpool status`
+- [ ] Verify backups are current (when backup system in place)
+- [ ] Check for critical VMs/services that can't have downtime
+- [ ] Review update changelog/release notes
+- [ ] Test on non-critical system first (PVE2 or test VM)
+- [ ] Plan rollback strategy if update fails
+- [ ] Notify users if downtime expected
+
+### Post-Update Checklist
+
+After system update:
+- [ ] Verify system booted correctly: `uptime`
+- [ ] Check all VMs/CTs started: `qm list`, `pct list`
+- [ ] Test critical services:
+  - [ ] Pi-hole DNS: `nslookup google.com 10.10.10.10`
+  - [ ] Traefik routing: `curl -I https://plex.htsn.io`
+  - [ ] NFS/SMB shares: Test mount from VM
+  - [ ] Syncthing sync: Check all devices connected
+- [ ] Review logs for errors: `journalctl -p err -b`
+- [ ] Check temperatures: `sensors`
+- [ ] Verify UPS monitoring: `upsc cyberpower@localhost`
+
+### Disaster Recovery Test
+
+**Quarterly test** (when backup system in place):
+- [ ] Simulate VM failure: Restore from backup
+- [ ] Simulate storage failure: Import pool on different system
+- [ ] Simulate network failure: Verify Tailscale failover
+- [ ] Simulate power failure: Test UPS shutdown procedure (if safe)
+- [ ] Document recovery time and issues
+
+---
+
+## Log Rotation
+
+**System logs** are automatically rotated by systemd-journald and logrotate.
+
+**Check log sizes**:
+```bash
+# Journalctl size
+ssh pve 'journalctl --disk-usage'
+
+# Traefik logs
+ssh pve 'pct exec 202 -- du -sh /var/log/traefik/'
+```
+
+**Configure retention**:
+```bash
+# Limit journald to 500MB
+ssh pve 'echo "SystemMaxUse=500M" >> /etc/systemd/journald.conf'
+ssh pve 'systemctl restart systemd-journald'
+```
+
+**Traefik log rotation** (already configured):
+```bash
+# /etc/logrotate.d/traefik on CT 202
+/var/log/traefik/*.log {
+    daily
+    rotate 7
+    compress
+    delaycompress
+    missingok
+    notifempty
+}
+```
+
+---
+
+## Monitoring Integration
+
+**TODO**: Set up automated monitoring for these procedures
+
+**When monitoring is implemented** (see [MONITORING.md](MONITORING.md)):
+- ZFS scrub completion/errors
+- SMART test failures
+- Certificate expiry warnings (<30 days)
+- Update availability notifications
+- Disk space thresholds (>80%)
+- Temperature warnings (>85°C)
+
+---
+
+## Related Documentation
+
+- [MONITORING.md](MONITORING.md) - Automated health checks and alerts
+- [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) - Backup implementation plan
+- [UPS.md](UPS.md) - Power failure procedures
+- [STORAGE.md](STORAGE.md) - ZFS pool management
+- [HARDWARE.md](HARDWARE.md) - Hardware specifications
+- [SERVICES.md](SERVICES.md) - Service inventory
+
+---
+
+**Last Updated**: 2025-12-22
+**Status**: ⚠️ Manual procedures only - monitoring automation needed
--- a/MINECRAFT.md
+++ b/MINECRAFT.md
@@ -0,0 +1,711 @@
+# Minecraft Servers
+
+Minecraft servers running on docker-host2 via Crafty Controller 4.
+
+---
+
+## Servers Overview
+
+| Server | Address | Port | Version | Status |
+|--------|---------|------|---------|--------|
+| **Hutworld** | hutworld.htsn.io | 25565 | Paper 1.21.11 | Running |
+| **Backrooms** | backrooms.htsn.io | 25566 | Paper 1.21.4 | Running |
+
+### Web Map
+
+| Setting | Value |
+|---------|-------|
+| **URL** | https://map.htsn.io |
+| **Username** | hutworld |
+| **Password** | Suwanna123 |
+| **Plugin** | BlueMap 5.15 |
+| **Port** | 8100 (exposed via Docker) |
+
+---
+
+## Quick Reference
+
+### Hutworld (Main Server)
+
+| Setting | Value |
+|---------|-------|
+| **Web GUI** | https://mc.htsn.io |
+| **Game Server (Java)** | hutworld.htsn.io:25565 |
+| **Game Server (Bedrock)** | hutworld.htsn.io:19132 |
+| **Host** | docker-host2 (10.10.10.207) |
+| **Server Type** | Paper 1.21.11 |
+| **World Name** | hutworld |
+| **Memory** | 4GB min / 8GB max |
+
+### Backrooms (Horror/Exploration)
+
+| Setting | Value |
+|---------|-------|
+| **Web GUI** | https://mc.htsn.io |
+| **Game Server (Java)** | backrooms.htsn.io:25566 |
+| **Host** | docker-host2 (10.10.10.207) |
+| **Server Type** | Paper 1.21.4 |
+| **World Name** | backrooms |
+| **Memory** | 512MB min / 1.5GB max |
+| **Datapack** | The Backrooms v2.2.0 |
+
+**Backrooms Features:**
+- 50+ custom dimensions based on Backrooms lore
+- Use `/execute in backrooms:level0 run tp @s ~ ~ ~` to travel to Level 0
+- Horror-themed exploration gameplay
+- No client mods required (datapack only)
+
+---
+
+## Crafty Controller Access
+
+| Setting | Value |
+|---------|-------|
+| **URL** | https://mc.htsn.io |
+| **Username** | admin |
+| **Password** | See `/crafty/data/config/default-creds.txt` on docker-host2 |
+
+**Get password:**
+```bash
+ssh docker-host2 'cat ~/crafty/data/config/default-creds.txt'
+```
+
+---
+
+## Current Status
+
+### Completed
+
+- [x] Crafty Controller 4.4.7 deployed on docker-host2
+- [x] Traefik reverse proxy configured (mc.htsn.io → 10.10.10.207:8443)
+- [x] DNS A record created for hutworld.htsn.io (non-proxied, points to public IP)
+- [x] Port forwarding configured via UniFi API:
+  - TCP/UDP 25565 → 10.10.10.207 (Java Edition)
+  - UDP 19132 → 10.10.10.207 (Bedrock via Geyser)
+- [x] Server files transferred from Windows PC (D:\Minecraft\mcss\servers\hutworld)
+- [x] Server imported into Crafty and running
+- [x] Paper upgraded from 1.21.5 to 1.21.11
+- [x] Plugins updated (GSit 3.1.1, LuckPerms 5.5.22)
+- [x] Orphaned plugin data cleaned up
+- [x] LuckPerms database restored with original permissions
+- [x] Automated backups to TrueNAS configured (every 6 hours)
+
+### Pending
+
+- [ ] Install SilkSpawners plugin (allows mining spawners with Silk Touch)
+- [ ] Change Crafty admin password to something memorable
+- [ ] Test external connectivity from outside network
+
+---
+
+## Import Instructions
+
+To import the hutworld server in Crafty:
+
+1. Go to **Servers** → Click **+ Create New Server**
+2. Select **Import Server** tab
+3. Fill in:
+   - **Server Name:** `Hutworld`
+   - **Import Path:** `/crafty/import/hutworld`
+   - **Server JAR:** `paper.jar`
+   - **Min RAM:** `2048` (2GB)
+   - **Max RAM:** `6144` (6GB)
+   - **Server Port:** `25565`
+4. Click **Import Server**
+5. Go to server → Click **Start**
+
+---
+
+## Server Configuration
+
+### World Data
+
+| World | Description |
+|-------|-------------|
+| hutworld | Main overworld |
+| hutworld_nether | Nether dimension |
+| hutworld_the_end | End dimension |
+
+### Installed Plugins
+
+| Plugin | Version | Purpose |
+|--------|---------|---------|
+| EssentialsX | 2.20.1 | Core server commands |
+| EssentialsXChat | 2.20.1 | Chat formatting |
+| EssentialsXSpawn | 2.20.1 | Spawn management |
+| Geyser-Spigot | Latest | Bedrock Edition support |
+| floodgate | Latest | Bedrock authentication |
+| GSit | 3.1.1 | Sit/lay/crawl animations |
+| LuckPerms | 5.5.22 | Permissions management |
+| PluginPortal | 2.2.2 | Plugin management |
+| Vault | 1.7.3 | Economy/permissions API |
+| ViaVersion | Latest | Multi-version support |
+| ViaBackwards | 5.2.1 | Older client support |
+| randomtp | Latest | Random teleportation |
+| BlueMap | 5.15 | 3D web map with player tracking |
+| WorldEdit | 7.3.10 | World editing and terraforming |
+
+**Removed plugins** (cleaned up 2026-01-03):
+- GriefPrevention, Multiverse-Core, Multiverse-Portals, ProtocolLib, WorldGuard (disabled/orphaned)
+
+---
+
+## Docker Configuration
+
+**Location:** `~/crafty/docker-compose.yml` on docker-host2
+
+```yaml
+services:
+  crafty:
+    image: registry.gitlab.com/crafty-controller/crafty-4:4.4.7
+    container_name: crafty
+    restart: unless-stopped
+    environment:
+      - TZ=America/New_York
+    ports:
+      - "8443:8443"    # Web GUI (HTTPS)
+      - "8123:8123"    # Crafty HTTP
+      - "25565:25565"  # Minecraft Java
+      - "25566:25566"  # Additional server
+      - "19132:19132/udp"  # Minecraft Bedrock (Geyser)
+      - "8100:8100"    # BlueMap web server
+    volumes:
+      - ./data/backups:/crafty/backups
+      - ./data/logs:/crafty/logs
+      - ./data/servers:/crafty/servers
+      - ./data/config:/crafty/app/config
+      - ./data/import:/crafty/import
+```
+
+---
+
+## Traefik Configuration
+
+**File:** `/etc/traefik/conf.d/crafty.yaml` on CT 202 (10.10.10.250)
+
+```yaml
+http:
+  routers:
+    crafty-secure:
+      entryPoints:
+        - websecure
+      rule: "Host(`mc.htsn.io`)"
+      service: crafty
+      tls:
+        certResolver: cloudflare
+      priority: 50
+
+  services:
+    crafty:
+      loadBalancer:
+        servers:
+          - url: "https://10.10.10.207:8443"
+        serversTransport: crafty-transport@file
+
+  serversTransports:
+    crafty-transport:
+      insecureSkipVerify: true
+```
+
+---
+
+## Port Forwarding (UniFi)
+
+Configured via UniFi controller on UCG-Fiber (10.10.10.1):
+
+| Rule Name | Port | Protocol | Destination | Status |
+|-----------|------|----------|-------------|--------|
+| Minecraft Java | 25565 | TCP/UDP | 10.10.10.207:25565 | Active |
+| Minecraft Bedrock | 19132 | UDP | 10.10.10.207:19132 | Active |
+| Minecraft Backrooms | 25566 | TCP/UDP | 10.10.10.207:25566 | Active |
+
+---
+
+## DNS Records (Cloudflare)
+
+| Record | Type | Value | Proxied |
+|--------|------|-------|---------|
+| mc.htsn.io | CNAME | htsn.io | Yes (for web GUI) |
+| hutworld.htsn.io | A | 70.237.94.174 | No (direct for game traffic) |
+| backrooms.htsn.io | A | 70.237.94.174 | No (direct for game traffic) |
+
+**Note:** Game traffic (25565, 25566, 19132) cannot be proxied through Cloudflare - only HTTP/HTTPS works with Cloudflare proxy.
+
+---
+
+## LuckPerms Web Editor
+
+After server is running:
+
+1. Open Crafty console for Hutworld server
+2. Run command: `/lp editor`
+3. A unique URL will be generated (cloud-hosted by LuckPerms)
+4. Open the URL in browser to manage permissions
+
+The editor is hosted by LuckPerms, so no additional port forwarding is needed.
+
+---
+
+## Backup Configuration
+
+### Automated Backups to TrueNAS
+
+Backups run automatically every 2 hours and are stored on TrueNAS for both servers.
+
+| Setting | Value |
+|---------|-------|
+| **Destination** | TrueNAS (10.10.10.200) |
+| **Path** | `/mnt/vault/users/backups/minecraft/` |
+| **Frequency** | Every 2 hours (12 backups per day) |
+| **Retention** | 30 backups per server (~2.5 days of history) |
+| **Hutworld Size** | ~2-7 GB per backup |
+| **Backrooms Size** | ~100-150 MB per backup |
+| **Script** | `/home/hutson/minecraft-backup-all.sh` on docker-host2 |
+| **Log** | `/home/hutson/minecraft-backup.log` on docker-host2 |
+
+### Backup Scripts
+
+**Main Script:** `~/minecraft-backup-all.sh` on docker-host2 (backs up both servers)
+**Legacy Script:** `~/minecraft-backup.sh` on docker-host2 (Hutworld only)
+
+```bash
+#!/bin/bash
+# Minecraft Server Backup Script
+# Backs up Crafty server data to TrueNAS
+
+BACKUP_SRC="$HOME/crafty/data/servers/19f604a9-f037-442d-9283-0761c73cfd60"
+BACKUP_DEST="hutson@10.10.10.200:/mnt/vault/users/backups/minecraft"
+DATE=$(date +%Y-%m-%d_%H%M)
+BACKUP_NAME="hutworld-$DATE.tar.gz"
+LOCAL_BACKUP="/tmp/$BACKUP_NAME"
+
+# Create compressed backup (exclude large unnecessary files)
+tar -czf "$LOCAL_BACKUP" \
+    --exclude="*.jar" \
+    --exclude="cache" \
+    --exclude="libraries" \
+    --exclude=".paper-remapped" \
+    -C "$HOME/crafty/data/servers" \
+    19f604a9-f037-442d-9283-0761c73cfd60
+
+# Transfer to TrueNAS
+sshpass -p 'GrilledCh33s3#' scp -o StrictHostKeyChecking=no "$LOCAL_BACKUP" "$BACKUP_DEST/"
+
+# Clean up local temp file
+rm -f "$LOCAL_BACKUP"
+
+# Keep only last 30 backups on TrueNAS
+sshpass -p 'GrilledCh33s3#' ssh -o StrictHostKeyChecking=no hutson@10.10.10.200 '
+    cd /mnt/vault/users/backups/minecraft
+    ls -t hutworld-*.tar.gz 2>/dev/null | tail -n +31 | xargs -r rm -f
+'
+```
+
+### Cron Schedule
+
+```bash
+# View current schedule
+ssh docker-host2 'crontab -l | grep minecraft'
+
+# Output: 0 */2 * * * /home/hutson/minecraft-backup-all.sh >> /home/hutson/minecraft-backup.log 2>&1
+```
+
+### Manual Backup Commands
+
+```bash
+# Run backup manually
+ssh docker-host2 '~/minecraft-backup.sh'
+
+# Check backup log
+ssh docker-host2 'tail -20 ~/minecraft-backup.log'
+
+# List backups on TrueNAS
+sshpass -p 'GrilledCh33s3#' ssh -o StrictHostKeyChecking=no hutson@10.10.10.200 \
+  'ls -lh /mnt/vault/users/backups/minecraft/'
+```
+
+### Restore from Backup
+
+```bash
+# 1. Stop the server in Crafty web UI
+
+# 2. Copy backup from TrueNAS
+sshpass -p 'GrilledCh33s3#' scp -o StrictHostKeyChecking=no \
+  hutson@10.10.10.200:/mnt/vault/users/backups/minecraft/hutworld-YYYY-MM-DD_HHMM.tar.gz \
+  /tmp/
+
+# 3. Extract to server directory (backup existing first)
+ssh docker-host2 'cd ~/crafty/data/servers && \
+  mv 19f604a9-f037-442d-9283-0761c73cfd60 19f604a9-f037-442d-9283-0761c73cfd60.old && \
+  tar -xzf /tmp/hutworld-YYYY-MM-DD_HHMM.tar.gz'
+
+# 4. Start server in Crafty web UI
+```
+
+---
+
+## Admin Commands
+
+### Give Mob Spawner (1.21+ Syntax)
+
+In Minecraft 1.21+, the NBT syntax changed. Use `minecraft:give` to bypass Essentials:
+
+```
+minecraft:give <player> spawner[block_entity_data={id:"minecraft:mob_spawner",SpawnData:{entity:{id:"minecraft:<mob_type>"}}}]
+```
+
+**Examples:**
+```bash
+# Magma cube spawner
+minecraft:give suwann spawner[block_entity_data={id:"minecraft:mob_spawner",SpawnData:{entity:{id:"minecraft:magma_cube"}}}]
+
+# Zombie spawner
+minecraft:give suwann spawner[block_entity_data={id:"minecraft:mob_spawner",SpawnData:{entity:{id:"minecraft:zombie"}}}]
+
+# Skeleton spawner
+minecraft:give suwann spawner[block_entity_data={id:"minecraft:mob_spawner",SpawnData:{entity:{id:"minecraft:skeleton"}}}]
+
+# Blaze spawner
+minecraft:give suwann spawner[block_entity_data={id:"minecraft:mob_spawner",SpawnData:{entity:{id:"minecraft:blaze"}}}]
+```
+
+**Note:** Must use `minecraft:give` prefix to use vanilla command instead of Essentials `/give`.
+
+### RCON Access
+
+For remote console access to the server:
+
+| Setting | Value |
+|---------|-------|
+| **Host** | 10.10.10.207 |
+| **Port** | 25575 |
+| **Password** | HutworldRCON2026 |
+
+Example using mcrcon:
+```bash
+mcrcon -H 10.10.10.207 -P 25575 -p HutworldRCON2026
+```
+
+### BlueMap Commands
+
+```bash
+# Start full world render
+/bluemap render
+
+# Pause rendering
+/bluemap pause
+
+# Resume rendering
+/bluemap resume
+
+# Check render status
+/bluemap status
+
+# Reload BlueMap config
+/bluemap reload
+```
+
+---
+
+## Common Tasks
+
+### Start/Stop Server
+
+Via Crafty web UI at https://mc.htsn.io, or:
+
+```bash
+# Check Crafty container status
+ssh docker-host2 'docker ps | grep crafty'
+
+# Restart Crafty container
+ssh docker-host2 'cd ~/crafty && docker compose restart'
+
+# View Crafty logs
+ssh docker-host2 'docker logs -f crafty'
+```
+
+### Backup Server
+
+See [Backup Configuration](#backup-configuration) for full details.
+
+```bash
+# Run backup manually
+ssh docker-host2 '~/minecraft-backup.sh'
+
+# Check recent backups
+sshpass -p 'GrilledCh33s3#' ssh -o StrictHostKeyChecking=no hutson@10.10.10.200 \
+  'ls -lht /mnt/vault/users/backups/minecraft/ | head -5'
+```
+
+### Update Plugins
+
+1. Download new plugin JAR
+2. Upload via Crafty Files tab, or:
+```bash
+scp plugin.jar docker-host2:~/crafty/data/servers/hutworld/plugins/
+```
+3. Restart server in Crafty
+
+### Check Server Logs
+
+Via Crafty web UI (Logs tab), or:
+```bash
+ssh docker-host2 'tail -f ~/crafty/data/servers/hutworld/logs/latest.log'
+```
+
+---
+
+## Troubleshooting
+
+### Plugin Permission Issues (IMPORTANT)
+
+**Root Cause**: Crafty Docker container requires all files to be owned by `<user>:root` (not `<user>:<user>`) for permissions to work correctly.
+
+**Permanent Fix**:
+```bash
+# Fix all permissions immediately
+ssh docker-host2 'sudo chown -R hutson:root ~/crafty/data/servers/ && \
+  sudo find ~/crafty/data/servers/ -type d -exec chmod 2775 {} \; && \
+  sudo find ~/crafty/data/servers/ -type f -exec chmod 664 {} \;'
+```
+
+**Prevention**:
+1. **Always upload plugins through Crafty web UI** - this ensures correct permissions
+2. **Or use the import directory**: Copy to `~/crafty/data/import/` then restart container
+3. **Never directly copy files** to the servers directory
+
+**Check for permission issues**:
+```bash
+# Use the permission check script (recommended)
+ssh docker-host2 '~/check-crafty-permissions.sh'
+
+# Or manually check for wrong group ownership
+ssh docker-host2 'find ~/crafty/data/servers -type f ! -group root -ls'
+ssh docker-host2 'find ~/crafty/data/servers -type d ! -group root -ls'
+```
+
+**Permission Check Script**: Located at `~/check-crafty-permissions.sh` on docker-host2
+- Automatically detects permission issues
+- Offers to fix them with one command
+- Ignores temporary files that are expected to have different permissions
+
+### Crafty Shows Server Offline or "Another Instance Running"
+
+**Cause**: This happens when the server was started manually (not through Crafty) or when Crafty loses track of the server process.
+
+**Fix**:
+```bash
+# 1. Kill any orphaned server processes
+ssh docker-host2 'docker exec crafty pkill -f "paper.jar"'
+
+# 2. Restart Crafty container to clear state
+ssh docker-host2 'cd ~/crafty && docker compose restart'
+
+# 3. Wait 30-60 seconds - Crafty will auto-start the server
+```
+
+**Prevention**:
+- Always use Crafty web UI to start/stop servers
+- Never manually start the server with java command
+- If you must restart, use the container restart method above
+
+### Server won't start
+
+```bash
+# Check Crafty container logs
+ssh docker-host2 'docker logs crafty --tail 50'
+
+# Check server logs
+ssh docker-host2 'cat ~/crafty/data/servers/hutworld/logs/latest.log | tail -100'
+
+# Check Java version in container
+ssh docker-host2 'docker exec crafty java -version'
+```
+
+### Can't connect externally
+
+1. Verify port forwarding is active:
+```bash
+ssh root@10.10.10.1 'iptables -t nat -L -n | grep 25565'
+```
+
+2. Test from external network:
+```bash
+nc -zv hutworld.htsn.io 25565
+```
+
+3. Check if server is listening:
+```bash
+ssh docker-host2 'netstat -tlnp | grep 25565'
+```
+
+### Bedrock players can't connect
+
+1. Verify Geyser plugin is installed and enabled
+2. Check Geyser config: `~/crafty/data/servers/hutworld/plugins/Geyser-Spigot/config.yml`
+3. Ensure UDP 19132 is forwarded and not blocked
+
+### Corrupted plugin JARs (ZipException)
+
+If you see `java.util.zip.ZipException: zip END header not found`:
+
+1. **Check all plugins for corruption:**
+```bash
+ssh docker-host2 'cd ~/crafty/data/servers/19f604a9-f037-442d-9283-0761c73cfd60/plugins && \
+  for jar in *.jar; do unzip -t "$jar" > /dev/null 2>&1 && echo "OK: $jar" || echo "CORRUPT: $jar"; done'
+```
+
+2. **Re-download corrupted plugins from Hangar/Modrinth/SpigotMC**
+
+3. **Restart server**
+
+### Session lock errors
+
+If server fails with `session.lock: already locked`:
+
+```bash
+# Kill stale Java processes and remove locks
+ssh docker-host2 'docker exec crafty bash -c "pkill -f paper.jar; rm -f /crafty/servers/*/hutworld*/session.lock"'
+```
+
+### Permission denied errors in Docker
+
+If world files show `AccessDeniedException`:
+
+```bash
+# Fix permissions (crafty user is UID 1000)
+ssh docker-host2 'docker exec crafty bash -c "chown -R 1000:0 /crafty/servers/19f604a9-f037-442d-9283-0761c73cfd60/ && chmod -R u+rwX /crafty/servers/19f604a9-f037-442d-9283-0761c73cfd60/"'
+```
+
+### LuckPerms missing users/permissions
+
+If LuckPerms shows a fresh database (missing users like Suwan):
+
+1. **Check if original database exists:**
+```bash
+ssh docker-host2 'ls -la ~/crafty/data/import/hutworld/plugins/LuckPerms/*.db'
+```
+
+2. **Restore from import backup:**
+```bash
+# Stop server in Crafty UI first
+ssh docker-host2 'cp ~/crafty/data/import/hutworld/plugins/LuckPerms/luckperms-h2-v2.mv.db \
+  ~/crafty/data/servers/19f604a9-f037-442d-9283-0761c73cfd60/plugins/LuckPerms/'
+```
+
+3. **Or restore from TrueNAS backup:**
+```bash
+# List available backups
+sshpass -p 'GrilledCh33s3#' ssh -o StrictHostKeyChecking=no hutson@10.10.10.200 \
+  'ls -lt /mnt/vault/users/backups/minecraft/'
+
+# Extract LuckPerms database from backup
+sshpass -p 'GrilledCh33s3#' scp hutson@10.10.10.200:/mnt/vault/users/backups/minecraft/hutworld-YYYY-MM-DD_HHMM.tar.gz /tmp/
+tar -xzf /tmp/hutworld-*.tar.gz -C /tmp --strip-components=2 \
+  '*/plugins/LuckPerms/luckperms-h2-v2.mv.db'
+```
+
+4. **Restart server in Crafty UI**
+
+---
+
+## Migration History
+
+### 2026-01-04: Backup System (Updated 2026-01-13)
+
+- Configured automated backups to TrueNAS
+- **Updated frequency:** Every 2 hours (was 6 hours)
+- **Updated retention:** 30 backups (~2.5 days) (was 14 backups)
+- Created backup script with compression and cleanup
+- Storage: `/mnt/vault/users/backups/minecraft/`
+
+### 2026-01-03: Server Fixes & Updates
+
+**Updates:**
+- Upgraded Paper from 1.21.5 to 1.21.11 (build 69)
+- Updated GSit from 2.3.2 to 3.1.1
+- Fixed corrupted LuckPerms JAR (re-downloaded 5.5.22)
+- Restored original LuckPerms database with user permissions
+
+**Cleanup:**
+- Removed disabled plugins: Dynmap, Graves
+- Removed orphaned data folders: GriefPreventionData, SilkSpawners_v2, Graves, ViaRewind
+
+**Fixes:**
+- Fixed memory allocation (was attempting 2TB, set to 2GB min / 4GB max)
+- Fixed file permissions for Docker container access
+
+### 2026-01-03: Initial Migration
+
+**Source:** Windows PC (10.10.10.150) - D:\Minecraft\mcss\servers\hutworld
+
+**Steps completed:**
+1. Compressed hutworld folder on Windows (2.4GB zip)
+2. Transferred via SCP to docker-host2
+3. Unzipped to ~/crafty/data/import/hutworld
+4. Downloaded Paper 1.21.5 JAR (later upgraded to 1.21.11)
+5. Imported server into Crafty Controller
+6. Configured port forwarding (updated existing 25565 rule, added 19132)
+7. Created DNS record for hutworld.htsn.io
+
+**Original MCSS config preserved:** `mcss_server_config.json`
+
+---
+
+## Related Documentation
+
+- [IP Assignments](IP-ASSIGNMENTS.md) - Network configuration
+- [Traefik](TRAEFIK.md) - Reverse proxy setup
+- [VMs](VMS.md) - docker-host2 details
+- [Gateway](GATEWAY.md) - UCG-Fiber configuration
+
+---
+
+## Resources
+
+- [Crafty Controller Docs](https://docs.craftycontrol.com/)
+- [Paper MC](https://papermc.io/)
+- [Geyser MC](https://geysermc.org/)
+- [LuckPerms](https://luckperms.net/)
+
+---
+
+**Last Updated:** 2026-01-11
+
+---
+
+## Migration History (Hutworld)
+
+### 2026-01-13: Server Infrastructure Upgrades ✅
+
+- **RAM Upgraded:** Increased from 2GB/4GB to 4GB/8GB (min/max)
+- **Storage Expanded:** VM disk increased from 32GB to 64GB (33% used)
+- **RCON Enabled:** Remote console access configured on port 25575 - TESTED & WORKING
+- **WorldEdit Installed:** Version 7.3.10 for world editing capabilities
+- **Auto-Start Configured:** Server auto-starts with Crafty container
+- **Docker Cleanup:** Freed 1.1GB by removing unused images and containers
+- **Container Fixed:** Recreated with proper port mappings for RCON access
+
+### 2026-01-11: BlueMap Web Map Added
+
+- Installed BlueMap 5.15 plugin (supports MC 1.21.11)
+- Exposed port 8100 in docker-compose.yml for BlueMap web server
+- Configured Traefik routing: map.htsn.io → 10.10.10.207:8100
+- Added basic auth password protection via Traefik middleware
+- Fixed corrupted ViaVersion/ViaBackwards plugins (re-downloaded from Hangar)
+- Fixed Docker file permission issues (chown to UID 1000)
+- Documented 1.21+ spawner give command syntax
+
+---
+
+## Migration History (Backrooms)
+
+### 2026-01-05: Backrooms Server Created
+
+- Created new Backrooms server in Crafty Controller
+- Installed Paper 1.21.4 build 232 (recommended version for datapack)
+- Installed The Backrooms datapack v2.2.0 from Modrinth
+- DNS record created for backrooms.htsn.io
+- Memory configured for 512MB-1.5GB (VM memory constrained)
+- Server running on port 25566
+- **Pending:** Port forwarding for external access
--- a/MONITORING.md
+++ b/MONITORING.md
@@ -0,0 +1,711 @@
+# Monitoring and Alerting
+
+Documentation for system monitoring, health checks, and alerting across the homelab.
+
+## Current Monitoring Status
+
+| Component | Monitored? | Method | Alerts | Notes |
+|-----------|------------|--------|--------|-------|
+| **Gateway** | ✅ Yes | Custom services | ✅ Auto-reboot | Internet watchdog + memory monitor |
+| **UPS** | ✅ Yes | NUT + Home Assistant | ❌ No | Battery, load, runtime tracked |
+| **Syncthing** | ✅ Partial | API (manual checks) | ❌ No | Connection status available |
+| **Server temps** | ✅ Partial | Manual checks | ❌ No | Via `sensors` command |
+| **VM status** | ✅ Partial | Proxmox UI | ❌ No | Manual monitoring |
+| **ZFS health** | ❌ No | Manual `zpool status` | ❌ No | No automated checks |
+| **Disk health (SMART)** | ❌ No | Manual `smartctl` | ❌ No | No automated checks |
+| **Network** | ✅ Partial | Gateway watchdog | ✅ Auto-reboot | Connectivity check every 60s |
+| **Services** | ❌ No | - | ❌ No | No health checks |
+| **Backups** | ❌ No | - | ❌ No | No verification |
+| **Claude Code** | ✅ Yes | Prometheus + Grafana | ✅ Yes | Token usage, burn rate, cost tracking |
+
+**Overall Status**: ⚠️ **PARTIAL** - Gateway monitoring active, Claude Code active, most else is manual
+
+---
+
+## Existing Monitoring
+
+### UPS Monitoring (NUT)
+
+**Status**: ✅ **Active and working**
+
+**What's monitored**:
+- Battery charge percentage
+- Runtime remaining (seconds)
+- Load percentage
+- Input/output voltage
+- UPS status (OL/OB/LB)
+
+**Access**:
+```bash
+# Full UPS status
+ssh pve 'upsc cyberpower@localhost'
+
+# Key metrics
+ssh pve 'upsc cyberpower@localhost | grep -E "battery.charge:|battery.runtime:|ups.load:|ups.status:"'
+```
+
+**Home Assistant Integration**:
+- Sensors: `sensor.cyberpower_*`
+- Can be used for automation/alerts
+- Currently: No alerts configured
+
+**See**: [UPS.md](UPS.md)
+
+---
+
+### Gateway Monitoring
+
+**Status**: ✅ **Active with auto-recovery**
+
+Two custom systemd services monitor the UCG-Fiber gateway (10.10.10.1):
+
+**1. Internet Watchdog** (`internet-watchdog.service`)
+- Pings external DNS (1.1.1.1, 8.8.8.8, 208.67.222.222) every 60 seconds
+- Auto-reboots gateway after 5 consecutive failures (~5 minutes)
+- Logs to `/var/log/internet-watchdog.log`
+
+**2. Memory Monitor** (`memory-monitor.service`)
+- Logs memory usage and top processes every 10 minutes
+- Logs to `/data/logs/memory-history.log`
+- Auto-rotates when log exceeds 10MB
+
+**Quick Commands**:
+```bash
+# Check service status
+ssh ucg-fiber 'systemctl status internet-watchdog memory-monitor'
+
+# View watchdog activity
+ssh ucg-fiber 'tail -20 /var/log/internet-watchdog.log'
+
+# View memory history
+ssh ucg-fiber 'tail -100 /data/logs/memory-history.log'
+
+# Current memory usage
+ssh ucg-fiber 'free -m && ps -eo pid,rss,comm --sort=-rss | head -12'
+```
+
+**See**: [GATEWAY.md](GATEWAY.md)
+
+---
+
+### Claude Code Token Monitoring
+
+**Status**: ✅ **Active with alerts**
+
+Monitors Claude Code token usage across all machines to track subscription consumption and prevent hitting weekly limits.
+
+**Architecture**:
+```
+Claude Code (MacBook/Mac Mini)
+      │
+      ▼ (OTLP HTTP push every 60s)
+      │
+OTEL Collector (docker-host:4318)
+      │
+      ▼ (Prometheus exporter on :8889)
+      │
+Prometheus (docker-host:9090) ─── scrapes ───► otel-collector:8889
+      │
+      ├──► Grafana Dashboard
+      │
+      └──► Alertmanager (burn rate alerts)
+```
+
+**Note**: Uses Prometheus exporter instead of Remote Write because Claude Code sends Delta temporality metrics, which Remote Write doesn't support.
+
+**Monitored Devices**:
+All Claude Code sessions on any device automatically push metrics via OTLP.
+
+**What's monitored**:
+- Token usage (input/output/cache) over time
+- Burn rate (tokens/hour)
+- Cost tracking (USD)
+- Usage by model (Opus, Sonnet, Haiku)
+- Session count
+- Per-device breakdown
+
+**Dashboard**: https://grafana.htsn.io/d/claude-code-usage/claude-code-token-usage
+
+**Alerts Configured**:
+| Alert | Threshold | Severity |
+|-------|-----------|----------|
+| High Burn Rate | >100k tokens/hour for 15min | Warning |
+| Weekly Limit Risk | Projected >5M tokens/week | Critical |
+| No Metrics | Scrape fails for 5min | Info |
+
+**Configuration Files**:
+- Shell config: `~/.zshrc` (on each Mac - synced via Syncthing)
+- OTEL Collector: `/opt/monitoring/otel-collector/config.yaml` (docker-host)
+- Alert rules: `/opt/monitoring/prometheus/rules/claude-code.yml` (docker-host)
+
+**Shell Environment Setup** (in `~/.zshrc`):
+```bash
+# Claude Code OpenTelemetry Metrics (push to OTEL Collector)
+export CLAUDE_CODE_ENABLE_TELEMETRY=1
+export OTEL_METRICS_EXPORTER=otlp
+export OTEL_EXPORTER_OTLP_ENDPOINT="http://10.10.10.206:4318"
+export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
+export OTEL_METRIC_EXPORT_INTERVAL=60000
+```
+
+**Note**: These can be set either in shell environment (`~/.zshrc`) or in `~/.claude/settings.json` under the `env` block. Both methods work.
+
+**OTEL Collector Config** (`/opt/monitoring/otel-collector/config.yaml`):
+```yaml
+receivers:
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 0.0.0.0:4317
+      http:
+        endpoint: 0.0.0.0:4318
+
+processors:
+  batch:
+    timeout: 10s
+
+exporters:
+  prometheus:
+    endpoint: 0.0.0.0:8889
+    resource_to_telemetry_conversion:
+      enabled: true
+
+service:
+  pipelines:
+    metrics:
+      receivers: [otlp]
+      processors: [batch]
+      exporters: [prometheus]
+```
+
+**Prometheus Scrape Config** (add to `/opt/monitoring/prometheus/prometheus.yml`):
+```yaml
+  - job_name: "claude-code"
+    static_configs:
+      - targets: ["otel-collector:8889"]
+        labels:
+          group: "claude-code"
+```
+
+**Useful PromQL Queries**:
+```promql
+# Total tokens by model
+sum(claude_code_token_usage_tokens_total) by (model)
+
+# Burn rate (tokens/hour)
+sum(rate(claude_code_token_usage_tokens_total[1h])) * 3600
+
+# Total cost by model
+sum(claude_code_cost_usage_USD_total) by (model)
+
+# Usage by type (input, output, cacheRead, cacheCreation)
+sum(claude_code_token_usage_tokens_total) by (type)
+
+# Projected weekly usage (rough estimate)
+sum(increase(claude_code_token_usage_tokens_total[24h])) * 7
+```
+
+**Important Notes**:
+- After changing `~/.zshrc`, start a new terminal/shell session before running Claude Code
+- Metrics only flow while Claude Code is running
+- Weekly subscription resets Monday 1am (America/New_York)
+- Verify env vars are set: `env | grep OTEL`
+
+**Added**: 2026-01-16
+
+---
+
+### Syncthing Monitoring
+
+**Status**: ⚠️ **Partial** - API available, no automated monitoring
+
+**What's available**:
+- Device connection status
+- Folder sync status
+- Sync errors
+- Bandwidth usage
+
+**Manual Checks**:
+```bash
+# Check connections (Mac Mini)
+curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" \
+  "http://127.0.0.1:8384/rest/system/connections" | \
+  python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; \
+  [print(f\"{v.get('name',k[:7])}: {'UP' if v['connected'] else 'DOWN'}\") for k,v in d.items()]"
+
+# Check folder status
+curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" \
+  "http://127.0.0.1:8384/rest/db/status?folder=documents" | jq
+
+# Check errors
+curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" \
+  "http://127.0.0.1:8384/rest/folder/errors?folder=documents" | jq
+```
+
+**Needs**: Automated monitoring script + alerts
+
+**See**: [SYNCTHING.md](SYNCTHING.md)
+
+---
+
+### Temperature Monitoring
+
+**Status**: ⚠️ **Manual only**
+
+**Current Method**:
+```bash
+# CPU temperature (Threadripper Tctl)
+ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do \
+  label=$(cat ${f%_input}_label 2>/dev/null); \
+  if [ "$label" = "Tctl" ]; then echo "PVE Tctl: $(($(cat $f)/1000))°C"; fi; done'
+
+ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do \
+  label=$(cat ${f%_input}_label 2>/dev/null); \
+  if [ "$label" = "Tctl" ]; then echo "PVE2 Tctl: $(($(cat $f)/1000))°C"; fi; done'
+```
+
+**Thresholds**:
+- Healthy: 70-80°C under load
+- Warning: >85°C
+- Critical: >90°C (throttling)
+
+**Needs**: Automated monitoring + alert if >85°C
+
+---
+
+### Proxmox VM Monitoring
+
+**Status**: ⚠️ **Manual only**
+
+**Current Access**:
+- Proxmox Web UI: Node → Summary
+- CLI: `ssh pve 'qm list'`
+
+**Metrics Available** (via Proxmox):
+- CPU usage per VM
+- RAM usage per VM
+- Disk I/O
+- Network I/O
+- VM uptime
+
+**Needs**: API-based monitoring + alerts for VM down
+
+---
+
+## Recommended Monitoring Stack
+
+### Option 1: Prometheus + Grafana (Recommended)
+
+**Why**:
+- Industry standard
+- Extensive integrations
+- Beautiful dashboards
+- Flexible alerting
+
+**Architecture**:
+```
+Grafana (dashboard) → Prometheus (metrics DB) → Exporters (data collection)
+                              ↓
+                          Alertmanager (alerts)
+```
+
+**Required Exporters**:
+| Exporter | Monitors | Install On |
+|----------|----------|------------|
+| node_exporter | CPU, RAM, disk, network | PVE, PVE2, TrueNAS, all VMs |
+| zfs_exporter | ZFS pool health | PVE, PVE2, TrueNAS |
+| smartmon_exporter | Drive SMART data | PVE, PVE2, TrueNAS |
+| nut_exporter | UPS metrics | PVE |
+| proxmox_exporter | VM/CT stats | PVE, PVE2 |
+| cadvisor | Docker containers | Saltbox, docker-host |
+
+**Deployment**:
+```bash
+# Create monitoring VM
+ssh pve 'qm create 210 --name monitoring --memory 4096 --cores 2 \
+  --net0 virtio,bridge=vmbr0'
+
+# Install Prometheus + Grafana (via Docker)
+# /opt/monitoring/docker-compose.yml
+```
+
+**Estimated Setup Time**: 4-6 hours
+
+---
+
+### Option 2: Uptime Kuma (Simpler Alternative)
+
+**Why**:
+- Lightweight
+- Easy to set up
+- Web-based dashboard
+- Built-in alerts (email, Slack, etc.)
+
+**What it monitors**:
+- HTTP/HTTPS endpoints
+- Ping (ICMP)
+- Ports (TCP)
+- Docker containers
+
+**Deployment**:
+```bash
+ssh docker-host 'mkdir -p /opt/uptime-kuma'
+cat > docker-compose.yml << 'EOF'
+version: "3.8"
+services:
+  uptime-kuma:
+    image: louislam/uptime-kuma:latest
+    ports:
+      - "3001:3001"
+    volumes:
+      - ./data:/app/data
+    restart: unless-stopped
+EOF
+
+# Access: http://10.10.10.206:3001
+# Add Traefik config for uptime.htsn.io
+```
+
+**Estimated Setup Time**: 1-2 hours
+
+---
+
+### Option 3: Netdata (Real-time Monitoring)
+
+**Why**:
+- Real-time metrics (1-second granularity)
+- Auto-discovers services
+- Low overhead
+- Beautiful web UI
+
+**Deployment**:
+```bash
+# Install on each server
+ssh pve 'bash <(curl -Ss https://my-netdata.io/kickstart.sh)'
+ssh pve2 'bash <(curl -Ss https://my-netdata.io/kickstart.sh)'
+
+# Access:
+# http://10.10.10.120:19999 (PVE)
+# http://10.10.10.102:19999 (PVE2)
+```
+
+**Parent-Child Setup** (optional):
+- Configure PVE as parent
+- Stream metrics from PVE2 → PVE
+- Single dashboard for both servers
+
+**Estimated Setup Time**: 1 hour
+
+---
+
+## Critical Metrics to Monitor
+
+### Server Health
+
+| Metric | Threshold | Action |
+|--------|-----------|--------|
+| **CPU usage** | >90% for 5 min | Alert |
+| **CPU temp** | >85°C | Alert |
+| **CPU temp** | >90°C | Critical alert |
+| **RAM usage** | >95% | Alert |
+| **Disk space** | >80% | Warning |
+| **Disk space** | >90% | Alert |
+| **Load average** | >CPU count | Alert |
+
+### Storage Health
+
+| Metric | Threshold | Action |
+|--------|-----------|--------|
+| **ZFS pool errors** | >0 | Alert immediately |
+| **ZFS pool degraded** | Any degraded vdev | Critical alert |
+| **ZFS scrub failed** | Last scrub error | Alert |
+| **SMART reallocated sectors** | >0 | Warning |
+| **SMART pending sectors** | >0 | Alert |
+| **SMART failure** | Pre-fail | Critical - replace drive |
+
+### UPS
+
+| Metric | Threshold | Action |
+|--------|-----------|--------|
+| **Battery charge** | <20% | Warning |
+| **Battery charge** | <10% | Alert |
+| **On battery** | >5 min | Alert |
+| **Runtime** | <5 min | Critical |
+
+### Network
+
+| Metric | Threshold | Action |
+|--------|-----------|--------|
+| **Device unreachable** | >2 min down | Alert |
+| **High packet loss** | >5% | Warning |
+| **Bandwidth saturation** | >90% | Warning |
+
+### VMs/Services
+
+| Metric | Threshold | Action |
+|--------|-----------|--------|
+| **VM stopped** | Critical VM down | Alert immediately |
+| **Service unreachable** | HTTP 5xx or timeout | Alert |
+| **Backup failed** | Any backup failure | Alert |
+| **Certificate expiry** | <30 days | Warning |
+| **Certificate expiry** | <7 days | Alert |
+
+---
+
+## Alert Destinations
+
+### Email Alerts
+
+**Recommended**: Set up SMTP relay for email alerts
+
+**Options**:
+1. Gmail SMTP (free, rate-limited)
+2. SendGrid (free tier: 100 emails/day)
+3. Mailgun (free tier available)
+4. Self-hosted mail server (complex)
+
+**Configuration Example** (Prometheus Alertmanager):
+```yaml
+# /etc/alertmanager/alertmanager.yml
+receivers:
+  - name: 'email'
+    email_configs:
+      - to: 'hutson@example.com'
+        from: 'alerts@htsn.io'
+        smarthost: 'smtp.gmail.com:587'
+        auth_username: 'alerts@htsn.io'
+        auth_password: 'app-password-here'
+```
+
+---
+
+### Push Notifications
+
+**Options**:
+- **Pushover**: $5 one-time, reliable
+- **Pushbullet**: Free tier available
+- **Telegram Bot**: Free
+- **Discord Webhook**: Free
+- **Slack**: Free tier available
+
+**Recommended**: Pushover or Telegram for mobile alerts
+
+---
+
+### Home Assistant Alerts
+
+Since Home Assistant is already running, use it for alerts:
+
+**Automation Example**:
+```yaml
+automation:
+  - alias: "UPS Low Battery Alert"
+    trigger:
+      - platform: numeric_state
+        entity_id: sensor.cyberpower_battery_charge
+        below: 20
+    action:
+      - service: notify.mobile_app
+        data:
+          message: "⚠️ UPS battery at {{ states('sensor.cyberpower_battery_charge') }}%"
+
+  - alias: "Server High Temperature"
+    trigger:
+      - platform: template
+        value_template: "{{ sensor.pve_cpu_temp > 85 }}"
+    action:
+      - service: notify.mobile_app
+        data:
+          message: "🔥 PVE CPU temperature: {{ states('sensor.pve_cpu_temp') }}°C"
+```
+
+**Needs**: Sensors for CPU temp, disk space, etc. in Home Assistant
+
+---
+
+## Monitoring Scripts
+
+### Daily Health Check
+
+Save as `~/bin/homelab-health-check.sh`:
+
+```bash
+#!/bin/bash
+# Daily homelab health check
+
+echo "=== Homelab Health Check ==="
+echo "Date: $(date)"
+echo ""
+
+echo "=== Server Status ==="
+ssh pve 'uptime' 2>/dev/null || echo "PVE: UNREACHABLE"
+ssh pve2 'uptime' 2>/dev/null || echo "PVE2: UNREACHABLE"
+echo ""
+
+echo "=== CPU Temperatures ==="
+ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE: $(($(cat $f)/1000))°C"; fi; done'
+ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE2: $(($(cat $f)/1000))°C"; fi; done'
+echo ""
+
+echo "=== UPS Status ==="
+ssh pve 'upsc cyberpower@localhost | grep -E "battery.charge:|battery.runtime:|ups.load:|ups.status:"'
+echo ""
+
+echo "=== ZFS Pools ==="
+ssh pve 'zpool status -x' 2>/dev/null
+ssh pve2 'zpool status -x' 2>/dev/null
+ssh truenas 'zpool status -x vault'
+echo ""
+
+echo "=== Disk Space ==="
+ssh pve 'df -h | grep -E "Filesystem|/dev/(nvme|sd)"'
+ssh truenas 'df -h /mnt/vault'
+echo ""
+
+echo "=== VM Status ==="
+ssh pve 'qm list | grep running | wc -l' | xargs echo "PVE VMs running:"
+ssh pve2 'qm list | grep running | wc -l' | xargs echo "PVE2 VMs running:"
+echo ""
+
+echo "=== Syncthing Connections ==="
+curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" \
+  "http://127.0.0.1:8384/rest/system/connections" | \
+  python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; \
+  [print(f\"{v.get('name',k[:7])}: {'UP' if v['connected'] else 'DOWN'}\") for k,v in d.items()]"
+echo ""
+
+echo "=== Check Complete ==="
+```
+
+**Run daily**:
+```cron
+0 9 * * * ~/bin/homelab-health-check.sh | mail -s "Homelab Health Check" hutson@example.com
+```
+
+---
+
+### ZFS Scrub Checker
+
+```bash
+#!/bin/bash
+# Check last ZFS scrub status
+
+echo "=== ZFS Scrub Status ==="
+
+for host in pve pve2; do
+  echo "--- $host ---"
+  ssh $host 'zpool status | grep -A1 scrub'
+  echo ""
+done
+
+echo "--- TrueNAS ---"
+ssh truenas 'zpool status vault | grep -A1 scrub'
+```
+
+---
+
+### SMART Health Checker
+
+```bash
+#!/bin/bash
+# Check SMART health on all drives
+
+echo "=== SMART Health Check ==="
+
+echo "--- TrueNAS Drives ---"
+ssh truenas 'smartctl --scan | while read dev type; do
+  echo "=== $dev ===";
+  smartctl -H $dev | grep -E "SMART overall|PASSED|FAILED";
+done'
+
+echo "--- PVE Drives ---"
+ssh pve 'for dev in /dev/nvme* /dev/sd*; do
+  [ -e "$dev" ] && echo "=== $dev ===" && smartctl -H $dev | grep -E "SMART|PASSED|FAILED";
+done'
+```
+
+---
+
+## Dashboard Recommendations
+
+### Grafana Dashboard Layout
+
+**Page 1: Overview**
+- Server uptime
+- CPU usage (all servers)
+- RAM usage (all servers)
+- Disk space (all pools)
+- Network traffic
+- UPS status
+
+**Page 2: Storage**
+- ZFS pool health
+- SMART status for all drives
+- I/O latency
+- Scrub progress
+- Disk temperatures
+
+**Page 3: VMs**
+- VM status (up/down)
+- VM resource usage
+- VM disk I/O
+- VM network traffic
+
+**Page 4: Services**
+- Service health checks
+- HTTP response times
+- Certificate expiry dates
+- Syncthing sync status
+
+---
+
+## Implementation Plan
+
+### Phase 1: Basic Monitoring (Week 1)
+
+- [ ] Install Uptime Kuma or Netdata
+- [ ] Add HTTP checks for all services
+- [ ] Configure UPS alerts in Home Assistant
+- [ ] Set up daily health check email
+
+**Estimated Time**: 4-6 hours
+
+---
+
+### Phase 2: Advanced Monitoring (Week 2-3)
+
+- [ ] Install Prometheus + Grafana
+- [ ] Deploy node_exporter on all servers
+- [ ] Deploy zfs_exporter
+- [ ] Deploy smartmon_exporter
+- [ ] Create Grafana dashboards
+
+**Estimated Time**: 8-12 hours
+
+---
+
+### Phase 3: Alerting (Week 4)
+
+- [ ] Configure Alertmanager
+- [ ] Set up email/push notifications
+- [ ] Create alert rules for all critical metrics
+- [ ] Test all alert paths
+- [ ] Document alert procedures
+
+**Estimated Time**: 4-6 hours
+
+---
+
+## Related Documentation
+
+- [GATEWAY.md](GATEWAY.md) - Gateway monitoring and troubleshooting
+- [UPS.md](UPS.md) - UPS monitoring details
+- [STORAGE.md](STORAGE.md) - ZFS health checks
+- [SERVICES.md](SERVICES.md) - Service inventory
+- [HOMEASSISTANT.md](HOMEASSISTANT.md) - Home Assistant automations
+- [MAINTENANCE.md](MAINTENANCE.md) - Regular maintenance checks
+
+---
+
+**Last Updated**: 2026-01-02
+**Status**: ⚠️ **Partial monitoring - Gateway active, other systems need implementation**
--- a/N8N-INTEGRATIONS.md
+++ b/N8N-INTEGRATIONS.md
@@ -0,0 +1,382 @@
+# n8n Homelab Integrations - Quick Start Guide
+
+n8n is running on your homelab network (10.10.10.207) and can access all local services. This guide sets up useful automations.
+
+---
+
+## Network Access Verified
+
+n8n can connect to:
+- ✅ **Home Assistant** (10.10.10.110:8123)
+- ✅ **Prometheus** (10.10.10.206:9090)
+- ✅ **Grafana** (10.10.10.206:3001)
+- ✅ **Syncthing** (10.10.10.200:8384)
+- ✅ **PiHole** (10.10.10.10)
+- ✅ **Gitea** (10.10.10.220:3000)
+- ✅ **Proxmox** (10.10.10.120:8006, 10.10.10.102:8006)
+- ✅ **TrueNAS** (10.10.10.200)
+- ✅ **All external APIs** (via internet)
+
+---
+
+## Initial Setup (First-Time)
+
+1. Open **https://n8n.htsn.io**
+2. Complete the setup wizard:
+   - **Owner Email:** hutson@htsn.io
+   - **Owner Name:** Hutson
+   - **Password:** (choose secure password)
+3. Skip data sharing (optional)
+
+---
+
+## Credentials to Add in n8n
+
+Go to **Settings → Credentials** and add:
+
+### 1. Home Assistant
+
+| Field | Value |
+|-------|-------|
+| **Credential Type** | Home Assistant API |
+| **Host** | `http://10.10.10.110:8123` |
+| **Access Token** | (get from Home Assistant) |
+
+**Get Token:** Home Assistant → Profile → Long-Lived Access Tokens → Create Token
+
+---
+
+### 2. Prometheus
+
+| Field | Value |
+|-------|-------|
+| **Credential Type** | HTTP Request (Generic) |
+| **URL** | `http://10.10.10.206:9090` |
+| **Authentication** | None |
+
+---
+
+### 3. Grafana
+
+| Field | Value |
+|-------|-------|
+| **Credential Type** | Grafana API |
+| **URL** | `http://10.10.10.206:3001` |
+| **API Key** | (create in Grafana) |
+
+**Get API Key:** Grafana → Administration → Service Accounts → Create → Add Token
+
+---
+
+### 4. Syncthing
+
+| Field | Value |
+|-------|-------|
+| **Credential Type** | HTTP Request (Generic) |
+| **URL** | `http://10.10.10.200:8384` |
+| **Header Name** | `X-API-Key` |
+| **Header Value** | `VFJ7XZPJoWvkYj6fKzpQxc9u3XC8KUBs` |
+
+---
+
+### 5. Telegram Bot
+
+| Field | Value |
+|-------|-------|
+| **Credential Type** | Telegram API |
+| **Access Token** | `8450212653:AAHoVBlNUuA0vtrVPMNUfSgJh_gmFMxlrBg` |
+
+**Your Chat ID:** `1004084736`
+
+---
+
+### 6. Proxmox
+
+| Field | Value |
+|-------|-------|
+| **Credential Type** | HTTP Request (Generic) |
+| **URL** | `http://10.10.10.120:8006` |
+| **Authentication** | API Token |
+| **Token** | (use monitoring@pve token if needed) |
+
+---
+
+## Starter Workflows
+
+### Workflow 1: Homelab Health Check (Every Hour)
+
+**Nodes:**
+1. **Schedule Trigger** (every hour)
+2. **HTTP Request** → Prometheus query for down hosts
+   - URL: `http://10.10.10.206:9090/api/v1/query`
+   - Query param: `query=up{job=~"node.*"} == 0`
+3. **If** → Check if any hosts are down
+4. **Telegram** → Send alert if hosts down
+
+**PromQL Query:**
+```
+up{job=~"node.*"} == 0
+```
+
+---
+
+### Workflow 2: Daily Backup Status
+
+**Nodes:**
+1. **Schedule Trigger** (8am daily)
+2. **HTTP Request** → Query Syncthing sync status
+   - URL: `http://10.10.10.200:8384/rest/db/status?folder=backup`
+   - Header: `X-API-Key: VFJ7XZPJoWvkYj6fKzpQxc9u3XC8KUBs`
+3. **Function** → Check if folder is syncing
+4. **Telegram** → Send daily status report
+
+---
+
+### Workflow 3: High CPU Alert
+
+**Nodes:**
+1. **Schedule Trigger** (every 5 minutes)
+2. **HTTP Request** → Prometheus CPU query
+   - URL: `http://10.10.10.206:9090/api/v1/query`
+   - Query: `100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)`
+3. **If** → CPU > 90%
+4. **Telegram** → Send alert
+
+---
+
+### Workflow 4: UPS Power Event
+
+**Webhook Trigger Setup:**
+1. Create webhook trigger in n8n
+2. Get webhook URL: `https://n8n.htsn.io/webhook/ups-alert`
+3. Configure NUT to call webhook on power events
+
+**Nodes:**
+1. **Webhook Trigger** → Receive UPS event
+2. **Switch** → Route by event type (on battery, low battery, online)
+3. **Telegram** → Send appropriate alert
+
+---
+
+### Workflow 5: Gitea → Deploy on Push
+
+**Nodes:**
+1. **Webhook Trigger** → Gitea push event
+2. **If** → Check if branch is `main`
+3. **SSH** → Connect to target server
+4. **Execute Command** → `git pull && docker-compose up -d`
+5. **Telegram** → Notify deployment complete
+
+---
+
+### Workflow 6: Syncthing Folder Behind Alert
+
+**Nodes:**
+1. **Schedule Trigger** (every 30 minutes)
+2. **HTTP Request** → Get all folder statuses
+   - URL: `http://10.10.10.200:8384/rest/stats/folder`
+3. **Function** → Check if any folder has errors or is significantly behind
+4. **If** → Errors found
+5. **Telegram** → Alert with folder name and status
+
+---
+
+### Workflow 7: Grafana Alert Forwarder
+
+**Purpose:** Forward Grafana alerts to Telegram
+
+**Nodes:**
+1. **Webhook Trigger** → Grafana webhook
+2. **Function** → Parse alert data
+3. **Telegram** → Format and send alert
+
+**Grafana Setup:**
+- Contact Point → Add webhook: `https://n8n.htsn.io/webhook/grafana-alerts`
+
+---
+
+### Workflow 8: Daily Homelab Summary
+
+**Nodes:**
+1. **Schedule Trigger** (9am daily)
+2. **Multiple HTTP Requests in parallel:**
+   - Prometheus: System uptime
+   - Prometheus: Average CPU usage (24h)
+   - Prometheus: Disk usage
+   - Syncthing: Sync status (all folders)
+   - PiHole: Queries blocked (24h)
+3. **Function** → Format data as summary
+4. **Telegram** → Send daily report
+
+**Example Output:**
+```
+🏠 Homelab Daily Summary
+
+✅ All systems operational
+⏱️  Uptime: 14 days
+📊 Avg CPU: 12%
+💾 Disk: 45% used
+🔄 Syncthing: All folders in sync
+🛡️  PiHole: 2,341 queries blocked
+
+Last updated: 2025-12-27 09:00
+```
+
+---
+
+### Workflow 9: VM State Change Monitor
+
+**Nodes:**
+1. **Schedule Trigger** (every 1 minute)
+2. **HTTP Request** → Query Proxmox API for VM list
+3. **Function** → Compare with previous state (use Set node)
+4. **If** → VM state changed
+5. **Telegram** → Notify VM started/stopped
+
+---
+
+### Workflow 10: Internet Speed Test Alert
+
+**Nodes:**
+1. **Schedule Trigger** (every 6 hours)
+2. **HTTP Request** → Prometheus speedtest exporter
+3. **If** → Download speed < 500 Mbps
+4. **Telegram** → Alert about slow internet
+
+---
+
+## Advanced Integration Ideas
+
+### Home Assistant Automations
+- Turn on lights when server room temperature > 80°F
+- Trigger workflows from HA button press
+- Send sensor data to external services
+
+### Proxmox Automation
+- Auto-snapshot VMs before updates
+- Clone VMs for testing
+- Monitor resource usage and rebalance
+
+### Media Management
+- Notify when new Plex content added
+- Auto-organize downloads
+- Send weekly watch statistics
+
+### Backup Monitoring
+- Verify all Syncthing folders synced
+- Alert on ZFS scrub errors
+- Monitor snapshot ages
+
+### Security
+- Alert on failed SSH attempts (from logs)
+- Monitor SSL certificate expiration
+- Track unusual network traffic patterns
+
+---
+
+## n8n Best Practices
+
+1. **Error Handling:** Always add error workflows to catch failures
+2. **Rate Limiting:** Don't query APIs too frequently
+3. **Credentials:** Never hardcode - always use credential store
+4. **Testing:** Use manual trigger during development
+5. **Logging:** Add Set nodes to track workflow state
+6. **Backups:** Export workflows regularly (Settings → Export)
+
+---
+
+## Useful PromQL Queries for n8n
+
+**CPU Usage:**
+```promql
+100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
+```
+
+**Memory Usage:**
+```promql
+(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
+```
+
+**Disk Usage:**
+```promql
+(node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_avail_bytes{mountpoint="/"}) / node_filesystem_size_bytes{mountpoint="/"} * 100
+```
+
+**Hosts Down:**
+```promql
+up{job=~"node.*"} == 0
+```
+
+**Syncthing Disconnected:**
+```promql
+up{job=~"syncthing.*"} == 0
+```
+
+---
+
+## Webhook URLs
+
+After creating webhooks in n8n, you'll get URLs like:
+- `https://n8n.htsn.io/webhook/your-webhook-name`
+
+These can be called from:
+- Grafana alerts
+- Home Assistant automations
+- Gitea webhooks
+- Custom scripts
+- UPS monitoring (NUT)
+
+---
+
+## Testing Credentials
+
+Test each credential after adding:
+1. Create simple workflow with manual trigger
+2. Add HTTP Request node with credential
+3. Execute and check response
+4. Verify data returned correctly
+
+---
+
+## Troubleshooting
+
+**Can't reach local service:**
+- Verify service IP and port
+- Check if service requires HTTPS
+- Test with `curl` from docker-host2 first
+
+**Webhook not triggering:**
+- Check n8n is accessible: `curl https://n8n.htsn.io/webhook/test`
+- Verify webhook URL in external service
+- Check n8n execution logs
+
+**Workflow fails silently:**
+- Enable "Execute on Error" workflow
+- Check workflow execution list
+- Add Function nodes to log data
+
+**API authentication fails:**
+- Verify credential is saved
+- Check API token hasn't expired
+- Test with curl manually first
+
+---
+
+## Next Steps
+
+1. **Add Credentials** - Start with Telegram and Prometheus
+2. **Create Test Workflow** - Simple hourly health check
+3. **Test Telegram** - Verify messages arrive
+4. **Build Gradually** - Add one workflow at a time
+5. **Export Backups** - Save workflows regularly
+
+---
+
+## Resources
+
+- **n8n Docs:** https://docs.n8n.io
+- **Community Workflows:** https://n8n.io/workflows
+- **Your n8n:** https://n8n.htsn.io
+- **Your API Docs:** [N8N.md](N8N.md)
+
+**Last Updated:** 2025-12-27
--- a/N8N.md
+++ b/N8N.md
@@ -0,0 +1,346 @@
+# n8n - Workflow Automation
+
+n8n is an extendable workflow automation tool deployed on docker-host2 for automating tasks across your homelab and external services.
+
+---
+
+## Quick Reference
+
+| Setting | Value |
+|---------|-------|
+| **URL** | https://n8n.htsn.io |
+| **Local IP** | 10.10.10.207:5678 |
+| **Server** | docker-host2 (PVE2 VMID 302) |
+| **Database** | PostgreSQL (containerized) |
+| **API Endpoint** | http://10.10.10.207:5678/api/v1/ |
+
+---
+
+## Claude Code Integration (MCP)
+
+### n8n-MCP Server
+
+The n8n-MCP server gives Claude Code deep knowledge of all 545+ n8n nodes, enabling it to build complete workflows from natural language descriptions.
+
+**Installation:** Already configured in `~/Library/Application Support/Claude/claude_desktop_config.json`
+
+```json
+{
+  "mcpServers": {
+    "n8n-nodes": {
+      "command": "npx",
+      "args": ["-y", "@czlonkowski/n8n-mcp"]
+    }
+  }
+}
+```
+
+**What This Enables:**
+- ✅ Build n8n workflows from natural language
+- ✅ Get detailed help with node parameters and options
+- ✅ Best practices for n8n node usage
+- ✅ Debug workflow issues with full node context
+
+**Example Prompts:**
+```
+"Create an n8n workflow to monitor Prometheus and send Telegram alerts"
+"Build a workflow that triggers when Syncthing has errors"
+"What's the best n8n node to parse JSON responses?"
+```
+
+**How It Works:**
+- MCP server provides offline documentation for all n8n nodes
+- No connection to your n8n instance required
+- Claude builds workflows that you can then import into https://n8n.htsn.io
+
+**Resources:**
+- [n8n-MCP GitHub](https://github.com/czlonkowski/n8n-mcp)
+- [MCP Documentation](https://docs.n8n.io/advanced-ai/accessing-n8n-mcp-server/)
+
+---
+
+## API Access
+
+### API Key
+
+```
+X-N8N-API-KEY: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI3NTdiMDA5YS1hMjM2LTQ5MzUtODkwNS0xZDY1MjYzZWE2OWYiLCJpc3MiOiJuOG4iLCJhdWQiOiJwdWJsaWMtYXBpIiwiaWF0IjoxNzY2ODEwMzA3fQ.RIZAbpDa7LiUPWk48qOscJ9-d9gRAA0afMDX_V3oSVo
+```
+
+### API Examples
+
+**List Workflows:**
+```bash
+curl -H "X-N8N-API-KEY: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI3NTdiMDA5YS1hMjM2LTQ5MzUtODkwNS0xZDY1MjYzZWE2OWYiLCJpc3MiOiJuOG4iLCJhdWQiOiJwdWJsaWMtYXBpIiwiaWF0IjoxNzY2ODEwMzA3fQ.RIZAbpDa7LiUPWk48qOscJ9-d9gRAA0afMDX_V3oSVo" \
+  http://10.10.10.207:5678/api/v1/workflows
+```
+
+**Get Workflow by ID:**
+```bash
+curl -H "X-N8N-API-KEY: YOUR_API_KEY" \
+  http://10.10.10.207:5678/api/v1/workflows/{id}
+```
+
+**Trigger Workflow:**
+```bash
+curl -X POST \
+  -H "X-N8N-API-KEY: YOUR_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"data": {"key": "value"}}' \
+  http://10.10.10.207:5678/api/v1/workflows/{id}/execute
+```
+
+**API Documentation:** https://docs.n8n.io/api/
+
+---
+
+## Deployment Details
+
+### Docker Compose
+
+**Location:** `/opt/n8n/docker-compose.yml` on docker-host2
+
+**Services:**
+- `n8n` - Main application (port 5678)
+- `postgres` - Database backend
+
+**Volumes:**
+- `n8n_data` - Workflow data, credentials, settings
+- `postgres_data` - Database storage
+
+### Environment Configuration
+
+```yaml
+N8N_HOST: n8n.htsn.io
+N8N_PORT: 5678
+N8N_PROTOCOL: https
+NODE_ENV: production
+WEBHOOK_URL: https://n8n.htsn.io/
+GENERIC_TIMEZONE: America/Los_Angeles
+DB_TYPE: postgresdb
+DB_POSTGRESDB_HOST: postgres
+DB_POSTGRESDB_DATABASE: n8n
+DB_POSTGRESDB_USER: n8n
+DB_POSTGRESDB_PASSWORD: n8n_secure_password_2024
+```
+
+### Resource Limits
+
+- **Memory**: 512MB-1GB (soft/hard)
+- **CPU**: Shared (4 vCPUs on host)
+
+---
+
+## Common Tasks
+
+### Restart n8n
+
+```bash
+ssh docker-host2 'cd /opt/n8n && docker compose restart n8n'
+```
+
+### View Logs
+
+```bash
+ssh docker-host2 'docker logs -f n8n'
+```
+
+### Backup Workflows
+
+Workflows are stored in PostgreSQL. To backup:
+
+```bash
+ssh docker-host2 'docker exec n8n-postgres pg_dump -U n8n n8n > /tmp/n8n-backup-$(date +%Y%m%d).sql'
+```
+
+### Update n8n
+
+```bash
+ssh docker-host2 'cd /opt/n8n && docker compose pull n8n && docker compose up -d n8n'
+```
+
+---
+
+## Traefik Configuration
+
+**File:** `/etc/traefik/conf.d/n8n.yaml` on CT 202
+
+```yaml
+http:
+  routers:
+    n8n-secure:
+      entryPoints:
+        - websecure
+      rule: "Host(`n8n.htsn.io`)"
+      service: n8n
+      tls:
+        certResolver: cloudflare
+      priority: 50
+
+    n8n-redirect:
+      entryPoints:
+        - web
+      rule: "Host(`n8n.htsn.io`)"
+      middlewares:
+        - n8n-https-redirect
+      service: n8n
+      priority: 50
+
+  services:
+    n8n:
+      loadBalancer:
+        servers:
+          - url: "http://10.10.10.207:5678"
+
+  middlewares:
+    n8n-https-redirect:
+      redirectScheme:
+        scheme: https
+        permanent: true
+```
+
+---
+
+## Monitoring
+
+### Prometheus
+
+n8n exposes metrics at `http://10.10.10.207:5678/metrics` (if enabled)
+
+### Grafana
+
+n8n metrics can be visualized in Grafana dashboards
+
+### Uptime Monitoring
+
+Add to Pulse: https://pulse.htsn.io
+- Monitor: https://n8n.htsn.io
+- Check interval: 60s
+
+---
+
+## Troubleshooting
+
+### n8n won't start
+
+```bash
+ssh docker-host2 'docker logs n8n | tail -50'
+ssh docker-host2 'docker logs n8n-postgres | tail -50'
+```
+
+### Database connection issues
+
+```bash
+# Check postgres health
+ssh docker-host2 'docker exec n8n-postgres pg_isready -U n8n'
+
+# Restart postgres
+ssh docker-host2 'cd /opt/n8n && docker compose restart postgres'
+```
+
+### SSL/HTTPS issues
+
+```bash
+# Check Traefik config
+ssh root@10.10.10.250 'cat /etc/traefik/conf.d/n8n.yaml'
+
+# Reload Traefik
+ssh root@10.10.10.250 'systemctl reload traefik'
+```
+
+### API not responding
+
+```bash
+# Test API locally
+curl -H "X-N8N-API-KEY: YOUR_KEY" http://10.10.10.207:5678/api/v1/workflows
+
+# Check if n8n container is healthy
+ssh docker-host2 'docker ps | grep n8n'
+```
+
+### Remove "This message was sent automatically by n8n" signature from Telegram messages
+
+**Problem:** n8n Telegram node adds attribution signature to all messages by default.
+
+**Solution:** Use the correct parameter name `appendAttribution` (camelCase, not snake_case) in `additionalFields`:
+
+```bash
+# Get workflow
+curl -H "X-N8N-API-KEY: $(cat /tmp/n8n-key.txt)" \
+  http://10.10.10.207:5678/api/v1/workflows/WORKFLOW_ID > workflow.json
+
+# Update all Telegram nodes (using jq)
+cat workflow.json | jq '.nodes = (.nodes | map(
+  if .type == "n8n-nodes-base.telegram" then
+    .parameters.additionalFields.appendAttribution = false
+  else
+    .
+  end
+))' | jq '{name, nodes, connections, settings, staticData}' > workflow-fixed.json
+
+# Upload updated workflow
+curl -X PUT \
+  -H "X-N8N-API-KEY: $(cat /tmp/n8n-key.txt)" \
+  -H 'Content-Type: application/json' \
+  -d @workflow-fixed.json \
+  http://10.10.10.207:5678/api/v1/workflows/WORKFLOW_ID
+
+# Restart n8n to reload workflow
+ssh docker-host2 'cd /opt/n8n && docker compose restart n8n'
+```
+
+**Important Notes:**
+- Parameter must be `appendAttribution` (camelCase), not `append_attribution` or `append_n8n_attribution`
+- Must restart n8n after updating workflow for changes to take effect
+- This applies to all Telegram message nodes in the workflow
+
+**Fixed:** 2026-01-23
+
+---
+
+## Integration Examples
+
+### Homelab Automation Ideas
+
+1. **Backup Notifications** - Send Telegram alerts when backups complete
+2. **Server Monitoring** - Query Prometheus and alert on high CPU/memory
+3. **Media Management** - Trigger Sonarr/Radarr downloads
+4. **Home Assistant Integration** - Automate smart home workflows
+5. **Git Webhooks** - Deploy changes from Gitea automatically
+6. **Syncthing Monitoring** - Alert when sync folders get behind
+7. **UPS Alerts** - Notify on power events from NUT
+
+---
+
+## Security Notes
+
+- API key provides full access to all workflows and data
+- Store API key securely (added to this doc for homelab reference)
+- n8n credentials are encrypted at rest in PostgreSQL
+- HTTPS enforced via Traefik
+- No public internet exposure (only via Tailscale)
+
+---
+
+## Quick Start
+
+**New to n8n?** Start here: **[N8N-INTEGRATIONS.md](N8N-INTEGRATIONS.md)** ⭐
+
+This guide includes:
+- ✅ Network access verification
+- ✅ Credential setup for all homelab services
+- ✅ 10 ready-to-use starter workflows
+- ✅ Home Assistant, Prometheus, Syncthing, Telegram integrations
+- ✅ Troubleshooting tips
+
+---
+
+## Related Documentation
+
+- [n8n Homelab Integrations Guide](N8N-INTEGRATIONS.md) - **START HERE**
+- [docker-host2 VM details](VMS.md)
+- [Traefik reverse proxy](TRAEFIK.md)
+- [IP Assignments](IP-ASSIGNMENTS.md)
+- [Pulse Setup](PULSE-SETUP.md)
+
+**Last Updated:** 2025-12-26
--- a/PA-API.md
+++ b/PA-API.md
@@ -0,0 +1,339 @@
+# Personal Assistant API
+
+Backend API for the Personal Assistant system - provides Claude-powered voice/text interface to all PA capabilities (calendar, tasks, messages, smart home, etc.).
+
+---
+
+## Quick Reference
+
+| Setting | Value |
+|---------|-------|
+| **Domain** | pa.htsn.io |
+| **Local IP** | 10.10.10.207:8401 |
+| **Server** | docker-host2 (PVE2 VMID 302) |
+| **Compose** | `/opt/pa-api/docker-compose.yml` |
+| **Access** | Tailscale only (not publicly exposed) |
+| **GitHub** | Private repo: `pa-api` |
+
+---
+
+## Architecture
+
+```
+Android/Telegram
+       │
+       ▼
+┌─────────────────┐
+│    PA API       │  ← Claude SDK, model routing
+│ docker-host2    │
+│ :8401           │
+└────────┬────────┘
+         │
+    ┌────┴────┐
+    │         │
+    ▼         ▼
+┌───────┐  ┌──────────┐
+│ Rube  │  │MCP Bridge│  ← Mac Mini (Beeper, Proton, etc.)
+│ Exa   │  │ :8400    │
+│ etc.  │  └──────────┘
+└───────┘
+```
+
+**PA API handles:**
+- Claude SDK integration (no CLI startup delay)
+- Model routing (Haiku/Sonnet/Opus)
+- Session management
+- Direct API tools (Exa, Ref, Rube, Airtable)
+
+**MCP Bridge handles:**
+- Tools requiring Mac Mini (Beeper, Proton Bridge, filesystem)
+- Runs on Mac Mini at 10.10.10.125:8400
+
+---
+
+## API Endpoints
+
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `/chat` | POST | Main query endpoint (streaming SSE) |
+| `/health` | GET | Health check |
+
+### POST /chat
+
+**Request:**
+```json
+{
+  "message": "What's on my calendar today?",
+  "session_id": "abc123"
+}
+```
+
+**Response (Server-Sent Events):**
+```
+data: {"type": "model", "name": "sonnet"}
+data: {"type": "chunk", "text": "You have "}
+data: {"type": "chunk", "text": "3 meetings today..."}
+data: {"type": "done", "full_text": "You have 3 meetings today..."}
+```
+
+### Model Routing
+
+| Query Type | Model | Examples |
+|------------|-------|----------|
+| Simple facts | Haiku | "How old is X?", "What's 15% of 80?" |
+| PA queries | Sonnet | "What's on my calendar?", "Add task" |
+| Complex reasoning | Opus | "Help me plan my week" |
+
+**Override:** Say "Use Opus" to force model selection (sticky per session).
+
+---
+
+## Deployment
+
+### Docker Compose
+
+Location: `/opt/pa-api/docker-compose.yml`
+
+```yaml
+version: '3.8'
+
+services:
+  pa-api:
+    image: pa-api:latest
+    build: .
+    container_name: pa-api
+    restart: unless-stopped
+    ports:
+      - "8401:8401"
+    environment:
+      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
+      - MCP_BRIDGE_URL=http://10.10.10.125:8400
+      - EXA_API_KEY=${EXA_API_KEY}
+      # Add other API keys as needed
+    volumes:
+      - ./data:/app/data
+    networks:
+      - pa-network
+
+networks:
+  pa-network:
+    driver: bridge
+```
+
+### Environment Variables
+
+| Variable | Purpose |
+|----------|---------|
+| `ANTHROPIC_API_KEY` | Claude API access |
+| `MCP_BRIDGE_URL` | Mac Mini bridge endpoint |
+| `EXA_API_KEY` | Exa web search |
+| `AIRTABLE_API_KEY` | Airtable access |
+
+Store in `/opt/pa-api/.env` (not committed to git).
+
+---
+
+## Traefik Configuration
+
+File: `/etc/traefik/conf.d/pa-api.yaml` (on CT 202)
+
+```yaml
+http:
+  routers:
+    pa-api:
+      rule: "Host(`pa.htsn.io`)"
+      entryPoints:
+        - websecure
+      service: pa-api
+      tls:
+        certResolver: cloudflare
+
+  services:
+    pa-api:
+      loadBalancer:
+        servers:
+          - url: "http://10.10.10.207:8401"
+```
+
+**Note:** This service is Tailscale-only. The Traefik route exists for convenience but should not be exposed publicly via Cloudflare.
+
+---
+
+## Common Tasks
+
+### Start/Stop Service
+
+```bash
+# SSH to docker-host2
+ssh docker-host2
+
+# Start
+cd /opt/pa-api && docker-compose up -d
+
+# Stop
+cd /opt/pa-api && docker-compose down
+
+# View logs
+docker logs -f pa-api
+
+# Restart
+docker-compose restart pa-api
+```
+
+### Update Service
+
+```bash
+ssh docker-host2
+cd /opt/pa-api
+git pull
+docker-compose build
+docker-compose up -d
+```
+
+### Health Check
+
+```bash
+# From any machine on network
+curl http://10.10.10.207:8401/health
+
+# Test chat endpoint
+curl -X POST http://10.10.10.207:8401/chat \
+  -H "Content-Type: application/json" \
+  -d '{"message": "Hello", "session_id": "test"}'
+```
+
+---
+
+## MCP Bridge (Mac Mini)
+
+The MCP Bridge runs on Mac Mini and exposes MCP tools as HTTP endpoints.
+
+| Setting | Value |
+|---------|-------|
+| **Location** | Mac Mini (10.10.10.125) |
+| **Port** | 8400 |
+| **Purpose** | Execute MCP tools (Beeper, Proton, TickTick, HA, etc.) |
+
+### Bridge Endpoints
+
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `/tools` | GET | List available tools |
+| `/execute` | POST | Execute a tool |
+| `/health` | GET | Health check |
+
+### Start MCP Bridge
+
+```bash
+# SSH to Mac Mini
+ssh macmini
+
+# Start bridge (managed by launchd)
+launchctl load ~/Library/LaunchAgents/com.hutson.mcp-bridge.plist
+
+# Check status
+curl http://localhost:8400/health
+```
+
+---
+
+## Integration Points
+
+### Related Services
+
+| Service | Relationship |
+|---------|--------------|
+| n8n | Telegram bot uses n8n → Claude CLI (separate path) |
+| MetaMCP | PA API does NOT use MetaMCP (direct MCP Bridge) |
+| Home Assistant | Controlled via MCP Bridge |
+| Claude-Mem | Shared memory database for context |
+
+### Clients
+
+| Client | Connection |
+|--------|------------|
+| Android App | HTTPS via Tailscale → pa.htsn.io |
+| (Future) Web UI | Same endpoint |
+
+---
+
+## Monitoring
+
+### Health Checks
+
+```bash
+# PA API
+curl -s http://10.10.10.207:8401/health | jq
+
+# MCP Bridge
+curl -s http://10.10.10.125:8400/health | jq
+```
+
+### Logs
+
+```bash
+# PA API logs
+ssh docker-host2 'docker logs -f pa-api --tail 100'
+
+# MCP Bridge logs (Mac Mini)
+ssh macmini 'tail -f ~/Library/Logs/mcp-bridge.log'
+```
+
+---
+
+## Troubleshooting
+
+### PA API Not Responding
+
+1. Check container status:
+   ```bash
+   ssh docker-host2 'docker ps | grep pa-api'
+   ```
+
+2. Check logs for errors:
+   ```bash
+   ssh docker-host2 'docker logs pa-api --tail 50'
+   ```
+
+3. Verify network:
+   ```bash
+   curl http://10.10.10.207:8401/health
+   ```
+
+### MCP Bridge Not Responding
+
+1. Check if Mac Mini is reachable:
+   ```bash
+   ping 10.10.10.125
+   ```
+
+2. Check bridge process:
+   ```bash
+   ssh macmini 'pgrep -f mcp-bridge'
+   ```
+
+3. Restart bridge:
+   ```bash
+   ssh macmini 'launchctl unload ~/Library/LaunchAgents/com.hutson.mcp-bridge.plist'
+   ssh macmini 'launchctl load ~/Library/LaunchAgents/com.hutson.mcp-bridge.plist'
+   ```
+
+### Model Routing Issues
+
+- Check Claude API key is valid
+- Verify Haiku classifier is responding
+- Check session storage for stuck model overrides
+
+---
+
+## Related Documentation
+
+- [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - Service IP mapping
+- [VMS.md](VMS.md) - docker-host2 VM details
+- [TRAEFIK.md](TRAEFIK.md) - Reverse proxy configuration
+- [Personal Assistant Project](~/Projects/personal-assistant/CLAUDE.md) - PA system overview
+- [Services Matrix](~/Projects/personal-assistant/docs/services-matrix.md) - All MCP tools
+
+---
+
+**Last Updated**: 2026-01-07
--- a/POWER-MANAGEMENT.md
+++ b/POWER-MANAGEMENT.md
@@ -0,0 +1,509 @@
+# Power Management and Optimization
+
+Documentation of power optimizations applied to reduce idle power consumption and heat generation.
+
+## Overview
+
+Combined estimated power draw: **~1000-1350W under load**, **500-700W idle**
+
+Through various optimizations, we've reduced idle power consumption by approximately **150-250W** compared to default settings.
+
+---
+
+## Power Draw Estimates
+
+### PVE (10.10.10.120)
+
+| Component | Idle | Load | TDP |
+|-----------|------|------|-----|
+| Threadripper PRO 3975WX | 150-200W | 400-500W | 280W |
+| NVIDIA TITAN RTX | 2-3W | 250W | 280W |
+| NVIDIA Quadro P2000 | 25W | 70W | 75W |
+| RAM (128 GB DDR4) | 30-40W | 30-40W | - |
+| Storage (NVMe + SSD) | 20-30W | 40-50W | - |
+| HBAs, fans, misc | 20-30W | 20-30W | - |
+| **Total** | **250-350W** | **800-940W** | - |
+
+### PVE2 (10.10.10.102)
+
+| Component | Idle | Load | TDP |
+|-----------|------|------|-----|
+| Threadripper PRO 3975WX | 150-200W | 400-500W | 280W |
+| NVIDIA RTX A6000 | 11W | 280W | 300W |
+| RAM (128 GB DDR4) | 30-40W | 30-40W | - |
+| Storage (NVMe + HDD) | 20-30W | 40-50W | - |
+| Fans, misc | 15-20W | 15-20W | - |
+| **Total** | **226-330W** | **765-890W** | - |
+
+### Combined
+
+| Metric | Idle | Load |
+|--------|------|------|
+| Servers | 476-680W | 1565-1830W |
+| Network gear | ~50W | ~50W |
+| **Total** | **~530-730W** | **~1615-1880W** |
+| **UPS Load** | 40-55% | 120-140% ⚠️ |
+
+**Note**: UPS capacity is 1320W. Under heavy load, servers can exceed UPS capacity, which is acceptable since high load is rare.
+
+---
+
+## Optimizations Applied
+
+### 1. KSMD Disabled (2024-12-17)
+
+**KSM** (Kernel Same-page Merging) scans memory to deduplicate identical pages across VMs.
+
+**Problem**:
+- KSMD was consuming 44-57% CPU continuously on PVE
+- Caused CPU temp to rise from 74°C to 83°C
+- **Negative profit**: More power spent scanning than saved from deduplication
+
+**Solution**: Disabled KSM permanently
+
+**Configuration**:
+
+**Systemd service**: `/etc/systemd/system/disable-ksm.service`
+```ini
+[Unit]
+Description=Disable KSM (Kernel Same-page Merging)
+After=multi-user.target
+
+[Service]
+Type=oneshot
+ExecStart=/bin/sh -c 'echo 0 > /sys/kernel/mm/ksm/run'
+RemainAfterExit=yes
+
+[Install]
+WantedBy=multi-user.target
+```
+
+**Enable and start**:
+```bash
+systemctl daemon-reload
+systemctl enable --now disable-ksm
+systemctl mask ksmtuned  # Prevent re-enabling
+```
+
+**Verify**:
+```bash
+# KSM should be disabled (run=0)
+cat /sys/kernel/mm/ksm/run  # Should output: 0
+
+# ksmd should show 0% CPU
+ps aux | grep ksmd
+```
+
+**Savings**: ~60-80W + significant temperature reduction (74°C → 83°C prevented)
+
+**⚠️ Important**: Proxmox updates sometimes re-enable KSM. If CPU is unexpectedly hot, check:
+```bash
+cat /sys/kernel/mm/ksm/run
+# If 1, disable it:
+echo 0 > /sys/kernel/mm/ksm/run
+systemctl mask ksmtuned
+```
+
+---
+
+### 2. CPU Governor Optimization (2024-12-16)
+
+Default CPU governor keeps cores at max frequency even when idle, wasting power.
+
+#### PVE: `amd-pstate-epp` Driver
+
+**Driver**: `amd-pstate-epp` (modern AMD P-state driver)
+**Governor**: `powersave`
+**EPP**: `balance_power`
+
+**Configuration**:
+
+**Systemd service**: `/etc/systemd/system/cpu-powersave.service`
+```ini
+[Unit]
+Description=Set CPU governor to powersave with balance_power EPP
+After=multi-user.target
+
+[Service]
+Type=oneshot
+ExecStart=/bin/sh -c 'for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo powersave > $cpu; done'
+ExecStart=/bin/sh -c 'for cpu in /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference; do echo balance_power > $cpu; done'
+RemainAfterExit=yes
+
+[Install]
+WantedBy=multi-user.target
+```
+
+**Enable**:
+```bash
+systemctl daemon-reload
+systemctl enable --now cpu-powersave
+```
+
+**Verify**:
+```bash
+# Check governor
+cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
+# Output: powersave
+
+# Check EPP
+cat /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_preference
+# Output: balance_power
+
+# Check current frequency (should be low when idle)
+grep MHz /proc/cpuinfo | head -5
+# Should show ~1700-2200 MHz idle, up to 4000 MHz under load
+```
+
+#### PVE2: `acpi-cpufreq` Driver
+
+**Driver**: `acpi-cpufreq` (older ACPI driver)
+**Governor**: `schedutil` (adaptive, better than powersave for this driver)
+
+**Configuration**:
+
+**Systemd service**: `/etc/systemd/system/cpu-powersave.service`
+```ini
+[Unit]
+Description=Set CPU governor to schedutil
+After=multi-user.target
+
+[Service]
+Type=oneshot
+ExecStart=/bin/sh -c 'for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo schedutil > $cpu; done'
+RemainAfterExit=yes
+
+[Install]
+WantedBy=multi-user.target
+```
+
+**Enable**:
+```bash
+systemctl daemon-reload
+systemctl enable --now cpu-powersave
+```
+
+**Verify**:
+```bash
+cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
+# Output: schedutil
+
+grep MHz /proc/cpuinfo | head -5
+# Should show ~1700-2200 MHz idle
+```
+
+**Savings**: ~60-120W combined (CPUs now idle at 1.7-2.2 GHz instead of 4 GHz)
+
+**Performance impact**: Minimal - CPU still boosts to max frequency under load
+
+---
+
+### 3. GPU Power States (2024-12-16)
+
+GPUs automatically enter low-power states when idle. Verified optimal.
+
+| GPU | Location | Idle Power | P-State | Notes |
+|-----|----------|------------|---------|-------|
+| RTX A6000 | PVE2 | 11W | P8 | Excellent idle power |
+| TITAN RTX | PVE | 2-3W | P8 | Excellent idle power |
+| Quadro P2000 | PVE | 25W | P0 | Plex keeps it active |
+
+**Check GPU power state**:
+```bash
+# Via nvidia-smi (if installed in VM)
+ssh lmdev1 'nvidia-smi --query-gpu=name,power.draw,pstate --format=csv'
+
+# Expected output:
+# name, power.draw [W], pstate
+# NVIDIA TITAN RTX, 2.50 W, P8
+
+# Via lspci (from Proxmox host - shows link speed, not power)
+ssh pve 'lspci | grep -i nvidia'
+```
+
+**P-States**:
+- **P0**: Maximum performance
+- **P8**: Minimum power (idle)
+
+**No action needed** - GPUs automatically manage power states.
+
+**Savings**: N/A (already optimal)
+
+---
+
+### 4. Syncthing Rescan Intervals (2024-12-16)
+
+Aggressive 60-second rescans were keeping TrueNAS VM at 86% CPU constantly.
+
+**Changed**:
+- Large folders: 60s → **3600s** (1 hour)
+- Affected: downloads (38GB), documents (11GB), desktop (7.2GB), movies, pictures, notes, config
+
+**Configuration**: Via Syncthing UI on each device
+- Settings → Folders → [Folder Name] → Advanced → Rescan Interval
+
+**Savings**: ~60-80W (TrueNAS CPU usage dropped from 86% to <10%)
+
+**Trade-off**: Changes take up to 1 hour to detect instead of 1 minute
+- Still acceptable for most use cases
+- Manual rescan available if needed: `curl -X POST "http://localhost:8384/rest/db/scan?folder=FOLDER" -H "X-API-Key: API_KEY"`
+
+---
+
+### 5. ksmtuned Disabled (2024-12-16)
+
+**ksmtuned** is the daemon that tunes KSM parameters. Even with KSM disabled, the tuning daemon was still running.
+
+**Solution**: Stopped and disabled on both servers
+
+```bash
+systemctl stop ksmtuned
+systemctl disable ksmtuned
+systemctl mask ksmtuned  # Prevent re-enabling
+```
+
+**Savings**: ~2-5W
+
+---
+
+### 6. HDD Spindown on PVE2 (2024-12-16)
+
+**Problem**: `local-zfs2` pool (2x WD Red 6TB HDD) had only 768 KB used but drives spinning 24/7
+
+**Solution**: Configure 30-minute spindown timeout
+
+**Udev rule**: `/etc/udev/rules.d/69-hdd-spindown.rules`
+```udev
+# Spin down WD Red 6TB drives after 30 minutes idle
+ACTION=="add|change", KERNEL=="sd[a-z]", ATTRS{model}=="WDC WD60EFRX-68L*", RUN+="/sbin/hdparm -S 241 /dev/%k"
+```
+
+**hdparm value**: 241 = 30 minutes
+- Formula: `value * 5 seconds = timeout`
+- 241 * 5 = 1205 seconds = 20 minutes (approx 30 min with tolerances)
+
+**Apply rule**:
+```bash
+udevadm control --reload-rules
+udevadm trigger
+
+# Verify drives have spindown set
+hdparm -I /dev/sda | grep -i standby
+hdparm -I /dev/sdb | grep -i standby
+```
+
+**Check if drives are spun down**:
+```bash
+hdparm -C /dev/sda
+# Output: drive state is:  standby  (spun down)
+# or:     drive state is:  active/idle  (spinning)
+```
+
+**Savings**: ~10-16W when spun down (8W per drive)
+
+**Trade-off**: 5-10 second delay when accessing pool after spindown
+
+---
+
+## Potential Optimizations (Not Yet Applied)
+
+### PCIe ASPM (Active State Power Management)
+
+**Benefit**: Reduce power of idle PCIe devices
+**Risk**: May cause stability issues with some devices
+**Estimated savings**: 5-15W
+
+**Test**:
+```bash
+# Check current ASPM state
+lspci -vv | grep -i aspm
+
+# Enable ASPM (test first)
+# Add to kernel cmdline: pcie_aspm=force
+# Edit /etc/default/grub:
+GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=force"
+
+# Update grub
+update-grub
+reboot
+```
+
+### NMI Watchdog Disable
+
+**Benefit**: Reduce CPU wakeups
+**Risk**: Harder to debug kernel hangs
+**Estimated savings**: 1-3W
+
+**Test**:
+```bash
+# Disable NMI watchdog
+echo 0 > /proc/sys/kernel/nmi_watchdog
+
+# Make permanent (add to kernel cmdline)
+# Edit /etc/default/grub:
+GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0"
+
+update-grub
+reboot
+```
+
+---
+
+## Monitoring
+
+### CPU Frequency
+
+```bash
+# Current frequency on all cores
+ssh pve 'grep MHz /proc/cpuinfo | head -10'
+
+# Governor
+ssh pve 'cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'
+
+# Available governors
+ssh pve 'cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors'
+```
+
+### CPU Temperature
+
+```bash
+# PVE
+ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE Tctl: $(($(cat $f)/1000))°C"; fi; done'
+
+# PVE2
+ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE2 Tctl: $(($(cat $f)/1000))°C"; fi; done'
+```
+
+**Healthy temps**: 70-80°C under load
+**Warning**: >85°C
+**Throttle**: 90°C (Tctl max for Threadripper PRO)
+
+### GPU Power Draw
+
+```bash
+# If nvidia-smi installed in VM
+ssh lmdev1 'nvidia-smi --query-gpu=name,power.draw,power.limit,pstate --format=csv'
+
+# Sample output:
+# name, power.draw [W], power.limit [W], pstate
+# NVIDIA TITAN RTX, 2.50 W, 280.00 W, P8
+```
+
+### Power Consumption (UPS)
+
+```bash
+# Check UPS load percentage
+ssh pve 'upsc cyberpower@localhost ups.load'
+
+# Battery runtime (seconds)
+ssh pve 'upsc cyberpower@localhost battery.runtime'
+
+# Full UPS status
+ssh pve 'upsc cyberpower@localhost'
+```
+
+See [UPS.md](UPS.md) for more UPS monitoring details.
+
+### ZFS ARC Memory Usage
+
+```bash
+# PVE
+ssh pve 'arc_summary | grep -A5 "ARC size"'
+
+# TrueNAS
+ssh truenas 'arc_summary | grep -A5 "ARC size"'
+```
+
+**ARC** (Adaptive Replacement Cache) uses RAM for ZFS caching. Adjust if needed:
+
+```bash
+# Limit ARC to 32 GB (example)
+# Edit /etc/modprobe.d/zfs.conf:
+options zfs zfs_arc_max=34359738368
+
+# Apply (reboot required)
+update-initramfs -u
+reboot
+```
+
+---
+
+## Troubleshooting
+
+### CPU Not Downclocking
+
+```bash
+# Check current governor
+cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
+
+# Should be: powersave (PVE) or schedutil (PVE2)
+# If not, systemd service may have failed
+
+# Check service status
+systemctl status cpu-powersave
+
+# Manually set governor (temporary)
+echo powersave | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
+
+# Check frequency
+grep MHz /proc/cpuinfo | head -5
+```
+
+### High Idle Power After Update
+
+**Common causes**:
+1. **KSM re-enabled** after Proxmox update
+   - Check: `cat /sys/kernel/mm/ksm/run`
+   - Fix: `echo 0 > /sys/kernel/mm/ksm/run && systemctl mask ksmtuned`
+
+2. **CPU governor reset** to default
+   - Check: `cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor`
+   - Fix: `systemctl restart cpu-powersave`
+
+3. **GPU stuck in high-performance mode**
+   - Check: `nvidia-smi --query-gpu=pstate --format=csv`
+   - Fix: Restart VM or power cycle GPU
+
+### HDDs Won't Spin Down
+
+```bash
+# Check spindown setting
+hdparm -I /dev/sda | grep -i standby
+
+# Set spindown manually (temporary)
+hdparm -S 241 /dev/sda
+
+# Check if drive is idle (ZFS may keep it active)
+zpool iostat -v 1 5  # Watch for activity
+
+# Check what's accessing the drive
+lsof | grep /mnt/pool
+```
+
+---
+
+## Power Optimization Summary
+
+| Optimization | Savings | Applied | Notes |
+|--------------|---------|---------|-------|
+| **KSMD disabled** | 60-80W | ✅ | Also reduces CPU temp significantly |
+| **CPU governor** | 60-120W | ✅ | PVE: powersave+balance_power, PVE2: schedutil |
+| **GPU power states** | 0W | ✅ | Already optimal (automatic) |
+| **Syncthing rescans** | 60-80W | ✅ | Reduced TrueNAS CPU usage |
+| **ksmtuned disabled** | 2-5W | ✅ | Minor but easy win |
+| **HDD spindown** | 10-16W | ✅ | Only when drives idle |
+| PCIe ASPM | 5-15W | ❌ | Not yet tested |
+| NMI watchdog | 1-3W | ❌ | Not yet tested |
+| **Total savings** | **~150-300W** | - | Significant reduction |
+
+---
+
+## Related Documentation
+
+- [UPS.md](UPS.md) - UPS capacity and power monitoring
+- [STORAGE.md](STORAGE.md) - HDD spindown configuration
+- [VMS.md](VMS.md) - VM resource allocation
+
+---
+
+**Last Updated**: 2025-12-22
--- a/PULSE-SETUP.md
+++ b/PULSE-SETUP.md
@@ -0,0 +1,69 @@
+# Add n8n and docker-host2 to Pulse Monitoring
+
+Pulse automatically monitors based on Prometheus targets, but you can also add custom HTTP monitors.
+
+## Quick Steps
+
+1. Open **https://pulse.htsn.io** in your browser
+2. Login if required
+3. Click **"+ Add Monitor"** or **"New Monitor"**
+
+---
+
+## Monitor: n8n
+
+| Field | Value |
+|-------|-------|
+| **Name** | n8n Workflow Automation |
+| **URL** | https://n8n.htsn.io |
+| **Check Interval** | 60 seconds |
+| **Monitor Type** | HTTP/HTTPS |
+| **Expected Status** | 200 |
+| **Timeout** | 10 seconds |
+| **Alert After** | 2 failed checks |
+
+---
+
+## Monitor: docker-host2
+
+| Field | Value |
+|-------|-------|
+| **Name** | docker-host2 (node_exporter) |
+| **URL** | http://10.10.10.207:9100/metrics |
+| **Check Interval** | 60 seconds |
+| **Monitor Type** | HTTP |
+| **Expected Status** | 200 |
+| **Expected Content** | `node_exporter` |
+| **Timeout** | 5 seconds |
+| **Alert After** | 2 failed checks |
+
+---
+
+## Optional: docker-host2 SSH
+
+| Field | Value |
+|-------|-------|
+| **Name** | docker-host2 SSH |
+| **Host** | 10.10.10.207 |
+| **Port** | 22 |
+| **Monitor Type** | TCP Port |
+| **Check Interval** | 60 seconds |
+| **Timeout** | 5 seconds |
+
+---
+
+## Verification
+
+After adding monitors, you should see:
+- ✅ Green status for both monitors
+- Response time graphs
+- Uptime percentage
+- Alert history (should be empty)
+
+Access Pulse dashboard: **https://pulse.htsn.io**
+
+---
+
+**Note:** Pulse may already be monitoring these services via Prometheus integration. Check existing monitors before adding duplicates.
+
+**Last Updated:** 2025-12-27
--- a/QUICK-REF-WELCOME-HOME.md
+++ b/QUICK-REF-WELCOME-HOME.md
@@ -0,0 +1,102 @@
+# Welcome Home Automation - Quick Reference
+
+## Quick Test (Manual Trigger)
+
+```bash
+HA_TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIwZThjZmJjMzVlNDA0NzYwOTMzMjg3MTQ5ZjkwOGU2NyIsImlhdCI6MTc2NTk5MjQ4OCwiZXhwIjoyMDgxMzUyNDg4fQ.r743tsb3E5NNlrwEEu9glkZdiI4j_3SKIT1n5PGUytY"
+
+# Test the automation now (ignores conditions)
+curl -X POST \
+  -H "Authorization: Bearer $HA_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"entity_id": "automation.welcome_home"}' \
+  "http://10.10.10.210:8123/api/services/automation/trigger"
+```
+
+## Current Configuration
+
+**Lights that turn on:**
+- Living Room (75%)
+- Living Room Lamp (60%)
+- Kitchen (80%)
+
+**When:** After sunset (30 min early) OR before sunrise  
+**Trigger:** Entering home zone (100m radius)
+
+## Quick Modifications
+
+### Add Office Light
+
+```bash
+# Get current config
+curl -s -H "Authorization: Bearer $HA_TOKEN" \
+  "http://10.10.10.210:8123/api/config/automation/config/welcome_home" > /tmp/welcome.json
+
+# Edit /tmp/welcome.json and add to "actions" array:
+# {
+#   "target": {"entity_id": "light.office"},
+#   "data": {"brightness_pct": 70},
+#   "action": "light.turn_on"
+# }
+
+# Update automation
+curl -X POST \
+  -H "Authorization: Bearer $HA_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d @/tmp/welcome.json \
+  "http://10.10.10.210:8123/api/config/automation/config/welcome_home"
+```
+
+### Change to Scene Instead
+
+Replace all light actions with a single scene:
+
+```json
+{
+  "actions": [
+    {
+      "service": "scene.turn_on",
+      "target": {
+        "entity_id": "scene.living_room_relax"
+      }
+    }
+  ]
+}
+```
+
+## Status Check
+
+```bash
+# Check if automation is enabled
+curl -s -H "Authorization: Bearer $HA_TOKEN" \
+  "http://10.10.10.210:8123/api/states/automation.welcome_home" | \
+  python3 -c "import json, sys; data=json.load(sys.stdin); print(f\"State: {data['state']}\"); print(f\"Last triggered: {data['attributes']['last_triggered']}\")"
+
+# Check current location
+curl -s -H "Authorization: Bearer $HA_TOKEN" \
+  "http://10.10.10.210:8123/api/states/person.hutson" | \
+  python3 -c "import json, sys; data=json.load(sys.stdin); print(f\"Location: {data['state']}\"); print(f\"GPS: {data['attributes']['latitude']}, {data['attributes']['longitude']}\"); print(f\"Accuracy: {data['attributes']['gps_accuracy']}m\")"
+```
+
+## Toggle On/Off
+
+```bash
+# Disable
+curl -X POST -H "Authorization: Bearer $HA_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"entity_id": "automation.welcome_home"}' \
+  "http://10.10.10.210:8123/api/services/automation/turn_off"
+
+# Enable
+curl -X POST -H "Authorization: Bearer $HA_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"entity_id": "automation.welcome_home"}' \
+  "http://10.10.10.210:8123/api/services/automation/turn_on"
+```
+
+## Web UI
+
+http://10.10.10.210:8123 → Settings → Automations & Scenes → "Welcome Home"
+
+---
+*Entity ID: automation.welcome_home*
--- a/README.md
+++ b/README.md
@@ -0,0 +1,151 @@
+# Homelab Documentation
+
+Documentation for Hutson's home infrastructure - two Proxmox servers running VMs and containers for home automation, media, development, and AI workloads.
+
+## 🚀 Quick Start
+
+**New to this homelab?** Start here:
+1. [CLAUDE.md](CLAUDE.md) - Quick reference guide for common tasks
+2. [SSH-ACCESS.md](SSH-ACCESS.md) - How to connect to all systems
+3. [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - What's at what IP address
+4. [SERVICES.md](SERVICES.md) - What services are running
+
+**Claude Code Session?** Read [CLAUDE.md](CLAUDE.md) first - it's your command center.
+
+## 📚 Documentation Index
+
+### Infrastructure
+
+| Document | Description |
+|----------|-------------|
+| [GATEWAY.md](GATEWAY.md) | UniFi gateway monitoring, watchdog services, troubleshooting |
+| [VMS.md](VMS.md) | Complete VM/LXC inventory, specs, GPU passthrough |
+| [HARDWARE.md](HARDWARE.md) | Server specs, GPUs, network cards, HBAs |
+| [STORAGE.md](STORAGE.md) | ZFS pools, NFS/SMB shares, capacity planning |
+| [NETWORK.md](NETWORK.md) | Bridges, VLANs, MTU config, Tailscale VPN |
+| [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) | CPU governors, GPU power states, optimizations |
+| [UPS.md](UPS.md) | UPS configuration, NUT monitoring, power failure handling |
+
+### Services & Applications
+
+| Document | Description |
+|----------|-------------|
+| [SERVICES.md](SERVICES.md) | Complete service inventory with URLs and credentials |
+| [TRAEFIK.md](TRAEFIK.md) | Reverse proxy setup, adding services, SSL certificates |
+| [HOMEASSISTANT.md](HOMEASSISTANT.md) | Home Assistant API, automations, integrations |
+| [PA-API.md](PA-API.md) | Personal Assistant API, MCP Bridge, Claude integration |
+| [SYNCTHING.md](SYNCTHING.md) | File sync across all devices, API access, troubleshooting |
+| [SALTBOX.md](#) | Media automation stack (Plex, *arr apps) (coming soon) |
+
+### Access & Security
+
+| Document | Description |
+|----------|-------------|
+| [SSH-ACCESS.md](SSH-ACCESS.md) | SSH keys, host aliases, password auth, QEMU agent |
+| [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) | Complete IP address assignments for all devices |
+| [SECURITY.md](#) | Firewall, access control, certificates (coming soon) |
+
+### Operations
+
+| Document | Description |
+|----------|-------------|
+| [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) | 🚨 Backup strategy, disaster recovery (CRITICAL) |
+| [MAINTENANCE.md](MAINTENANCE.md) | Regular procedures, update schedules, testing checklists |
+| [MONITORING.md](MONITORING.md) | Health monitoring, alerts, dashboard recommendations |
+| [DISASTER-RECOVERY.md](#) | Recovery procedures (coming soon) |
+
+### Reference
+
+| Document | Description |
+|----------|-------------|
+| [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) | Storage enclosure SES commands, LCC troubleshooting |
+| [SHELL-ALIASES.md](SHELL-ALIASES.md) | ZSH aliases for Claude Code sessions |
+
+## 🖥️ System Overview
+
+### Servers
+
+- **PVE** (10.10.10.120) - Primary Proxmox server
+  - AMD Threadripper PRO 3975WX (32-core)
+  - 128 GB RAM
+  - NVIDIA Quadro P2000 + TITAN RTX
+
+- **PVE2** (10.10.10.102) - Secondary Proxmox server
+  - AMD Threadripper PRO 3975WX (32-core)
+  - 128 GB RAM
+  - NVIDIA RTX A6000
+
+### Key Services
+
+| Service | Location | URL |
+|---------|----------|-----|
+| **Proxmox** | PVE | https://pve.htsn.io |
+| **TrueNAS** | VM 100 | https://truenas.htsn.io |
+| **Plex** | Saltbox VM | https://plex.htsn.io |
+| **Home Assistant** | VM 110 | https://homeassistant.htsn.io |
+| **Gitea** | VM 300 | https://git.htsn.io |
+| **PA API** | docker-host2 | https://pa.htsn.io (Tailscale) |
+| **Pi-hole** | CT 200 | http://10.10.10.10/admin |
+| **Traefik** | CT 202 | http://10.10.10.250:8080 |
+
+[See IP-ASSIGNMENTS.md for complete list](IP-ASSIGNMENTS.md)
+
+## 🔥 Emergency Procedures
+
+### Power Failure
+1. UPS provides ~15 min runtime at typical load
+2. At 2 min remaining, NUT triggers graceful VM shutdown
+3. When power returns, servers auto-boot and start VMs in order
+
+See [UPS.md](UPS.md) for details.
+
+### Service Down
+
+```bash
+# Quick health check (run from Mac Mini)
+ssh pve 'qm list'                    # Check VMs on PVE
+ssh pve2 'qm list'                   # Check VMs on PVE2
+ssh pve 'pct list'                   # Check containers
+
+# Syncthing status
+curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" \
+  "http://127.0.0.1:8384/rest/system/connections"
+
+# Restart a VM
+ssh pve 'qm stop VMID && qm start VMID'
+```
+
+See [CLAUDE.md](CLAUDE.md) for complete troubleshooting runbooks.
+
+## 📞 Getting Help
+
+**Claude Code Assistant**: Start a session in this directory - all context is available in CLAUDE.md
+
+**Key Contacts**:
+- Homelab Owner: Hutson
+- Git Repo: https://git.htsn.io/hutson/homelab-docs
+- Local Path: `~/Projects/homelab`
+
+## 🔄 Recent Changes
+
+See [CHANGELOG.md](#) (coming soon) or the Changelog section in [CLAUDE.md](CLAUDE.md).
+
+## 📝 Contributing
+
+When updating docs:
+1. Keep CLAUDE.md as quick reference only
+2. Move detailed content to specialized docs
+3. Update cross-references
+4. Test all commands before committing
+5. Add entries to changelog
+
+```bash
+cd ~/Projects/homelab
+git add -A
+git commit -m "Update documentation: <description>"
+git push
+```
+
+---
+
+**Last Updated**: 2026-01-02
--- a/SERVICES.md
+++ b/SERVICES.md
@@ -0,0 +1,591 @@
+# Services Inventory
+
+Complete inventory of all services running across the homelab infrastructure.
+
+## Overview
+
+| Category | Services | Location | Access |
+|----------|----------|----------|--------|
+| **Infrastructure** | Proxmox, TrueNAS, Pi-hole, Traefik | VMs/CTs | Web UI + SSH |
+| **Media** | Plex, *arr apps, downloaders | Saltbox VM | Web UI |
+| **Development** | Gitea, Docker services | VMs | Web UI |
+| **Home Automation** | Home Assistant, Happy Coder | VMs | Web UI + API |
+| **Monitoring** | UPS (NUT), Syncthing, Pulse | Various | API |
+
+**Total Services**: 25+ running services
+
+---
+
+## Service URLs Quick Reference
+
+| Service | URL | Authentication | Purpose |
+|---------|-----|----------------|---------|
+| **Proxmox** | https://pve.htsn.io:8006 | Username + 2FA | VM management |
+| **TrueNAS** | https://truenas.htsn.io | Username/password | NAS management |
+| **Plex** | https://plex.htsn.io | Plex account | Media streaming |
+| **Home Assistant** | https://homeassistant.htsn.io | Username/password | Home automation |
+| **Gitea** | https://git.htsn.io | Username/password | Git repositories |
+| **Excalidraw** | https://excalidraw.htsn.io | None (public) | Whiteboard |
+| **Happy Coder** | https://happy.htsn.io | QR code auth | Remote Claude sessions |
+| **Pi-hole** | http://10.10.10.10/admin | Password | DNS/ad blocking |
+| **Traefik** | http://10.10.10.250:8080 | None (internal) | Reverse proxy dashboard |
+| **Pulse** | https://pulse.htsn.io | Unknown | Monitoring dashboard |
+| **Copyparty** | https://copyparty.htsn.io | Unknown | File sharing |
+| **FindShyt** | https://findshyt.htsn.io | Unknown | Custom app |
+
+---
+
+## Infrastructure Services
+
+### Proxmox VE (PVE & PVE2)
+
+**Purpose**: Virtualization platform, VM/CT host
+**Location**: Physical servers (10.10.10.120, 10.10.10.102)
+**Access**: https://pve.htsn.io:8006, SSH
+**Version**: Unknown (check: `pveversion`)
+
+**Key Features**:
+- Web-based management
+- VM and LXC container support
+- ZFS storage pools
+- Clustering (2-node)
+- API access
+
+**Common Operations**:
+```bash
+# List VMs
+ssh pve 'qm list'
+
+# Create VM
+ssh pve 'qm create VMID --name myvm ...'
+
+# Backup VM
+ssh pve 'vzdump VMID --dumpdir /var/lib/vz/dump'
+```
+
+**See**: [VMS.md](VMS.md)
+
+---
+
+### TrueNAS SCALE (VM 100)
+
+**Purpose**: Central file storage, NFS/SMB shares
+**Location**: VM on PVE (10.10.10.200)
+**Access**: https://truenas.htsn.io, SSH
+**Version**: TrueNAS SCALE (check version in UI)
+
+**Key Features**:
+- ZFS storage management
+- NFS exports
+- SMB shares
+- Syncthing hub
+- Snapshot management
+
+**Storage Pools**:
+- `vault`: Main data pool on EMC enclosure
+
+**Shares** (needs documentation):
+- NFS exports for Saltbox media
+- SMB shares for Windows access
+- Syncthing sync folders
+
+**See**: [STORAGE.md](STORAGE.md)
+
+---
+
+### Pi-hole (CT 200)
+
+**Purpose**: Network-wide DNS server and ad blocker
+**Location**: LXC on PVE (10.10.10.10)
+**Access**: http://10.10.10.10/admin
+**Version**: Unknown
+
+**Configuration**:
+- **Upstream DNS**: Cloudflare (1.1.1.1)
+- **Blocklists**: Unknown count
+- **Queries**: All network DNS traffic
+- **DHCP**: Disabled (router handles DHCP)
+
+**Stats** (example):
+```bash
+ssh pihole 'pihole -c -e'  # Stats
+ssh pihole 'pihole status'  # Status
+```
+
+**Common Tasks**:
+- Update blocklists: `ssh pihole 'pihole -g'`
+- Whitelist domain: `ssh pihole 'pihole -w example.com'`
+- View logs: `ssh pihole 'pihole -t'`
+
+---
+
+### Traefik (CT 202)
+
+**Purpose**: Reverse proxy for all public-facing services
+**Location**: LXC on PVE (10.10.10.250)
+**Access**: http://10.10.10.250:8080/dashboard/
+**Version**: Unknown (check: `traefik version`)
+
+**Managed Services**:
+- All *.htsn.io domains (except Saltbox services)
+- SSL/TLS certificates via Let's Encrypt
+- HTTP → HTTPS redirects
+
+**See**: [TRAEFIK.md](TRAEFIK.md) for complete configuration
+
+---
+
+## Media Services (Saltbox VM)
+
+All media services run in Docker on the Saltbox VM (10.10.10.100).
+
+### Plex Media Server
+
+**Purpose**: Media streaming platform
+**URL**: https://plex.htsn.io
+**Access**: Plex account
+
+**Features**:
+- Hardware transcoding (TITAN RTX)
+- Libraries: Movies, TV, Music
+- Remote access enabled
+- Managed by Saltbox
+
+**Media Storage**:
+- Source: TrueNAS NFS mounts
+- Location: `/mnt/unionfs/`
+
+**Common Tasks**:
+```bash
+# View Plex status
+ssh saltbox 'docker logs -f plex'
+
+# Restart Plex
+ssh saltbox 'docker restart plex'
+
+# Scan library
+# (via Plex UI: Settings → Library → Scan)
+```
+
+---
+
+### *arr Apps (Media Automation)
+
+Running on Saltbox VM, managed via Traefik-Saltbox.
+
+| Service | Purpose | URL | Notes |
+|---------|---------|-----|-------|
+| **Sonarr** | TV show automation | sonarr.htsn.io | Monitors, downloads, organizes TV |
+| **Radarr** | Movie automation | radarr.htsn.io | Monitors, downloads, organizes movies |
+| **Lidarr** | Music automation | lidarr.htsn.io | Monitors, downloads, organizes music |
+| **Overseerr** | Request management | overseerr.htsn.io | User requests for media |
+| **Bazarr** | Subtitle management | bazarr.htsn.io | Downloads subtitles |
+
+**Downloaders**:
+| Service | Purpose | URL |
+|---------|---------|-----|
+| **SABnzbd** | Usenet downloader | sabnzbd.htsn.io |
+| **NZBGet** | Usenet downloader | nzbget.htsn.io |
+| **qBittorrent** | Torrent client | qbittorrent.htsn.io |
+
+**Indexers**:
+| Service | Purpose | URL |
+|---------|---------|-----|
+| **Jackett** | Torrent indexer proxy | jackett.htsn.io |
+| **NZBHydra2** | Usenet indexer proxy | nzbhydra2.htsn.io |
+
+---
+
+### Supporting Media Services
+
+| Service | Purpose | URL |
+|---------|---------|-----|
+| **Tautulli** | Plex statistics | tautulli.htsn.io |
+| **Organizr** | Service dashboard | organizr.htsn.io |
+| **Authelia** | SSO authentication | auth.htsn.io |
+
+---
+
+## Development Services
+
+### Gitea (VM 300)
+
+**Purpose**: Self-hosted Git server
+**Location**: VM on PVE2 (10.10.10.220)
+**URL**: https://git.htsn.io
+**Access**: Username/password
+
+**Repositories**:
+- homelab-docs (this documentation)
+- Personal projects
+- Private repos
+
+**Common Tasks**:
+```bash
+# SSH to Gitea VM
+ssh gitea-vm
+
+# View logs
+ssh gitea-vm 'journalctl -u gitea -f'
+
+# Backup
+ssh gitea-vm 'gitea dump -c /etc/gitea/app.ini'
+```
+
+**See**: Gitea documentation for API usage
+
+---
+
+### Docker Services (docker-host VM)
+
+Running on VM 206 (10.10.10.206).
+
+| Service | URL | Purpose | Port |
+|---------|-----|---------|------|
+| **Excalidraw** | https://excalidraw.htsn.io | Whiteboard/diagramming | 8080 |
+| **Happy Server** | https://happy.htsn.io | Happy Coder relay | 3002 |
+| **Pulse** | https://pulse.htsn.io | Monitoring dashboard | 7655 |
+
+**Docker Compose files**: `/opt/{excalidraw,happy-server,pulse}/docker-compose.yml`
+
+**Managing services**:
+```bash
+ssh docker-host 'docker ps'
+ssh docker-host 'cd /opt/excalidraw && sudo docker-compose logs -f'
+ssh docker-host 'cd /opt/excalidraw && sudo docker-compose restart'
+```
+
+---
+
+## Home Automation
+
+### Home Assistant (VM 110)
+
+**Purpose**: Smart home automation platform
+**Location**: VM on PVE (10.10.10.110)
+**URL**: https://homeassistant.htsn.io
+**Access**: Username/password
+
+**Integrations**:
+- UPS monitoring (NUT sensors)
+- Unknown other integrations (needs documentation)
+
+**Sensors**:
+- `sensor.cyberpower_battery_charge`
+- `sensor.cyberpower_load`
+- `sensor.cyberpower_battery_runtime`
+- `sensor.cyberpower_status`
+
+**See**: [HOMEASSISTANT.md](HOMEASSISTANT.md)
+
+---
+
+### Happy Coder Relay (docker-host)
+
+**Purpose**: Self-hosted relay server for Happy Coder mobile app
+**Location**: docker-host (10.10.10.206)
+**URL**: https://happy.htsn.io
+**Access**: QR code authentication
+
+**Stack**:
+- Happy Server (Node.js)
+- PostgreSQL (user/session data)
+- Redis (real-time events)
+- MinIO (file/image storage)
+
+**Clients**:
+- Mac Mini (Happy daemon)
+- Mobile app (iOS/Android)
+
+**Credentials**:
+- Master Secret: `3ccbfd03a028d3c278da7d2cf36d99b94cd4b1fecabc49ab006e8e89bc7707ac`
+- PostgreSQL: `happy` / `happypass`
+- MinIO: `happyadmin` / `happyadmin123`
+
+---
+
+## File Sync & Storage
+
+### Syncthing
+
+**Purpose**: File synchronization across all devices
+**Devices**:
+- Mac Mini (10.10.10.125) - Hub
+- MacBook - Mobile sync
+- TrueNAS (10.10.10.200) - Central storage
+- Windows PC (10.10.10.150) - Windows sync
+- Phone (10.10.10.54) - Mobile sync
+
+**API Keys**:
+- Mac Mini: `oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5`
+- MacBook: `qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ`
+- Phone: `Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM`
+
+**Synced Folders**:
+- documents (~11 GB)
+- downloads (~38 GB)
+- pictures
+- notes
+- desktop (~7.2 GB)
+- config
+- movies
+
+**See**: [SYNCTHING.md](SYNCTHING.md)
+
+---
+
+### Copyparty (VM 201)
+
+**Purpose**: Simple HTTP file sharing
+**Location**: VM on PVE (10.10.10.201)
+**URL**: https://copyparty.htsn.io
+**Access**: Unknown
+
+**Features**:
+- Web-based file upload/download
+- Lightweight
+
+---
+
+## Trading & AI Services
+
+### AI Trading Platform (trading-vm)
+
+**Purpose**: Algorithmic trading with AI models
+**Location**: VM 301 on PVE2 (10.10.10.221)
+**URL**: https://aitrade.htsn.io (if accessible)
+**GPU**: RTX A6000 (48GB VRAM)
+
+**Components**:
+- Trading algorithms
+- AI models for market prediction
+- Real-time data feeds
+- Backtesting infrastructure
+
+**Access**: SSH only (no web UI documented)
+
+---
+
+### LM Dev (lmdev1)
+
+**Purpose**: AI/LLM development environment
+**Location**: VM 111 on PVE (10.10.10.111)
+**URL**: https://lmdev.htsn.io (if accessible)
+**GPU**: TITAN RTX (shared with Saltbox)
+
+**Installed**:
+- CUDA toolkit
+- Python 3.11+
+- PyTorch, TensorFlow
+- Hugging Face transformers
+
+---
+
+## Monitoring & Utilities
+
+### UPS Monitoring (NUT)
+
+**Purpose**: Monitor UPS status and trigger shutdowns
+**Location**: PVE (master), PVE2 (slave)
+**Access**: Command-line (`upsc`)
+
+**Key Commands**:
+```bash
+ssh pve 'upsc cyberpower@localhost'
+ssh pve 'upsc cyberpower@localhost ups.load'
+ssh pve 'upsc cyberpower@localhost battery.runtime'
+```
+
+**Home Assistant Integration**: UPS sensors exposed
+
+**See**: [UPS.md](UPS.md)
+
+---
+
+### Pulse Monitoring
+
+**Purpose**: Unknown monitoring dashboard
+**Location**: docker-host (10.10.10.206:7655)
+**URL**: https://pulse.htsn.io
+**Access**: Unknown
+
+**Needs documentation**:
+- What does it monitor?
+- How to configure?
+- Authentication?
+
+---
+
+### Tailscale VPN
+
+**Purpose**: Secure remote access to homelab
+**Subnet Routers**:
+- PVE (100.113.177.80) - Primary
+- UCG-Fiber (100.94.246.32) - Failover
+
+**Devices on Tailscale**:
+- Mac Mini: 100.108.89.58
+- PVE: 100.113.177.80
+- TrueNAS: 100.100.94.71
+- Pi-hole: 100.112.59.128
+
+**See**: [NETWORK.md](NETWORK.md)
+
+---
+
+## Custom Applications
+
+### FindShyt (CT 205)
+
+**Purpose**: Unknown custom application
+**Location**: LXC on PVE (10.10.10.8)
+**URL**: https://findshyt.htsn.io
+**Access**: Unknown
+
+**Needs documentation**:
+- What is this app?
+- How to use it?
+- Tech stack?
+
+---
+
+## Service Dependencies
+
+### Critical Dependencies
+
+```
+TrueNAS
+├── Plex (media files via NFS)
+├── *arr apps (downloads via NFS)
+├── Syncthing (central storage hub)
+└── Backups (if configured)
+
+Traefik (CT 202)
+├── All *.htsn.io services
+└── SSL certificate management
+
+Pi-hole
+└── DNS for entire network
+
+Router
+└── Gateway for all services
+```
+
+### Startup Order
+
+**See [VMS.md](VMS.md)** for VM boot order configuration:
+1. TrueNAS (storage first)
+2. Saltbox (depends on TrueNAS NFS)
+3. Other VMs
+4. Containers
+
+---
+
+## Service Port Reference
+
+### Well-Known Ports
+
+| Port | Service | Protocol | Purpose |
+|------|---------|----------|---------|
+| 22 | SSH | TCP | Remote access |
+| 53 | Pi-hole | UDP | DNS queries |
+| 80 | Traefik | TCP | HTTP (redirects to 443) |
+| 443 | Traefik | TCP | HTTPS |
+| 3000 | Gitea | TCP | Git HTTP/S |
+| 8006 | Proxmox | TCP | Web UI |
+| 8096 | Plex | TCP | Plex Media Server |
+| 8384 | Syncthing | TCP | Web UI |
+| 22000 | Syncthing | TCP | Sync protocol |
+
+### Internal Ports
+
+| Port | Service | Purpose |
+|------|---------|---------|
+| 3002 | Happy Server | Relay backend |
+| 5432 | PostgreSQL | Happy Server DB |
+| 6379 | Redis | Happy Server cache |
+| 7655 | Pulse | Monitoring |
+| 8080 | Excalidraw | Whiteboard |
+| 8080 | Traefik | Dashboard |
+| 9000 | MinIO | Object storage |
+
+---
+
+## Service Health Checks
+
+### Quick Health Check Script
+
+```bash
+#!/bin/bash
+# Check all critical services
+
+echo "=== Infrastructure ==="
+curl -Is https://pve.htsn.io:8006 | head -1
+curl -Is https://truenas.htsn.io | head -1
+curl -I http://10.10.10.10/admin 2>/dev/null | head -1
+echo ""
+
+echo "=== Media Services ==="
+curl -Is https://plex.htsn.io | head -1
+curl -Is https://sonarr.htsn.io | head -1
+curl -Is https://radarr.htsn.io | head -1
+echo ""
+
+echo "=== Development ==="
+curl -Is https://git.htsn.io | head -1
+curl -Is https://excalidraw.htsn.io | head -1
+echo ""
+
+echo "=== Home Automation ==="
+curl -Is https://homeassistant.htsn.io | head -1
+curl -Is https://happy.htsn.io/health | head -1
+```
+
+### Service-Specific Checks
+
+```bash
+# Proxmox VMs
+ssh pve 'qm list | grep running'
+
+# Docker services
+ssh docker-host 'docker ps --format "{{.Names}}: {{.Status}}"'
+
+# Syncthing
+curl -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" \
+  "http://127.0.0.1:8384/rest/system/status"
+
+# UPS
+ssh pve 'upsc cyberpower@localhost ups.status'
+```
+
+---
+
+## Service Credentials
+
+**Location**: See individual service documentation
+
+| Service | Credentials Location | Notes |
+|---------|---------------------|-------|
+| Proxmox | Proxmox UI | Username + 2FA |
+| TrueNAS | TrueNAS UI | Root password |
+| Plex | Plex account | Managed externally |
+| Gitea | Gitea DB | Self-managed |
+| Pi-hole | `/etc/pihole/setupVars.conf` | Admin password |
+| Happy Server | [CLAUDE.md](CLAUDE.md) | Master secret, DB passwords |
+
+**⚠️ Security Note**: Never commit credentials to Git. Use proper secrets management.
+
+---
+
+## Related Documentation
+
+- [VMS.md](VMS.md) - VM/service locations
+- [TRAEFIK.md](TRAEFIK.md) - Reverse proxy config
+- [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - Service IP addresses
+- [NETWORK.md](NETWORK.md) - Network configuration
+- [MONITORING.md](MONITORING.md) - Monitoring setup (coming soon)
+
+---
+
+**Last Updated**: 2025-12-22
+**Status**: ⚠️ Incomplete - many services need documentation (passwords, features, usage)
--- a/SSH-ACCESS.md
+++ b/SSH-ACCESS.md
@@ -0,0 +1,475 @@
+# SSH Access
+
+Documentation for SSH access to all homelab systems, including key authentication, password authentication for special cases, and QEMU guest agent usage.
+
+## Overview
+
+Most systems use **SSH key authentication** with the `~/.ssh/homelab` key. A few special cases require **password authentication** (router, Windows PC) due to platform limitations.
+
+**SSH Password**: `GrilledCh33s3#` (for systems without key auth)
+
+---
+
+## SSH Key Authentication (Primary Method)
+
+### SSH Key Configuration
+
+SSH keys are configured in `~/.ssh/config` on both Mac Mini and MacBook.
+
+**Key file**: `~/.ssh/homelab` (Ed25519 key)
+
+**Key deployed to**: All Proxmox hosts, VMs, and LXCs (13 total hosts)
+
+### Host Aliases
+
+Use these convenient aliases instead of IP addresses:
+
+| Host Alias | IP | User | Type | Notes |
+|------------|-----|------|------|-------|
+| `ucg-fiber` / `gateway` | 10.10.10.1 | root | UniFi Gateway | Router/firewall |
+| `pve` | 10.10.10.120 | root | Proxmox | Primary server |
+| `pve2` | 10.10.10.102 | root | Proxmox | Secondary server |
+| `truenas` | 10.10.10.200 | root | VM | NAS/storage |
+| `saltbox` | 10.10.10.100 | hutson | VM | Media automation |
+| `lmdev1` | 10.10.10.111 | hutson | VM | AI/LLM development |
+| `docker-host` | 10.10.10.206 | hutson | VM | Docker services (PVE) |
+| `docker-host2` | 10.10.10.207 | hutson | VM | Docker services (PVE2) - MetaMCP, n8n |
+| `fs-dev` | 10.10.10.5 | hutson | VM | Development |
+| `copyparty` | 10.10.10.201 | hutson | VM | File sharing |
+| `gitea-vm` | 10.10.10.220 | hutson | VM | Git server |
+| `trading-vm` | 10.10.10.221 | hutson | VM | AI trading platform |
+| `pihole` | 10.10.10.10 | root | LXC | DNS/Ad blocking |
+| `traefik` | 10.10.10.250 | root | LXC | Reverse proxy |
+| `findshyt` | 10.10.10.8 | root | LXC | Custom app |
+
+### Usage Examples
+
+```bash
+# List VMs on PVE
+ssh pve 'qm list'
+
+# Check ZFS pool on TrueNAS
+ssh truenas 'zpool status vault'
+
+# List Docker containers on Saltbox
+ssh saltbox 'docker ps'
+
+# Check Pi-hole status
+ssh pihole 'pihole status'
+
+# View Traefik config
+ssh pve 'pct exec 202 -- cat /etc/traefik/traefik.yaml'
+```
+
+### SSH Config File
+
+**Location**: `~/.ssh/config`
+
+**Example entries**:
+
+```sshconfig
+# Proxmox Servers
+Host pve
+    HostName 10.10.10.120
+    User root
+    IdentityFile ~/.ssh/homelab
+
+Host pve2
+    HostName 10.10.10.102
+    User root
+    IdentityFile ~/.ssh/homelab
+    # Post-quantum KEX causes MTU issues - use classic
+    KexAlgorithms curve25519-sha256
+
+# VMs
+Host truenas
+    HostName 10.10.10.200
+    User root
+    IdentityFile ~/.ssh/homelab
+
+Host saltbox
+    HostName 10.10.10.100
+    User hutson
+    IdentityFile ~/.ssh/homelab
+
+Host lmdev1
+    HostName 10.10.10.111
+    User hutson
+    IdentityFile ~/.ssh/homelab
+
+Host docker-host
+    HostName 10.10.10.206
+    User hutson
+    IdentityFile ~/.ssh/homelab
+
+Host docker-host2
+    HostName 10.10.10.207
+    User hutson
+    IdentityFile ~/.ssh/homelab
+
+Host fs-dev
+    HostName 10.10.10.5
+    User hutson
+    IdentityFile ~/.ssh/homelab
+
+Host copyparty
+    HostName 10.10.10.201
+    User hutson
+    IdentityFile ~/.ssh/homelab
+
+Host gitea-vm
+    HostName 10.10.10.220
+    User hutson
+    IdentityFile ~/.ssh/homelab
+
+Host trading-vm
+    HostName 10.10.10.221
+    User hutson
+    IdentityFile ~/.ssh/homelab
+
+# LXC Containers
+Host pihole
+    HostName 10.10.10.10
+    User root
+    IdentityFile ~/.ssh/homelab
+
+Host traefik
+    HostName 10.10.10.250
+    User root
+    IdentityFile ~/.ssh/homelab
+
+Host findshyt
+    HostName 10.10.10.8
+    User root
+    IdentityFile ~/.ssh/homelab
+```
+
+---
+
+## Password Authentication (Special Cases)
+
+Some systems don't support SSH key auth or have other limitations.
+
+### UniFi Router (10.10.10.1) - NOW USES KEY AUTH
+
+**Host alias**: `ucg-fiber` or `gateway`
+
+**Status**: SSH key authentication now works (as of 2026-01-02)
+
+**Commands**:
+
+```bash
+# Run command on router (using SSH key)
+ssh ucg-fiber 'hostname'
+
+# Get ARP table (all device IPs)
+ssh ucg-fiber 'cat /proc/net/arp'
+
+# Check Tailscale status
+ssh ucg-fiber 'tailscale status'
+
+# Check memory usage
+ssh ucg-fiber 'free -m'
+```
+
+**Note**: Key may need to be re-deployed after firmware updates if UniFi clears authorized_keys.
+
+### Windows PC (10.10.10.150)
+
+**OS**: Windows with OpenSSH server
+**User**: `claude`
+**Password**: `GrilledCh33s3#`
+**Shell**: PowerShell (not bash)
+
+**Commands**:
+
+```bash
+# Run PowerShell command
+sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Get-Process | Select -First 5'
+
+# Check Syncthing status
+sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Get-Process -Name syncthing -ErrorAction SilentlyContinue'
+
+# Restart Syncthing
+sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Stop-Process -Name syncthing -Force; Start-ScheduledTask -TaskName "Syncthing"'
+```
+
+**⚠️ Important**: Use `;` (semicolon) to chain PowerShell commands, NOT `&&` (bash syntax).
+
+**Why not key auth?**: Could be configured, but password auth works and is simpler for Windows.
+
+---
+
+## QEMU Guest Agent
+
+Most VMs have the QEMU guest agent installed, allowing command execution without SSH.
+
+### VMs with QEMU Agent
+
+| VMID | VM Name | Use Case |
+|------|---------|----------|
+| 100 | truenas | Execute commands, check ZFS |
+| 101 | saltbox | Execute commands, Docker mgmt |
+| 105 | fs-dev | Execute commands |
+| 111 | lmdev1 | Execute commands |
+| 201 | copyparty | Execute commands |
+| 206 | docker-host | Execute commands |
+| 300 | gitea-vm | Execute commands |
+| 301 | trading-vm | Execute commands |
+
+### VM WITHOUT QEMU Agent
+
+**VMID 110 (homeassistant)**: No QEMU agent installed
+- Access via web UI only
+- Or install SSH server manually if needed
+
+### Usage Examples
+
+**Basic syntax**:
+```bash
+ssh pve 'qm guest exec VMID -- bash -c "COMMAND"'
+```
+
+**Examples**:
+
+```bash
+# Check ZFS pool on TrueNAS (without SSH)
+ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
+
+# Get VM IP addresses
+ssh pve 'qm guest exec 100 -- bash -c "ip addr"'
+
+# Check Docker containers on Saltbox
+ssh pve 'qm guest exec 101 -- bash -c "docker ps"'
+
+# Run multi-line command
+ssh pve 'qm guest exec 100 -- bash -c "df -h; free -h; uptime"'
+```
+
+**When to use QEMU agent vs SSH**:
+- ✅ Use **SSH** for interactive sessions, file editing, complex tasks
+- ✅ Use **QEMU agent** for one-off commands, when SSH is down, or VM has no network
+- ⚠️ QEMU agent is slower for multiple commands (use SSH instead)
+
+---
+
+## Troubleshooting SSH Issues
+
+### Connection Refused
+
+```bash
+# Check if SSH service is running
+ssh pve 'systemctl status sshd'
+
+# Check if port 22 is open
+nc -zv 10.10.10.XXX 22
+
+# Check firewall
+ssh pve 'iptables -L -n | grep 22'
+```
+
+### Permission Denied (Public Key)
+
+```bash
+# Verify key file exists
+ls -la ~/.ssh/homelab
+
+# Check key permissions (should be 600)
+chmod 600 ~/.ssh/homelab
+
+# Test SSH key auth verbosely
+ssh -vvv -i ~/.ssh/homelab root@10.10.10.120
+
+# Check authorized_keys on remote (via QEMU agent if SSH broken)
+ssh pve 'qm guest exec VMID -- bash -c "cat ~/.ssh/authorized_keys"'
+```
+
+### Slow SSH Connection (PVE2 Issue)
+
+**Problem**: SSH to PVE2 hangs for 30+ seconds before connecting
+**Cause**: MTU mismatch (vmbr0=9000, nic1=1500) causing post-quantum KEX packet fragmentation
+**Fix**: Use classic KEX algorithm instead
+
+**In `~/.ssh/config`**:
+```sshconfig
+Host pve2
+    HostName 10.10.10.102
+    User root
+    IdentityFile ~/.ssh/homelab
+    KexAlgorithms curve25519-sha256  # Avoid mlkem768x25519-sha256
+```
+
+**Permanent fix**: Set `nic1` MTU to 9000 in `/etc/network/interfaces` on PVE2
+
+---
+
+## Adding SSH Keys to New Systems
+
+### Linux (VMs/LXCs)
+
+```bash
+# Copy public key to new host
+ssh-copy-id -i ~/.ssh/homelab user@hostname
+
+# Or manually:
+ssh user@hostname 'mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys' < ~/.ssh/homelab.pub
+ssh user@hostname 'chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys'
+```
+
+### LXC Containers (Root User)
+
+```bash
+# Via pct exec from Proxmox host
+ssh pve 'pct exec CTID -- bash -c "mkdir -p /root/.ssh"'
+ssh pve 'pct exec CTID -- bash -c "echo \"$(cat ~/.ssh/homelab.pub)\" >> /root/.ssh/authorized_keys"'
+ssh pve 'pct exec CTID -- bash -c "chmod 700 /root/.ssh && chmod 600 /root/.ssh/authorized_keys"'
+
+# Also enable PermitRootLogin in sshd_config
+ssh pve 'pct exec CTID -- bash -c "sed -i \"s/^#*PermitRootLogin.*/PermitRootLogin prohibit-password/\" /etc/ssh/sshd_config"'
+ssh pve 'pct exec CTID -- bash -c "systemctl restart sshd"'
+```
+
+### VMs (via QEMU Agent)
+
+```bash
+# Add key via QEMU agent (if SSH not working)
+ssh pve 'qm guest exec VMID -- bash -c "mkdir -p ~/.ssh"'
+ssh pve 'qm guest exec VMID -- bash -c "echo \"$(cat ~/.ssh/homelab.pub)\" >> ~/.ssh/authorized_keys"'
+ssh pve 'qm guest exec VMID -- bash -c "chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys"'
+```
+
+---
+
+## SSH Key Management
+
+### Rotate SSH Keys (Future)
+
+When rotating SSH keys:
+
+1. Generate new key pair:
+   ```bash
+   ssh-keygen -t ed25519 -f ~/.ssh/homelab-new -C "homelab-new"
+   ```
+
+2. Deploy new key to all hosts (keep old key for now):
+   ```bash
+   for host in pve pve2 truenas saltbox lmdev1 docker-host fs-dev copyparty gitea-vm trading-vm pihole traefik findshyt; do
+     ssh-copy-id -i ~/.ssh/homelab-new $host
+   done
+   ```
+
+3. Update `~/.ssh/config` to use new key:
+   ```sshconfig
+   IdentityFile ~/.ssh/homelab-new
+   ```
+
+4. Test all connections:
+   ```bash
+   for host in pve pve2 truenas saltbox lmdev1 docker-host fs-dev copyparty gitea-vm trading-vm pihole traefik findshyt; do
+     echo "Testing $host..."
+     ssh $host 'hostname'
+   done
+   ```
+
+5. Remove old key from all hosts once confirmed working
+
+---
+
+## Quick Reference
+
+### Common SSH Operations
+
+```bash
+# Execute command on remote host
+ssh host 'command'
+
+# Execute multiple commands
+ssh host 'command1 && command2'
+
+# Copy file to remote
+scp file host:/path/
+
+# Copy file from remote
+scp host:/path/file ./
+
+# Execute command on Proxmox VM (via QEMU agent)
+ssh pve 'qm guest exec VMID -- bash -c "command"'
+
+# Execute command on LXC
+ssh pve 'pct exec CTID -- command'
+
+# Interactive shell
+ssh host
+
+# SSH with X11 forwarding
+ssh -X host
+```
+
+### Troubleshooting Commands
+
+```bash
+# Test SSH with verbose output
+ssh -vvv host
+
+# Check SSH service status (remote)
+ssh host 'systemctl status sshd'
+
+# Check SSH config (local)
+ssh -G host
+
+# Test port connectivity
+nc -zv hostname 22
+```
+
+---
+
+## Security Best Practices
+
+### Current Security Posture
+
+✅ **Good**:
+- SSH keys used instead of passwords (where possible)
+- Keys use Ed25519 (modern, secure algorithm)
+- Root login disabled on VMs (use sudo instead)
+- SSH keys have proper permissions (600)
+
+⚠️ **Could Improve**:
+- [ ] Disable password authentication on all hosts (force key-only)
+- [ ] Use SSH certificate authority instead of individual keys
+- [ ] Set up SSH bastion host (jump server)
+- [ ] Enable 2FA for SSH (via PAM + Google Authenticator)
+- [ ] Implement SSH key rotation policy (annually)
+
+### Hardening SSH (Future)
+
+For additional security, consider:
+
+```sshconfig
+# /etc/ssh/sshd_config (on remote hosts)
+PermitRootLogin prohibit-password  # No root password login
+PasswordAuthentication no          # Disable password auth entirely
+PubkeyAuthentication yes           # Only allow key auth
+AuthorizedKeysFile .ssh/authorized_keys
+MaxAuthTries 3                     # Limit auth attempts
+MaxSessions 10                     # Limit concurrent sessions
+ClientAliveInterval 300            # Timeout idle sessions
+ClientAliveCountMax 2              # Drop after 2 keepalives
+```
+
+**Apply after editing**:
+```bash
+systemctl restart sshd
+```
+
+---
+
+## Related Documentation
+
+- [VMS.md](VMS.md) - Complete VM/CT inventory
+- [NETWORK.md](NETWORK.md) - Network configuration
+- [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - IP addresses for all hosts
+- [SECURITY.md](#) - Security policies (coming soon)
+
+---
+
+**Last Updated**: 2025-12-22
--- a/STORAGE.md
+++ b/STORAGE.md
@@ -0,0 +1,510 @@
+# Storage Architecture
+
+Documentation of all storage pools, datasets, shares, and capacity planning across the homelab.
+
+## Overview
+
+### Storage Distribution
+
+| Location | Type | Capacity | Purpose |
+|----------|------|----------|---------|
+| **PVE** | NVMe + SSD mirrors | ~9 TB usable | VM storage, fast IO |
+| **PVE2** | NVMe + HDD mirrors | ~6+ TB usable | VM storage, bulk data |
+| **TrueNAS** | ZFS pool + EMC enclosure | ~12+ TB usable | Central file storage, NFS/SMB |
+
+---
+
+## PVE (10.10.10.120) Storage Pools
+
+### nvme-mirror1 (Primary Fast Storage)
+- **Type**: ZFS mirror
+- **Devices**: 2x Sabrent Rocket Q NVMe
+- **Capacity**: 3.6 TB usable
+- **Purpose**: High-performance VM storage
+- **Used By**:
+  - Critical VMs requiring fast IO
+  - Database workloads
+  - Development environments
+
+**Check status**:
+```bash
+ssh pve 'zpool status nvme-mirror1'
+ssh pve 'zpool list nvme-mirror1'
+```
+
+### nvme-mirror2 (Secondary Fast Storage)
+- **Type**: ZFS mirror
+- **Devices**: 2x Kingston SFYRD 2TB NVMe
+- **Capacity**: 1.8 TB usable
+- **Purpose**: Additional fast VM storage
+- **Used By**: TBD
+
+**Check status**:
+```bash
+ssh pve 'zpool status nvme-mirror2'
+ssh pve 'zpool list nvme-mirror2'
+```
+
+### rpool (Root Pool)
+- **Type**: ZFS mirror
+- **Devices**: 2x Samsung 870 QVO 4TB SSD
+- **Capacity**: 3.6 TB usable
+- **Purpose**: Proxmox OS, container storage, VM backups
+- **Used By**:
+  - Proxmox root filesystem
+  - LXC containers
+  - Local VM backups
+
+**Check status**:
+```bash
+ssh pve 'zpool status rpool'
+ssh pve 'df -h /var/lib/vz'
+```
+
+### Storage Pool Usage Summary (PVE)
+
+**Get current usage**:
+```bash
+ssh pve 'zpool list'
+ssh pve 'pvesm status'
+```
+
+---
+
+## PVE2 (10.10.10.102) Storage Pools
+
+### nvme-mirror3 (Fast Storage)
+- **Type**: ZFS mirror
+- **Devices**: 2x NVMe (model unknown)
+- **Capacity**: Unknown (needs investigation)
+- **Purpose**: High-performance VM storage
+- **Used By**: Trading VM (301), other VMs
+
+**Check status**:
+```bash
+ssh pve2 'zpool status nvme-mirror3'
+ssh pve2 'zpool list nvme-mirror3'
+```
+
+### local-zfs2 (Bulk Storage)
+- **Type**: ZFS mirror
+- **Devices**: 2x WD Red 6TB HDD
+- **Capacity**: ~6 TB usable
+- **Purpose**: Bulk/archival storage
+- **Power Management**: 30-minute spindown configured
+  - Saves ~10-16W when idle
+  - Udev rule: `/etc/udev/rules.d/69-hdd-spindown.rules`
+  - Command: `hdparm -S 241` (30 min)
+
+**Notes**:
+- Pool had only 768 KB used as of 2024-12-16
+- Drives configured to spin down after 30 min idle
+- Good for archival, NOT for active workloads
+
+**Check status**:
+```bash
+ssh pve2 'zpool status local-zfs2'
+ssh pve2 'zpool list local-zfs2'
+
+# Check if drives are spun down
+ssh pve2 'hdparm -C /dev/sdX'  # Shows active/standby
+```
+
+---
+
+## TrueNAS (VM 100 @ 10.10.10.200) - Central Storage
+
+### ZFS Pool: vault
+
+**Primary storage pool** for all shared data.
+
+**Devices**: ❓ Needs investigation
+- EMC storage enclosure with multiple drives
+- SAS connection via LSI SAS2308 HBA (passed through to VM)
+
+**Capacity**: ❓ Needs investigation
+
+**Check pool status**:
+```bash
+ssh truenas 'zpool status vault'
+ssh truenas 'zpool list vault'
+
+# Get detailed capacity
+ssh truenas 'zfs list -o name,used,avail,refer,mountpoint'
+```
+
+### Datasets (Known)
+
+Based on Syncthing configuration, likely datasets:
+
+| Dataset | Purpose | Synced Devices | Notes |
+|---------|---------|----------------|-------|
+| vault/documents | Personal documents | Mac Mini, MacBook, Windows PC, Phone | ~11 GB |
+| vault/downloads | Downloads folder | Mac Mini, TrueNAS | ~38 GB |
+| vault/pictures | Photos | Mac Mini, MacBook, Phone | Unknown size |
+| vault/notes | Note files | Mac Mini, MacBook, Phone | Unknown size |
+| vault/desktop | Desktop sync | Unknown | 7.2 GB |
+| vault/movies | Movie library | Unknown | Unknown size |
+| vault/config | Config files | Mac Mini, MacBook | Unknown size |
+
+**Get complete dataset list**:
+```bash
+ssh truenas 'zfs list -r vault'
+```
+
+### NFS/SMB Shares
+
+**Status**: ❓ Not documented
+
+**Needs investigation**:
+```bash
+# List NFS exports
+ssh truenas 'showmount -e localhost'
+
+# List SMB shares
+ssh truenas 'smbclient -L localhost -N'
+
+# Via TrueNAS API/UI
+# Sharing → Unix Shares (NFS)
+# Sharing → Windows Shares (SMB)
+```
+
+**Expected shares**:
+- Media libraries for Plex (on Saltbox VM)
+- Document storage
+- VM backups?
+- ISO storage?
+
+### EMC Storage Enclosure
+
+**Model**: EMC KTN-STL4 (or similar)
+**Connection**: SAS via LSI SAS2308 HBA (passthrough to TrueNAS VM)
+**Drives**: ❓ Unknown count and capacity
+
+**See [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md)** for:
+- SES commands
+- Fan control
+- LCC (Link Control Card) troubleshooting
+- Maintenance procedures
+
+**Check enclosure status**:
+```bash
+ssh truenas 'sg_ses --page=0x02 /dev/sgX'  # Element descriptor
+ssh truenas 'smartctl --scan'              # List all drives
+```
+
+---
+
+## Storage Network Architecture
+
+### Internal Storage Network (10.10.10.20.0/24)
+
+**Purpose**: Dedicated network for NFS/iSCSI traffic to reduce congestion on main network.
+
+**Bridge**: vmbr3 on PVE (virtual bridge, no physical NIC)
+**Subnet**: 10.10.10.20.0/24
+**DHCP**: No
+**Gateway**: No (internal only, no internet)
+
+**Connected VMs**:
+- TrueNAS VM (secondary NIC)
+- Saltbox VM (secondary NIC) - for NFS mounts
+- Other VMs needing storage access
+
+**Configuration**:
+```bash
+# On TrueNAS VM - check second NIC
+ssh truenas 'ip addr show enp6s19'
+
+# On Saltbox - check NFS mounts
+ssh saltbox 'mount | grep nfs'
+```
+
+**Benefits**:
+- Separates storage traffic from general network
+- Prevents NFS/SMB from saturating main network
+- Better performance for storage-heavy workloads
+
+---
+
+## Storage Capacity Planning
+
+### Current Usage (Estimate)
+
+**Needs actual audit**:
+```bash
+# PVE pools
+ssh pve 'zpool list -o name,size,alloc,free'
+
+# PVE2 pools
+ssh pve2 'zpool list -o name,size,alloc,free'
+
+# TrueNAS vault pool
+ssh truenas 'zpool list vault'
+
+# Get detailed breakdown
+ssh truenas 'zfs list -r vault -o name,used,avail'
+```
+
+### Growth Rate
+
+**Needs tracking** - recommend monthly snapshots of capacity:
+
+```bash
+#!/bin/bash
+# Save as ~/bin/storage-capacity-report.sh
+
+DATE=$(date +%Y-%m-%d)
+REPORT=~/Backups/storage-reports/capacity-$DATE.txt
+
+mkdir -p ~/Backups/storage-reports
+
+echo "Storage Capacity Report - $DATE" > $REPORT
+echo "================================" >> $REPORT
+echo "" >> $REPORT
+
+echo "PVE Pools:" >> $REPORT
+ssh pve 'zpool list' >> $REPORT
+echo "" >> $REPORT
+
+echo "PVE2 Pools:" >> $REPORT
+ssh pve2 'zpool list' >> $REPORT
+echo "" >> $REPORT
+
+echo "TrueNAS Pools:" >> $REPORT
+ssh truenas 'zpool list' >> $REPORT
+echo "" >> $REPORT
+
+echo "TrueNAS Datasets:" >> $REPORT
+ssh truenas 'zfs list -r vault -o name,used,avail' >> $REPORT
+
+echo "Report saved to $REPORT"
+```
+
+**Run monthly via cron**:
+```cron
+0 9 1 * * ~/bin/storage-capacity-report.sh
+```
+
+### Expansion Planning
+
+**When to expand**:
+- Pool reaches 80% capacity
+- Performance degrades
+- New workloads require more space
+
+**Expansion options**:
+1. Add drives to existing pools (if mirrors, add mirror vdev)
+2. Add new NVMe drives to PVE/PVE2
+3. Expand EMC enclosure (add more drives)
+4. Add second EMC enclosure
+
+**Cost estimates**: TBD
+
+---
+
+## ZFS Health Monitoring
+
+### Daily Health Checks
+
+```bash
+# Check for errors on all pools
+ssh pve 'zpool status -x'     # Shows only unhealthy pools
+ssh pve2 'zpool status -x'
+ssh truenas 'zpool status -x'
+
+# Check scrub status
+ssh pve 'zpool status | grep scrub'
+ssh pve2 'zpool status | grep scrub'
+ssh truenas 'zpool status | grep scrub'
+```
+
+### Scrub Schedule
+
+**Recommended**: Monthly scrub on all pools
+
+**Configure scrub**:
+```bash
+# Via Proxmox UI: Node → Disks → ZFS → Select pool → Scrub
+# Or via cron:
+0 2 1 * * /sbin/zpool scrub nvme-mirror1
+0 2 1 * * /sbin/zpool scrub rpool
+```
+
+**On TrueNAS**:
+- Configure via UI: Storage → Pools → Scrub Tasks
+- Recommended: 1st of every month at 2 AM
+
+### SMART Monitoring
+
+**Check drive health**:
+```bash
+# PVE
+ssh pve 'smartctl -a /dev/nvme0'
+ssh pve 'smartctl -a /dev/sda'
+
+# TrueNAS
+ssh truenas 'smartctl --scan'
+ssh truenas 'smartctl -a /dev/sdX'  # For each drive
+```
+
+**Configure SMART tests**:
+- TrueNAS UI: Tasks → S.M.A.R.T. Tests
+- Recommended: Weekly short test, monthly long test
+
+### Alerts
+
+**Set up email alerts for**:
+- ZFS pool errors
+- SMART test failures
+- Pool capacity > 80%
+- Scrub failures
+
+---
+
+## Storage Performance Tuning
+
+### ZFS ARC (Cache)
+
+**Check ARC usage**:
+```bash
+ssh pve 'arc_summary'
+ssh truenas 'arc_summary'
+```
+
+**Tuning** (if needed):
+- PVE/PVE2: Set max ARC in `/etc/modprobe.d/zfs.conf`
+- TrueNAS: Configure via UI (System → Advanced → Tunables)
+
+### NFS Performance
+
+**Mount options** (on clients like Saltbox):
+```
+rsize=131072,wsize=131072,hard,timeo=600,retrans=2,vers=3
+```
+
+**Verify NFS mounts**:
+```bash
+ssh saltbox 'mount | grep nfs'
+```
+
+### Record Size Optimization
+
+**Different workloads need different record sizes**:
+- VMs: 64K (default, good for VMs)
+- Databases: 8K or 16K
+- Media files: 1M (large sequential reads)
+
+**Set record size** (on TrueNAS datasets):
+```bash
+ssh truenas 'zfs set recordsize=1M vault/movies'
+```
+
+---
+
+## Disaster Recovery
+
+### Pool Recovery
+
+**If a pool fails to import**:
+```bash
+# Try importing with different name
+zpool import -f -N poolname newpoolname
+
+# Check pool with readonly
+zpool import -f -o readonly=on poolname
+
+# Force import (last resort)
+zpool import -f -F poolname
+```
+
+### Drive Replacement
+
+**When a drive fails**:
+```bash
+# Identify failed drive
+zpool status poolname
+
+# Replace drive
+zpool replace poolname old-device new-device
+
+# Monitor resilver
+watch zpool status poolname
+```
+
+### Data Recovery
+
+**If pool is completely lost**:
+1. Restore from offsite backup (see [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md))
+2. Recreate pool structure
+3. Restore data
+
+**Critical**: This is why we need offsite backups!
+
+---
+
+## Quick Reference
+
+### Common Commands
+
+```bash
+# Pool status
+zpool status [poolname]
+zpool list
+
+# Dataset usage
+zfs list
+zfs list -r vault
+
+# Check pool health (only unhealthy)
+zpool status -x
+
+# Scrub pool
+zpool scrub poolname
+
+# Get pool IO stats
+zpool iostat -v 1
+
+# Snapshot management
+zfs snapshot poolname/dataset@snapname
+zfs list -t snapshot
+zfs rollback poolname/dataset@snapname
+zfs destroy poolname/dataset@snapname
+```
+
+### Storage Locations by Use Case
+
+| Use Case | Recommended Storage | Why |
+|----------|---------------------|-----|
+| VM OS disk | nvme-mirror1 (PVE) | Fastest IO |
+| Database | nvme-mirror1/2 | Low latency |
+| Media files | TrueNAS vault | Large capacity |
+| Development | nvme-mirror2 | Fast, mid-tier |
+| Containers | rpool | Good performance |
+| Backups | TrueNAS or rpool | Large capacity |
+| Archive | local-zfs2 (PVE2) | Cheap, can spin down |
+
+---
+
+## Investigation Needed
+
+- [ ] Get complete TrueNAS dataset list
+- [ ] Document NFS/SMB share configuration
+- [ ] Inventory EMC enclosure drives (count, capacity, model)
+- [ ] Document current pool usage percentages
+- [ ] Set up monthly capacity reports
+- [ ] Configure ZFS scrub schedules
+- [ ] Set up storage health alerts
+
+---
+
+## Related Documentation
+
+- [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) - Backup and snapshot strategy
+- [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) - Storage enclosure maintenance
+- [VMS.md](VMS.md) - VM storage assignments
+- [NETWORK.md](NETWORK.md) - Storage network configuration
+
+---
+
+**Last Updated**: 2025-12-22
--- a/SYNCTHING.md
+++ b/SYNCTHING.md
@@ -63,6 +63,20 @@ curl -sk "https://10.10.10.54:8384/rest/system/status" -H "X-API-Key: $API_KEY"
 curl -sk "https://100.106.175.37:8384/rest/system/status" -H "X-API-Key: $API_KEY"
 ```

+### TrueNAS (Docker Container)
+```bash
+API_KEY="LNWnrRmeyrw4dbngSmJMYN4a5Z2VnhSE"
+# Access via Tailscale (port 20910, not 8384)
+curl -s "http://100.100.94.71:20910/rest/system/status" -H "X-API-Key: $API_KEY"
+# Or via local network
+curl -s "http://10.10.10.200:20910/rest/system/status" -H "X-API-Key: $API_KEY"
+```
+
+**Note:** TrueNAS Syncthing runs in Docker with:
+- Config: `/mnt/.ix-apps/app_mounts/syncthing/config`
+- Data: `/mnt/vault/shares/syncthing` → mounted as `/data` in container
+- Container name: `ix-syncthing-syncthing-1`
+
 ## Common Commands

 ### Check Status
--- a/TAILSCALE.md
+++ b/TAILSCALE.md
@@ -0,0 +1,296 @@
+# Tailscale VPN Configuration
+
+## Overview
+
+Tailscale provides secure remote access to the homelab via a mesh VPN. This document covers the configuration, subnet routing, and critical gotchas learned from troubleshooting.
+
+---
+
+## Network Architecture
+
+```
+Remote Clients (MacBook, Phone)
+        │
+        ▼ Tailscale Mesh (100.x.x.x)
+        │
+┌───────┴────────┐
+│                │
+▼                ▼
+PVE (Subnet Router)    UCG-Fiber (Gateway)
+100.113.177.80         100.94.246.32
+     │                      │
+     │    10.10.10.0/24     │
+     └──────────┬───────────┘
+               │
+        ┌──────┴──────┐
+        │             │
+     PiHole      TrueNAS
+   10.10.10.10   10.10.10.200
+```
+
+---
+
+## Device Configuration
+
+| Device | Tailscale IP | Role | Accept Routes | Advertise Routes |
+|--------|--------------|------|---------------|------------------|
+| **PVE** | 100.113.177.80 | Subnet Router (Primary) | **NO** | 10.10.10.0/24, 10.10.20.0/24 |
+| **UCG-Fiber** | 100.94.246.32 | Gateway (backup) | **NO** | (disabled) |
+| **PiHole** | 100.112.59.128 | DNS Server | **NO** | None |
+| **TrueNAS** | 100.100.94.71 | NAS | Yes | None |
+| **Mac-Mini** | 100.108.89.58 | Desktop | Yes | None |
+| **MacBook** | 100.88.161.1 | Laptop | Yes | None |
+| **Phone** | 100.106.175.37 | Mobile | Yes | None |
+
+---
+
+## Critical Configuration Rules
+
+### 1. Devices on the Advertised Subnet MUST Have `--accept-routes=false`
+
+**Problem:** If a device is directly connected to 10.10.10.0/24 AND has `--accept-routes=true`, Tailscale will route local subnet traffic through the mesh instead of the local interface.
+
+**Symptom:** Device can't reach neighbors on the same subnet; `ip route get 10.10.10.X` shows `dev tailscale0` instead of the local interface.
+
+**Fix:**
+```bash
+# On any device directly connected to 10.10.10.0/24
+tailscale set --accept-routes=false
+```
+
+**Affected devices:**
+- UCG-Fiber (gateway) - directly on 10.10.10.0/24
+- PiHole - directly on 10.10.10.0/24
+- PVE - directly on 10.10.10.0/24 (but is the subnet router, so different)
+
+### 2. Only ONE Device Should Be Primary Subnet Router
+
+**Problem:** Multiple devices advertising the same subnet can cause routing conflicts or failover issues.
+
+**Current Setup:**
+- **PVE** is the primary subnet router for both 10.10.10.0/24 and 10.10.20.0/24
+- **UCG-Fiber** has subnet advertisement DISABLED (was causing relay-only connections)
+
+**To change subnet router:**
+1. Go to https://login.tailscale.com/admin/machines
+2. Disable route on old device, enable on new device
+3. Or set primary if both advertise
+
+### 3. VPNs on Tailscale Devices Can Break Connectivity
+
+**Problem:** A full-tunnel VPN (like ProtonVPN with `AllowedIPs = 0.0.0.0/0`) will route Tailscale's DERP/STUN traffic through the VPN, breaking NAT traversal.
+
+**Symptom:** Device shows relay-only connections with asymmetric traffic (high TX, near-zero RX).
+
+**Fix:** Use split-tunnel configuration that excludes Tailscale traffic. See [PiHole ProtonVPN Configuration](#pihole-protonvpn-split-tunnel) below.
+
+---
+
+## DNS Configuration
+
+### Tailscale Admin DNS Settings
+- **Nameserver:** 10.10.10.10 (PiHole via subnet route)
+- **Fallback:** None configured
+
+### How DNS Works
+1. Remote client enables "Use Tailscale DNS"
+2. DNS queries go to 10.10.10.10
+3. Traffic routes through PVE (subnet router) to PiHole
+4. PiHole resolves via Unbound (recursive) through ProtonVPN
+
+---
+
+## Subnet Routing
+
+### Current Primary Routes
+```
+PVE advertises:
+  - 10.10.10.0/24 (LAN)
+  - 10.10.20.0/24 (Storage network)
+```
+
+### Verifying Routes
+```bash
+# From MacBook - check who's advertising routes
+tailscale status --json | python3 -c "
+import sys, json
+data = json.load(sys.stdin)
+for peer in data.get('Peer', {}).values():
+    routes = peer.get('PrimaryRoutes', [])
+    if routes:
+        print(f\"{peer.get('HostName')}: {routes}\")"
+```
+
+### Testing Subnet Connectivity
+```bash
+# Test from remote client
+ping 10.10.10.10      # PiHole
+ping 10.10.10.120     # PVE
+ping 10.10.10.1       # Gateway
+dig @10.10.10.10 google.com  # DNS
+```
+
+---
+
+## PiHole ProtonVPN Split-Tunnel
+
+PiHole runs a WireGuard tunnel to ProtonVPN for encrypted upstream DNS queries. The configuration uses policy-based routing to ONLY route Unbound's DNS traffic through the VPN.
+
+### Configuration File: `/etc/wireguard/piehole.conf`
+
+```ini
+[Interface]
+PrivateKey = <key>
+Address = 10.2.0.2/32
+# CRITICAL: Disable automatic routing - we handle it manually
+Table = off
+
+# Policy routing: only route Unbound DNS through VPN
+PostUp = ip route add default dev %i table 51820
+PostUp = ip rule add fwmark 0x51820 table 51820 priority 100
+PostUp = iptables -t mangle -N UNBOUND_VPN 2>/dev/null || true
+PostUp = iptables -t mangle -F UNBOUND_VPN
+PostUp = iptables -t mangle -A UNBOUND_VPN -d 10.0.0.0/8 -j RETURN
+PostUp = iptables -t mangle -A UNBOUND_VPN -d 127.0.0.0/8 -j RETURN
+PostUp = iptables -t mangle -A UNBOUND_VPN -d 100.64.0.0/10 -j RETURN
+PostUp = iptables -t mangle -A UNBOUND_VPN -d 192.168.0.0/16 -j RETURN
+PostUp = iptables -t mangle -A UNBOUND_VPN -d 172.16.0.0/12 -j RETURN
+PostUp = iptables -t mangle -A UNBOUND_VPN -j MARK --set-mark 0x51820
+PostUp = iptables -t mangle -A OUTPUT -p udp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN
+PostUp = iptables -t mangle -A OUTPUT -p tcp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN
+PostUp = iptables -t nat -A POSTROUTING -o %i -j MASQUERADE
+
+PostDown = iptables -t mangle -D OUTPUT -p udp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN
+PostDown = iptables -t mangle -D OUTPUT -p tcp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN
+PostDown = iptables -t mangle -F UNBOUND_VPN
+PostDown = iptables -t mangle -X UNBOUND_VPN
+PostDown = ip rule del fwmark 0x51820 table 51820 priority 100
+PostDown = ip route del default dev %i table 51820
+PostDown = iptables -t nat -D POSTROUTING -o %i -j MASQUERADE
+
+[Peer]
+PublicKey = <ProtonVPN-key>
+AllowedIPs = 0.0.0.0/0, ::/0
+Endpoint = 149.102.242.1:51820
+PersistentKeepalive = 25
+```
+
+**Key Points:**
+- `Table = off` prevents wg-quick from adding default routes
+- Only traffic from the `unbound` user to port 53 gets marked and routed through VPN
+- Local, private, and Tailscale (100.64.0.0/10) traffic is excluded
+
+---
+
+## Troubleshooting
+
+### Symptom: Can't reach subnet (10.10.10.x) from remote
+
+**Check 1:** Is PVE online and advertising routes?
+```bash
+tailscale status | grep pve
+# Should show "active" not "offline"
+```
+
+**Check 2:** Is PVE the primary subnet router?
+```bash
+tailscale status --json | python3 -c "..." # See above
+```
+
+**Check 3:** Can PVE reach the target on local network?
+```bash
+ssh pve 'ping -c 1 10.10.10.10'
+```
+
+### Symptom: Device shows "relay" with asymmetric traffic (high TX, low RX)
+
+**Cause:** Usually a VPN or firewall blocking Tailscale's UDP traffic.
+
+**Check:** Run netcheck on the affected device:
+```bash
+tailscale netcheck
+```
+
+Look for:
+- Wrong external IP (indicates VPN routing issue)
+- Missing DERP latencies
+- `MappingVariesByDestIP: true` with no direct connections
+
+### Symptom: Local devices can't reach each other
+
+**Cause:** `--accept-routes=true` on a device that's directly on the subnet.
+
+**Fix:**
+```bash
+# Check current setting
+tailscale debug prefs | grep -i route
+
+# Disable accept-routes
+tailscale set --accept-routes=false
+```
+
+### Symptom: Gateway can ping Tailscale IPs but not local IPs
+
+**Check routing:**
+```bash
+ip route get 10.10.10.120
+# If it shows "dev tailscale0" instead of "dev br0", that's the problem
+```
+
+**Fix:** `tailscale set --accept-routes=false` on the gateway
+
+---
+
+## Maintenance Commands
+
+### Restart Tailscale
+```bash
+# On Linux
+systemctl restart tailscaled
+
+# Check status
+tailscale status
+```
+
+### Re-advertise Routes (PVE)
+```bash
+tailscale set --advertise-routes=10.10.10.0/24,10.10.20.0/24
+```
+
+### Check Connection Type
+```bash
+# Shows direct vs relay for each peer
+tailscale status
+
+# Detailed ping with path info
+tailscale ping <tailscale-ip>
+```
+
+### Force Re-connection
+```bash
+tailscale down && tailscale up
+```
+
+---
+
+## Known Issues
+
+### UCG-Fiber Relay-Only Connections
+The UniFi gateway sometimes fails to establish direct Tailscale connections, falling back to relay. This appears related to memory pressure or the gateway's NAT implementation. Current workaround: use PVE as the subnet router instead.
+
+### Gateway Memory Pressure
+The UCG-Fiber has limited RAM (~3GB) and can become unstable under load. The internet-watchdog service will auto-reboot if connectivity is lost. See [GATEWAY.md](GATEWAY.md).
+
+---
+
+## Change History
+
+### 2026-01-05
+- Switched subnet router from UCG-Fiber to PVE
+- Fixed PiHole ProtonVPN from full-tunnel to split-tunnel (DNS-only)
+- Disabled `--accept-routes` on UCG-Fiber and PiHole
+- Documented critical configuration rules
+
+---
+
+**Last Updated:** 2026-01-05
--- a/TRAEFIK.md
+++ b/TRAEFIK.md
@@ -0,0 +1,676 @@
+# Traefik Reverse Proxy
+
+Documentation for Traefik reverse proxy setup, SSL certificates, and deploying new public services.
+
+## Overview
+
+There are **TWO separate Traefik instances** handling different services. Understanding which one to use is critical.
+
+| Instance | Location | IP | Purpose | Managed By |
+|----------|----------|-----|---------|------------|
+| **Traefik-Primary** | CT 202 | **10.10.10.250** | General services | Manual config files |
+| **Traefik-Saltbox** | VM 101 (Docker) | **10.10.10.100** | Saltbox services only | Saltbox Ansible |
+
+---
+
+## ⚠️ CRITICAL RULE: Which Traefik to Use
+
+### When Adding ANY New Service:
+
+✅ **USE Traefik-Primary (CT 202 @ 10.10.10.250)** - For ALL new services
+❌ **DO NOT touch Traefik-Saltbox** - Unless you're modifying Saltbox itself
+
+### Why This Matters:
+
+- **Traefik-Saltbox** has complex Saltbox-managed configs (Ansible-generated)
+- Messing with it breaks Plex, Sonarr, Radarr, and all media services
+- Each Traefik has its own Let's Encrypt certificates
+- Mixing them causes certificate conflicts and routing issues
+
+---
+
+## Traefik-Primary (CT 202) - For New Services
+
+### Configuration
+
+**Location**: Container 202 on PVE (10.10.10.250)
+**Config Directory**: `/etc/traefik/`
+**Main Config**: `/etc/traefik/traefik.yaml`
+**Dynamic Configs**: `/etc/traefik/conf.d/*.yaml`
+
+### Access Traefik Config
+
+```bash
+# From Mac Mini:
+ssh pve 'pct exec 202 -- cat /etc/traefik/traefik.yaml'
+ssh pve 'pct exec 202 -- ls /etc/traefik/conf.d/'
+
+# Edit a service config:
+ssh pve 'pct exec 202 -- vi /etc/traefik/conf.d/myservice.yaml'
+
+# View logs:
+ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
+```
+
+### Services Using Traefik-Primary
+
+| Service | Domain | Backend |
+|---------|--------|---------|
+| Excalidraw | excalidraw.htsn.io | 10.10.10.206:8080 (docker-host) |
+| FindShyt | findshyt.htsn.io | 10.10.10.205 (CT 205) |
+| Gitea | git.htsn.io | 10.10.10.220:3000 |
+| Home Assistant | homeassistant.htsn.io | 10.10.10.110 |
+| LM Dev | lmdev.htsn.io | 10.10.10.111 |
+| MetaMCP | metamcp.htsn.io | 10.10.10.207:12008 (docker-host2) |
+| Pi-hole | pihole.htsn.io | 10.10.10.200 |
+| TrueNAS | truenas.htsn.io | 10.10.10.200 |
+| Proxmox | pve.htsn.io | 10.10.10.120 |
+| Copyparty | copyparty.htsn.io | 10.10.10.201 |
+| AI Trade | aitrade.htsn.io | (trading server) |
+| Pulse | pulse.htsn.io | 10.10.10.206:7655 (monitoring) |
+| Happy | happy.htsn.io | 10.10.10.206:3002 (Happy Coder relay) |
+| BlueMap | map.htsn.io | 10.10.10.207:8100 (Minecraft web map, password protected) |
+| Notes Redirect | notes.htsn.io | 10.10.10.207:8765 (HTTP→obsidian:// redirect) |
+| Todo Redirect | todo.htsn.io | 10.10.10.207:8765 (HTTP→ticktick:// redirect) |
+
+---
+
+## Traefik-Saltbox (VM 101) - DO NOT MODIFY
+
+### Configuration
+
+**Location**: `/opt/traefik/` inside Saltbox VM
+**Managed By**: Saltbox Ansible playbooks (automatic)
+**Docker Mount**: `/opt/traefik` → `/etc/traefik` in container
+
+### Services Using Traefik-Saltbox
+
+- Plex (plex.htsn.io)
+- Sonarr, Radarr, Lidarr
+- SABnzbd, NZBGet, qBittorrent
+- Overseerr, Tautulli, Organizr
+- Jackett, NZBHydra2
+- Authelia (SSO authentication)
+- All other Saltbox-managed containers
+
+### View Saltbox Traefik (Read-Only)
+
+```bash
+# View config (don't edit!)
+ssh pve 'qm guest exec 101 -- bash -c "docker exec traefik cat /etc/traefik/traefik.yml"'
+
+# View logs
+ssh saltbox 'docker logs -f traefik'
+```
+
+**⚠️ WARNING**: Editing Saltbox Traefik configs manually will be overwritten by Ansible and may break media services.
+
+---
+
+## Adding a New Public Service - Complete Workflow
+
+Follow these steps to deploy a new service and make it accessible at `servicename.htsn.io`.
+
+### Step 0: Deploy Your Service
+
+First, deploy your service on the appropriate host.
+
+#### Option A: Docker on docker-host (10.10.10.206)
+
+```bash
+ssh hutson@10.10.10.206
+sudo mkdir -p /opt/myservice
+cat > /opt/myservice/docker-compose.yml << 'EOF'
+version: "3.8"
+services:
+  myservice:
+    image: myimage:latest
+    ports:
+      - "8080:80"
+    restart: unless-stopped
+EOF
+cd /opt/myservice && sudo docker-compose up -d
+```
+
+#### Option B: New LXC Container on PVE
+
+```bash
+ssh pve 'pct create CTID local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
+  --hostname myservice --memory 2048 --cores 2 \
+  --net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1 \
+  --rootfs local-zfs:8 --unprivileged 1 --start 1'
+```
+
+#### Option C: New VM on PVE
+
+```bash
+ssh pve 'qm create VMID --name myservice --memory 2048 --cores 2 \
+  --net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-pci'
+```
+
+### Step 1: Create Traefik Config File
+
+Use this template for new services on **Traefik-Primary (CT 202)**:
+
+#### Basic Template
+
+```yaml
+# /etc/traefik/conf.d/myservice.yaml
+http:
+  routers:
+    # HTTPS router
+    myservice-secure:
+      entryPoints:
+        - websecure
+      rule: "Host(`myservice.htsn.io`)"
+      service: myservice
+      tls:
+        certResolver: cloudflare  # Use 'cloudflare' for proxied domains, 'letsencrypt' for DNS-only
+      priority: 50
+
+    # HTTP → HTTPS redirect
+    myservice-redirect:
+      entryPoints:
+        - web
+      rule: "Host(`myservice.htsn.io`)"
+      middlewares:
+        - myservice-https-redirect
+      service: myservice
+      priority: 50
+
+  services:
+    myservice:
+      loadBalancer:
+        servers:
+          - url: "http://10.10.10.XXX:PORT"
+
+  middlewares:
+    myservice-https-redirect:
+      redirectScheme:
+        scheme: https
+        permanent: true
+```
+
+#### Deploy the Config
+
+```bash
+# Create file on CT 202
+ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << '\''EOF'\''
+<paste config here>
+EOF"'
+
+# Traefik auto-reloads (watches conf.d directory)
+# Check logs:
+ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
+```
+
+### Step 2: Add Cloudflare DNS Entry
+
+#### Cloudflare Credentials
+
+| Field | Value |
+|-------|-------|
+| Email | cloudflare@htsn.io |
+| API Key | 849ebefd163d2ccdec25e49b3e1b3fe2cdadc |
+| Zone ID (htsn.io) | c0f5a80448c608af35d39aa820a5f3af |
+| Public IP | 70.237.94.174 |
+
+#### Method 1: Manual (Cloudflare Dashboard)
+
+1. Go to https://dash.cloudflare.com/
+2. Select `htsn.io` domain
+3. DNS → Add Record
+4. Type: `A`, Name: `myservice`, IPv4: `70.237.94.174`, Proxied: ☑️
+
+#### Method 2: Automated (CLI)
+
+Save this as `~/bin/add-cloudflare-dns.sh`:
+
+```bash
+#!/bin/bash
+# Add DNS record to Cloudflare for htsn.io
+
+SUBDOMAIN="$1"
+CF_EMAIL="cloudflare@htsn.io"
+CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
+ZONE_ID="c0f5a80448c608af35d39aa820a5f3af"
+PUBLIC_IP="70.237.94.174"
+
+if [ -z "$SUBDOMAIN" ]; then
+  echo "Usage: $0 <subdomain>"
+  echo "Example: $0 myservice  # Creates myservice.htsn.io"
+  exit 1
+fi
+
+curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
+  -H "X-Auth-Email: $CF_EMAIL" \
+  -H "X-Auth-Key: $CF_API_KEY" \
+  -H "Content-Type: application/json" \
+  --data "{
+    \"type\":\"A\",
+    \"name\":\"$SUBDOMAIN\",
+    \"content\":\"$PUBLIC_IP\",
+    \"ttl\":1,
+    \"proxied\":true
+  }" | jq .
+```
+
+**Usage**:
+```bash
+chmod +x ~/bin/add-cloudflare-dns.sh
+~/bin/add-cloudflare-dns.sh myservice  # Creates myservice.htsn.io
+```
+
+### Step 3: Testing
+
+```bash
+# Check if DNS resolves
+dig myservice.htsn.io
+
+# Should return: 70.237.94.174 (or Cloudflare IPs if proxied)
+
+# Test HTTP redirect
+curl -I http://myservice.htsn.io
+
+# Expected: 301 redirect to https://
+
+# Test HTTPS
+curl -I https://myservice.htsn.io
+
+# Expected: 200 OK
+
+# Check Traefik dashboard (if enabled)
+# http://10.10.10.250:8080/dashboard/
+```
+
+### Step 4: Update Documentation
+
+After deploying, update:
+
+1. **IP-ASSIGNMENTS.md** - Add to Services & Reverse Proxy Mapping table
+2. **This file (TRAEFIK.md)** - Add to "Services Using Traefik-Primary" list
+3. **CLAUDE.md** - Update quick reference if needed
+
+---
+
+## SSL Certificates
+
+Traefik has **two certificate resolvers** configured:
+
+| Resolver | Use When | Challenge Type | Notes |
+|----------|----------|----------------|-------|
+| `letsencrypt` | Cloudflare DNS-only (gray cloud ☁️) | HTTP-01 | Requires port 80 reachable |
+| `cloudflare` | Cloudflare Proxied (orange cloud 🟠) | DNS-01 | Works with Cloudflare proxy |
+
+### ⚠️ Important: HTTP Challenge vs DNS Challenge
+
+**If Cloudflare proxy is enabled** (orange cloud), HTTP challenge **FAILS** because Cloudflare redirects HTTP→HTTPS before the challenge reaches your server.
+
+**Solution**: Use `cloudflare` resolver (DNS-01 challenge) instead.
+
+### Certificate Resolver Configuration
+
+**Cloudflare API credentials** are configured in `/etc/systemd/system/traefik.service`:
+
+```ini
+Environment="CF_API_EMAIL=cloudflare@htsn.io"
+Environment="CF_API_KEY=849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
+```
+
+### Certificate Storage
+
+| Resolver | Storage File |
+|----------|--------------|
+| HTTP challenge (`letsencrypt`) | `/etc/traefik/acme.json` |
+| DNS challenge (`cloudflare`) | `/etc/traefik/acme-cf.json` |
+
+**Permissions**: Must be `600` (read/write owner only)
+
+```bash
+# Check permissions
+ssh pve 'pct exec 202 -- ls -la /etc/traefik/acme*.json'
+
+# Fix if needed
+ssh pve 'pct exec 202 -- chmod 600 /etc/traefik/acme.json'
+ssh pve 'pct exec 202 -- chmod 600 /etc/traefik/acme-cf.json'
+```
+
+### Certificate Renewal
+
+- **Automatic** via Traefik
+- Checks every 24 hours
+- Renews 30 days before expiry
+- No manual intervention needed
+
+### Troubleshooting Certificates
+
+#### Certificate Fails to Issue
+
+```bash
+# Check Traefik logs
+ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log | grep -i error'
+
+# Verify Cloudflare API access
+curl -X GET "https://api.cloudflare.com/client/v4/user/tokens/verify" \
+  -H "X-Auth-Email: cloudflare@htsn.io" \
+  -H "X-Auth-Key: 849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
+
+# Check acme.json permissions
+ssh pve 'pct exec 202 -- ls -la /etc/traefik/acme*.json'
+```
+
+#### Force Certificate Renewal
+
+```bash
+# Delete certificate (Traefik will re-request)
+ssh pve 'pct exec 202 -- rm /etc/traefik/acme-cf.json'
+ssh pve 'pct exec 202 -- touch /etc/traefik/acme-cf.json'
+ssh pve 'pct exec 202 -- chmod 600 /etc/traefik/acme-cf.json'
+ssh pve 'pct exec 202 -- systemctl restart traefik'
+
+# Watch logs
+ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
+```
+
+---
+
+## Quick Deployment - One-Liner
+
+For fast deployment, use this all-in-one command:
+
+```bash
+# === DEPLOY SERVICE (example: myservice on docker-host port 8080) ===
+
+# 1. Create Traefik config
+ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << EOF
+http:
+  routers:
+    myservice-secure:
+      entryPoints: [websecure]
+      rule: Host(\\\`myservice.htsn.io\\\`)
+      service: myservice
+      tls: {certResolver: cloudflare}
+  services:
+    myservice:
+      loadBalancer:
+        servers:
+          - url: http://10.10.10.206:8080
+EOF"'
+
+# 2. Add Cloudflare DNS
+curl -s -X POST "https://api.cloudflare.com/client/v4/zones/c0f5a80448c608af35d39aa820a5f3af/dns_records" \
+  -H "X-Auth-Email: cloudflare@htsn.io" \
+  -H "X-Auth-Key: 849ebefd163d2ccdec25e49b3e1b3fe2cdadc" \
+  -H "Content-Type: application/json" \
+  --data '{"type":"A","name":"myservice","content":"70.237.94.174","proxied":true}'
+
+# 3. Test (wait a few seconds for DNS propagation)
+curl -I https://myservice.htsn.io
+```
+
+---
+
+## Docker Service with Traefik Labels (Alternative)
+
+If deploying a service via Docker on `docker-host` (VM 206), you can use Traefik labels instead of config files.
+
+**Requirements**:
+- Traefik must have access to Docker socket
+- Service must be on same Docker network as Traefik
+
+**Example docker-compose.yml**:
+
+```yaml
+version: "3.8"
+
+services:
+  myservice:
+    image: myimage:latest
+    labels:
+      - "traefik.enable=true"
+      - "traefik.http.routers.myservice.rule=Host(`myservice.htsn.io`)"
+      - "traefik.http.routers.myservice.entrypoints=websecure"
+      - "traefik.http.routers.myservice.tls.certresolver=letsencrypt"
+      - "traefik.http.services.myservice.loadbalancer.server.port=8080"
+    networks:
+      - traefik
+
+networks:
+  traefik:
+    external: true
+```
+
+**Note**: This method is NOT currently used on Traefik-Primary (CT 202), as it doesn't have Docker socket access. Config files are preferred.
+
+---
+
+## Cloudflare API Reference
+
+### API Credentials
+
+| Field | Value |
+|-------|-------|
+| Email | cloudflare@htsn.io |
+| API Key | 849ebefd163d2ccdec25e49b3e1b3fe2cdadc |
+| Zone ID | c0f5a80448c608af35d39aa820a5f3af |
+
+### Common API Operations
+
+Set credentials:
+```bash
+CF_EMAIL="cloudflare@htsn.io"
+CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
+ZONE_ID="c0f5a80448c608af35d39aa820a5f3af"
+```
+
+**List all DNS records**:
+```bash
+curl -X GET "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
+  -H "X-Auth-Email: $CF_EMAIL" \
+  -H "X-Auth-Key: $CF_API_KEY" | jq
+```
+
+**Add A record**:
+```bash
+curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
+  -H "X-Auth-Email: $CF_EMAIL" \
+  -H "X-Auth-Key: $CF_API_KEY" \
+  -H "Content-Type: application/json" \
+  --data '{
+    "type":"A",
+    "name":"subdomain",
+    "content":"70.237.94.174",
+    "proxied":true
+  }'
+```
+
+**Delete record**:
+```bash
+curl -X DELETE "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" \
+  -H "X-Auth-Email: $CF_EMAIL" \
+  -H "X-Auth-Key: $CF_API_KEY"
+```
+
+**Update record** (toggle proxy):
+```bash
+curl -X PATCH "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" \
+  -H "X-Auth-Email: $CF_EMAIL" \
+  -H "X-Auth-Key: $CF_API_KEY" \
+  -H "Content-Type: application/json" \
+  --data '{"proxied":false}'
+```
+
+---
+
+## Troubleshooting
+
+### Service Not Accessible
+
+```bash
+# 1. Check if DNS resolves
+dig myservice.htsn.io
+
+# 2. Check if backend is reachable
+curl -I http://10.10.10.XXX:PORT
+
+# 3. Check Traefik logs
+ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
+
+# 4. Check Traefik config is valid
+ssh pve 'pct exec 202 -- cat /etc/traefik/conf.d/myservice.yaml'
+
+# 5. Restart Traefik (if needed)
+ssh pve 'pct exec 202 -- systemctl restart traefik'
+```
+
+### Certificate Issues
+
+```bash
+# Check certificate status in acme.json
+ssh pve 'pct exec 202 -- cat /etc/traefik/acme-cf.json | jq'
+
+# Check certificate expiry
+echo | openssl s_client -servername myservice.htsn.io -connect myservice.htsn.io:443 2>/dev/null | openssl x509 -noout -dates
+```
+
+### 502 Bad Gateway
+
+**Cause**: Backend service is down or unreachable
+
+```bash
+# Check if backend is running
+ssh backend-host 'systemctl status myservice'
+
+# Check if port is open
+nc -zv 10.10.10.XXX PORT
+
+# Check firewall
+ssh backend-host 'iptables -L -n | grep PORT'
+```
+
+### 404 Not Found
+
+**Cause**: Traefik can't match the request to a router
+
+```bash
+# Check router rule matches domain
+ssh pve 'pct exec 202 -- cat /etc/traefik/conf.d/myservice.yaml | grep rule'
+
+# Should be: rule: "Host(`myservice.htsn.io`)"
+
+# Check DNS is pointing to correct IP
+dig myservice.htsn.io
+
+# Restart Traefik to reload config
+ssh pve 'pct exec 202 -- systemctl restart traefik'
+```
+
+---
+
+## Advanced Configuration Examples
+
+### WebSocket Support
+
+For services that use WebSockets (like Home Assistant):
+
+```yaml
+http:
+  routers:
+    myservice-secure:
+      entryPoints:
+        - websecure
+      rule: "Host(`myservice.htsn.io`)"
+      service: myservice
+      tls:
+        certResolver: cloudflare
+
+  services:
+    myservice:
+      loadBalancer:
+        servers:
+          - url: "http://10.10.10.XXX:PORT"
+        # No special config needed - WebSockets work by default in Traefik v2+
+```
+
+### Custom Headers
+
+Add custom headers (e.g., security headers):
+
+```yaml
+http:
+  routers:
+    myservice-secure:
+      middlewares:
+        - myservice-headers
+
+  middlewares:
+    myservice-headers:
+      headers:
+        customResponseHeaders:
+          X-Frame-Options: "DENY"
+          X-Content-Type-Options: "nosniff"
+          Referrer-Policy: "strict-origin-when-cross-origin"
+```
+
+### Basic Authentication
+
+Protect a service with basic auth:
+
+```yaml
+http:
+  routers:
+    myservice-secure:
+      middlewares:
+        - myservice-auth
+
+  middlewares:
+    myservice-auth:
+      basicAuth:
+        users:
+          - "user:$apr1$..." # Generate with: htpasswd -nb user password
+```
+
+---
+
+## Maintenance
+
+### Monthly Checks
+
+```bash
+# Check Traefik status
+ssh pve 'pct exec 202 -- systemctl status traefik'
+
+# Review logs for errors
+ssh pve 'pct exec 202 -- grep -i error /var/log/traefik/traefik.log | tail -20'
+
+# Check certificate expiry dates
+ssh pve 'pct exec 202 -- cat /etc/traefik/acme-cf.json | jq ".cloudflare.Certificates[] | {domain: .domain.main, expiry: .certificate}"'
+
+# Verify all services responding
+for domain in plex.htsn.io git.htsn.io truenas.htsn.io; do
+  echo "Testing $domain..."
+  curl -sI https://$domain | head -1
+done
+```
+
+### Backup Traefik Config
+
+```bash
+# Backup all configs
+ssh pve 'pct exec 202 -- tar czf /tmp/traefik-backup-$(date +%Y%m%d).tar.gz /etc/traefik'
+
+# Copy to safe location
+scp "pve:/var/lib/lxc/202/rootfs/tmp/traefik-backup-*.tar.gz" ~/Backups/traefik/
+```
+
+---
+
+## Related Documentation
+
+- [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - Service IP addresses
+- [CLOUDFLARE.md](#) - Cloudflare DNS management (coming soon)
+- [SERVICES.md](#) - Complete service inventory (coming soon)
+
+---
+
+**Last Updated**: 2025-12-22
--- a/UPS.md
+++ b/UPS.md
@@ -0,0 +1,605 @@
+# UPS and Power Management
+
+Documentation for UPS (Uninterruptible Power Supply) configuration, NUT (Network UPS Tools) monitoring, and power failure procedures.
+
+## Hardware
+
+### Current UPS
+
+| Specification | Value |
+|---------------|-------|
+| **Model** | CyberPower OR2200PFCRT2U |
+| **Capacity** | 2200VA / 1320W |
+| **Form Factor** | 2U rackmount |
+| **Output** | PFC Sinewave (compatible with active PFC PSUs) |
+| **Outlets** | 2x NEMA 5-20R + 6x NEMA 5-15R (all battery + surge) |
+| **Input Plug** | ⚠️ Originally NEMA 5-20P (20A), **rewired to 5-15P (15A)** |
+| **Runtime** | ~15-20 min at typical load (~33% / 440W) |
+| **Installed** | 2025-12-21 |
+| **Status** | Active |
+
+### ⚠️ Temporary Wiring Modification
+
+**Issue**: UPS came with NEMA 5-20P plug (20A) but server rack is on 15A circuit
+**Solution**: Temporarily rewired plug from 5-20P → 5-15P for compatibility
+**Risk**: UPS can output 1320W but circuit limited to 1800W max (15A × 120V)
+**Current draw**: ~1000-1350W total (safe margin)
+**Backlog**: Upgrade to 20A circuit, restore original 5-20P plug
+
+### Previous UPS
+
+| Model | Capacity | Issue | Replaced |
+|-------|----------|-------|----------|
+| WattBox WB-1100-IPVMB-6 | 1100VA / 660W | Insufficient for dual Threadripper setup | 2025-12-21 |
+
+**Why replaced**: Combined server load of 1000-1350W exceeded 660W capacity.
+
+---
+
+## Power Draw Estimates
+
+### Typical Load
+
+| Component | Idle | Load | Notes |
+|-----------|------|------|-------|
+| PVE Server | 250-350W | 500-750W | CPU + TITAN RTX + P2000 + storage |
+| PVE2 Server | 200-300W | 450-600W | CPU + RTX A6000 + storage |
+| Network gear | ~50W | ~50W | Router, switches |
+| **Total** | **500-700W** | **1000-1400W** | Varies by workload |
+
+**UPS Load**: ~33-50% typical, 70-80% under heavy load
+
+### Runtime Calculation
+
+At **440W load** (33%): ~15-20 min runtime (tested 2025-12-21)
+At **660W load** (50%): ~10-12 min estimated
+At **1000W load** (75%): ~6-8 min estimated
+
+**NUT shutdown trigger**: 120 seconds (2 min) remaining runtime
+
+---
+
+## NUT (Network UPS Tools) Configuration
+
+### Architecture
+
+```
+UPS (USB) ──> PVE (NUT Server/Master) ──> PVE2 (NUT Client/Slave)
+                      │
+                      └──> Home Assistant (monitoring only)
+```
+
+**Master**: PVE (10.10.10.120) - UPS connected via USB, runs NUT server
+**Slave**: PVE2 (10.10.10.102) - Monitors PVE's NUT server, shuts down when triggered
+
+### NUT Server Configuration (PVE)
+
+#### 1. UPS Driver Config: `/etc/nut/ups.conf`
+
+```ini
+[cyberpower]
+    driver = usbhid-ups
+    port = auto
+    desc = "CyberPower OR2200PFCRT2U"
+    override.battery.charge.low = 20
+    override.battery.runtime.low = 120
+```
+
+**Key settings**:
+- `driver = usbhid-ups`: USB HID UPS driver (generic for CyberPower)
+- `port = auto`: Auto-detect USB device
+- `override.battery.runtime.low = 120`: Trigger shutdown at 120 seconds (2 min) remaining
+
+#### 2. NUT Server Config: `/etc/nut/upsd.conf`
+
+```ini
+LISTEN 127.0.0.1 3493
+LISTEN 10.10.10.120 3493
+```
+
+**Listens on**:
+- Localhost (for local monitoring)
+- LAN IP (for PVE2 to connect)
+
+#### 3. User Config: `/etc/nut/upsd.users`
+
+```ini
+[admin]
+    password = upsadmin123
+    actions = SET
+    instcmds = ALL
+
+[upsmon]
+    password = upsmon123
+    upsmon master
+```
+
+**Users**:
+- `admin`: Full control, can run commands
+- `upsmon`: Monitoring only (used by PVE2)
+
+#### 4. Monitor Config: `/etc/nut/upsmon.conf`
+
+```ini
+MONITOR cyberpower@localhost 1 upsmon upsmon123 master
+
+MINSUPPLIES 1
+SHUTDOWNCMD "/usr/local/bin/ups-shutdown.sh"
+NOTIFYCMD /usr/sbin/upssched
+POLLFREQ 5
+POLLFREQALERT 5
+HOSTSYNC 15
+DEADTIME 15
+POWERDOWNFLAG /etc/killpower
+
+NOTIFYMSG ONLINE    "UPS %s on line power"
+NOTIFYMSG ONBATT    "UPS %s on battery"
+NOTIFYMSG LOWBATT   "UPS %s battery is low"
+NOTIFYMSG FSD       "UPS %s: forced shutdown in progress"
+NOTIFYMSG COMMOK    "Communications with UPS %s established"
+NOTIFYMSG COMMBAD   "Communications with UPS %s lost"
+NOTIFYMSG SHUTDOWN  "Auto logout and shutdown proceeding"
+NOTIFYMSG REPLBATT  "UPS %s battery needs to be replaced"
+NOTIFYMSG NOCOMM    "UPS %s is unavailable"
+NOTIFYMSG NOPARENT  "upsmon parent process died - shutdown impossible"
+
+NOTIFYFLAG ONLINE   SYSLOG+WALL
+NOTIFYFLAG ONBATT   SYSLOG+WALL
+NOTIFYFLAG LOWBATT  SYSLOG+WALL
+NOTIFYFLAG FSD      SYSLOG+WALL
+NOTIFYFLAG COMMOK   SYSLOG+WALL
+NOTIFYFLAG COMMBAD  SYSLOG+WALL
+NOTIFYFLAG SHUTDOWN SYSLOG+WALL
+NOTIFYFLAG REPLBATT SYSLOG+WALL
+NOTIFYFLAG NOCOMM   SYSLOG+WALL
+NOTIFYFLAG NOPARENT SYSLOG
+```
+
+**Key settings**:
+- `MONITOR cyberpower@localhost 1 upsmon upsmon123 master`: Monitor local UPS
+- `SHUTDOWNCMD "/usr/local/bin/ups-shutdown.sh"`: Custom shutdown script
+- `POLLFREQ 5`: Check UPS every 5 seconds
+
+#### 5. USB Permissions: `/etc/udev/rules.d/99-nut-ups.rules`
+
+```udev
+SUBSYSTEM=="usb", ATTR{idVendor}=="0764", ATTR{idProduct}=="0501", MODE="0660", GROUP="nut"
+```
+
+**Purpose**: Ensure NUT can access USB UPS device
+
+**Apply rule**:
+```bash
+udevadm control --reload-rules
+udevadm trigger
+```
+
+### NUT Client Configuration (PVE2)
+
+#### Monitor Config: `/etc/nut/upsmon.conf`
+
+```ini
+MONITOR cyberpower@10.10.10.120 1 upsmon upsmon123 slave
+
+MINSUPPLIES 1
+SHUTDOWNCMD "/usr/local/bin/ups-shutdown.sh"
+POLLFREQ 5
+POLLFREQALERT 5
+HOSTSYNC 15
+DEADTIME 15
+POWERDOWNFLAG /etc/killpower
+
+# Same NOTIFYMSG and NOTIFYFLAG as PVE
+```
+
+**Key difference**: `slave` instead of `master` - monitors remote UPS on PVE
+
+---
+
+## Custom Shutdown Script
+
+### `/usr/local/bin/ups-shutdown.sh` (Same on both PVE and PVE2)
+
+```bash
+#!/bin/bash
+# Graceful VM/CT shutdown when UPS battery low
+
+LOG="/var/log/ups-shutdown.log"
+
+log() {
+    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG"
+}
+
+log "=== UPS Shutdown Triggered ==="
+log "Battery low - initiating graceful shutdown of VMs/CTs"
+
+# Get list of running VMs (skip TrueNAS for now)
+VMS=$(qm list | awk '$3=="running" && $1!=100 {print $1}')
+for VMID in $VMS; do
+    log "Stopping VM $VMID..."
+    qm shutdown $VMID
+done
+
+# Get list of running containers
+CTS=$(pct list | awk '$2=="running" {print $1}')
+for CTID in $CTS; do
+    log "Stopping CT $CTID..."
+    pct shutdown $CTID
+done
+
+# Wait for VMs/CTs to stop
+log "Waiting 60 seconds for VMs/CTs to shut down..."
+sleep 60
+
+# Now stop TrueNAS (storage - must be last)
+if qm status 100 | grep -q running; then
+    log "Stopping TrueNAS (VM 100) last..."
+    qm shutdown 100
+    sleep 30
+fi
+
+log "All VMs/CTs stopped. Host will remain running until UPS dies."
+log "=== UPS Shutdown Complete ==="
+```
+
+**Make executable**:
+```bash
+chmod +x /usr/local/bin/ups-shutdown.sh
+```
+
+**Script behavior**:
+1. Stops all VMs (except TrueNAS)
+2. Stops all containers
+3. Waits 60 seconds
+4. Stops TrueNAS last (storage must be cleanly unmounted)
+5. **Does NOT shut down Proxmox hosts** - intentionally left running
+
+**Why not shut down hosts?**
+- BIOS configured to "Restore on AC Power Loss"
+- When power returns, servers auto-boot and start VMs in order
+- Avoids need for manual intervention
+
+---
+
+## Power Failure Behavior
+
+### When Power Fails
+
+1. **UPS switches to battery** (`OB DISCHRG` status)
+2. **NUT monitors runtime** - polls every 5 seconds
+3. **At 120 seconds (2 min) remaining**:
+   - NUT triggers `/usr/local/bin/ups-shutdown.sh` on both servers
+   - Script gracefully stops all VMs/CTs
+   - TrueNAS stopped last (storage integrity)
+4. **Hosts remain running** until UPS battery depletes
+5. **UPS battery dies** → Hosts lose power (ungraceful but safe - VMs already stopped)
+
+### When Power Returns
+
+1. **UPS charges battery**, power returns to servers
+2. **BIOS "Restore on AC Power Loss"** boots both servers
+3. **Proxmox starts** and auto-starts VMs in configured order:
+
+| Order | Wait | VMs/CTs | Reason |
+|-------|------|---------|--------|
+| 1 | 30s | TrueNAS (VM 100) | Storage must start first |
+| 2 | 60s | Saltbox (VM 101) | Depends on TrueNAS NFS |
+| 3 | 10s | fs-dev, homeassistant, lmdev1, copyparty, docker-host | General VMs |
+| 4 | 5s | pihole, traefik, findshyt | Containers |
+
+PVE2 VMs: order=1, wait=10s
+
+**Total recovery time**: ~7 minutes from power restoration to fully operational (tested 2025-12-21)
+
+---
+
+## UPS Status Codes
+
+| Code | Meaning | Action |
+|------|---------|--------|
+| `OL` | Online (AC power) | Normal operation |
+| `OB` | On Battery | Power outage - monitor runtime |
+| `LB` | Low Battery | <2 min remaining - shutdown imminent |
+| `CHRG` | Charging | Battery charging after power restored |
+| `DISCHRG` | Discharging | On battery, draining |
+| `FSD` | Forced Shutdown | NUT triggered shutdown |
+
+---
+
+## Monitoring & Commands
+
+### Check UPS Status
+
+```bash
+# Full status
+ssh pve 'upsc cyberpower@localhost'
+
+# Key metrics only
+ssh pve 'upsc cyberpower@localhost | grep -E "battery.charge:|battery.runtime:|ups.load:|ups.status:"'
+
+# Example output:
+# battery.charge: 100
+# battery.runtime: 1234        (seconds remaining)
+# ups.load: 33                  (% load)
+# ups.status: OL                (online)
+```
+
+### Control UPS Beeper
+
+```bash
+# Mute beeper (temporary - until next power event)
+ssh pve 'upscmd -u admin -p upsadmin123 cyberpower@localhost beeper.mute'
+
+# Disable beeper (permanent)
+ssh pve 'upscmd -u admin -p upsadmin123 cyberpower@localhost beeper.disable'
+
+# Enable beeper
+ssh pve 'upscmd -u admin -p upsadmin123 cyberpower@localhost beeper.enable'
+```
+
+### Test Shutdown Procedure
+
+**Simulate low battery** (careful - this will shut down VMs!):
+
+```bash
+# Set a very high low battery threshold to trigger shutdown
+ssh pve 'upsrw -s battery.runtime.low=300 -u admin -p upsadmin123 cyberpower@localhost'
+
+# Watch it trigger (when runtime drops below 300 seconds)
+ssh pve 'tail -f /var/log/ups-shutdown.log'
+
+# Reset to normal
+ssh pve 'upsrw -s battery.runtime.low=120 -u admin -p upsadmin123 cyberpower@localhost'
+```
+
+**Better test**: Run shutdown script manually without actually triggering NUT:
+```bash
+ssh pve '/usr/local/bin/ups-shutdown.sh'
+```
+
+---
+
+## Home Assistant Integration
+
+UPS metrics are exposed to Home Assistant via NUT integration.
+
+### Available Sensors
+
+| Entity ID | Description |
+|-----------|-------------|
+| `sensor.cyberpower_battery_charge` | Battery % (0-100) |
+| `sensor.cyberpower_battery_runtime` | Seconds remaining on battery |
+| `sensor.cyberpower_load` | Load % (0-100) |
+| `sensor.cyberpower_input_voltage` | Input voltage (V AC) |
+| `sensor.cyberpower_output_voltage` | Output voltage (V AC) |
+| `sensor.cyberpower_status` | Status text (OL, OB, LB, etc.) |
+
+### Configuration
+
+**Home Assistant**: See [HOMEASSISTANT.md](HOMEASSISTANT.md) for integration setup.
+
+### Example Automations
+
+**Send notification when on battery**:
+```yaml
+automation:
+  - alias: "UPS On Battery Alert"
+    trigger:
+      - platform: state
+        entity_id: sensor.cyberpower_status
+        to: "OB"
+    action:
+      - service: notify.mobile_app
+        data:
+          message: "⚠️ Power outage! UPS on battery. Runtime: {{ states('sensor.cyberpower_battery_runtime') }}s"
+```
+
+**Alert when battery low**:
+```yaml
+automation:
+  - alias: "UPS Low Battery Alert"
+    trigger:
+      - platform: numeric_state
+        entity_id: sensor.cyberpower_battery_runtime
+        below: 300
+    action:
+      - service: notify.mobile_app
+        data:
+          message: "🚨 UPS battery low! {{ states('sensor.cyberpower_battery_runtime') }}s remaining"
+```
+
+---
+
+## Testing Results
+
+### Full Power Failure Test (2025-12-21)
+
+Complete end-to-end test of power failure and recovery:
+
+| Event | Time | Duration | Notes |
+|-------|------|----------|-------|
+| **Power pulled** | 22:30 | - | UPS on battery, ~15 min runtime at 33% load |
+| **Low battery trigger** | 22:40:38 | +10:38 | Runtime < 120s, shutdown script ran |
+| **All VMs stopped** | 22:41:36 | +0:58 | Graceful shutdown completed |
+| **UPS died** | 22:46:29 | +4:53 | Hosts lost power at 0% battery |
+| **Power restored** | ~22:47 | - | Plugged back in |
+| **PVE online** | 22:49:11 | +2:11 | BIOS boot, Proxmox started |
+| **PVE2 online** | 22:50:47 | +3:47 | BIOS boot, Proxmox started |
+| **All VMs running** | 22:53:39 | +6:39 | Auto-started in correct order |
+| **Total recovery** | - | **~7 min** | From power return to fully operational |
+
+**Results**:
+✅ VMs shut down gracefully
+✅ Hosts remained running until UPS died (as intended)
+✅ Auto-boot on power restoration worked
+✅ VMs started in correct order with appropriate delays
+✅ No data corruption or issues
+
+**Runtime calculation**:
+- Load: ~33% (440W estimated)
+- Total runtime on battery: ~16 minutes (22:30 → 22:46:29)
+- Matches manufacturer estimate for 33% load
+
+---
+
+## Proxmox Cluster Quorum Fix
+
+### Problem
+
+With a 2-node cluster, if one node goes down, the other loses quorum and can't manage VMs.
+
+During UPS testing, this would prevent the remaining node from starting VMs after power restoration.
+
+### Solution
+
+Modified `/etc/pve/corosync.conf` to enable 2-node mode:
+
+```
+quorum {
+    provider: corosync_votequorum
+    two_node: 1
+}
+```
+
+**Effect**:
+- Either node can operate independently if the other is down
+- No more waiting for quorum when one server is offline
+- Both nodes visible in single Proxmox interface when both up
+
+**Applied**: 2025-12-21
+
+---
+
+## Maintenance
+
+### Monthly Checks
+
+```bash
+# Check UPS status
+ssh pve 'upsc cyberpower@localhost'
+
+# Check NUT server running
+ssh pve 'systemctl status nut-server'
+ssh pve 'systemctl status nut-monitor'
+
+# Check NUT client running (PVE2)
+ssh pve2 'systemctl status nut-monitor'
+
+# Verify PVE2 can see UPS
+ssh pve2 'upsc cyberpower@10.10.10.120'
+
+# Check logs for errors
+ssh pve 'journalctl -u nut-server -n 50'
+ssh pve 'journalctl -u nut-monitor -n 50'
+```
+
+### Battery Health
+
+**Check battery stats**:
+```bash
+ssh pve 'upsc cyberpower@localhost | grep battery'
+
+# Key metrics:
+# battery.charge: 100          (should be near 100 when on AC)
+# battery.runtime: 1200+       (seconds at current load)
+# battery.voltage: ~24V        (normal for 24V battery system)
+```
+
+**Battery replacement**: When runtime significantly decreases or UPS reports `REPLBATT`:
+```bash
+ssh pve 'upsc cyberpower@localhost | grep battery.mfr.date'
+```
+
+CyberPower batteries typically last 3-5 years.
+
+### Firmware Updates
+
+Check CyberPower website for firmware updates:
+https://www.cyberpowersystems.com/support/firmware/
+
+---
+
+## Troubleshooting
+
+### UPS Not Detected
+
+```bash
+# Check USB connection
+ssh pve 'lsusb | grep Cyber'
+
+# Expected:
+# Bus 001 Device 003: ID 0764:0501 Cyber Power System, Inc. CP1500 AVR UPS
+
+# Restart NUT driver
+ssh pve 'systemctl restart nut-driver'
+ssh pve 'systemctl status nut-driver'
+```
+
+### PVE2 Can't Connect
+
+```bash
+# Verify NUT server listening
+ssh pve 'netstat -tuln | grep 3493'
+
+# Should show:
+# tcp 0 0 10.10.10.120:3493 0.0.0.0:* LISTEN
+
+# Test connection from PVE2
+ssh pve2 'telnet 10.10.10.120 3493'
+
+# Check firewall (should allow port 3493)
+ssh pve 'iptables -L -n | grep 3493'
+```
+
+### Shutdown Script Not Running
+
+```bash
+# Check script permissions
+ssh pve 'ls -la /usr/local/bin/ups-shutdown.sh'
+
+# Should be: -rwxr-xr-x (executable)
+
+# Check logs
+ssh pve 'cat /var/log/ups-shutdown.log'
+
+# Test script manually
+ssh pve '/usr/local/bin/ups-shutdown.sh'
+```
+
+### UPS Status Shows UNKNOWN
+
+```bash
+# Driver may not be compatible
+ssh pve 'upsc cyberpower@localhost ups.status'
+
+# Try different driver (in /etc/nut/ups.conf)
+# driver = usbhid-ups
+# or
+# driver = blazer_usb
+
+# Restart after change
+ssh pve 'systemctl restart nut-driver nut-server'
+```
+
+---
+
+## Future Improvements
+
+- [ ] Add email alerts for UPS events (power fail, low battery)
+- [ ] Log runtime statistics to track battery degradation
+- [ ] Set up Grafana dashboard for UPS metrics
+- [ ] Test battery runtime at different load levels
+- [ ] Upgrade to 20A circuit, restore original 5-20P plug
+- [ ] Consider adding network management card for out-of-band UPS access
+
+---
+
+## Related Documentation
+
+- [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) - Overall power optimization
+- [VMS.md](VMS.md) - VM startup order configuration
+- [HOMEASSISTANT.md](HOMEASSISTANT.md) - UPS sensor integration
+
+---
+
+**Last Updated**: 2025-12-22
--- a/VMS.md
+++ b/VMS.md
@@ -0,0 +1,580 @@
+# VMs and Containers
+
+Complete inventory of all virtual machines and LXC containers across both Proxmox servers.
+
+## Overview
+
+| Server | VMs | LXCs | Total |
+|--------|-----|------|-------|
+| **PVE** (10.10.10.120) | 6 | 3 | 9 |
+| **PVE2** (10.10.10.102) | 3 | 0 | 3 |
+| **Total** | **9** | **3** | **12** |
+
+---
+
+## PVE (10.10.10.120) - Primary Server
+
+### Virtual Machines
+
+| VMID | Name | IP | vCPUs | RAM | Storage | Purpose | GPU/Passthrough | QEMU Agent |
+|------|------|-----|-------|-----|---------|---------|-----------------|------------|
+| **100** | truenas | 10.10.10.200 | 8 | 32GB | nvme-mirror1 | NAS, central file storage | LSI SAS2308 HBA, Samsung NVMe | ✅ Yes |
+| **101** | saltbox | 10.10.10.100 | 16 | 16GB | nvme-mirror1 | Media automation (Plex, *arr) | TITAN RTX | ✅ Yes |
+| **105** | fs-dev | 10.10.10.5 | 10 | 8GB | rpool | Development environment | - | ✅ Yes |
+| **110** | homeassistant | 10.10.10.110 | 2 | 2GB | rpool | Home automation platform | - | ❌ No |
+| **111** | lmdev1 | 10.10.10.111 | 8 | 32GB | nvme-mirror1 | AI/LLM development | TITAN RTX | ✅ Yes |
+| **201** | copyparty | 10.10.10.201 | 2 | 2GB | rpool | File sharing service | - | ✅ Yes |
+| **206** | docker-host | 10.10.10.206 | 2 | 4GB | rpool | Docker services (Excalidraw, Happy, Pulse) | - | ✅ Yes |
+
+### LXC Containers
+
+| CTID | Name | IP | RAM | Storage | Purpose |
+|------|------|-----|-----|---------|---------|
+| **200** | pihole | 10.10.10.10 | - | rpool | DNS, ad blocking |
+| **202** | traefik | 10.10.10.250 | - | rpool | Reverse proxy (primary) |
+| **205** | findshyt | 10.10.10.8 | - | rpool | Custom app |
+
+---
+
+## PVE2 (10.10.10.102) - Secondary Server
+
+### Virtual Machines
+
+| VMID | Name | IP | vCPUs | RAM | Storage | Purpose | GPU/Passthrough | QEMU Agent |
+|------|------|-----|-------|-----|---------|---------|-----------------|------------|
+| **300** | gitea-vm | 10.10.10.220 | 2 | 4GB | nvme-mirror3 | Git server (Gitea) | - | ✅ Yes |
+| **301** | trading-vm | 10.10.10.221 | 16 | 32GB | nvme-mirror3 | AI trading platform | RTX A6000 | ✅ Yes |
+| **302** | docker-host2 | 10.10.10.207 | 4 | 8GB | nvme-mirror3 | Docker host (n8n, automation) | - | ✅ Yes |
+
+### LXC Containers
+
+None on PVE2.
+
+---
+
+## VM Details
+
+### 100 - TrueNAS (Storage Server)
+
+**Purpose**: Central NAS for all file storage, NFS/SMB shares, and media libraries
+
+**Specs**:
+- **OS**: TrueNAS SCALE
+- **vCPUs**: 8
+- **RAM**: 32 GB
+- **Storage**: nvme-mirror1 (OS), EMC storage enclosure (data pool via HBA passthrough)
+- **Network**:
+  - Primary: 10 Gb (vmbr2)
+  - Secondary: Internal storage network (vmbr3 @ 10.10.20.x)
+
+**Hardware Passthrough**:
+- LSI SAS2308 HBA (for EMC enclosure drives)
+- Samsung NVMe (for ZFS caching)
+
+**ZFS Pools**:
+- `vault`: Main storage pool on EMC drives
+- Boot pool on passed-through NVMe
+
+**See**: [STORAGE.md](STORAGE.md), [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md)
+
+---
+
+### 101 - Saltbox (Media Automation)
+
+**Purpose**: Media server stack - Plex, Sonarr, Radarr, SABnzbd, Overseerr, etc.
+
+**Specs**:
+- **OS**: Ubuntu 22.04
+- **vCPUs**: 16
+- **RAM**: 16 GB
+- **Storage**: nvme-mirror1
+- **Network**: 10 Gb (vmbr2)
+
+**GPU Passthrough**:
+- NVIDIA TITAN RTX (for Plex hardware transcoding)
+
+**Services**:
+- Plex Media Server (plex.htsn.io)
+- Sonarr, Radarr, Lidarr (TV/movie/music automation)
+- SABnzbd, NZBGet (downloaders)
+- Overseerr (request management)
+- Tautulli (Plex stats)
+- Organizr (dashboard)
+- Authelia (SSO authentication)
+- Traefik (reverse proxy - separate from CT 202)
+
+**Managed By**: Saltbox Ansible playbooks
+**See**: [SALTBOX.md](#) (coming soon)
+
+---
+
+### 105 - fs-dev (Development Environment)
+
+**Purpose**: General development work, testing, prototyping
+
+**Specs**:
+- **OS**: Ubuntu 22.04
+- **vCPUs**: 10
+- **RAM**: 8 GB
+- **Storage**: rpool
+- **Network**: 1 Gb (vmbr0)
+
+---
+
+### 110 - Home Assistant (Home Automation)
+
+**Purpose**: Smart home automation platform
+
+**Specs**:
+- **OS**: Home Assistant OS
+- **vCPUs**: 2
+- **RAM**: 2 GB
+- **Storage**: rpool
+- **Network**: 1 Gb (vmbr0)
+
+**Access**:
+- Web UI: https://homeassistant.htsn.io
+- API: See [HOMEASSISTANT.md](HOMEASSISTANT.md)
+
+**Special Notes**:
+- ❌ No QEMU agent (Home Assistant OS doesn't support it)
+- No SSH server by default (access via web terminal)
+
+---
+
+### 111 - lmdev1 (AI/LLM Development)
+
+**Purpose**: AI model development, fine-tuning, inference
+
+**Specs**:
+- **OS**: Ubuntu 22.04
+- **vCPUs**: 8
+- **RAM**: 32 GB
+- **Storage**: nvme-mirror1
+- **Network**: 1 Gb (vmbr0)
+
+**GPU Passthrough**:
+- NVIDIA TITAN RTX (shared with Saltbox, but can be dedicated if needed)
+
+**Installed**:
+- CUDA toolkit
+- Python 3.11+
+- PyTorch, TensorFlow
+- Hugging Face transformers
+
+---
+
+### 201 - Copyparty (File Sharing)
+
+**Purpose**: Simple HTTP file sharing server
+
+**Specs**:
+- **OS**: Ubuntu 22.04
+- **vCPUs**: 2
+- **RAM**: 2 GB
+- **Storage**: rpool
+- **Network**: 1 Gb (vmbr0)
+
+**Access**: https://copyparty.htsn.io
+
+---
+
+### 206 - docker-host (Docker Services)
+
+**Purpose**: General-purpose Docker host for miscellaneous services
+
+**Specs**:
+- **OS**: Ubuntu 22.04
+- **vCPUs**: 2
+- **RAM**: 4 GB
+- **Storage**: rpool
+- **Network**: 1 Gb (vmbr0)
+- **CPU**: `host` passthrough (for x86-64-v3 support)
+
+**Services Running**:
+- Excalidraw (excalidraw.htsn.io) - Whiteboard
+- Happy Coder relay server (happy.htsn.io) - Self-hosted relay for Happy Coder mobile app
+- Pulse (pulse.htsn.io) - Monitoring dashboard
+
+**Docker Compose Files**: `/opt/*/docker-compose.yml`
+
+---
+
+### 300 - gitea-vm (Git Server)
+
+**Purpose**: Self-hosted Git server
+
+**Specs**:
+- **OS**: Ubuntu 22.04
+- **vCPUs**: 2
+- **RAM**: 4 GB
+- **Storage**: nvme-mirror3 (PVE2)
+- **Network**: 1 Gb (vmbr0)
+
+**Access**: https://git.htsn.io
+
+**Repositories**:
+- homelab-docs (this documentation)
+- Personal projects
+- Private repos
+
+---
+
+### 301 - trading-vm (AI Trading Platform)
+
+**Purpose**: Algorithmic trading system with AI models
+
+**Specs**:
+- **OS**: Ubuntu 22.04
+- **vCPUs**: 16
+- **RAM**: 32 GB
+- **Storage**: nvme-mirror3 (PVE2)
+- **Network**: 1 Gb (vmbr0)
+
+**GPU Passthrough**:
+- NVIDIA RTX A6000 (300W TDP, 48GB VRAM)
+
+**Software**:
+- Trading algorithms
+- AI models for market prediction
+- Real-time data feeds
+- Backtesting infrastructure
+
+---
+
+## LXC Container Details
+
+### 200 - Pi-hole (DNS & Ad Blocking)
+
+**Purpose**: Network-wide DNS server and ad blocker
+
+**Type**: LXC (unprivileged)
+**OS**: Ubuntu 22.04
+**IP**: 10.10.10.10
+**Storage**: rpool
+
+**Access**:
+- Web UI: http://10.10.10.10/admin
+- Public URL: https://pihole.htsn.io
+
+**Configuration**:
+- Upstream DNS: Cloudflare (1.1.1.1)
+- DHCP: Disabled (router handles DHCP)
+- Interface: All interfaces
+
+**Usage**: Set router DNS to 10.10.10.10 for network-wide ad blocking
+
+---
+
+### 202 - Traefik (Reverse Proxy)
+
+**Purpose**: Primary reverse proxy for all public-facing services
+
+**Type**: LXC (unprivileged)
+**OS**: Ubuntu 22.04
+**IP**: 10.10.10.250
+**Storage**: rpool
+
+**Configuration**: `/etc/traefik/`
+**Dynamic Configs**: `/etc/traefik/conf.d/*.yaml`
+
+**See**: [TRAEFIK.md](TRAEFIK.md) for complete documentation
+
+**⚠️ Important**: This is the PRIMARY Traefik instance. Do NOT confuse with Saltbox's Traefik (VM 101).
+
+---
+
+### 205 - FindShyt (Custom App)
+
+**Purpose**: Custom application (details TBD)
+
+**Type**: LXC (unprivileged)
+**OS**: Ubuntu 22.04
+**IP**: 10.10.10.8
+**Storage**: rpool
+
+**Access**: https://findshyt.htsn.io
+
+---
+
+## VM Startup Order & Dependencies
+
+### Power-On Sequence
+
+When servers boot (after power failure or restart), VMs/CTs start in this order:
+
+#### PVE (10.10.10.120)
+
+| Order | Wait | VMID | Name | Reason |
+|-------|------|------|------|--------|
+| **1** | 30s | 100 | TrueNAS | ⚠️ Storage must start first - other VMs depend on NFS |
+| **2** | 60s | 101 | Saltbox | Depends on TrueNAS NFS mounts for media |
+| **3** | 10s | 105, 110, 111, 201, 206 | Other VMs | General VMs, no critical dependencies |
+| **4** | 5s | 200, 202, 205 | Containers | Lightweight, start quickly |
+
+**Configure startup order** (already set):
+```bash
+# View current config
+ssh pve 'qm config 100 | grep -E "startup|onboot"'
+
+# Set startup order (example)
+ssh pve 'qm set 100 --onboot 1 --startup order=1,up=30'
+ssh pve 'qm set 101 --onboot 1 --startup order=2,up=60'
+```
+
+#### PVE2 (10.10.10.102)
+
+| Order | Wait | VMID | Name |
+|-------|------|------|------|
+| **1** | 10s | 300, 301 | All VMs |
+
+**Less critical** - no dependencies between PVE2 VMs.
+
+---
+
+## Resource Allocation Summary
+
+### Total Allocated (PVE)
+
+| Resource | Allocated | Physical | % Used |
+|----------|-----------|----------|--------|
+| **vCPUs** | 56 | 64 (32 cores × 2 threads) | 88% |
+| **RAM** | 98 GB | 128 GB | 77% |
+
+**Note**: vCPU overcommit is acceptable (VMs rarely use all cores simultaneously)
+
+### Total Allocated (PVE2)
+
+| Resource | Allocated | Physical | % Used |
+|----------|-----------|----------|--------|
+| **vCPUs** | 18 | 64 | 28% |
+| **RAM** | 36 GB | 128 GB | 28% |
+
+**PVE2** has significant headroom for additional VMs.
+
+---
+
+## Adding a New VM
+
+### Quick Template
+
+```bash
+# Create VM
+ssh pve 'qm create VMID \
+  --name myvm \
+  --memory 4096 \
+  --cores 2 \
+  --net0 virtio,bridge=vmbr0 \
+  --scsihw virtio-scsi-pci \
+  --scsi0 nvme-mirror1:32 \
+  --boot order=scsi0 \
+  --ostype l26 \
+  --agent enabled=1'
+
+# Attach ISO for installation
+ssh pve 'qm set VMID --ide2 local:iso/ubuntu-22.04.iso,media=cdrom'
+
+# Start VM
+ssh pve 'qm start VMID'
+
+# Access console
+ssh pve 'qm vncproxy VMID' # Then connect with VNC client
+# Or via Proxmox web UI
+```
+
+### Cloud-Init Template (Faster)
+
+Use cloud-init for automated VM deployment:
+
+```bash
+# Download cloud image
+ssh pve 'wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img -O /var/lib/vz/template/iso/ubuntu-22.04-cloud.img'
+
+# Create VM
+ssh pve 'qm create VMID --name myvm --memory 4096 --cores 2 --net0 virtio,bridge=vmbr0'
+
+# Import disk
+ssh pve 'qm importdisk VMID /var/lib/vz/template/iso/ubuntu-22.04-cloud.img nvme-mirror1'
+
+# Attach disk
+ssh pve 'qm set VMID --scsi0 nvme-mirror1:vm-VMID-disk-0'
+
+# Add cloud-init drive
+ssh pve 'qm set VMID --ide2 nvme-mirror1:cloudinit'
+
+# Set boot disk
+ssh pve 'qm set VMID --boot order=scsi0'
+
+# Configure cloud-init (user, SSH key, network)
+ssh pve 'qm set VMID --ciuser hutson --sshkeys ~/.ssh/homelab.pub --ipconfig0 ip=10.10.10.XXX/24,gw=10.10.10.1'
+
+# Enable QEMU agent
+ssh pve 'qm set VMID --agent enabled=1'
+
+# Resize disk (cloud images are small by default)
+ssh pve 'qm resize VMID scsi0 +30G'
+
+# Start VM
+ssh pve 'qm start VMID'
+```
+
+**Cloud-init VMs boot ready-to-use** with SSH keys, static IP, and user configured.
+
+---
+
+## Adding a New LXC Container
+
+```bash
+# Download template (if not already downloaded)
+ssh pve 'pveam update'
+ssh pve 'pveam available | grep ubuntu'
+ssh pve 'pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst'
+
+# Create container
+ssh pve 'pct create CTID local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
+  --hostname mycontainer \
+  --memory 2048 \
+  --cores 2 \
+  --net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1 \
+  --rootfs local-zfs:8 \
+  --unprivileged 1 \
+  --features nesting=1 \
+  --start 1'
+
+# Set root password
+ssh pve 'pct exec CTID -- passwd'
+
+# Add SSH key
+ssh pve 'pct exec CTID -- mkdir -p /root/.ssh'
+ssh pve 'pct exec CTID -- bash -c "echo \"$(cat ~/.ssh/homelab.pub)\" >> /root/.ssh/authorized_keys"'
+ssh pve 'pct exec CTID -- chmod 700 /root/.ssh && chmod 600 /root/.ssh/authorized_keys'
+```
+
+---
+
+## GPU Passthrough Configuration
+
+### Current GPU Assignments
+
+| GPU | Location | Passed To | VMID | Purpose |
+|-----|----------|-----------|------|---------|
+| **NVIDIA Quadro P2000** | PVE | - | - | Proxmox host (Plex transcoding via driver) |
+| **NVIDIA TITAN RTX** | PVE | saltbox, lmdev1 | 101, 111 | Media transcoding + AI dev (shared) |
+| **NVIDIA RTX A6000** | PVE2 | trading-vm | 301 | AI trading (dedicated) |
+
+### How to Pass GPU to VM
+
+1. **Identify GPU PCI ID**:
+   ```bash
+   ssh pve 'lspci | grep -i nvidia'
+   # Example output:
+   # 81:00.0 VGA compatible controller: NVIDIA Corporation TU102 [TITAN RTX] (rev a1)
+   # 81:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
+   ```
+
+2. **Pass GPU to VM** (include both VGA and Audio):
+   ```bash
+   ssh pve 'qm set VMID -hostpci0 81:00.0,pcie=1'
+   # If multi-function device (GPU + Audio), use:
+   ssh pve 'qm set VMID -hostpci0 81:00,pcie=1'
+   ```
+
+3. **Configure VM for GPU**:
+   ```bash
+   # Set machine type to q35
+   ssh pve 'qm set VMID --machine q35'
+
+   # Set BIOS to OVMF (UEFI)
+   ssh pve 'qm set VMID --bios ovmf'
+
+   # Add EFI disk
+   ssh pve 'qm set VMID --efidisk0 nvme-mirror1:1,format=raw,efitype=4m,pre-enrolled-keys=1'
+   ```
+
+4. **Reboot VM** and install NVIDIA drivers inside the VM
+
+**See**: [GPU-PASSTHROUGH.md](#) (coming soon) for detailed guide
+
+---
+
+## Backup Priority
+
+See [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) for complete backup plan.
+
+### Critical VMs (Must Backup)
+
+| Priority | VMID | Name | Reason |
+|----------|------|------|--------|
+| 🔴 **CRITICAL** | 100 | truenas | All storage lives here - catastrophic if lost |
+| 🟡 **HIGH** | 101 | saltbox | Complex media stack config |
+| 🟡 **HIGH** | 110 | homeassistant | Home automation config |
+| 🟡 **HIGH** | 300 | gitea-vm | Git repositories (code, docs) |
+| 🟡 **HIGH** | 301 | trading-vm | Trading algorithms and AI models |
+
+### Medium Priority
+
+| VMID | Name | Notes |
+|------|------|-------|
+| 200 | pihole | Easy to rebuild, but DNS config valuable |
+| 202 | traefik | Config files backed up separately |
+
+### Low Priority (Ephemeral/Rebuildable)
+
+| VMID | Name | Notes |
+|------|------|-------|
+| 105 | fs-dev | Development - code is in Git |
+| 111 | lmdev1 | Ephemeral development |
+| 201 | copyparty | Simple app, easy to redeploy |
+| 206 | docker-host | Docker Compose files backed up separately |
+
+---
+
+## Quick Reference Commands
+
+```bash
+# List all VMs
+ssh pve 'qm list'
+ssh pve2 'qm list'
+
+# List all containers
+ssh pve 'pct list'
+
+# Start/stop VM
+ssh pve 'qm start VMID'
+ssh pve 'qm stop VMID'
+ssh pve 'qm shutdown VMID'  # Graceful
+
+# Start/stop container
+ssh pve 'pct start CTID'
+ssh pve 'pct stop CTID'
+ssh pve 'pct shutdown CTID'  # Graceful
+
+# VM console
+ssh pve 'qm terminal VMID'
+
+# Container console
+ssh pve 'pct enter CTID'
+
+# Clone VM
+ssh pve 'qm clone VMID NEW_VMID --name newvm'
+
+# Delete VM
+ssh pve 'qm destroy VMID'
+
+# Delete container
+ssh pve 'pct destroy CTID'
+```
+
+---
+
+## Related Documentation
+
+- [STORAGE.md](STORAGE.md) - Storage pool assignments
+- [SSH-ACCESS.md](SSH-ACCESS.md) - How to access VMs
+- [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) - VM backup strategy
+- [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) - VM resource optimization
+- [NETWORK.md](NETWORK.md) - Which bridge to use for new VMs
+
+---
+
+**Last Updated**: 2025-12-22
--- a/client_secret_693027753314-hdjfnvfnarlcnehba6u8plbehv78rfh9.apps.googleusercontent.com.json
+++ b/client_secret_693027753314-hdjfnvfnarlcnehba6u8plbehv78rfh9.apps.googleusercontent.com.json
@@ -0,0 +1 @@
+{"web":{"client_id":"693027753314-hdjfnvfnarlcnehba6u8plbehv78rfh9.apps.googleusercontent.com","project_id":"spheric-method-482514-f8","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_secret":"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","redirect_uris":["https://my.home-assistant.io/redirect/oauth"]}}
--- a/data/scripts/internet-watchdog.sh
+++ b/data/scripts/internet-watchdog.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+# Internet Watchdog - Reboots if internet is unreachable for 5 minutes
+LOG_FILE="/var/log/internet-watchdog.log"
+FAIL_COUNT=0
+MAX_FAILS=5
+CHECK_INTERVAL=60
+
+log() {
+    echo "$(date "+%Y-%m-%d %H:%M:%S") - $1" >> "$LOG_FILE"
+}
+
+check_internet() {
+    for endpoint in 1.1.1.1 8.8.8.8 208.67.222.222; do
+        if ping -c 1 -W 5 "$endpoint" > /dev/null 2>&1; then
+            return 0
+        fi
+    done
+    return 1
+}
+
+log "Watchdog started"
+
+while true; do
+    if check_internet; then
+        if [ $FAIL_COUNT -gt 0 ]; then
+            log "Internet restored after $FAIL_COUNT failures"
+        fi
+        FAIL_COUNT=0
+    else
+        FAIL_COUNT=$((FAIL_COUNT + 1))
+        log "Internet check failed ($FAIL_COUNT/$MAX_FAILS)"
+
+        if [ $FAIL_COUNT -ge $MAX_FAILS ]; then
+            log "CRITICAL: $MAX_FAILS consecutive failures - REBOOTING"
+            sync
+            sleep 2
+            reboot
+        fi
+    fi
+    sleep $CHECK_INTERVAL
+done
--- a/data/scripts/memory-monitor.sh
+++ b/data/scripts/memory-monitor.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+LOG_DIR="/data/logs"
+LOG_FILE="$LOG_DIR/memory-history.log"
+mkdir -p "$LOG_DIR"
+
+while true; do
+    # Rotate if over 10MB
+    if [ -f "$LOG_FILE" ]; then
+        SIZE=$(wc -c < "$LOG_FILE" 2>/dev/null || echo 0)
+        if [ "$SIZE" -gt 10485760 ]; then
+            mv "$LOG_FILE" "$LOG_FILE.old"
+        fi
+    fi
+
+    echo "========== $(date +%Y-%m-%d\ %H:%M:%S) ==========" >> "$LOG_FILE"
+    echo "--- MEMORY ---" >> "$LOG_FILE"
+    free -m >> "$LOG_FILE"
+    echo "--- TOP MEMORY PROCESSES ---" >> "$LOG_FILE"
+    ps -eo pid,rss,comm --sort=-rss | head -12 >> "$LOG_FILE"
+    echo "" >> "$LOG_FILE"
+
+    sleep 600
+done
--- a/scripts/check-crafty-permissions.sh
+++ b/scripts/check-crafty-permissions.sh
@@ -0,0 +1,114 @@
+#!/bin/bash
+# Crafty Permission Checker Script
+# Checks for permission issues that could break plugin functionality
+
+echo "Crafty Permission Check - $(date)"
+echo "================================"
+
+# Base directory
+CRAFTY_DIR="/home/hutson/crafty/data/servers"
+
+# Check if running on docker-host2
+if [ "$(hostname)" != "docker-host2" ]; then
+    echo "⚠️  This script should be run on docker-host2"
+    echo "   Use: ssh docker-host2 '~/check-crafty-permissions.sh'"
+    exit 1
+fi
+
+# Function to check permissions
+check_permissions() {
+    local issues_found=0
+
+    # Check for files not owned by root group
+    echo -e "\n📁 Checking file ownership..."
+    wrong_group=$(find "$CRAFTY_DIR" -type f ! -group root 2>/dev/null)
+    if [ ! -z "$wrong_group" ]; then
+        echo "❌ Files with incorrect group (should be 'root'):"
+        echo "$wrong_group" | head -10
+        issues_found=$((issues_found + 1))
+    else
+        echo "✅ All files have correct group ownership (root)"
+    fi
+
+    # Check for directories not owned by root group
+    echo -e "\n📁 Checking directory ownership..."
+    wrong_dir_group=$(find "$CRAFTY_DIR" -type d ! -group root 2>/dev/null)
+    if [ ! -z "$wrong_dir_group" ]; then
+        echo "❌ Directories with incorrect group (should be 'root'):"
+        echo "$wrong_dir_group" | head -10
+        issues_found=$((issues_found + 1))
+    else
+        echo "✅ All directories have correct group ownership (root)"
+    fi
+
+    # Check for directories without setgid bit
+    echo -e "\n🔒 Checking setgid bit on directories..."
+    no_setgid=$(find "$CRAFTY_DIR" -type d ! -perm -g+s 2>/dev/null)
+    if [ ! -z "$no_setgid" ]; then
+        echo "⚠️  Directories without setgid bit (may cause future issues):"
+        echo "$no_setgid" | head -10
+        issues_found=$((issues_found + 1))
+    else
+        echo "✅ All directories have setgid bit set"
+    fi
+
+    # Check for files that crafty user can't read (excluding temp files)
+    echo -e "\n📖 Checking read permissions..."
+    unreadable=$(find "$CRAFTY_DIR" -type f ! -perm -g+r ! -name "*.tmp" 2>/dev/null)
+    if [ ! -z "$unreadable" ]; then
+        echo "❌ Files that crafty user can't read:"
+        echo "$unreadable" | head -10
+        issues_found=$((issues_found + 1))
+    else
+        echo "✅ All files are readable by crafty user"
+    fi
+
+    return $issues_found
+}
+
+# Function to fix permissions
+fix_permissions() {
+    echo -e "\n🔧 Fixing permissions..."
+
+    # Fix ownership
+    sudo chown -R hutson:root "$CRAFTY_DIR"
+
+    # Fix directory permissions (2775 = rwxrwsr-x)
+    sudo find "$CRAFTY_DIR" -type d -exec chmod 2775 {} \;
+
+    # Fix file permissions (664 = rw-rw-r--)
+    sudo find "$CRAFTY_DIR" -type f -exec chmod 664 {} \;
+
+    echo "✅ Permissions fixed!"
+}
+
+# Main execution
+echo "Checking Crafty server permissions..."
+check_permissions
+result=$?
+
+if [ $result -gt 0 ]; then
+    echo -e "\n⚠️  Found $result permission issue(s)!"
+    echo -n "Would you like to fix them automatically? (y/n): "
+    read -r response
+    if [[ "$response" =~ ^[Yy]$ ]]; then
+        fix_permissions
+        echo -e "\n🔄 Re-checking permissions..."
+        check_permissions
+        if [ $? -eq 0 ]; then
+            echo -e "\n✅ All permission issues resolved!"
+        else
+            echo -e "\n❌ Some issues remain. You may need to restart the Crafty container."
+        fi
+    else
+        echo -e "\nTo fix manually, run:"
+        echo "sudo chown -R hutson:root $CRAFTY_DIR"
+        echo "sudo find $CRAFTY_DIR -type d -exec chmod 2775 {} \;"
+        echo "sudo find $CRAFTY_DIR -type f -exec chmod 664 {} \;"
+    fi
+else
+    echo -e "\n✅ No permission issues found!"
+fi
+
+echo -e "\n================================"
+echo "Check complete - $(date)"
--- a/scripts/minecraft-backup-all.sh
+++ b/scripts/minecraft-backup-all.sh
@@ -0,0 +1,77 @@
+#!/bin/bash
+# Minecraft Servers Backup Script (All Servers)
+# Backs up both Hutworld and Backrooms servers to TrueNAS
+
+BACKUP_DEST="hutson@10.10.10.200:/mnt/vault/users/backups/minecraft"
+DATE=$(date +%Y-%m-%d_%H%M)
+
+echo "[$(date)] Starting Minecraft servers backup..."
+
+# Backup Hutworld server
+HUTWORLD_SRC="$HOME/crafty/data/servers/19f604a9-f037-442d-9283-0761c73cfd60"
+HUTWORLD_BACKUP="/tmp/hutworld-$DATE.tar.gz"
+
+echo "[$(date)] Backing up Hutworld server..."
+tar -czf "$HUTWORLD_BACKUP" \
+    --exclude="*.jar" \
+    --exclude="cache" \
+    --exclude="libraries" \
+    --exclude=".paper-remapped" \
+    -C "$HOME/crafty/data/servers" \
+    19f604a9-f037-442d-9283-0761c73cfd60
+
+echo "[$(date)] Hutworld backup created: $(ls -lh $HUTWORLD_BACKUP | awk '{print $5}')"
+
+# Transfer Hutworld backup to TrueNAS
+sshpass -p 'GrilledCh33s3#' scp -o StrictHostKeyChecking=no "$HUTWORLD_BACKUP" "$BACKUP_DEST/"
+
+if [ $? -eq 0 ]; then
+    echo "[$(date)] Hutworld backup transferred successfully"
+    rm "$HUTWORLD_BACKUP"
+else
+    echo "[$(date)] ERROR: Failed to transfer Hutworld backup"
+fi
+
+# Backup Backrooms server
+BACKROOMS_SRC="$HOME/crafty/data/servers/64079d6c-acb0-48c4-9b21-23e0fa354522"
+BACKROOMS_BACKUP="/tmp/backrooms-$DATE.tar.gz"
+
+echo "[$(date)] Backing up Backrooms server..."
+tar -czf "$BACKROOMS_BACKUP" \
+    --exclude="*.jar" \
+    --exclude="cache" \
+    --exclude="libraries" \
+    --exclude=".paper-remapped" \
+    -C "$HOME/crafty/data/servers" \
+    64079d6c-acb0-48c4-9b21-23e0fa354522
+
+echo "[$(date)] Backrooms backup created: $(ls -lh $BACKROOMS_BACKUP | awk '{print $5}')"
+
+# Transfer Backrooms backup to TrueNAS
+sshpass -p 'GrilledCh33s3#' scp -o StrictHostKeyChecking=no "$BACKROOMS_BACKUP" "$BACKUP_DEST/"
+
+if [ $? -eq 0 ]; then
+    echo "[$(date)] Backrooms backup transferred successfully"
+    rm "$BACKROOMS_BACKUP"
+else
+    echo "[$(date)] ERROR: Failed to transfer Backrooms backup"
+fi
+
+# Clean up old backups (keep last 30 of each server)
+echo "[$(date)] Cleaning up old backups..."
+sshpass -p 'GrilledCh33s3#' ssh -o StrictHostKeyChecking=no hutson@10.10.10.200 '
+    cd /mnt/vault/users/backups/minecraft
+
+    # Keep only last 30 Hutworld backups
+    ls -t hutworld-*.tar.gz 2>/dev/null | tail -n +31 | xargs -r rm -f
+
+    # Keep only last 30 Backrooms backups
+    ls -t backrooms-*.tar.gz 2>/dev/null | tail -n +31 | xargs -r rm -f
+
+    echo "Current backups:"
+    echo "Hutworld: $(ls -1 hutworld-*.tar.gz 2>/dev/null | wc -l) backups"
+    echo "Backrooms: $(ls -1 backrooms-*.tar.gz 2>/dev/null | wc -l) backups"
+    echo "Total size: $(du -sh . | cut -f1)"
+'
+
+echo "[$(date)] All backups complete!"
Author	SHA1	Message	Date
Hutson	38a7a2c52e	Auto-sync: 20260123-015626	2026-01-23 01:56:27 -05:00
Hutson	52d8f2f133	Add central configuration reference section Reference ~/.secrets, ~/.hosts, and ~/.ssh/config for centralized credentials and host management. Includes homelab-specific variables for Syncthing, Home Assistant, n8n, and Cloudflare. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 15:13:16 -05:00
Hutson	80b6ab43d3	Auto-sync: 20260120-145048	2026-01-20 14:50:49 -05:00
Hutson	6932ee1ca9	Auto-sync: 20260116-161159	2026-01-16 16:12:19 -05:00
Hutson	42cfdd8552	Auto-sync: 20260116-155016	2026-01-16 15:50:17 -05:00
Hutson	d54447949e	Add Oura Ring integration and automations documentation - Document HACS and Oura Ring v2 integration setup - Add OAuth credentials for Oura developer portal - Document 9 Oura automations: - Sleep/wake detection (HR-based thermostat control) - Health alerts (low readiness, SpO2, fever detection) - Sleep comfort (temperature-based thermostat adjustment) - Activity reminders (sedentary alert) - Add Nest thermostat to integrations list - Mark completed TODOs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-16 15:25:21 -05:00
Hutson	4535969566	Auto-sync: 20260116-152013	2026-01-16 15:20:14 -05:00
Hutson	8c1cbf3dac	Auto-sync: 20260116-150510	2026-01-16 15:05:12 -05:00
Hutson	d38de8bfb1	Auto-sync: 20260115-110247	2026-01-15 11:02:48 -05:00
Hutson	db7ac68312	Auto-sync: 20260114-183121	2026-01-14 18:31:23 -05:00
Hutson	bd3ed4e4ef	Auto-sync: 20260114-002941	2026-01-14 00:29:42 -05:00
Hutson	e7c8d7f86f	Auto-sync: 20260113-134342	2026-01-13 13:43:43 -05:00
Hutson	1dcb7ff9e5	Auto-sync: 20260113-093539	2026-01-13 09:35:40 -05:00
Hutson	f234fe96cb	Auto-sync: 20260113-015009	2026-01-13 01:50:10 -05:00
Hutson	1abd618b52	Auto-sync: 20260113-013507	2026-01-13 01:35:08 -05:00
Hutson	35fba5a6ae	Auto-sync: 20260113-012006	2026-01-13 01:20:07 -05:00
Hutson	eb698f0c38	Auto-sync: 20260111-164757	2026-01-11 16:47:58 -05:00
Hutson	d66ed5c55a	Auto-sync: 20260111-161755	2026-01-11 16:17:56 -05:00
Hutson	5ac698db0d	Auto-sync: 20260107-000953	2026-01-07 00:09:54 -05:00
Hutson	7eacc846e6	Auto-sync: 20260105-213809	2026-01-05 21:38:10 -05:00
Hutson	b832cc9e57	Auto-sync: 20260105-212307	2026-01-05 21:23:08 -05:00
Hutson	54a71124ae	Auto-sync: 20260105-172251	2026-01-05 17:22:52 -05:00
Hutson	eddd98c57f	Auto-sync: 20260105-122831	2026-01-05 12:28:33 -05:00
Hutson	56b82df497	Complete Phase 2 documentation: Add HARDWARE, SERVICES, MONITORING, MAINTENANCE Phase 2 documentation implementation: - Created HARDWARE.md: Complete hardware inventory (servers, GPUs, storage, network cards) - Created SERVICES.md: Service inventory with URLs, credentials, health checks (25+ services) - Created MONITORING.md: Health monitoring recommendations, alert setup, implementation plan - Created MAINTENANCE.md: Regular procedures, update schedules, testing checklists - Updated README.md: Added all Phase 2 documentation links - Updated CLAUDE.md: Cleaned up to quick reference only (1340→377 lines) All detailed content now in specialized documentation files with cross-references. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-23 00:34:21 -05:00
Hutson	23e9df68c9	Update Happy Coder docs with complete setup flow and troubleshooting - Expand Mobile Access Setup with full authentication steps (HAPPY_SERVER_URL, happy auth login, happy connect claude, local claude login) - Fix launchd path: ~/Library/LaunchAgents/ not /Library/LaunchDaemons/ - Add Common Issues troubleshooting table with fixes for: - Invalid API key (Claude not logged in locally) - Failed to start daemon (stale lock files) - Sessions not showing (missing HAPPY_SERVER_URL) - Slow responses (Cloudflare proxy enabled) - Update DNS note: Cloudflare proxy disabled for WebSocket performance - Add .zshrc to Files & Configuration table 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 13:28:30 -05:00
				`@@ -0,0 +1 @@`
				`{"web":{"client_id":"693027753314-hdjfnvfnarlcnehba6u8plbehv78rfh9.apps.googleusercontent.com","project_id":"spheric-method-482514-f8","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_secret":"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","redirect_uris":["https://my.home-assistant.io/redirect/oauth"]}}`