# Homelab Infrastructure - Quick Reference **Start here**: [README.md](README.md) - Documentation index and overview This is your **quick reference guide** for common homelab tasks. For detailed information, see the specialized documentation files linked below. --- ## Quick Reference - Common Tasks | Task | Documentation | Quick Command | |------|--------------|---------------| | **Gateway issues** | [GATEWAY.md](GATEWAY.md) | `ssh ucg-fiber 'free -m'` | | **Tailscale/VPN issues** | [TAILSCALE.md](TAILSCALE.md) | `tailscale status` | | **Add new public service** | [TRAEFIK.md](TRAEFIK.md) | Create Traefik config + Cloudflare DNS | | **Check UPS status** | [UPS.md](UPS.md) | `ssh pve 'upsc cyberpower@localhost'` | | **Check server temps** | [Temperature Check](#server-temperature-check) | `ssh pve 'grep Tctl ...'` | | **Syncthing issues** | [SYNCTHING.md](SYNCTHING.md) | Check API connections | | **VM/CT management** | [VMS.md](VMS.md) | `ssh pve 'qm list'` | | **Storage issues** | [STORAGE.md](STORAGE.md) | `ssh pve 'zpool status'` | | **SSH access** | [SSH-ACCESS.md](SSH-ACCESS.md) | Use host aliases in `~/.ssh/config` | | **Power optimization** | [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) | CPU governors, GPU states | | **Backup strategy** | [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) | ⚠️ CRITICAL GAPS | **Key Credentials:** - SSH Password: `GrilledCh33s3#` - Cloudflare: `cloudflare@htsn.io` / `849ebefd163d2ccdec25e49b3e1b3fe2cdadc` - See individual docs for service-specific credentials --- ## Role You are the **Homelab Assistant** - a Claude Code session dedicated to managing and maintaining Hutson's home infrastructure. **Responsibilities:** - Infrastructure Management (Proxmox, VMs, containers) - File Sync (Syncthing across all devices) - Network Administration - Power Optimization - Documentation (keep all docs current) - Automation (shell aliases, scripts, scheduled tasks) **Full access via**: SSH keys, APIs, QEMU guest agent --- ## Proactive Behaviors When the user mentions issues or asks questions: - **"sync not working"** → Check Syncthing on ALL devices, identify which is offline - **"device offline"** → Ping local + Tailscale IPs, check if service running - **"slow"** → Check CPU usage, processes, Syncthing rescan activity - **"check status"** → Run full health check across all systems - **"something's wrong"** → Run diagnostics on likely culprits --- ## Quick Health Checks ```bash # === FULL HEALTH CHECK === # Syncthing connections (Mac Mini) curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" \ "http://127.0.0.1:8384/rest/system/connections" | \ python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; \ [print(f\"{v.get('name',k[:7])}: {'UP' if v['connected'] else 'DOWN'}\") for k,v in d.items()]" # Proxmox VMs ssh pve 'qm list' 2>/dev/null || echo "PVE: unreachable" ssh pve2 'qm list' 2>/dev/null || echo "PVE2: unreachable" # Critical devices ping -c 1 -W 1 10.10.10.200 >/dev/null && echo "TrueNAS: UP" || echo "TrueNAS: DOWN" ping -c 1 -W 1 10.10.10.1 >/dev/null && echo "Router: UP" || echo "Router: DOWN" # Windows PC Syncthing nc -zw1 10.10.10.150 22000 && echo "Windows: UP" || echo "Windows: DOWN" ``` --- ## Troubleshooting Runbooks | Symptom | Check | Fix | Docs | |---------|-------|-----|------| | **Network down** | `ssh ucg-fiber 'free -m'` | Check memory, watchdog reboots auto | [GATEWAY.md](GATEWAY.md) | | **Tailscale DNS not working** | `tailscale status` | Check PVE online, subnet routing | [TAILSCALE.md](TAILSCALE.md) | | **Subnet unreachable** | `ping 10.10.10.10` | Check `--accept-routes` on local devices | [TAILSCALE.md](TAILSCALE.md) | | **Relay-only connections** | `tailscale ping ` | Check for VPN conflicts, restart tailscaled | [TAILSCALE.md](TAILSCALE.md) | | Device not syncing | `curl Syncthing API` | Restart Syncthing | [SYNCTHING.md](SYNCTHING.md) | | VM won't start | Storage/RAM available? | `ssh pve 'qm start VMID'` | [VMS.md](VMS.md) | | Server running hot | Check KSM, CPU processes | Disable KSM | [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) | | Storage enclosure loud | Check fan speed via SES | Switch LCC | [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) | | UPS on battery | Check runtime | Monitor shutdown script | [UPS.md](UPS.md) | | Service unreachable | Check Traefik config | Fix routing | [TRAEFIK.md](TRAEFIK.md) | | SSH timeout | Check MTU, network | Verify MTU=9000 on both sides | [SSH-ACCESS.md](SSH-ACCESS.md) | --- ## Server Temperature Check ```bash # Check temps on both servers (Threadripper PRO max safe: 90°C Tctl) ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do \ label=$(cat ${f%_input}_label 2>/dev/null); \ if [ "$label" = "Tctl" ]; then echo "PVE Tctl: $(($(cat $f)/1000))°C"; fi; done' ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do \ label=$(cat ${f%_input}_label 2>/dev/null); \ if [ "$label" = "Tctl" ]; then echo "PVE2 Tctl: $(($(cat $f)/1000))°C"; fi; done' ``` **Healthy**: 70-80°C under load | **Warning**: >85°C | **Throttle**: 90°C --- ## Service Dependencies ``` TrueNAS (10.10.10.200) ├── Central Syncthing hub - if down, sync breaks ├── NFS/SMB shares for VMs └── Media storage for Plex PiHole (CT 200) └── DNS for entire network Traefik (CT 202) └── Reverse proxy - external access Router (10.10.10.1) └── Gateway for all traffic ``` --- ## API Quick Reference | Service | Device | Endpoint | Auth | |---------|--------|----------|------| | Syncthing | Mac Mini | `http://127.0.0.1:8384/rest/` | `X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5` | | Syncthing | MacBook | `http://127.0.0.1:8384/rest/` | `X-API-Key: qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ` | | Syncthing | Phone | `https://10.10.10.54:8384/rest/` | `X-API-Key: Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM` | | Proxmox | PVE/PVE2 | `https://10.10.10.120:8006/api2/json/` | SSH key auth | | MetaMCP | docker-host2 | `https://metamcp.htsn.io/` | Web UI login | | n8n | docker-host2 | `http://10.10.10.207:5678/api/v1/` | `X-N8N-API-KEY` (see [N8N.md](N8N.md)) | **See**: [SYNCTHING.md](SYNCTHING.md), [HOMEASSISTANT.md](HOMEASSISTANT.md), [N8N.md](N8N.md) for more APIs --- ## Emergency Commands ```bash # Restart VM ssh pve 'qm stop VMID && qm start VMID' # Check CPU usage ssh pve 'ps aux --sort=-%cpu | head -10' # Check ZFS pool (via QEMU agent) ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"' # Force Syncthing rescan curl -X POST "http://127.0.0.1:8384/rest/db/scan?folder=FOLDER" \ -H "X-API-Key: API_KEY" # Restart Syncthing on Windows sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 \ 'Stop-Process -Name syncthing -Force; Start-ScheduledTask -TaskName "Syncthing"' ``` --- ## Infrastructure Overview ### Servers | Server | CPU | RAM | Role | Details | |--------|-----|-----|------|---------| | **PVE** (10.10.10.120) | Threadripper PRO 3975WX (32C) | 128GB | Primary | [VMS.md](VMS.md) | | **PVE2** (10.10.10.102) | Threadripper PRO 3975WX (32C) | 128GB | Secondary | [VMS.md](VMS.md) | **Power**: ~1000-1350W under load | **UPS**: CyberPower 2200VA/1320W | **See**: [UPS.md](UPS.md), [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) ### Critical VMs | VMID | Name | IP | Purpose | Docs | |------|------|-----|---------|------| | 100 | truenas | 10.10.10.200 | NAS/storage | [STORAGE.md](STORAGE.md) | | 101 | saltbox | 10.10.10.100 | Media stack (Plex) | [VMS.md](VMS.md) | | 110 | homeassistant | 10.10.10.110 | Home automation | [HOMEASSISTANT.md](HOMEASSISTANT.md) | | 202 | traefik (CT) | 10.10.10.250 | Reverse proxy | [TRAEFIK.md](TRAEFIK.md) | | 206 | docker-host | 10.10.10.206 | Monitoring stack (Grafana/Prometheus) | [VMS.md](VMS.md) | | 302 | docker-host2 | 10.10.10.207 | MetaMCP, n8n, automation | [VMS.md](VMS.md) | **Complete inventory**: [VMS.md](VMS.md) | **IP assignments**: [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) --- ## Common Maintenance Tasks 1. **Check Syncthing sync** - Folders behind? Errors? 2. **Verify devices connected** - Run connection check 3. **Check disk space** - `ssh pve 'df -h'` 4. **Review ZFS health** - `ssh pve 'zpool status'` 5. **Check for stuck processes** - High CPU? Memory pressure? 6. **Verify backups** - Critical folders syncing? → See [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) --- ## Network Quick Reference **Ranges**: 10.10.10.0/24 (LAN), 10.10.20.0/24 (storage) **Jumbo Frames**: MTU 9000 enabled **Tailscale**: VPN with subnet routing (HA failover) **See**: [NETWORK.md](NETWORK.md) for complete details --- ## Common Commands ```bash # VM management ssh pve 'qm list' # List VMs ssh pve 'qm start VMID' # Start VM ssh pve 'qm shutdown VMID' # Graceful shutdown # Container management ssh pve 'pct list' # List containers ssh pve 'pct enter CTID' # Enter container shell # Storage ssh pve 'zpool status' # Check ZFS pools ssh truenas 'zpool status vault' # Check TrueNAS pool # QEMU guest agent ssh pve 'qm guest exec VMID -- bash -c "COMMAND"' ``` **See**: [SSH-ACCESS.md](SSH-ACCESS.md), [VMS.md](VMS.md) --- ## Documentation Index ### Infrastructure - [README.md](README.md) - Start here - [GATEWAY.md](GATEWAY.md) - UniFi gateway, monitoring services - [TAILSCALE.md](TAILSCALE.md) - VPN, subnet routing, DNS - [VMS.md](VMS.md) - VM/CT inventory - [STORAGE.md](STORAGE.md) - ZFS pools, shares - [NETWORK.md](NETWORK.md) - Bridges, VLANs, MTU - [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) - Optimizations - [UPS.md](UPS.md) - UPS config, NUT monitoring ### Services - [TRAEFIK.md](TRAEFIK.md) - Reverse proxy, SSL - [HOMEASSISTANT.md](HOMEASSISTANT.md) - Home automation - [SYNCTHING.md](SYNCTHING.md) - File sync - [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) - Storage enclosure - [MONITORING.md](MONITORING.md) - System monitoring ### Operations - [SSH-ACCESS.md](SSH-ACCESS.md) - SSH keys, hosts - [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - IP addresses - [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) - ⚠️ Backups (CRITICAL) - [SHELL-ALIASES.md](SHELL-ALIASES.md) - ZSH aliases --- ## Agent & Tool Guidelines ### Background Agents **Always** spin up background agents for multiple independent tasks: - Parallel execution improves efficiency - Use for: tests, builds, searches simultaneously ### MCP Tools | Tool | Provider | Use Case | |------|----------|----------| | `mcp__Ref__ref_search_documentation` | ref.tools | Search documentation | | `mcp__Ref__ref_read_url` | ref.tools | Read doc URLs | | `mcp__exa__web_search_exa` | Exa | General web search | | `mcp__exa__get_code_context_exa` | Exa | Code-specific search | --- ## Git Repository - **Gitea**: https://git.htsn.io/hutson/homelab-docs - **Local**: `~/Projects/homelab` - **Notes**: `~/Notes/05_Homelab` (symlink) ```bash cd ~/Projects/homelab git add -A && git commit -m "Update docs" && git push ``` --- ## Backlog | Priority | Task | Notes | |----------|------|-------| | Medium | Re-IP all devices | Current IPs inconsistent | | Medium | Upgrade to 20A circuit for UPS | Plug rewired 5-20P→5-15P | | Low | Install SSH on HomeAssistant | Currently QEMU agent only | --- ## Recent Changes ### 2026-01-14 - **Guitar Room Humidity Automation** setup complete - Homebridge installed on Mac Mini with `homebridge-plugin-govee` for BLE sensor access - Govee H5074 temperature/humidity sensor bridged to Home Assistant - VeSync integration added for Levoit LV600S humidifier control - Automations created: turn ON below 45%, turn OFF above 47% - Target: maintain 45-47% humidity for Lowden guitar storage - **New Home Assistant integrations:** - VeSync (vesync@htsn.io) - humidifier control - HomeKit Controller - Homebridge bridge - **Homebridge service:** `~/Library/LaunchAgents/com.homebridge.server.plist` - **New HA entities:** `sensor.goveeh5074_5059_humidity`, `humidifier.lv600s` ### 2026-01-11 - **BlueMap web map** for Minecraft Hutworld server - URL: https://map.htsn.io (password protected: hutworld / Suwanna123) - BlueMap 5.15 plugin installed - Port 8100 exposed in Crafty docker-compose - Traefik routing with basicAuth middleware - Fixed corrupted ViaVersion/ViaBackwards plugins - Documented 1.21+ spawner give command syntax - Fixed Docker file permission issues in Crafty container ### 2026-01-05 - Created [TAILSCALE.md](TAILSCALE.md) - comprehensive Tailscale VPN documentation - **Fixed Tailscale subnet routing issues:** - Switched primary subnet router from UCG-Fiber to PVE (gateway had relay-only connections) - Disabled `--accept-routes` on UCG-Fiber and PiHole (devices on subnet must not accept subnet routes) - Fixed PiHole ProtonVPN from full-tunnel to split-tunnel (DNS-only via fwmark routing) - **Root cause:** Devices directly on 10.10.10.0/24 with `--accept-routes=true` were routing local traffic through Tailscale mesh instead of local interface - **Key lesson:** Any device directly connected to an advertised subnet MUST have `--accept-routes=false` ### 2026-01-03 - Deployed **Crafty Controller 4** on docker-host2 for Minecraft server management - URL: https://mc.htsn.io (Web GUI) - Minecraft Java: 10.10.10.207:25565 - Minecraft Bedrock (Geyser): 10.10.10.207:19132/udp - Admin: `admin` / password in `/crafty/app/config/default-creds.txt` - World data to be migrated from Windows PC (D:\Minecraft\mcss\servers\hutworld) - Deployed **MetaMCP** on docker-host2 (10.10.10.207) for unified MCP server management - URL: https://metamcp.htsn.io - Added docker-host2 to SSH config (`~/.ssh/config`) - Updated IP-ASSIGNMENTS.md, SSH-ACCESS.md, TRAEFIK.md with docker-host2 ### 2026-01-02 - Created [GATEWAY.md](GATEWAY.md) - UniFi gateway documentation - Deployed internet-watchdog service (auto-reboot on connectivity loss) - Deployed memory-monitor service (logs memory usage every 10 min) - Configured SSH key auth for gateway (`ucg-fiber`/`gateway` aliases) - Disabled UniFi Connect to free ~200MB RAM - Updated [MONITORING.md](MONITORING.md) with gateway monitoring - Updated [SSH-ACCESS.md](SSH-ACCESS.md) with key auth for router ### 2025-12-22 - Created comprehensive Phase 1 documentation split - New docs: README.md, BACKUP-STRATEGY.md, STORAGE.md, UPS.md, TRAEFIK.md, SSH-ACCESS.md, POWER-MANAGEMENT.md, VMS.md - Cleaned up CLAUDE.md to quick reference only ### 2025-12-21 - UPS upgrade: CyberPower OR2200PFCRT2U (1320W) - NUT monitoring configured (master/slave) - Full power failure test successful (~7 min recovery) - Happy Server self-hosted relay deployed - PVE Tailscale routing fix - Proxmox 2-node cluster quorum fix **Full changelog**: See end of this file --- **Last Updated**: 2026-01-14 **Documentation Status**: ✅ Phase 1 Complete + Gateway Monitoring + MetaMCP + Tailscale + Humidity Automation --- ## Central Configuration Reference All homelab credentials and hosts are centralized in these files (synced via Syncthing): | File | Purpose | Usage | |------|---------|-------| | `~/.secrets` | API keys, tokens, credentials | `source ~/.secrets` then use `$VAR_NAME` | | `~/.hosts` | IPs, hostnames, service URLs | `source ~/.hosts` then use `$IP_*` or `$HOST_*` | | `~/.ssh/config` | SSH aliases for all homelab hosts | `ssh pve`, `ssh truenas`, `ssh docker-host`, etc. | **Key variables for homelab:** - `$SYNCTHING_API_KEY_*` - Syncthing API keys per device - `$HA_TOKEN` - Home Assistant long-lived access token - `$N8N_API_KEY` - n8n API key - `$CF_API_KEY` - Cloudflare API key for Traefik DNS - All SSH passwords: `$HUTSON_PC_PASS`, `$TRUENAS_PASS`, etc. **When adding new credentials or hosts:** 1. Add to the central files (`~/.secrets` or `~/.hosts`) 2. Files sync via Syncthing to all machines 3. Update this CLAUDE.md if infrastructure changes ---
Full Changelog (Click to expand) ### 2025-12-21 **UPS Upgrade** - Replaced WattBox WB-1100-IPVMB-6 (660W) with CyberPower OR2200PFCRT2U (1320W) - Temporarily rewired plug 5-20P → 5-15P for 15A circuit - Runtime: ~15-20 min at 33% load **NUT Monitoring** - Configured NUT on PVE (master), PVE2 (slave) - Shutdown threshold: 120 seconds runtime - Custom shutdown script: `/usr/local/bin/ups-shutdown.sh` - Home Assistant integration (UPS sensors) **Happy Server Self-Hosted Relay** - Deployed on docker-host (10.10.10.206) - Stack: Happy Server + PostgreSQL + Redis + MinIO - URL: https://happy.htsn.io - Traefik reverse proxy configured **Proxmox Fixes** - PVE Tailscale routing: Added rule for local network access - PVE2 MTU fix: vmbr0 + nic1 both set to 9000 - 2-node cluster quorum: `two_node: 1` in corosync.conf **Power Failure Test** - Full end-to-end test successful - VMs stopped gracefully at 2 min runtime - Total recovery: ~7 minutes ### 2024-12-20 **Git & SSH** - Created homelab-docs repo on Gitea - Deployed SSH keys to all VMs/LXCs (13 hosts) - Updated ~/.ssh/config with host aliases ### 2024-12-19 **EMC Storage Enclosure** - LCC B failure diagnosed, switched to LCC A - Fans now quiet (speed code 3 vs 5) - Created EMC-ENCLOSURE.md documentation **QEMU Guest Agent** - Installed on docker-host, fs-dev, copyparty - All VMs now have agent except homeassistant