Files
homelab-docs/CLAUDE.md
Hutson 52d8f2f133 Add central configuration reference section
Reference ~/.secrets, ~/.hosts, and ~/.ssh/config for centralized
credentials and host management. Includes homelab-specific variables
for Syncthing, Home Assistant, n8n, and Cloudflare.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 15:13:16 -05:00

17 KiB

Homelab Infrastructure - Quick Reference

Start here: README.md - Documentation index and overview

This is your quick reference guide for common homelab tasks. For detailed information, see the specialized documentation files linked below.


Quick Reference - Common Tasks

Task Documentation Quick Command
Gateway issues GATEWAY.md ssh ucg-fiber 'free -m'
Tailscale/VPN issues TAILSCALE.md tailscale status
Add new public service TRAEFIK.md Create Traefik config + Cloudflare DNS
Check UPS status UPS.md ssh pve 'upsc cyberpower@localhost'
Check server temps Temperature Check ssh pve 'grep Tctl ...'
Syncthing issues SYNCTHING.md Check API connections
VM/CT management VMS.md ssh pve 'qm list'
Storage issues STORAGE.md ssh pve 'zpool status'
SSH access SSH-ACCESS.md Use host aliases in ~/.ssh/config
Power optimization POWER-MANAGEMENT.md CPU governors, GPU states
Backup strategy BACKUP-STRATEGY.md ⚠️ CRITICAL GAPS

Key Credentials:

  • SSH Password: GrilledCh33s3#
  • Cloudflare: cloudflare@htsn.io / 849ebefd163d2ccdec25e49b3e1b3fe2cdadc
  • See individual docs for service-specific credentials

Role

You are the Homelab Assistant - a Claude Code session dedicated to managing and maintaining Hutson's home infrastructure.

Responsibilities:

  • Infrastructure Management (Proxmox, VMs, containers)
  • File Sync (Syncthing across all devices)
  • Network Administration
  • Power Optimization
  • Documentation (keep all docs current)
  • Automation (shell aliases, scripts, scheduled tasks)

Full access via: SSH keys, APIs, QEMU guest agent


Proactive Behaviors

When the user mentions issues or asks questions:

  • "sync not working" → Check Syncthing on ALL devices, identify which is offline
  • "device offline" → Ping local + Tailscale IPs, check if service running
  • "slow" → Check CPU usage, processes, Syncthing rescan activity
  • "check status" → Run full health check across all systems
  • "something's wrong" → Run diagnostics on likely culprits

Quick Health Checks

# === FULL HEALTH CHECK ===

# Syncthing connections (Mac Mini)
curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" \
  "http://127.0.0.1:8384/rest/system/connections" | \
  python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; \
  [print(f\"{v.get('name',k[:7])}: {'UP' if v['connected'] else 'DOWN'}\") for k,v in d.items()]"

# Proxmox VMs
ssh pve 'qm list' 2>/dev/null || echo "PVE: unreachable"
ssh pve2 'qm list' 2>/dev/null || echo "PVE2: unreachable"

# Critical devices
ping -c 1 -W 1 10.10.10.200 >/dev/null && echo "TrueNAS: UP" || echo "TrueNAS: DOWN"
ping -c 1 -W 1 10.10.10.1 >/dev/null && echo "Router: UP" || echo "Router: DOWN"

# Windows PC Syncthing
nc -zw1 10.10.10.150 22000 && echo "Windows: UP" || echo "Windows: DOWN"

Troubleshooting Runbooks

Symptom Check Fix Docs
Network down ssh ucg-fiber 'free -m' Check memory, watchdog reboots auto GATEWAY.md
Tailscale DNS not working tailscale status Check PVE online, subnet routing TAILSCALE.md
Subnet unreachable ping 10.10.10.10 Check --accept-routes on local devices TAILSCALE.md
Relay-only connections tailscale ping <ip> Check for VPN conflicts, restart tailscaled TAILSCALE.md
Device not syncing curl Syncthing API Restart Syncthing SYNCTHING.md
VM won't start Storage/RAM available? ssh pve 'qm start VMID' VMS.md
Server running hot Check KSM, CPU processes Disable KSM POWER-MANAGEMENT.md
Storage enclosure loud Check fan speed via SES Switch LCC EMC-ENCLOSURE.md
UPS on battery Check runtime Monitor shutdown script UPS.md
Service unreachable Check Traefik config Fix routing TRAEFIK.md
SSH timeout Check MTU, network Verify MTU=9000 on both sides SSH-ACCESS.md

Server Temperature Check

# Check temps on both servers (Threadripper PRO max safe: 90°C Tctl)
ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do \
  label=$(cat ${f%_input}_label 2>/dev/null); \
  if [ "$label" = "Tctl" ]; then echo "PVE Tctl: $(($(cat $f)/1000))°C"; fi; done'

ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do \
  label=$(cat ${f%_input}_label 2>/dev/null); \
  if [ "$label" = "Tctl" ]; then echo "PVE2 Tctl: $(($(cat $f)/1000))°C"; fi; done'

Healthy: 70-80°C under load | Warning: >85°C | Throttle: 90°C


Service Dependencies

TrueNAS (10.10.10.200)
├── Central Syncthing hub - if down, sync breaks
├── NFS/SMB shares for VMs
└── Media storage for Plex

PiHole (CT 200)
└── DNS for entire network

Traefik (CT 202)
└── Reverse proxy - external access

Router (10.10.10.1)
└── Gateway for all traffic

API Quick Reference

Service Device Endpoint Auth
Syncthing Mac Mini http://127.0.0.1:8384/rest/ X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5
Syncthing MacBook http://127.0.0.1:8384/rest/ X-API-Key: qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ
Syncthing Phone https://10.10.10.54:8384/rest/ X-API-Key: Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM
Proxmox PVE/PVE2 https://10.10.10.120:8006/api2/json/ SSH key auth
MetaMCP docker-host2 https://metamcp.htsn.io/ Web UI login
n8n docker-host2 http://10.10.10.207:5678/api/v1/ X-N8N-API-KEY (see N8N.md)

See: SYNCTHING.md, HOMEASSISTANT.md, N8N.md for more APIs


Emergency Commands

# Restart VM
ssh pve 'qm stop VMID && qm start VMID'

# Check CPU usage
ssh pve 'ps aux --sort=-%cpu | head -10'

# Check ZFS pool (via QEMU agent)
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'

# Force Syncthing rescan
curl -X POST "http://127.0.0.1:8384/rest/db/scan?folder=FOLDER" \
  -H "X-API-Key: API_KEY"

# Restart Syncthing on Windows
sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 \
  'Stop-Process -Name syncthing -Force; Start-ScheduledTask -TaskName "Syncthing"'

Infrastructure Overview

Servers

Server CPU RAM Role Details
PVE (10.10.10.120) Threadripper PRO 3975WX (32C) 128GB Primary VMS.md
PVE2 (10.10.10.102) Threadripper PRO 3975WX (32C) 128GB Secondary VMS.md

Power: ~1000-1350W under load | UPS: CyberPower 2200VA/1320W | See: UPS.md, POWER-MANAGEMENT.md

Critical VMs

VMID Name IP Purpose Docs
100 truenas 10.10.10.200 NAS/storage STORAGE.md
101 saltbox 10.10.10.100 Media stack (Plex) VMS.md
110 homeassistant 10.10.10.110 Home automation HOMEASSISTANT.md
202 traefik (CT) 10.10.10.250 Reverse proxy TRAEFIK.md
206 docker-host 10.10.10.206 Monitoring stack (Grafana/Prometheus) VMS.md
302 docker-host2 10.10.10.207 MetaMCP, n8n, automation VMS.md

Complete inventory: VMS.md | IP assignments: IP-ASSIGNMENTS.md


Common Maintenance Tasks

  1. Check Syncthing sync - Folders behind? Errors?
  2. Verify devices connected - Run connection check
  3. Check disk space - ssh pve 'df -h'
  4. Review ZFS health - ssh pve 'zpool status'
  5. Check for stuck processes - High CPU? Memory pressure?
  6. Verify backups - Critical folders syncing? → See BACKUP-STRATEGY.md

Network Quick Reference

Ranges: 10.10.10.0/24 (LAN), 10.10.20.0/24 (storage) Jumbo Frames: MTU 9000 enabled Tailscale: VPN with subnet routing (HA failover)

See: NETWORK.md for complete details


Common Commands

# VM management
ssh pve 'qm list'                    # List VMs
ssh pve 'qm start VMID'              # Start VM
ssh pve 'qm shutdown VMID'           # Graceful shutdown

# Container management
ssh pve 'pct list'                   # List containers
ssh pve 'pct enter CTID'             # Enter container shell

# Storage
ssh pve 'zpool status'               # Check ZFS pools
ssh truenas 'zpool status vault'     # Check TrueNAS pool

# QEMU guest agent
ssh pve 'qm guest exec VMID -- bash -c "COMMAND"'

See: SSH-ACCESS.md, VMS.md


Documentation Index

Infrastructure

Services

Operations


Agent & Tool Guidelines

Background Agents

Always spin up background agents for multiple independent tasks:

  • Parallel execution improves efficiency
  • Use for: tests, builds, searches simultaneously

MCP Tools

Tool Provider Use Case
mcp__Ref__ref_search_documentation ref.tools Search documentation
mcp__Ref__ref_read_url ref.tools Read doc URLs
mcp__exa__web_search_exa Exa General web search
mcp__exa__get_code_context_exa Exa Code-specific search

Git Repository

cd ~/Projects/homelab
git add -A && git commit -m "Update docs" && git push

Backlog

Priority Task Notes
Medium Re-IP all devices Current IPs inconsistent
Medium Upgrade to 20A circuit for UPS Plug rewired 5-20P→5-15P
Low Install SSH on HomeAssistant Currently QEMU agent only

Recent Changes

2026-01-14

  • Guitar Room Humidity Automation setup complete
    • Homebridge installed on Mac Mini with homebridge-plugin-govee for BLE sensor access
    • Govee H5074 temperature/humidity sensor bridged to Home Assistant
    • VeSync integration added for Levoit LV600S humidifier control
    • Automations created: turn ON below 45%, turn OFF above 47%
    • Target: maintain 45-47% humidity for Lowden guitar storage
  • New Home Assistant integrations:
    • VeSync (vesync@htsn.io) - humidifier control
    • HomeKit Controller - Homebridge bridge
  • Homebridge service: ~/Library/LaunchAgents/com.homebridge.server.plist
  • New HA entities: sensor.goveeh5074_5059_humidity, humidifier.lv600s

2026-01-11

  • BlueMap web map for Minecraft Hutworld server
    • URL: https://map.htsn.io (password protected: hutworld / Suwanna123)
    • BlueMap 5.15 plugin installed
    • Port 8100 exposed in Crafty docker-compose
    • Traefik routing with basicAuth middleware
  • Fixed corrupted ViaVersion/ViaBackwards plugins
  • Documented 1.21+ spawner give command syntax
  • Fixed Docker file permission issues in Crafty container

2026-01-05

  • Created TAILSCALE.md - comprehensive Tailscale VPN documentation
  • Fixed Tailscale subnet routing issues:
    • Switched primary subnet router from UCG-Fiber to PVE (gateway had relay-only connections)
    • Disabled --accept-routes on UCG-Fiber and PiHole (devices on subnet must not accept subnet routes)
    • Fixed PiHole ProtonVPN from full-tunnel to split-tunnel (DNS-only via fwmark routing)
  • Root cause: Devices directly on 10.10.10.0/24 with --accept-routes=true were routing local traffic through Tailscale mesh instead of local interface
  • Key lesson: Any device directly connected to an advertised subnet MUST have --accept-routes=false

2026-01-03

  • Deployed Crafty Controller 4 on docker-host2 for Minecraft server management
  • URL: https://mc.htsn.io (Web GUI)
  • Minecraft Java: 10.10.10.207:25565
  • Minecraft Bedrock (Geyser): 10.10.10.207:19132/udp
  • Admin: admin / password in /crafty/app/config/default-creds.txt
  • World data to be migrated from Windows PC (D:\Minecraft\mcss\servers\hutworld)
  • Deployed MetaMCP on docker-host2 (10.10.10.207) for unified MCP server management
  • URL: https://metamcp.htsn.io
  • Added docker-host2 to SSH config (~/.ssh/config)
  • Updated IP-ASSIGNMENTS.md, SSH-ACCESS.md, TRAEFIK.md with docker-host2

2026-01-02

  • Created GATEWAY.md - UniFi gateway documentation
  • Deployed internet-watchdog service (auto-reboot on connectivity loss)
  • Deployed memory-monitor service (logs memory usage every 10 min)
  • Configured SSH key auth for gateway (ucg-fiber/gateway aliases)
  • Disabled UniFi Connect to free ~200MB RAM
  • Updated MONITORING.md with gateway monitoring
  • Updated SSH-ACCESS.md with key auth for router

2025-12-22

  • Created comprehensive Phase 1 documentation split
  • New docs: README.md, BACKUP-STRATEGY.md, STORAGE.md, UPS.md, TRAEFIK.md, SSH-ACCESS.md, POWER-MANAGEMENT.md, VMS.md
  • Cleaned up CLAUDE.md to quick reference only

2025-12-21

  • UPS upgrade: CyberPower OR2200PFCRT2U (1320W)
  • NUT monitoring configured (master/slave)
  • Full power failure test successful (~7 min recovery)
  • Happy Server self-hosted relay deployed
  • PVE Tailscale routing fix
  • Proxmox 2-node cluster quorum fix

Full changelog: See end of this file


Last Updated: 2026-01-14 Documentation Status: Phase 1 Complete + Gateway Monitoring + MetaMCP + Tailscale + Humidity Automation


Central Configuration Reference

All homelab credentials and hosts are centralized in these files (synced via Syncthing):

File Purpose Usage
~/.secrets API keys, tokens, credentials source ~/.secrets then use $VAR_NAME
~/.hosts IPs, hostnames, service URLs source ~/.hosts then use $IP_* or $HOST_*
~/.ssh/config SSH aliases for all homelab hosts ssh pve, ssh truenas, ssh docker-host, etc.

Key variables for homelab:

  • $SYNCTHING_API_KEY_* - Syncthing API keys per device
  • $HA_TOKEN - Home Assistant long-lived access token
  • $N8N_API_KEY - n8n API key
  • $CF_API_KEY - Cloudflare API key for Traefik DNS
  • All SSH passwords: $HUTSON_PC_PASS, $TRUENAS_PASS, etc.

When adding new credentials or hosts:

  1. Add to the central files (~/.secrets or ~/.hosts)
  2. Files sync via Syncthing to all machines
  3. Update this CLAUDE.md if infrastructure changes

Full Changelog (Click to expand)

2025-12-21

UPS Upgrade

  • Replaced WattBox WB-1100-IPVMB-6 (660W) with CyberPower OR2200PFCRT2U (1320W)
  • Temporarily rewired plug 5-20P → 5-15P for 15A circuit
  • Runtime: ~15-20 min at 33% load

NUT Monitoring

  • Configured NUT on PVE (master), PVE2 (slave)
  • Shutdown threshold: 120 seconds runtime
  • Custom shutdown script: /usr/local/bin/ups-shutdown.sh
  • Home Assistant integration (UPS sensors)

Happy Server Self-Hosted Relay

  • Deployed on docker-host (10.10.10.206)
  • Stack: Happy Server + PostgreSQL + Redis + MinIO
  • URL: https://happy.htsn.io
  • Traefik reverse proxy configured

Proxmox Fixes

  • PVE Tailscale routing: Added rule for local network access
  • PVE2 MTU fix: vmbr0 + nic1 both set to 9000
  • 2-node cluster quorum: two_node: 1 in corosync.conf

Power Failure Test

  • Full end-to-end test successful
  • VMs stopped gracefully at 2 min runtime
  • Total recovery: ~7 minutes

2024-12-20

Git & SSH

  • Created homelab-docs repo on Gitea
  • Deployed SSH keys to all VMs/LXCs (13 hosts)
  • Updated ~/.ssh/config with host aliases

2024-12-19

EMC Storage Enclosure

  • LCC B failure diagnosed, switched to LCC A
  • Fans now quiet (speed code 3 vs 5)
  • Created EMC-ENCLOSURE.md documentation

QEMU Guest Agent

  • Installed on docker-host, fs-dev, copyparty
  • All VMs now have agent except homeassistant