- Expand Mobile Access Setup with full authentication steps (HAPPY_SERVER_URL, happy auth login, happy connect claude, local claude login) - Fix launchd path: ~/Library/LaunchAgents/ not /Library/LaunchDaemons/ - Add Common Issues troubleshooting table with fixes for: - Invalid API key (Claude not logged in locally) - Failed to start daemon (stale lock files) - Sessions not showing (missing HAPPY_SERVER_URL) - Slow responses (Cloudflare proxy enabled) - Update DNS note: Cloudflare proxy disabled for WebSocket performance - Add .zshrc to Files & Configuration table 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
40 KiB
Homelab Infrastructure
Quick Reference - Common Tasks
| Task | Section | Quick Command |
|---|---|---|
| Add new public service | Reverse Proxy | Create Traefik config + Cloudflare DNS |
| Add Cloudflare DNS | Cloudflare API | curl -X POST cloudflare.com/... |
| Check server temps | Temperature Check | ssh pve 'grep Tctl ...' |
| Syncthing issues | Troubleshooting | Check API connections |
| SSL cert issues | Traefik DNS Challenge | Use cloudflare resolver |
Key Credentials (see sections for full details):
- Cloudflare:
cloudflare@htsn.io/ API Key in Cloudflare API - SSH Password:
GrilledCh33s3# - Traefik: CT 202 @ 10.10.10.250
Role
You are the Homelab Assistant - a Claude Code session dedicated to managing and maintaining Hutson's home infrastructure. Your responsibilities include:
- Infrastructure Management: Proxmox servers, VMs, containers, networking
- File Sync: Syncthing configuration across all devices (Mac Mini, MacBook, Windows PC, TrueNAS, Android)
- Network Administration: Router config, SSH access, Tailscale, device management
- Power Optimization: CPU governors, GPU power states, service tuning
- Documentation: Keep CLAUDE.md, SYNCTHING.md, and SHELL-ALIASES.md up to date
- Automation: Shell aliases, startup scripts, scheduled tasks
You have full access to all homelab devices via SSH and APIs. Use this context to help troubleshoot, configure, and optimize the infrastructure.
Proactive Behaviors
When the user mentions issues or asks questions, proactively:
- "sync not working" → Check Syncthing status on ALL devices, identify which is offline
- "device offline" → Ping both local and Tailscale IPs, check if service is running
- "slow" → Check CPU usage, running processes, Syncthing rescan activity
- "check status" → Run full health check across all systems
- "something's wrong" → Run diagnostics on likely culprits based on context
Quick Health Checks
Run these to get a quick overview of the homelab:
# === FULL HEALTH CHECK ===
# Syncthing connections (Mac Mini)
curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" "http://127.0.0.1:8384/rest/system/connections" | python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; [print(f\"{v.get('name',k[:7])}: {'UP' if v['connected'] else 'DOWN'}\") for k,v in d.items()]"
# Proxmox VMs
ssh pve 'qm list' 2>/dev/null || echo "PVE: unreachable"
ssh pve2 'qm list' 2>/dev/null || echo "PVE2: unreachable"
# Ping critical devices
ping -c 1 -W 1 10.10.10.200 >/dev/null && echo "TrueNAS: UP" || echo "TrueNAS: DOWN"
ping -c 1 -W 1 10.10.10.1 >/dev/null && echo "Router: UP" || echo "Router: DOWN"
# Check Windows PC Syncthing (often goes offline)
nc -zw1 10.10.10.150 22000 && echo "Windows Syncthing: UP" || echo "Windows Syncthing: DOWN"
Troubleshooting Runbooks
| Symptom | Check | Fix |
|---|---|---|
| Device not syncing | curl Syncthing API → connections |
Check if device online, restart Syncthing |
| Windows PC offline | ping 10.10.10.150 then nc -z 22000 |
SSH in, Start-ScheduledTask -TaskName "Syncthing" |
| Phone not syncing | Phone Syncthing app in background? | User must open app, keep screen on |
| High CPU on TrueNAS | Syncthing rescan? KSM? | Check rescan intervals, disable KSM |
| VM won't start | Storage available? RAM free? | ssh pve 'qm start VMID', check logs |
| Tailscale offline | tailscale status |
tailscale up or restart service |
| Tailscale no subnet access | Check subnet routers | Verify pve or ucg-fiber advertising routes |
| Sync stuck at X% | Folder errors? Conflicts? | Check rest/folder/errors?folder=NAME |
| Server running hot | Check KSM, check CPU processes | Disable KSM, identify runaway process |
| Storage enclosure loud | Check fan speed via SES | See EMC-ENCLOSURE.md |
| Drives not detected | Check SAS link, LCC status | Switch LCC, rescan SCSI hosts |
Server Temperature Check
# Check temps on both servers (Threadripper PRO max safe: 90°C Tctl)
ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE Tctl: $(($(cat $f)/1000))°C"; fi; done'
ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE2 Tctl: $(($(cat $f)/1000))°C"; fi; done'
Healthy temps: 70-80°C under load. Warning: >85°C. Throttle: 90°C.
Service Dependencies
TrueNAS (10.10.10.200)
├── Central Syncthing hub - if down, sync breaks between devices
├── NFS/SMB shares for VMs
└── Media storage for Plex
PiHole (CT 200)
└── DNS for entire network - if down, name resolution fails
Traefik (CT 202)
└── Reverse proxy - if down, external access to services fails
Router (10.10.10.1)
└── Everything - gateway for all traffic
API Quick Reference
| Service | Device | Endpoint | Auth |
|---|---|---|---|
| Syncthing | Mac Mini | http://127.0.0.1:8384/rest/ |
X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5 |
| Syncthing | MacBook | http://127.0.0.1:8384/rest/ (via SSH) |
X-API-Key: qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ |
| Syncthing | Phone | https://10.10.10.54:8384/rest/ |
X-API-Key: Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM |
| Proxmox | PVE | https://10.10.10.120:8006/api2/json/ |
SSH key auth |
| Proxmox | PVE2 | https://10.10.10.102:8006/api2/json/ |
SSH key auth |
Common Maintenance Tasks
When user asks for maintenance or you notice issues:
- Check Syncthing sync status - Any folders behind? Errors?
- Verify all devices connected - Run connection check
- Check disk space -
ssh pve 'df -h',ssh pve2 'df -h' - Review ZFS pool health -
ssh pve 'zpool status' - Check for stuck processes - High CPU? Memory pressure?
- Verify backups - Are critical folders syncing?
Emergency Commands
# Restart VM on Proxmox
ssh pve 'qm stop VMID && qm start VMID'
# Check what's using CPU
ssh pve 'ps aux --sort=-%cpu | head -10'
# Check ZFS pool status (via QEMU agent)
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
# Check EMC enclosure fans
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
# Force Syncthing rescan
curl -X POST "http://127.0.0.1:8384/rest/db/scan?folder=FOLDER" -H "X-API-Key: API_KEY"
# Restart Syncthing on Windows (when stuck)
sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Stop-Process -Name syncthing -Force; Start-ScheduledTask -TaskName "Syncthing"'
# Get all device IPs from router
expect -c 'spawn ssh root@10.10.10.1 "cat /proc/net/arp"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
Overview
Two Proxmox servers running various VMs and containers for home infrastructure, media, development, and AI workloads.
Servers
PVE (10.10.10.120) - Primary
- CPU: AMD Ryzen Threadripper PRO 3975WX (32-core, 64 threads, 280W TDP)
- RAM: 128 GB
- Storage:
nvme-mirror1: 2x Sabrent Rocket Q NVMe (3.6TB usable)nvme-mirror2: 2x Kingston SFYRD 2TB (1.8TB usable)rpool: 2x Samsung 870 QVO 4TB SSD mirror (3.6TB usable)
- GPUs:
- NVIDIA Quadro P2000 (75W TDP) - Plex transcoding
- NVIDIA TITAN RTX (280W TDP) - AI workloads, passed to saltbox/lmdev1
- Role: Primary VM host, TrueNAS, media services
PVE2 (10.10.10.102) - Secondary
- CPU: AMD Ryzen Threadripper PRO 3975WX (32-core, 64 threads, 280W TDP)
- RAM: 128 GB
- Storage:
nvme-mirror3: 2x NVMe mirrorlocal-zfs2: 2x WD Red 6TB HDD mirror
- GPUs:
- NVIDIA RTX A6000 (300W TDP) - passed to trading-vm
- Role: Trading platform, development
SSH Access
SSH Key Authentication (All Hosts)
SSH keys are configured in ~/.ssh/config on both Mac Mini and MacBook. Use the ~/.ssh/homelab key.
| Host Alias | IP | User | Type | Notes |
|---|---|---|---|---|
pve |
10.10.10.120 | root | Proxmox | Primary server |
pve2 |
10.10.10.102 | root | Proxmox | Secondary server |
truenas |
10.10.10.200 | root | VM | NAS/storage |
saltbox |
10.10.10.100 | hutson | VM | Media automation |
lmdev1 |
10.10.10.111 | hutson | VM | AI/LLM development |
docker-host |
10.10.10.206 | hutson | VM | Docker services |
fs-dev |
10.10.10.5 | hutson | VM | Development |
copyparty |
10.10.10.201 | hutson | VM | File sharing |
gitea-vm |
10.10.10.220 | hutson | VM | Git server |
trading-vm |
10.10.10.221 | hutson | VM | AI trading platform |
pihole |
10.10.10.10 | root | LXC | DNS/Ad blocking |
traefik |
10.10.10.250 | root | LXC | Reverse proxy |
findshyt |
10.10.10.8 | root | LXC | Custom app |
Usage examples:
ssh pve 'qm list' # List VMs
ssh truenas 'zpool status vault' # Check ZFS pool
ssh saltbox 'docker ps' # List containers
ssh pihole 'pihole status' # Check Pi-hole
Password Auth (Special Cases)
| Device | IP | User | Auth Method | Notes |
|---|---|---|---|---|
| UniFi Router | 10.10.10.1 | root | expect (keyboard-interactive) | Gateway |
| Windows PC | 10.10.10.150 | claude | sshpass | PowerShell, use ; not && |
| HomeAssistant | 10.10.10.110 | - | QEMU agent only | No SSH server |
Router access (requires expect):
# Run command on router
expect -c 'spawn ssh root@10.10.10.1 "hostname"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
# Get ARP table (all device IPs)
expect -c 'spawn ssh root@10.10.10.1 "cat /proc/net/arp"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
Windows PC access:
sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Get-Process | Select -First 5'
HomeAssistant (no SSH, use QEMU agent):
ssh pve 'qm guest exec 110 -- bash -c "ha core info"'
VMs and Containers
PVE (10.10.10.120)
| VMID | Name | vCPUs | RAM | Purpose | GPU/Passthrough | QEMU Agent |
|---|---|---|---|---|---|---|
| 100 | truenas | 8 | 32GB | NAS, storage | LSI SAS2308 HBA, Samsung NVMe | Yes |
| 101 | saltbox | 16 | 16GB | Media automation | TITAN RTX | Yes |
| 105 | fs-dev | 10 | 8GB | Development | - | Yes |
| 110 | homeassistant | 2 | 2GB | Home automation | - | No |
| 111 | lmdev1 | 8 | 32GB | AI/LLM development | TITAN RTX | Yes |
| 201 | copyparty | 2 | 2GB | File sharing | - | Yes |
| 206 | docker-host | 2 | 4GB | Docker services | - | Yes |
| 200 | pihole (CT) | - | - | DNS/Ad blocking | - | N/A |
| 202 | traefik (CT) | - | - | Reverse proxy | - | N/A |
| 205 | findshyt (CT) | - | - | Custom app | - | N/A |
PVE2 (10.10.10.102)
| VMID | Name | vCPUs | RAM | Purpose | GPU/Passthrough | QEMU Agent |
|---|---|---|---|---|---|---|
| 300 | gitea-vm | 2 | 4GB | Git server | - | Yes |
| 301 | trading-vm | 16 | 32GB | AI trading platform | RTX A6000 | Yes |
QEMU Guest Agent
VMs with QEMU agent can be managed via qm guest exec:
# Execute command in VM
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
# Get VM IP addresses
ssh pve 'qm guest exec 100 -- bash -c "ip addr"'
Only VM 110 (homeassistant) lacks QEMU agent - use its web UI instead.
Power Management
Estimated Power Draw
- PVE: 500-750W (CPU + TITAN RTX + P2000 + storage + HBAs)
- PVE2: 450-600W (CPU + RTX A6000 + storage)
- Combined: ~1000-1350W under load
Optimizations Applied
-
KSMD Disabled (2024-12-17 updated)
- Was consuming 44-57% CPU on PVE with negative profit
- Caused CPU temp to rise from 74°C to 83°C
- Savings: ~7-10W + significant temp reduction
- Made permanent via:
- systemd service:
/etc/systemd/system/disable-ksm.service - ksmtuned masked:
systemctl mask ksmtuned(prevents re-enabling)
- systemd service:
- Note: KSM can get re-enabled by Proxmox updates. If CPU is hot, check:
cat /sys/kernel/mm/ksm/run # Should be 0 ps aux | grep ksmd # Should show 0% CPU # If KSM is running (run=1), disable it: echo 0 > /sys/kernel/mm/ksm/run systemctl mask ksmtuned
-
Syncthing Rescan Intervals (2024-12-16)
- Changed aggressive 60s rescans to 3600s for large folders
- Affected: downloads (38GB), documents (11GB), desktop (7.2GB), movies, pictures, notes, config
- Savings: ~60-80W (TrueNAS VM was at constant 86% CPU)
-
CPU Governor Optimization (2024-12-16)
- PVE:
powersavegovernor +balance_powerEPP (amd-pstate-epp driver) - PVE2:
schedutilgovernor (acpi-cpufreq driver) - Made permanent via systemd service:
/etc/systemd/system/cpu-powersave.service - Savings: ~60-120W combined (CPUs now idle at 1.7-2.2GHz vs 4GHz)
- PVE:
-
GPU Power States (2024-12-16) - Verified optimal
- RTX A6000: 11W idle (P8 state)
- TITAN RTX: 2-3W idle (P8 state)
- Quadro P2000: 25W (P0 - Plex keeps it active)
-
ksmtuned Disabled (2024-12-16)
- KSM tuning daemon was still running after KSMD disabled
- Stopped and disabled on both servers
- Savings: ~2-5W
-
HDD Spindown on PVE2 (2024-12-16)
- local-zfs2 pool (2x WD Red 6TB) had only 768KB used but drives spinning 24/7
- Set 30-minute spindown via
hdparm -S 241 - Persistent via udev rule:
/etc/udev/rules.d/69-hdd-spindown.rules - Savings: ~10-16W when spun down
Potential Optimizations
- PCIe ASPM power management
- NMI watchdog disable
Memory Configuration
- Ballooning enabled on most VMs but not actively used
- No memory overcommit (98GB allocated on 128GB physical for PVE)
- KSMD was wasting CPU with no benefit (negative general_profit)
Network
See NETWORK.md for full details.
Network Ranges
| Network | Range | Purpose |
|---|---|---|
| LAN | 10.10.10.0/24 | Primary network, all external access |
| Internal | 10.10.20.0/24 | Inter-VM only (storage, NFS/iSCSI) |
PVE Bridges (10.10.10.120)
| Bridge | NIC | Speed | Purpose | Use For |
|---|---|---|---|---|
| vmbr0 | enp1s0 | 1 Gb | Management | General VMs/CTs |
| vmbr1 | enp35s0f0 | 10 Gb | High-speed LXC | Bandwidth-heavy containers |
| vmbr2 | enp35s0f1 | 10 Gb | High-speed VM | TrueNAS, Saltbox, storage VMs |
| vmbr3 | (none) | Virtual | Internal only | NFS/iSCSI traffic, no internet |
Quick Reference
# Add VM to standard network (1Gb)
qm set VMID --net0 virtio,bridge=vmbr0
# Add VM to high-speed network (10Gb)
qm set VMID --net0 virtio,bridge=vmbr2
# Add secondary NIC for internal storage network
qm set VMID --net1 virtio,bridge=vmbr3
MTU 9000 (Jumbo Frames)
Jumbo frames are enabled across the network for improved throughput on large transfers.
| Device | Interface | MTU | Persistent |
|---|---|---|---|
| Mac Mini | en0 | 9000 | Yes (networksetup) |
| PVE | vmbr0, enp1s0 | 9000 | Yes (/etc/network/interfaces) |
| PVE2 | vmbr0, nic1 | 9000 | Yes (/etc/network/interfaces) |
| TrueNAS | enp6s18, enp6s19 | 9000 | Yes |
| UCG-Fiber | br0 | 9216 | Yes (default) |
Verify MTU:
# Mac Mini
ifconfig en0 | grep mtu
# PVE/PVE2
ssh pve 'ip link show vmbr0 | grep mtu'
ssh pve2 'ip link show vmbr0 | grep mtu'
# Test jumbo frames
ping -c 1 -D -s 8000 10.10.10.120 # 8000 + 8 byte header = 8008 bytes
Important: When setting MTU on Proxmox bridges, ensure BOTH the bridge (vmbr0) AND the underlying physical interface (enp1s0/nic1) have the same MTU, otherwise packets will be dropped.
Tailscale VPN
Tailscale provides secure remote access to the homelab from anywhere.
Subnet Routers (HA Failover)
Two devices advertise the 10.10.10.0/24 subnet for redundancy:
| Device | Tailscale IP | Role | Notes |
|---|---|---|---|
| pve | 100.113.177.80 | Primary | Proxmox host |
| ucg-fiber | 100.94.246.32 | Failover | UniFi router (always on) |
If Proxmox goes down, Tailscale automatically fails over to the router (~10-30 sec).
Router Tailscale Setup (UCG-Fiber)
- Installed via:
curl -fsSL https://tailscale.com/install.sh | sh - Config:
tailscale up --advertise-routes=10.10.10.0/24 --accept-routes - Survives reboots (systemd service)
- Routes must be approved in Tailscale Admin Console
Tailscale IPs Quick Reference
| Device | Tailscale IP | Local IP |
|---|---|---|
| Mac Mini | 100.108.89.58 | 10.10.10.125 |
| PVE | 100.113.177.80 | 10.10.10.120 |
| UCG-Fiber | 100.94.246.32 | 10.10.10.1 |
| TrueNAS | 100.100.94.71 | 10.10.10.200 |
| Pi-hole | 100.112.59.128 | 10.10.10.10 |
Check Tailscale Status
# From Mac Mini
/Applications/Tailscale.app/Contents/MacOS/Tailscale status
# From router
expect -c 'spawn ssh root@10.10.10.1 "tailscale status"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
Common Commands
# Check VM status
ssh pve 'qm list'
ssh pve2 'qm list'
# Check container status
ssh pve 'pct list'
# Monitor CPU/power
ssh pve 'top -bn1 | head -20'
# Check ZFS pools
ssh pve 'zpool status'
# Check GPU (if nvidia-smi installed in VM)
ssh pve 'lspci | grep -i nvidia'
Remote Claude Code Sessions (Mac Mini)
Overview
The Mac Mini (hutson-mac-mini.local) runs the Happy Coder daemon, enabling on-demand Claude Code sessions accessible from anywhere via the Happy Coder mobile app. Sessions are created when you need them - no persistent tmux sessions required.
Architecture
Mac Mini (100.108.89.58 via Tailscale)
├── launchd (auto-starts on boot)
│ └── com.hutson.happy-daemon.plist (starts Happy daemon)
├── Happy Coder daemon (manages remote sessions)
└── Tailscale (secure remote access)
How It Works
- Happy daemon runs on Mac Mini (auto-starts on boot)
- Open Happy Coder app on phone/tablet
- Start a new Claude session from the app
- Session runs in any working directory you choose
- Session ends when you're done - no cleanup needed
Quick Commands
# Check daemon status
happy daemon list
# Start a new session manually (from Mac Mini terminal)
cd ~/Projects/homelab && happy claude
# Check active sessions
happy daemon list
Mobile Access Setup (One-time)
- Download Happy Coder app:
- On Mac Mini, ensure self-hosted server is configured:
echo 'export HAPPY_SERVER_URL="https://happy.htsn.io"' >> ~/.zshrc source ~/.zshrc - Authenticate with the Happy server:
happy auth login --force # Opens browser, scan QR with app - Connect Claude API access:
happy connect claude # Links your Anthropic API credentials - Ensure Claude is logged in locally (critical for spawned sessions):
claude # Start Claude Code /login # Authenticate if prompted - Daemon auto-starts on login via launchd
Daemon Management
happy daemon start # Start daemon
happy daemon stop # Stop daemon
happy daemon status # Check status
happy daemon list # List active sessions
Remote Access via SSH + Tailscale
From any device on Tailscale network:
# SSH to Mac Mini
ssh hutson@100.108.89.58
# Or via hostname
ssh hutson@mac-mini
# Start Claude in desired directory
cd ~/Projects/homelab && happy claude
Files & Configuration
| File | Purpose |
|---|---|
~/Library/LaunchAgents/com.hutson.happy-daemon.plist |
User LaunchAgent (starts at login) |
~/.happy/ |
Happy Coder config, state, and logs |
~/.zshrc |
Contains HAPPY_SERVER_URL export |
Server: https://happy.htsn.io (self-hosted Happy server on docker-host)
Troubleshooting
# Check if daemon is running
pgrep -f "happy.*daemon"
# Check launchd status
launchctl list | grep happy
# List active sessions
happy daemon list
# Restart daemon
happy daemon stop && happy daemon start
# If Tailscale is disconnected
/Applications/Tailscale.app/Contents/MacOS/Tailscale up
Common Issues:
| Issue | Cause | Fix |
|---|---|---|
| "Invalid API key" in spawned session | Claude not logged in locally | Run claude then /login on Mac Mini |
| "Failed to start daemon" | Stale lock file | rm -f ~/.happy/daemon.state.json.lock ~/.happy/daemon.state.json |
| Sessions not showing on phone | HAPPY_SERVER_URL not set | Add to ~/.zshrc: export HAPPY_SERVER_URL="https://happy.htsn.io" |
| Slow responses | Cloudflare proxy enabled | Disable proxy for happy.htsn.io subdomain |
Happy Server (Self-Hosted Relay)
Self-hosted Happy Coder relay server for lower latency and no external dependencies.
Architecture
Phone App → https://happy.htsn.io → Traefik → docker-host:3002 → Happy Server
↓
PostgreSQL + Redis + MinIO (local)
Service Details
| Component | Location | Port | Notes |
|---|---|---|---|
| Happy Server | docker-host (10.10.10.206) | 3002 | Main relay service |
| PostgreSQL | docker-host | 5432 (internal) | User/session data |
| Redis | docker-host | 6379 (internal) | Real-time events |
| MinIO | docker-host | 9000 (internal) | File/image storage |
| Traefik | CT 202 | 443 | SSL termination |
Configuration
Docker Compose: /opt/happy-server/docker-compose.yml
Traefik Config: /etc/traefik/conf.d/happy.yaml (on CT 202)
DNS: happy.htsn.io → 70.237.94.174 (Cloudflare DNS-only, NOT proxied for WebSocket performance)
Credentials:
- Master Secret:
3ccbfd03a028d3c278da7d2cf36d99b94cd4b1fecabc49ab006e8e89bc7707ac - MinIO:
happyadmin/happyadmin123 - PostgreSQL:
happy/happypass
Quick Commands
# Check status
ssh docker-host 'docker ps --filter "name=happy"'
# View logs
ssh docker-host 'docker logs -f happy-server'
# Restart stack
ssh docker-host 'cd /opt/happy-server && sudo docker-compose restart'
# Health check
curl https://happy.htsn.io/health
# Run migrations (if needed)
ssh docker-host 'docker exec happy-server npx prisma migrate deploy'
Connecting Devices
Phone (Happy App):
- Settings → Relay Server URL
- Enter:
https://happy.htsn.io - Save and reconnect
CLI (Mac/Linux):
export HAPPY_SERVER_URL="https://happy.htsn.io"
happy auth # Re-authenticate with new server
Maintenance
Backup data:
ssh docker-host 'docker exec happy-postgres pg_dump -U happy happy > /tmp/happy-backup.sql'
Update Happy Server:
ssh docker-host 'cd /opt/happy-server && git pull && sudo docker-compose build && sudo docker-compose up -d'
Agent and Tool Guidelines
Background Agents
- Always spin up background agents when doing multiple independent tasks
- Background agents allow parallel execution of tasks that don't depend on each other
- This improves efficiency and reduces total execution time
- Use background agents for tasks like running tests, builds, or searches simultaneously
MCP Tools for Web Searches
ref.tools - Documentation Lookups
mcp__Ref__ref_search_documentation: Search through documentation for specific topicsmcp__Ref__ref_read_url: Read and parse content from documentation URLs
Exa MCP - General Web and Code Searches
mcp__exa__web_search_exa: General web searches for current informationmcp__exa__get_code_context_exa: Code-related searches and repository lookups
MCP Tools Reference Table
| Tool Name | Provider | Purpose | Use Case |
|---|---|---|---|
mcp__Ref__ref_search_documentation |
ref.tools | Search documentation | Finding specific topics in official docs |
mcp__Ref__ref_read_url |
ref.tools | Read documentation URLs | Parsing and extracting content from doc pages |
mcp__exa__web_search_exa |
Exa MCP | General web search | Current events, general information lookup |
mcp__exa__get_code_context_exa |
Exa MCP | Code-specific search | Finding code examples, repository searches |
Reverse Proxy Architecture (Traefik)
Overview
There are TWO separate Traefik instances handling different services:
| Instance | Location | IP | Purpose | Manages |
|---|---|---|---|---|
| Traefik-Primary | CT 202 | 10.10.10.250 | General services | All non-Saltbox services |
| Traefik-Saltbox | VM 101 (Docker) | 10.10.10.100 | Saltbox services | Plex, *arr apps, media stack |
⚠️ CRITICAL RULE: Which Traefik to Use
When adding ANY new service:
- ✅ Use Traefik-Primary (10.10.10.250) - Unless service lives inside Saltbox VM
- ❌ DO NOT touch Traefik-Saltbox - It manages Saltbox services with their own certificates
Why this matters:
- Traefik-Saltbox has complex Saltbox-managed configs
- Messing with it breaks Plex, Sonarr, Radarr, and all media services
- Each Traefik has its own Let's Encrypt certificates
- Mixing them causes certificate conflicts
Traefik-Primary (CT 202) - For New Services
Location: /etc/traefik/ on Container 202
Config: /etc/traefik/traefik.yaml
Dynamic Configs: /etc/traefik/conf.d/*.yaml
Services using Traefik-Primary (10.10.10.250):
- excalidraw.htsn.io → 10.10.10.206:8080 (docker-host)
- findshyt.htsn.io → 10.10.10.205 (CT 205)
- gitea (git.htsn.io) → 10.10.10.220:3000
- homeassistant → 10.10.10.110
- lmdev → 10.10.10.111
- pihole → 10.10.10.200
- truenas → 10.10.10.200
- proxmox → 10.10.10.120
- copyparty → 10.10.10.201
- aitrade → trading server
- pulse.htsn.io → 10.10.10.206:7655 (Pulse monitoring)
- happy.htsn.io → 10.10.10.206:3002 (Happy Coder relay server)
Access Traefik config:
# From Mac Mini:
ssh pve 'pct exec 202 -- cat /etc/traefik/traefik.yaml'
ssh pve 'pct exec 202 -- ls /etc/traefik/conf.d/'
# Edit a service config:
ssh pve 'pct exec 202 -- vi /etc/traefik/conf.d/myservice.yaml'
Traefik-Saltbox (VM 101) - DO NOT MODIFY
Location: /opt/traefik/ inside Saltbox VM
Managed by: Saltbox Ansible playbooks
Mounts: Docker bind mount from /opt/traefik → /etc/traefik in container
Services using Traefik-Saltbox (10.10.10.100):
- Plex (plex.htsn.io)
- Sonarr, Radarr, Lidarr
- SABnzbd, NZBGet, qBittorrent
- Overseerr, Tautulli, Organizr
- Jackett, NZBHydra2
- Authelia (SSO)
- All other Saltbox-managed containers
View Saltbox Traefik (read-only):
ssh pve 'qm guest exec 101 -- bash -c "docker exec traefik cat /etc/traefik/traefik.yml"'
Adding a New Public Service - Complete Workflow
Follow these steps to deploy a new service and make it publicly accessible at servicename.htsn.io.
Step 0. Deploy Your Service
First, deploy your service on the appropriate host:
Option A: Docker on docker-host (10.10.10.206)
ssh hutson@10.10.10.206
sudo mkdir -p /opt/myservice
cat > /opt/myservice/docker-compose.yml << 'EOF'
version: "3.8"
services:
myservice:
image: myimage:latest
ports:
- "8080:80"
restart: unless-stopped
EOF
cd /opt/myservice && sudo docker-compose up -d
Option B: New LXC Container on PVE
ssh pve 'pct create CTID local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
--hostname myservice --memory 2048 --cores 2 \
--net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1 \
--rootfs local-zfs:8 --unprivileged 1 --start 1'
Option C: New VM on PVE
ssh pve 'qm create VMID --name myservice --memory 2048 --cores 2 \
--net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-pci'
Step 1. Create Traefik Config File
Use this template for new services on Traefik-Primary (CT 202):
# /etc/traefik/conf.d/myservice.yaml
http:
routers:
# HTTPS router
myservice-secure:
entryPoints:
- websecure
rule: "Host(`myservice.htsn.io`)"
service: myservice
tls:
certResolver: cloudflare # Use 'cloudflare' for proxied domains, 'letsencrypt' for DNS-only
priority: 50
# HTTP → HTTPS redirect
myservice-redirect:
entryPoints:
- web
rule: "Host(`myservice.htsn.io`)"
middlewares:
- myservice-https-redirect
service: myservice
priority: 50
services:
myservice:
loadBalancer:
servers:
- url: "http://10.10.10.XXX:PORT"
middlewares:
myservice-https-redirect:
redirectScheme:
scheme: https
permanent: true
SSL Certificates
Traefik has two certificate resolvers configured:
| Resolver | Use When | Challenge Type | Notes |
|---|---|---|---|
letsencrypt |
Cloudflare DNS-only (gray cloud) | HTTP-01 | Requires port 80 reachable |
cloudflare |
Cloudflare Proxied (orange cloud) | DNS-01 | Works with Cloudflare proxy |
⚠️ Important: If Cloudflare proxy is enabled (orange cloud), HTTP challenge fails because Cloudflare redirects HTTP→HTTPS. Use cloudflare resolver instead.
Cloudflare API credentials are configured in /etc/systemd/system/traefik.service:
Environment="CF_API_EMAIL=cloudflare@htsn.io"
Environment="CF_API_KEY=849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
Certificate storage:
- HTTP challenge certs:
/etc/traefik/acme.json - DNS challenge certs:
/etc/traefik/acme-cf.json
Deploy the config:
# Create file on CT 202
ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << '\''EOF'\''
<paste config here>
EOF"'
# Traefik auto-reloads (watches conf.d directory)
# Check logs:
ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
2. Add Cloudflare DNS Entry
Cloudflare Credentials:
- Email:
cloudflare@htsn.io - API Key:
849ebefd163d2ccdec25e49b3e1b3fe2cdadc
Manual method (via Cloudflare Dashboard):
- Go to https://dash.cloudflare.com/
- Select
htsn.iodomain - DNS → Add Record
- Type:
A, Name:myservice, IPv4:70.237.94.174, Proxied: ☑️
Automated method (CLI script):
Save this as ~/bin/add-cloudflare-dns.sh:
#!/bin/bash
# Add DNS record to Cloudflare for htsn.io
SUBDOMAIN="$1"
CF_EMAIL="cloudflare@htsn.io"
CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
ZONE_ID="c0f5a80448c608af35d39aa820a5f3af" # htsn.io zone
PUBLIC_IP="70.237.94.174" # Update if IP changes: curl -s ifconfig.me
if [ -z "$SUBDOMAIN" ]; then
echo "Usage: $0 <subdomain>"
echo "Example: $0 myservice # Creates myservice.htsn.io"
exit 1
fi
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
-H "X-Auth-Email: $CF_EMAIL" \
-H "X-Auth-Key: $CF_API_KEY" \
-H "Content-Type: application/json" \
--data "{
\"type\":\"A\",
\"name\":\"$SUBDOMAIN\",
\"content\":\"$PUBLIC_IP\",
\"ttl\":1,
\"proxied\":true
}" | jq .
Usage:
chmod +x ~/bin/add-cloudflare-dns.sh
~/bin/add-cloudflare-dns.sh excalidraw # Creates excalidraw.htsn.io
3. Testing
# Check if DNS resolves
dig myservice.htsn.io
# Test HTTP redirect
curl -I http://myservice.htsn.io
# Test HTTPS
curl -I https://myservice.htsn.io
# Check Traefik dashboard (if enabled)
# Access: http://10.10.10.250:8080/dashboard/
Step 4. Update Documentation
After deploying, update these files:
- IP-ASSIGNMENTS.md - Add to Services & Reverse Proxy Mapping table
- CLAUDE.md - Add to "Services using Traefik-Primary" list (line ~495)
Quick Reference - One-Liner Commands
# === DEPLOY SERVICE (example: myservice on docker-host port 8080) ===
# 1. Create Traefik config
ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << EOF
http:
routers:
myservice-secure:
entryPoints: [websecure]
rule: Host(\\\`myservice.htsn.io\\\`)
service: myservice
tls: {certResolver: letsencrypt}
services:
myservice:
loadBalancer:
servers:
- url: http://10.10.10.206:8080
EOF"'
# 2. Add Cloudflare DNS
curl -s -X POST "https://api.cloudflare.com/client/v4/zones/c0f5a80448c608af35d39aa820a5f3af/dns_records" \
-H "X-Auth-Email: cloudflare@htsn.io" \
-H "X-Auth-Key: 849ebefd163d2ccdec25e49b3e1b3fe2cdadc" \
-H "Content-Type: application/json" \
--data '{"type":"A","name":"myservice","content":"70.237.94.174","proxied":true}'
# 3. Test (wait a few seconds for DNS propagation)
curl -I https://myservice.htsn.io
Traefik Troubleshooting
# View Traefik logs (CT 202)
ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
# Check if config is valid
ssh pve 'pct exec 202 -- cat /etc/traefik/conf.d/myservice.yaml'
# List all dynamic configs
ssh pve 'pct exec 202 -- ls -la /etc/traefik/conf.d/'
# Check certificate
ssh pve 'pct exec 202 -- cat /etc/traefik/acme.json | jq'
# Restart Traefik (if needed)
ssh pve 'pct exec 202 -- systemctl restart traefik'
Certificate Management
Let's Encrypt certificates are automatically managed by Traefik.
Certificate storage:
- Traefik-Primary:
/etc/traefik/acme.jsonon CT 202 - Traefik-Saltbox:
/opt/traefik/acme.jsonon VM 101
Certificate renewal:
- Automatic via HTTP-01 challenge
- Traefik checks every 24h
- Renews 30 days before expiry
If certificates fail:
# Check acme.json permissions (must be 600)
ssh pve 'pct exec 202 -- ls -la /etc/traefik/acme.json'
# Check Traefik can reach Let's Encrypt
ssh pve 'pct exec 202 -- curl -I https://acme-v02.api.letsencrypt.org/directory'
# Delete bad certificate (Traefik will re-request)
ssh pve 'pct exec 202 -- rm /etc/traefik/acme.json'
ssh pve 'pct exec 202 -- touch /etc/traefik/acme.json'
ssh pve 'pct exec 202 -- chmod 600 /etc/traefik/acme.json'
ssh pve 'pct exec 202 -- systemctl restart traefik'
Docker Service with Traefik Labels (Alternative)
If deploying a service via Docker on docker-host (VM 206), you can use Traefik labels instead of config files:
# docker-compose.yml
services:
myservice:
image: myimage:latest
labels:
- "traefik.enable=true"
- "traefik.http.routers.myservice.rule=Host(`myservice.htsn.io`)"
- "traefik.http.routers.myservice.entrypoints=websecure"
- "traefik.http.routers.myservice.tls.certresolver=letsencrypt"
- "traefik.http.services.myservice.loadbalancer.server.port=8080"
networks:
- traefik
networks:
traefik:
external: true
Note: This requires Traefik to have access to Docker socket and be on same network.
Cloudflare API Access
Credentials (stored in Saltbox config):
- Email:
cloudflare@htsn.io - API Key:
849ebefd163d2ccdec25e49b3e1b3fe2cdadc - Domain:
htsn.io
Retrieve from Saltbox:
ssh pve 'qm guest exec 101 -- bash -c "cat /srv/git/saltbox/accounts.yml | grep -A2 cloudflare"'
Cloudflare API Documentation:
- API Docs: https://developers.cloudflare.com/api/
- DNS Records: https://developers.cloudflare.com/api/operations/dns-records-for-a-zone-create-dns-record
Common API operations:
# Set credentials
CF_EMAIL="cloudflare@htsn.io"
CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
ZONE_ID="c0f5a80448c608af35d39aa820a5f3af"
# List all DNS records
curl -X GET "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
-H "X-Auth-Email: $CF_EMAIL" \
-H "X-Auth-Key: $CF_API_KEY" | jq
# Add A record
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
-H "X-Auth-Email: $CF_EMAIL" \
-H "X-Auth-Key: $CF_API_KEY" \
-H "Content-Type: application/json" \
--data '{"type":"A","name":"subdomain","content":"IP","proxied":true}'
# Delete record
curl -X DELETE "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" \
-H "X-Auth-Email: $CF_EMAIL" \
-H "X-Auth-Key: $CF_API_KEY"
Git Repository
This documentation is stored at:
- Gitea: https://git.htsn.io/hutson/homelab-docs
- Local:
~/Projects/homelab - Notes:
~/Notes/05_Homelab(symlink)
# Clone
git clone git@git.htsn.io:hutson/homelab-docs.git
# Push changes
cd ~/Projects/homelab
git add -A && git commit -m "Update docs" && git push
Related Documentation
| File | Description |
|---|---|
| EMC-ENCLOSURE.md | EMC storage enclosure (SES commands, LCC troubleshooting, maintenance) |
| HOMEASSISTANT.md | Home Assistant API access, automations, integrations |
| NETWORK.md | Network bridges, VLANs, which bridge to use for new VMs |
| IP-ASSIGNMENTS.md | Complete IP address assignments for all devices and services |
| SYNCTHING.md | Syncthing setup, API access, device list, troubleshooting |
| SHELL-ALIASES.md | ZSH aliases for Claude Code (chomelab, ctrading, etc.) |
| configs/ | Symlinks to shared shell configs |
Backlog
Future improvements and maintenance tasks:
| Priority | Task | Notes |
|---|---|---|
| Medium | Re-IP all devices | Current IP scheme is inconsistent. Plan: VMs 10.10.10.100-199, LXCs 10.10.10.200-249, Services 10.10.10.250-254 |
| Low | Install SSH on HomeAssistant | Currently only accessible via QEMU agent |
| Low | Set up SSH key for router | Currently requires expect/password |
Changelog
2025-12-21
Happy Server Self-Hosted Relay
- Deployed self-hosted Happy Coder relay server on docker-host (10.10.10.206)
- Stack includes: Happy Server, PostgreSQL, Redis, MinIO (all containerized)
- Configured Traefik reverse proxy at https://happy.htsn.io
- Added Cloudflare DNS record (proxied)
- Fixed Dockerfile to include Prisma migrations on startup
Docker-host CPU Upgrade
- Changed VM 206 CPU from emulated to
hostpassthrough - Fixes x86-64-v2 compatibility issues with modern binaries (Sharp, MinIO)
- Requires:
ssh pve 'qm set 206 -cpu host'+ VM reboot
PVE Tailscale Routing Fix
- Fixed issue where PVE was unreachable via local network (10.10.10.120)
- Root cause: Tailscale routing table 52 was capturing local subnet traffic
- Fix: Added routing rule
ip rule add from 10.10.10.120 table main priority 5200 - Made permanent in
/etc/network/interfacesunder vmbr0
2024-12-20
Git Repository Setup
- Created homelab-docs repo on Gitea (git.htsn.io/hutson/homelab-docs)
- Set up SSH key authentication for git@git.htsn.io
- Created symlink from ~/Notes/05_Homelab → ~/Projects/homelab
- Added Gitea API token for future automation
SSH Key Deployment - All Systems
- Added SSH keys to ALL VMs and LXCs (13 total hosts now accessible via key)
- Updated
~/.ssh/configwith complete host aliases - Fixed permissions: FindShyt LXC
.sshownership, enabled PermitRootLogin on LXCs - Hosts now accessible: pve, pve2, truenas, saltbox, lmdev1, docker-host, fs-dev, copyparty, gitea-vm, trading-vm, pihole, traefik, findshyt
Documentation Updates
- Rewrote SSH Access section with complete host table
- Added Password Auth section for router/Windows/HomeAssistant
- Added Backlog section with re-IP task
- Added Git Repository section with clone/push instructions
2024-12-19
EMC Storage Enclosure - LCC B Failure
- Diagnosed loud fan issue (speed code 5 → 4160 RPM)
- Root cause: Faulty LCC B controller causing false readings
- Resolution: Switched SAS cable to LCC A, fans now quiet (speed code 3 → 2670 RPM)
- Replacement ordered: EMC 303-108-000E ($14.95 eBay)
- Created EMC-ENCLOSURE.md with full documentation
SSH Key Consolidation
- Renamed
~/.ssh/ai_trading_ed25519→~/.ssh/homelab - Updated
~/.ssh/configon MacBook with all homelab hosts - SSH key auth now works for: pve, pve2, docker-host, fs-dev, copyparty, lmdev1, gitea-vm, trading-vm
- No more sshpass needed for PVE servers
QEMU Guest Agent Deployment
- Installed on: docker-host (206), fs-dev (105), copyparty (201)
- All PVE VMs now have agent except homeassistant (110)
- Can now use
qm guest execfor remote commands
VM Configuration Updates
- docker-host: Fixed SSH key in cloud-init
- fs-dev: Fixed
.sshdirectory ownership (1000 → 1001) - copyparty: Changed from DHCP to static IP (10.10.10.201)
Documentation Updates
- Updated CLAUDE.md SSH section (removed sshpass examples)
- Added QEMU Agent column to VM tables
- Added storage enclosure troubleshooting to runbooks