- Expand Mobile Access Setup with full authentication steps (HAPPY_SERVER_URL, happy auth login, happy connect claude, local claude login) - Fix launchd path: ~/Library/LaunchAgents/ not /Library/LaunchDaemons/ - Add Common Issues troubleshooting table with fixes for: - Invalid API key (Claude not logged in locally) - Failed to start daemon (stale lock files) - Sessions not showing (missing HAPPY_SERVER_URL) - Slow responses (Cloudflare proxy enabled) - Update DNS note: Cloudflare proxy disabled for WebSocket performance - Add .zshrc to Files & Configuration table 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1177 lines
40 KiB
Markdown
1177 lines
40 KiB
Markdown
# Homelab Infrastructure
|
|
|
|
## Quick Reference - Common Tasks
|
|
|
|
| Task | Section | Quick Command |
|
|
|------|---------|---------------|
|
|
| **Add new public service** | [Reverse Proxy](#reverse-proxy-architecture-traefik) | Create Traefik config + Cloudflare DNS |
|
|
| **Add Cloudflare DNS** | [Cloudflare API](#cloudflare-api-access) | `curl -X POST cloudflare.com/...` |
|
|
| **Check server temps** | [Temperature Check](#server-temperature-check) | `ssh pve 'grep Tctl ...'` |
|
|
| **Syncthing issues** | [Troubleshooting](#troubleshooting-runbooks) | Check API connections |
|
|
| **SSL cert issues** | [Traefik DNS Challenge](#ssl-certificates) | Use `cloudflare` resolver |
|
|
|
|
**Key Credentials (see sections for full details):**
|
|
- Cloudflare: `cloudflare@htsn.io` / API Key in [Cloudflare API](#cloudflare-api-access)
|
|
- SSH Password: `GrilledCh33s3#`
|
|
- Traefik: CT 202 @ 10.10.10.250
|
|
|
|
---
|
|
|
|
## Role
|
|
|
|
You are the **Homelab Assistant** - a Claude Code session dedicated to managing and maintaining Hutson's home infrastructure. Your responsibilities include:
|
|
|
|
- **Infrastructure Management**: Proxmox servers, VMs, containers, networking
|
|
- **File Sync**: Syncthing configuration across all devices (Mac Mini, MacBook, Windows PC, TrueNAS, Android)
|
|
- **Network Administration**: Router config, SSH access, Tailscale, device management
|
|
- **Power Optimization**: CPU governors, GPU power states, service tuning
|
|
- **Documentation**: Keep CLAUDE.md, SYNCTHING.md, and SHELL-ALIASES.md up to date
|
|
- **Automation**: Shell aliases, startup scripts, scheduled tasks
|
|
|
|
You have full access to all homelab devices via SSH and APIs. Use this context to help troubleshoot, configure, and optimize the infrastructure.
|
|
|
|
### Proactive Behaviors
|
|
|
|
When the user mentions issues or asks questions, proactively:
|
|
- **"sync not working"** → Check Syncthing status on ALL devices, identify which is offline
|
|
- **"device offline"** → Ping both local and Tailscale IPs, check if service is running
|
|
- **"slow"** → Check CPU usage, running processes, Syncthing rescan activity
|
|
- **"check status"** → Run full health check across all systems
|
|
- **"something's wrong"** → Run diagnostics on likely culprits based on context
|
|
|
|
### Quick Health Checks
|
|
|
|
Run these to get a quick overview of the homelab:
|
|
|
|
```bash
|
|
# === FULL HEALTH CHECK ===
|
|
# Syncthing connections (Mac Mini)
|
|
curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" "http://127.0.0.1:8384/rest/system/connections" | python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; [print(f\"{v.get('name',k[:7])}: {'UP' if v['connected'] else 'DOWN'}\") for k,v in d.items()]"
|
|
|
|
# Proxmox VMs
|
|
ssh pve 'qm list' 2>/dev/null || echo "PVE: unreachable"
|
|
ssh pve2 'qm list' 2>/dev/null || echo "PVE2: unreachable"
|
|
|
|
# Ping critical devices
|
|
ping -c 1 -W 1 10.10.10.200 >/dev/null && echo "TrueNAS: UP" || echo "TrueNAS: DOWN"
|
|
ping -c 1 -W 1 10.10.10.1 >/dev/null && echo "Router: UP" || echo "Router: DOWN"
|
|
|
|
# Check Windows PC Syncthing (often goes offline)
|
|
nc -zw1 10.10.10.150 22000 && echo "Windows Syncthing: UP" || echo "Windows Syncthing: DOWN"
|
|
```
|
|
|
|
### Troubleshooting Runbooks
|
|
|
|
| Symptom | Check | Fix |
|
|
|---------|-------|-----|
|
|
| Device not syncing | `curl Syncthing API → connections` | Check if device online, restart Syncthing |
|
|
| Windows PC offline | `ping 10.10.10.150` then `nc -z 22000` | SSH in, `Start-ScheduledTask -TaskName "Syncthing"` |
|
|
| Phone not syncing | Phone Syncthing app in background? | User must open app, keep screen on |
|
|
| High CPU on TrueNAS | Syncthing rescan? KSM? | Check rescan intervals, disable KSM |
|
|
| VM won't start | Storage available? RAM free? | `ssh pve 'qm start VMID'`, check logs |
|
|
| Tailscale offline | `tailscale status` | `tailscale up` or restart service |
|
|
| Tailscale no subnet access | Check subnet routers | Verify pve or ucg-fiber advertising routes |
|
|
| Sync stuck at X% | Folder errors? Conflicts? | Check `rest/folder/errors?folder=NAME` |
|
|
| Server running hot | Check KSM, check CPU processes | Disable KSM, identify runaway process |
|
|
| Storage enclosure loud | Check fan speed via SES | See [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) |
|
|
| Drives not detected | Check SAS link, LCC status | Switch LCC, rescan SCSI hosts |
|
|
|
|
### Server Temperature Check
|
|
```bash
|
|
# Check temps on both servers (Threadripper PRO max safe: 90°C Tctl)
|
|
ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE Tctl: $(($(cat $f)/1000))°C"; fi; done'
|
|
ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE2 Tctl: $(($(cat $f)/1000))°C"; fi; done'
|
|
```
|
|
**Healthy temps**: 70-80°C under load. **Warning**: >85°C. **Throttle**: 90°C.
|
|
|
|
### Service Dependencies
|
|
|
|
```
|
|
TrueNAS (10.10.10.200)
|
|
├── Central Syncthing hub - if down, sync breaks between devices
|
|
├── NFS/SMB shares for VMs
|
|
└── Media storage for Plex
|
|
|
|
PiHole (CT 200)
|
|
└── DNS for entire network - if down, name resolution fails
|
|
|
|
Traefik (CT 202)
|
|
└── Reverse proxy - if down, external access to services fails
|
|
|
|
Router (10.10.10.1)
|
|
└── Everything - gateway for all traffic
|
|
```
|
|
|
|
### API Quick Reference
|
|
|
|
| Service | Device | Endpoint | Auth |
|
|
|---------|--------|----------|------|
|
|
| Syncthing | Mac Mini | `http://127.0.0.1:8384/rest/` | `X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5` |
|
|
| Syncthing | MacBook | `http://127.0.0.1:8384/rest/` (via SSH) | `X-API-Key: qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ` |
|
|
| Syncthing | Phone | `https://10.10.10.54:8384/rest/` | `X-API-Key: Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM` |
|
|
| Proxmox | PVE | `https://10.10.10.120:8006/api2/json/` | SSH key auth |
|
|
| Proxmox | PVE2 | `https://10.10.10.102:8006/api2/json/` | SSH key auth |
|
|
|
|
### Common Maintenance Tasks
|
|
|
|
When user asks for maintenance or you notice issues:
|
|
|
|
1. **Check Syncthing sync status** - Any folders behind? Errors?
|
|
2. **Verify all devices connected** - Run connection check
|
|
3. **Check disk space** - `ssh pve 'df -h'`, `ssh pve2 'df -h'`
|
|
4. **Review ZFS pool health** - `ssh pve 'zpool status'`
|
|
5. **Check for stuck processes** - High CPU? Memory pressure?
|
|
6. **Verify backups** - Are critical folders syncing?
|
|
|
|
### Emergency Commands
|
|
|
|
```bash
|
|
# Restart VM on Proxmox
|
|
ssh pve 'qm stop VMID && qm start VMID'
|
|
|
|
# Check what's using CPU
|
|
ssh pve 'ps aux --sort=-%cpu | head -10'
|
|
|
|
# Check ZFS pool status (via QEMU agent)
|
|
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
|
|
|
|
# Check EMC enclosure fans
|
|
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
|
|
|
|
# Force Syncthing rescan
|
|
curl -X POST "http://127.0.0.1:8384/rest/db/scan?folder=FOLDER" -H "X-API-Key: API_KEY"
|
|
|
|
# Restart Syncthing on Windows (when stuck)
|
|
sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Stop-Process -Name syncthing -Force; Start-ScheduledTask -TaskName "Syncthing"'
|
|
|
|
# Get all device IPs from router
|
|
expect -c 'spawn ssh root@10.10.10.1 "cat /proc/net/arp"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
|
|
```
|
|
|
|
## Overview
|
|
|
|
Two Proxmox servers running various VMs and containers for home infrastructure, media, development, and AI workloads.
|
|
|
|
## Servers
|
|
|
|
### PVE (10.10.10.120) - Primary
|
|
- **CPU**: AMD Ryzen Threadripper PRO 3975WX (32-core, 64 threads, 280W TDP)
|
|
- **RAM**: 128 GB
|
|
- **Storage**:
|
|
- `nvme-mirror1`: 2x Sabrent Rocket Q NVMe (3.6TB usable)
|
|
- `nvme-mirror2`: 2x Kingston SFYRD 2TB (1.8TB usable)
|
|
- `rpool`: 2x Samsung 870 QVO 4TB SSD mirror (3.6TB usable)
|
|
- **GPUs**:
|
|
- NVIDIA Quadro P2000 (75W TDP) - Plex transcoding
|
|
- NVIDIA TITAN RTX (280W TDP) - AI workloads, passed to saltbox/lmdev1
|
|
- **Role**: Primary VM host, TrueNAS, media services
|
|
|
|
### PVE2 (10.10.10.102) - Secondary
|
|
- **CPU**: AMD Ryzen Threadripper PRO 3975WX (32-core, 64 threads, 280W TDP)
|
|
- **RAM**: 128 GB
|
|
- **Storage**:
|
|
- `nvme-mirror3`: 2x NVMe mirror
|
|
- `local-zfs2`: 2x WD Red 6TB HDD mirror
|
|
- **GPUs**:
|
|
- NVIDIA RTX A6000 (300W TDP) - passed to trading-vm
|
|
- **Role**: Trading platform, development
|
|
|
|
## SSH Access
|
|
|
|
### SSH Key Authentication (All Hosts)
|
|
|
|
SSH keys are configured in `~/.ssh/config` on both Mac Mini and MacBook. Use the `~/.ssh/homelab` key.
|
|
|
|
| Host Alias | IP | User | Type | Notes |
|
|
|------------|-----|------|------|-------|
|
|
| `pve` | 10.10.10.120 | root | Proxmox | Primary server |
|
|
| `pve2` | 10.10.10.102 | root | Proxmox | Secondary server |
|
|
| `truenas` | 10.10.10.200 | root | VM | NAS/storage |
|
|
| `saltbox` | 10.10.10.100 | hutson | VM | Media automation |
|
|
| `lmdev1` | 10.10.10.111 | hutson | VM | AI/LLM development |
|
|
| `docker-host` | 10.10.10.206 | hutson | VM | Docker services |
|
|
| `fs-dev` | 10.10.10.5 | hutson | VM | Development |
|
|
| `copyparty` | 10.10.10.201 | hutson | VM | File sharing |
|
|
| `gitea-vm` | 10.10.10.220 | hutson | VM | Git server |
|
|
| `trading-vm` | 10.10.10.221 | hutson | VM | AI trading platform |
|
|
| `pihole` | 10.10.10.10 | root | LXC | DNS/Ad blocking |
|
|
| `traefik` | 10.10.10.250 | root | LXC | Reverse proxy |
|
|
| `findshyt` | 10.10.10.8 | root | LXC | Custom app |
|
|
|
|
**Usage examples:**
|
|
```bash
|
|
ssh pve 'qm list' # List VMs
|
|
ssh truenas 'zpool status vault' # Check ZFS pool
|
|
ssh saltbox 'docker ps' # List containers
|
|
ssh pihole 'pihole status' # Check Pi-hole
|
|
```
|
|
|
|
### Password Auth (Special Cases)
|
|
|
|
| Device | IP | User | Auth Method | Notes |
|
|
|--------|-----|------|-------------|-------|
|
|
| UniFi Router | 10.10.10.1 | root | expect (keyboard-interactive) | Gateway |
|
|
| Windows PC | 10.10.10.150 | claude | sshpass | PowerShell, use `;` not `&&` |
|
|
| HomeAssistant | 10.10.10.110 | - | QEMU agent only | No SSH server |
|
|
|
|
**Router access (requires expect):**
|
|
```bash
|
|
# Run command on router
|
|
expect -c 'spawn ssh root@10.10.10.1 "hostname"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
|
|
|
|
# Get ARP table (all device IPs)
|
|
expect -c 'spawn ssh root@10.10.10.1 "cat /proc/net/arp"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
|
|
```
|
|
|
|
**Windows PC access:**
|
|
```bash
|
|
sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Get-Process | Select -First 5'
|
|
```
|
|
|
|
**HomeAssistant (no SSH, use QEMU agent):**
|
|
```bash
|
|
ssh pve 'qm guest exec 110 -- bash -c "ha core info"'
|
|
```
|
|
|
|
## VMs and Containers
|
|
|
|
### PVE (10.10.10.120)
|
|
| VMID | Name | vCPUs | RAM | Purpose | GPU/Passthrough | QEMU Agent |
|
|
|------|------|-------|-----|---------|-----------------|------------|
|
|
| 100 | truenas | 8 | 32GB | NAS, storage | LSI SAS2308 HBA, Samsung NVMe | Yes |
|
|
| 101 | saltbox | 16 | 16GB | Media automation | TITAN RTX | Yes |
|
|
| 105 | fs-dev | 10 | 8GB | Development | - | Yes |
|
|
| 110 | homeassistant | 2 | 2GB | Home automation | - | No |
|
|
| 111 | lmdev1 | 8 | 32GB | AI/LLM development | TITAN RTX | Yes |
|
|
| 201 | copyparty | 2 | 2GB | File sharing | - | Yes |
|
|
| 206 | docker-host | 2 | 4GB | Docker services | - | Yes |
|
|
| 200 | pihole (CT) | - | - | DNS/Ad blocking | - | N/A |
|
|
| 202 | traefik (CT) | - | - | Reverse proxy | - | N/A |
|
|
| 205 | findshyt (CT) | - | - | Custom app | - | N/A |
|
|
|
|
### PVE2 (10.10.10.102)
|
|
| VMID | Name | vCPUs | RAM | Purpose | GPU/Passthrough | QEMU Agent |
|
|
|------|------|-------|-----|---------|-----------------|------------|
|
|
| 300 | gitea-vm | 2 | 4GB | Git server | - | Yes |
|
|
| 301 | trading-vm | 16 | 32GB | AI trading platform | RTX A6000 | Yes |
|
|
|
|
### QEMU Guest Agent
|
|
VMs with QEMU agent can be managed via `qm guest exec`:
|
|
```bash
|
|
# Execute command in VM
|
|
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
|
|
|
|
# Get VM IP addresses
|
|
ssh pve 'qm guest exec 100 -- bash -c "ip addr"'
|
|
```
|
|
Only VM 110 (homeassistant) lacks QEMU agent - use its web UI instead.
|
|
|
|
## Power Management
|
|
|
|
### Estimated Power Draw
|
|
- **PVE**: 500-750W (CPU + TITAN RTX + P2000 + storage + HBAs)
|
|
- **PVE2**: 450-600W (CPU + RTX A6000 + storage)
|
|
- **Combined**: ~1000-1350W under load
|
|
|
|
### Optimizations Applied
|
|
1. **KSMD Disabled** (2024-12-17 updated)
|
|
- Was consuming 44-57% CPU on PVE with negative profit
|
|
- Caused CPU temp to rise from 74°C to 83°C
|
|
- Savings: ~7-10W + significant temp reduction
|
|
- Made permanent via:
|
|
- systemd service: `/etc/systemd/system/disable-ksm.service`
|
|
- **ksmtuned masked**: `systemctl mask ksmtuned` (prevents re-enabling)
|
|
- **Note**: KSM can get re-enabled by Proxmox updates. If CPU is hot, check:
|
|
```bash
|
|
cat /sys/kernel/mm/ksm/run # Should be 0
|
|
ps aux | grep ksmd # Should show 0% CPU
|
|
# If KSM is running (run=1), disable it:
|
|
echo 0 > /sys/kernel/mm/ksm/run
|
|
systemctl mask ksmtuned
|
|
```
|
|
|
|
2. **Syncthing Rescan Intervals** (2024-12-16)
|
|
- Changed aggressive 60s rescans to 3600s for large folders
|
|
- Affected: downloads (38GB), documents (11GB), desktop (7.2GB), movies, pictures, notes, config
|
|
- Savings: ~60-80W (TrueNAS VM was at constant 86% CPU)
|
|
|
|
3. **CPU Governor Optimization** (2024-12-16)
|
|
- PVE: `powersave` governor + `balance_power` EPP (amd-pstate-epp driver)
|
|
- PVE2: `schedutil` governor (acpi-cpufreq driver)
|
|
- Made permanent via systemd service: `/etc/systemd/system/cpu-powersave.service`
|
|
- Savings: ~60-120W combined (CPUs now idle at 1.7-2.2GHz vs 4GHz)
|
|
|
|
4. **GPU Power States** (2024-12-16) - Verified optimal
|
|
- RTX A6000: 11W idle (P8 state)
|
|
- TITAN RTX: 2-3W idle (P8 state)
|
|
- Quadro P2000: 25W (P0 - Plex keeps it active)
|
|
|
|
5. **ksmtuned Disabled** (2024-12-16)
|
|
- KSM tuning daemon was still running after KSMD disabled
|
|
- Stopped and disabled on both servers
|
|
- Savings: ~2-5W
|
|
|
|
6. **HDD Spindown on PVE2** (2024-12-16)
|
|
- local-zfs2 pool (2x WD Red 6TB) had only 768KB used but drives spinning 24/7
|
|
- Set 30-minute spindown via `hdparm -S 241`
|
|
- Persistent via udev rule: `/etc/udev/rules.d/69-hdd-spindown.rules`
|
|
- Savings: ~10-16W when spun down
|
|
|
|
### Potential Optimizations
|
|
- [ ] PCIe ASPM power management
|
|
- [ ] NMI watchdog disable
|
|
|
|
## Memory Configuration
|
|
- Ballooning enabled on most VMs but not actively used
|
|
- No memory overcommit (98GB allocated on 128GB physical for PVE)
|
|
- KSMD was wasting CPU with no benefit (negative general_profit)
|
|
|
|
## Network
|
|
|
|
See [NETWORK.md](NETWORK.md) for full details.
|
|
|
|
### Network Ranges
|
|
| Network | Range | Purpose |
|
|
|---------|-------|---------|
|
|
| LAN | 10.10.10.0/24 | Primary network, all external access |
|
|
| Internal | 10.10.20.0/24 | Inter-VM only (storage, NFS/iSCSI) |
|
|
|
|
### PVE Bridges (10.10.10.120)
|
|
| Bridge | NIC | Speed | Purpose | Use For |
|
|
|--------|-----|-------|---------|---------|
|
|
| vmbr0 | enp1s0 | 1 Gb | Management | General VMs/CTs |
|
|
| vmbr1 | enp35s0f0 | 10 Gb | High-speed LXC | Bandwidth-heavy containers |
|
|
| vmbr2 | enp35s0f1 | 10 Gb | High-speed VM | TrueNAS, Saltbox, storage VMs |
|
|
| vmbr3 | (none) | Virtual | Internal only | NFS/iSCSI traffic, no internet |
|
|
|
|
### Quick Reference
|
|
```bash
|
|
# Add VM to standard network (1Gb)
|
|
qm set VMID --net0 virtio,bridge=vmbr0
|
|
|
|
# Add VM to high-speed network (10Gb)
|
|
qm set VMID --net0 virtio,bridge=vmbr2
|
|
|
|
# Add secondary NIC for internal storage network
|
|
qm set VMID --net1 virtio,bridge=vmbr3
|
|
```
|
|
|
|
### MTU 9000 (Jumbo Frames)
|
|
|
|
Jumbo frames are enabled across the network for improved throughput on large transfers.
|
|
|
|
| Device | Interface | MTU | Persistent |
|
|
|--------|-----------|-----|------------|
|
|
| Mac Mini | en0 | 9000 | Yes (networksetup) |
|
|
| PVE | vmbr0, enp1s0 | 9000 | Yes (/etc/network/interfaces) |
|
|
| PVE2 | vmbr0, nic1 | 9000 | Yes (/etc/network/interfaces) |
|
|
| TrueNAS | enp6s18, enp6s19 | 9000 | Yes |
|
|
| UCG-Fiber | br0 | 9216 | Yes (default) |
|
|
|
|
**Verify MTU:**
|
|
```bash
|
|
# Mac Mini
|
|
ifconfig en0 | grep mtu
|
|
|
|
# PVE/PVE2
|
|
ssh pve 'ip link show vmbr0 | grep mtu'
|
|
ssh pve2 'ip link show vmbr0 | grep mtu'
|
|
|
|
# Test jumbo frames
|
|
ping -c 1 -D -s 8000 10.10.10.120 # 8000 + 8 byte header = 8008 bytes
|
|
```
|
|
|
|
**Important:** When setting MTU on Proxmox bridges, ensure BOTH the bridge (vmbr0) AND the underlying physical interface (enp1s0/nic1) have the same MTU, otherwise packets will be dropped.
|
|
|
|
### Tailscale VPN
|
|
|
|
Tailscale provides secure remote access to the homelab from anywhere.
|
|
|
|
**Subnet Routers (HA Failover)**
|
|
|
|
Two devices advertise the `10.10.10.0/24` subnet for redundancy:
|
|
|
|
| Device | Tailscale IP | Role | Notes |
|
|
|--------|--------------|------|-------|
|
|
| pve | 100.113.177.80 | Primary | Proxmox host |
|
|
| ucg-fiber | 100.94.246.32 | Failover | UniFi router (always on) |
|
|
|
|
If Proxmox goes down, Tailscale automatically fails over to the router (~10-30 sec).
|
|
|
|
**Router Tailscale Setup (UCG-Fiber)**
|
|
- Installed via: `curl -fsSL https://tailscale.com/install.sh | sh`
|
|
- Config: `tailscale up --advertise-routes=10.10.10.0/24 --accept-routes`
|
|
- Survives reboots (systemd service)
|
|
- Routes must be approved in [Tailscale Admin Console](https://login.tailscale.com/admin/machines)
|
|
|
|
**Tailscale IPs Quick Reference**
|
|
|
|
| Device | Tailscale IP | Local IP |
|
|
|--------|--------------|----------|
|
|
| Mac Mini | 100.108.89.58 | 10.10.10.125 |
|
|
| PVE | 100.113.177.80 | 10.10.10.120 |
|
|
| UCG-Fiber | 100.94.246.32 | 10.10.10.1 |
|
|
| TrueNAS | 100.100.94.71 | 10.10.10.200 |
|
|
| Pi-hole | 100.112.59.128 | 10.10.10.10 |
|
|
|
|
**Check Tailscale Status**
|
|
```bash
|
|
# From Mac Mini
|
|
/Applications/Tailscale.app/Contents/MacOS/Tailscale status
|
|
|
|
# From router
|
|
expect -c 'spawn ssh root@10.10.10.1 "tailscale status"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
|
|
```
|
|
|
|
## Common Commands
|
|
```bash
|
|
# Check VM status
|
|
ssh pve 'qm list'
|
|
ssh pve2 'qm list'
|
|
|
|
# Check container status
|
|
ssh pve 'pct list'
|
|
|
|
# Monitor CPU/power
|
|
ssh pve 'top -bn1 | head -20'
|
|
|
|
# Check ZFS pools
|
|
ssh pve 'zpool status'
|
|
|
|
# Check GPU (if nvidia-smi installed in VM)
|
|
ssh pve 'lspci | grep -i nvidia'
|
|
```
|
|
|
|
## Remote Claude Code Sessions (Mac Mini)
|
|
|
|
### Overview
|
|
The Mac Mini (`hutson-mac-mini.local`) runs the Happy Coder daemon, enabling on-demand Claude Code sessions accessible from anywhere via the Happy Coder mobile app. Sessions are created when you need them - no persistent tmux sessions required.
|
|
|
|
### Architecture
|
|
```
|
|
Mac Mini (100.108.89.58 via Tailscale)
|
|
├── launchd (auto-starts on boot)
|
|
│ └── com.hutson.happy-daemon.plist (starts Happy daemon)
|
|
├── Happy Coder daemon (manages remote sessions)
|
|
└── Tailscale (secure remote access)
|
|
```
|
|
|
|
### How It Works
|
|
1. Happy daemon runs on Mac Mini (auto-starts on boot)
|
|
2. Open Happy Coder app on phone/tablet
|
|
3. Start a new Claude session from the app
|
|
4. Session runs in any working directory you choose
|
|
5. Session ends when you're done - no cleanup needed
|
|
|
|
### Quick Commands
|
|
```bash
|
|
# Check daemon status
|
|
happy daemon list
|
|
|
|
# Start a new session manually (from Mac Mini terminal)
|
|
cd ~/Projects/homelab && happy claude
|
|
|
|
# Check active sessions
|
|
happy daemon list
|
|
```
|
|
|
|
### Mobile Access Setup (One-time)
|
|
1. Download Happy Coder app:
|
|
- iOS: https://apps.apple.com/us/app/happy-claude-code-client/id6748571505
|
|
- Android: https://play.google.com/store/apps/details?id=com.ex3ndr.happy
|
|
2. On Mac Mini, ensure self-hosted server is configured:
|
|
```bash
|
|
echo 'export HAPPY_SERVER_URL="https://happy.htsn.io"' >> ~/.zshrc
|
|
source ~/.zshrc
|
|
```
|
|
3. Authenticate with the Happy server:
|
|
```bash
|
|
happy auth login --force # Opens browser, scan QR with app
|
|
```
|
|
4. Connect Claude API access:
|
|
```bash
|
|
happy connect claude # Links your Anthropic API credentials
|
|
```
|
|
5. Ensure Claude is logged in locally (critical for spawned sessions):
|
|
```bash
|
|
claude # Start Claude Code
|
|
/login # Authenticate if prompted
|
|
```
|
|
6. Daemon auto-starts on login via launchd
|
|
|
|
### Daemon Management
|
|
```bash
|
|
happy daemon start # Start daemon
|
|
happy daemon stop # Stop daemon
|
|
happy daemon status # Check status
|
|
happy daemon list # List active sessions
|
|
```
|
|
|
|
### Remote Access via SSH + Tailscale
|
|
From any device on Tailscale network:
|
|
```bash
|
|
# SSH to Mac Mini
|
|
ssh hutson@100.108.89.58
|
|
|
|
# Or via hostname
|
|
ssh hutson@mac-mini
|
|
|
|
# Start Claude in desired directory
|
|
cd ~/Projects/homelab && happy claude
|
|
```
|
|
|
|
### Files & Configuration
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `~/Library/LaunchAgents/com.hutson.happy-daemon.plist` | User LaunchAgent (starts at login) |
|
|
| `~/.happy/` | Happy Coder config, state, and logs |
|
|
| `~/.zshrc` | Contains `HAPPY_SERVER_URL` export |
|
|
|
|
**Server:** `https://happy.htsn.io` (self-hosted Happy server on docker-host)
|
|
|
|
### Troubleshooting
|
|
```bash
|
|
# Check if daemon is running
|
|
pgrep -f "happy.*daemon"
|
|
|
|
# Check launchd status
|
|
launchctl list | grep happy
|
|
|
|
# List active sessions
|
|
happy daemon list
|
|
|
|
# Restart daemon
|
|
happy daemon stop && happy daemon start
|
|
|
|
# If Tailscale is disconnected
|
|
/Applications/Tailscale.app/Contents/MacOS/Tailscale up
|
|
```
|
|
|
|
**Common Issues:**
|
|
|
|
| Issue | Cause | Fix |
|
|
|-------|-------|-----|
|
|
| "Invalid API key" in spawned session | Claude not logged in locally | Run `claude` then `/login` on Mac Mini |
|
|
| "Failed to start daemon" | Stale lock file | `rm -f ~/.happy/daemon.state.json.lock ~/.happy/daemon.state.json` |
|
|
| Sessions not showing on phone | HAPPY_SERVER_URL not set | Add to `~/.zshrc`: `export HAPPY_SERVER_URL="https://happy.htsn.io"` |
|
|
| Slow responses | Cloudflare proxy enabled | Disable proxy for happy.htsn.io subdomain |
|
|
|
|
## Happy Server (Self-Hosted Relay)
|
|
|
|
Self-hosted Happy Coder relay server for lower latency and no external dependencies.
|
|
|
|
### Architecture
|
|
```
|
|
Phone App → https://happy.htsn.io → Traefik → docker-host:3002 → Happy Server
|
|
↓
|
|
PostgreSQL + Redis + MinIO (local)
|
|
```
|
|
|
|
### Service Details
|
|
|
|
| Component | Location | Port | Notes |
|
|
|-----------|----------|------|-------|
|
|
| Happy Server | docker-host (10.10.10.206) | 3002 | Main relay service |
|
|
| PostgreSQL | docker-host | 5432 (internal) | User/session data |
|
|
| Redis | docker-host | 6379 (internal) | Real-time events |
|
|
| MinIO | docker-host | 9000 (internal) | File/image storage |
|
|
| Traefik | CT 202 | 443 | SSL termination |
|
|
|
|
### Configuration
|
|
|
|
**Docker Compose**: `/opt/happy-server/docker-compose.yml`
|
|
**Traefik Config**: `/etc/traefik/conf.d/happy.yaml` (on CT 202)
|
|
**DNS**: happy.htsn.io → 70.237.94.174 (Cloudflare DNS-only, NOT proxied for WebSocket performance)
|
|
|
|
**Credentials**:
|
|
- Master Secret: `3ccbfd03a028d3c278da7d2cf36d99b94cd4b1fecabc49ab006e8e89bc7707ac`
|
|
- MinIO: `happyadmin` / `happyadmin123`
|
|
- PostgreSQL: `happy` / `happypass`
|
|
|
|
### Quick Commands
|
|
```bash
|
|
# Check status
|
|
ssh docker-host 'docker ps --filter "name=happy"'
|
|
|
|
# View logs
|
|
ssh docker-host 'docker logs -f happy-server'
|
|
|
|
# Restart stack
|
|
ssh docker-host 'cd /opt/happy-server && sudo docker-compose restart'
|
|
|
|
# Health check
|
|
curl https://happy.htsn.io/health
|
|
|
|
# Run migrations (if needed)
|
|
ssh docker-host 'docker exec happy-server npx prisma migrate deploy'
|
|
```
|
|
|
|
### Connecting Devices
|
|
|
|
**Phone (Happy App)**:
|
|
1. Settings → Relay Server URL
|
|
2. Enter: `https://happy.htsn.io`
|
|
3. Save and reconnect
|
|
|
|
**CLI (Mac/Linux)**:
|
|
```bash
|
|
export HAPPY_SERVER_URL="https://happy.htsn.io"
|
|
happy auth # Re-authenticate with new server
|
|
```
|
|
|
|
### Maintenance
|
|
|
|
**Backup data**:
|
|
```bash
|
|
ssh docker-host 'docker exec happy-postgres pg_dump -U happy happy > /tmp/happy-backup.sql'
|
|
```
|
|
|
|
**Update Happy Server**:
|
|
```bash
|
|
ssh docker-host 'cd /opt/happy-server && git pull && sudo docker-compose build && sudo docker-compose up -d'
|
|
```
|
|
|
|
## Agent and Tool Guidelines
|
|
|
|
### Background Agents
|
|
- **Always spin up background agents when doing multiple independent tasks**
|
|
- Background agents allow parallel execution of tasks that don't depend on each other
|
|
- This improves efficiency and reduces total execution time
|
|
- Use background agents for tasks like running tests, builds, or searches simultaneously
|
|
|
|
### MCP Tools for Web Searches
|
|
|
|
#### ref.tools - Documentation Lookups
|
|
- **`mcp__Ref__ref_search_documentation`**: Search through documentation for specific topics
|
|
- **`mcp__Ref__ref_read_url`**: Read and parse content from documentation URLs
|
|
|
|
#### Exa MCP - General Web and Code Searches
|
|
- **`mcp__exa__web_search_exa`**: General web searches for current information
|
|
- **`mcp__exa__get_code_context_exa`**: Code-related searches and repository lookups
|
|
|
|
### MCP Tools Reference Table
|
|
|
|
| Tool Name | Provider | Purpose | Use Case |
|
|
|-----------|----------|---------|----------|
|
|
| `mcp__Ref__ref_search_documentation` | ref.tools | Search documentation | Finding specific topics in official docs |
|
|
| `mcp__Ref__ref_read_url` | ref.tools | Read documentation URLs | Parsing and extracting content from doc pages |
|
|
| `mcp__exa__web_search_exa` | Exa MCP | General web search | Current events, general information lookup |
|
|
| `mcp__exa__get_code_context_exa` | Exa MCP | Code-specific search | Finding code examples, repository searches |
|
|
|
|
## Reverse Proxy Architecture (Traefik)
|
|
|
|
### Overview
|
|
There are **TWO separate Traefik instances** handling different services:
|
|
|
|
| Instance | Location | IP | Purpose | Manages |
|
|
|----------|----------|-----|---------|---------|
|
|
| **Traefik-Primary** | CT 202 | **10.10.10.250** | General services | All non-Saltbox services |
|
|
| **Traefik-Saltbox** | VM 101 (Docker) | **10.10.10.100** | Saltbox services | Plex, *arr apps, media stack |
|
|
|
|
### ⚠️ CRITICAL RULE: Which Traefik to Use
|
|
|
|
**When adding ANY new service:**
|
|
- ✅ **Use Traefik-Primary (10.10.10.250)** - Unless service lives inside Saltbox VM
|
|
- ❌ **DO NOT touch Traefik-Saltbox** - It manages Saltbox services with their own certificates
|
|
|
|
**Why this matters:**
|
|
- Traefik-Saltbox has complex Saltbox-managed configs
|
|
- Messing with it breaks Plex, Sonarr, Radarr, and all media services
|
|
- Each Traefik has its own Let's Encrypt certificates
|
|
- Mixing them causes certificate conflicts
|
|
|
|
### Traefik-Primary (CT 202) - For New Services
|
|
|
|
**Location**: `/etc/traefik/` on Container 202
|
|
**Config**: `/etc/traefik/traefik.yaml`
|
|
**Dynamic Configs**: `/etc/traefik/conf.d/*.yaml`
|
|
|
|
**Services using Traefik-Primary (10.10.10.250):**
|
|
- excalidraw.htsn.io → 10.10.10.206:8080 (docker-host)
|
|
- findshyt.htsn.io → 10.10.10.205 (CT 205)
|
|
- gitea (git.htsn.io) → 10.10.10.220:3000
|
|
- homeassistant → 10.10.10.110
|
|
- lmdev → 10.10.10.111
|
|
- pihole → 10.10.10.200
|
|
- truenas → 10.10.10.200
|
|
- proxmox → 10.10.10.120
|
|
- copyparty → 10.10.10.201
|
|
- aitrade → trading server
|
|
- pulse.htsn.io → 10.10.10.206:7655 (Pulse monitoring)
|
|
- happy.htsn.io → 10.10.10.206:3002 (Happy Coder relay server)
|
|
|
|
**Access Traefik config:**
|
|
```bash
|
|
# From Mac Mini:
|
|
ssh pve 'pct exec 202 -- cat /etc/traefik/traefik.yaml'
|
|
ssh pve 'pct exec 202 -- ls /etc/traefik/conf.d/'
|
|
|
|
# Edit a service config:
|
|
ssh pve 'pct exec 202 -- vi /etc/traefik/conf.d/myservice.yaml'
|
|
```
|
|
|
|
### Traefik-Saltbox (VM 101) - DO NOT MODIFY
|
|
|
|
**Location**: `/opt/traefik/` inside Saltbox VM
|
|
**Managed by**: Saltbox Ansible playbooks
|
|
**Mounts**: Docker bind mount from `/opt/traefik` → `/etc/traefik` in container
|
|
|
|
**Services using Traefik-Saltbox (10.10.10.100):**
|
|
- Plex (plex.htsn.io)
|
|
- Sonarr, Radarr, Lidarr
|
|
- SABnzbd, NZBGet, qBittorrent
|
|
- Overseerr, Tautulli, Organizr
|
|
- Jackett, NZBHydra2
|
|
- Authelia (SSO)
|
|
- All other Saltbox-managed containers
|
|
|
|
**View Saltbox Traefik (read-only):**
|
|
```bash
|
|
ssh pve 'qm guest exec 101 -- bash -c "docker exec traefik cat /etc/traefik/traefik.yml"'
|
|
```
|
|
|
|
### Adding a New Public Service - Complete Workflow
|
|
|
|
Follow these steps to deploy a new service and make it publicly accessible at `servicename.htsn.io`.
|
|
|
|
#### Step 0. Deploy Your Service
|
|
|
|
First, deploy your service on the appropriate host:
|
|
|
|
**Option A: Docker on docker-host (10.10.10.206)**
|
|
```bash
|
|
ssh hutson@10.10.10.206
|
|
sudo mkdir -p /opt/myservice
|
|
cat > /opt/myservice/docker-compose.yml << 'EOF'
|
|
version: "3.8"
|
|
services:
|
|
myservice:
|
|
image: myimage:latest
|
|
ports:
|
|
- "8080:80"
|
|
restart: unless-stopped
|
|
EOF
|
|
cd /opt/myservice && sudo docker-compose up -d
|
|
```
|
|
|
|
**Option B: New LXC Container on PVE**
|
|
```bash
|
|
ssh pve 'pct create CTID local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
|
|
--hostname myservice --memory 2048 --cores 2 \
|
|
--net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1 \
|
|
--rootfs local-zfs:8 --unprivileged 1 --start 1'
|
|
```
|
|
|
|
**Option C: New VM on PVE**
|
|
```bash
|
|
ssh pve 'qm create VMID --name myservice --memory 2048 --cores 2 \
|
|
--net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-pci'
|
|
```
|
|
|
|
#### Step 1. Create Traefik Config File
|
|
|
|
Use this template for new services on **Traefik-Primary (CT 202)**:
|
|
|
|
```yaml
|
|
# /etc/traefik/conf.d/myservice.yaml
|
|
http:
|
|
routers:
|
|
# HTTPS router
|
|
myservice-secure:
|
|
entryPoints:
|
|
- websecure
|
|
rule: "Host(`myservice.htsn.io`)"
|
|
service: myservice
|
|
tls:
|
|
certResolver: cloudflare # Use 'cloudflare' for proxied domains, 'letsencrypt' for DNS-only
|
|
priority: 50
|
|
|
|
# HTTP → HTTPS redirect
|
|
myservice-redirect:
|
|
entryPoints:
|
|
- web
|
|
rule: "Host(`myservice.htsn.io`)"
|
|
middlewares:
|
|
- myservice-https-redirect
|
|
service: myservice
|
|
priority: 50
|
|
|
|
services:
|
|
myservice:
|
|
loadBalancer:
|
|
servers:
|
|
- url: "http://10.10.10.XXX:PORT"
|
|
|
|
middlewares:
|
|
myservice-https-redirect:
|
|
redirectScheme:
|
|
scheme: https
|
|
permanent: true
|
|
```
|
|
|
|
### SSL Certificates
|
|
|
|
Traefik has **two certificate resolvers** configured:
|
|
|
|
| Resolver | Use When | Challenge Type | Notes |
|
|
|----------|----------|----------------|-------|
|
|
| `letsencrypt` | Cloudflare DNS-only (gray cloud) | HTTP-01 | Requires port 80 reachable |
|
|
| `cloudflare` | Cloudflare Proxied (orange cloud) | DNS-01 | Works with Cloudflare proxy |
|
|
|
|
**⚠️ Important:** If Cloudflare proxy is enabled (orange cloud), HTTP challenge fails because Cloudflare redirects HTTP→HTTPS. Use `cloudflare` resolver instead.
|
|
|
|
**Cloudflare API credentials** are configured in `/etc/systemd/system/traefik.service`:
|
|
```bash
|
|
Environment="CF_API_EMAIL=cloudflare@htsn.io"
|
|
Environment="CF_API_KEY=849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
|
|
```
|
|
|
|
**Certificate storage:**
|
|
- HTTP challenge certs: `/etc/traefik/acme.json`
|
|
- DNS challenge certs: `/etc/traefik/acme-cf.json`
|
|
|
|
**Deploy the config:**
|
|
```bash
|
|
# Create file on CT 202
|
|
ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << '\''EOF'\''
|
|
<paste config here>
|
|
EOF"'
|
|
|
|
# Traefik auto-reloads (watches conf.d directory)
|
|
# Check logs:
|
|
ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
|
|
```
|
|
|
|
#### 2. Add Cloudflare DNS Entry
|
|
|
|
**Cloudflare Credentials:**
|
|
- Email: `cloudflare@htsn.io`
|
|
- API Key: `849ebefd163d2ccdec25e49b3e1b3fe2cdadc`
|
|
|
|
**Manual method (via Cloudflare Dashboard):**
|
|
1. Go to https://dash.cloudflare.com/
|
|
2. Select `htsn.io` domain
|
|
3. DNS → Add Record
|
|
4. Type: `A`, Name: `myservice`, IPv4: `70.237.94.174`, Proxied: ☑️
|
|
|
|
**Automated method (CLI script):**
|
|
|
|
Save this as `~/bin/add-cloudflare-dns.sh`:
|
|
```bash
|
|
#!/bin/bash
|
|
# Add DNS record to Cloudflare for htsn.io
|
|
|
|
SUBDOMAIN="$1"
|
|
CF_EMAIL="cloudflare@htsn.io"
|
|
CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
|
|
ZONE_ID="c0f5a80448c608af35d39aa820a5f3af" # htsn.io zone
|
|
PUBLIC_IP="70.237.94.174" # Update if IP changes: curl -s ifconfig.me
|
|
|
|
if [ -z "$SUBDOMAIN" ]; then
|
|
echo "Usage: $0 <subdomain>"
|
|
echo "Example: $0 myservice # Creates myservice.htsn.io"
|
|
exit 1
|
|
fi
|
|
|
|
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
|
|
-H "X-Auth-Email: $CF_EMAIL" \
|
|
-H "X-Auth-Key: $CF_API_KEY" \
|
|
-H "Content-Type: application/json" \
|
|
--data "{
|
|
\"type\":\"A\",
|
|
\"name\":\"$SUBDOMAIN\",
|
|
\"content\":\"$PUBLIC_IP\",
|
|
\"ttl\":1,
|
|
\"proxied\":true
|
|
}" | jq .
|
|
```
|
|
|
|
**Usage:**
|
|
```bash
|
|
chmod +x ~/bin/add-cloudflare-dns.sh
|
|
~/bin/add-cloudflare-dns.sh excalidraw # Creates excalidraw.htsn.io
|
|
```
|
|
|
|
#### 3. Testing
|
|
|
|
```bash
|
|
# Check if DNS resolves
|
|
dig myservice.htsn.io
|
|
|
|
# Test HTTP redirect
|
|
curl -I http://myservice.htsn.io
|
|
|
|
# Test HTTPS
|
|
curl -I https://myservice.htsn.io
|
|
|
|
# Check Traefik dashboard (if enabled)
|
|
# Access: http://10.10.10.250:8080/dashboard/
|
|
```
|
|
|
|
#### Step 4. Update Documentation
|
|
|
|
After deploying, update these files:
|
|
|
|
1. **IP-ASSIGNMENTS.md** - Add to Services & Reverse Proxy Mapping table
|
|
2. **CLAUDE.md** - Add to "Services using Traefik-Primary" list (line ~495)
|
|
|
|
### Quick Reference - One-Liner Commands
|
|
|
|
```bash
|
|
# === DEPLOY SERVICE (example: myservice on docker-host port 8080) ===
|
|
|
|
# 1. Create Traefik config
|
|
ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << EOF
|
|
http:
|
|
routers:
|
|
myservice-secure:
|
|
entryPoints: [websecure]
|
|
rule: Host(\\\`myservice.htsn.io\\\`)
|
|
service: myservice
|
|
tls: {certResolver: letsencrypt}
|
|
services:
|
|
myservice:
|
|
loadBalancer:
|
|
servers:
|
|
- url: http://10.10.10.206:8080
|
|
EOF"'
|
|
|
|
# 2. Add Cloudflare DNS
|
|
curl -s -X POST "https://api.cloudflare.com/client/v4/zones/c0f5a80448c608af35d39aa820a5f3af/dns_records" \
|
|
-H "X-Auth-Email: cloudflare@htsn.io" \
|
|
-H "X-Auth-Key: 849ebefd163d2ccdec25e49b3e1b3fe2cdadc" \
|
|
-H "Content-Type: application/json" \
|
|
--data '{"type":"A","name":"myservice","content":"70.237.94.174","proxied":true}'
|
|
|
|
# 3. Test (wait a few seconds for DNS propagation)
|
|
curl -I https://myservice.htsn.io
|
|
```
|
|
|
|
### Traefik Troubleshooting
|
|
|
|
```bash
|
|
# View Traefik logs (CT 202)
|
|
ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
|
|
|
|
# Check if config is valid
|
|
ssh pve 'pct exec 202 -- cat /etc/traefik/conf.d/myservice.yaml'
|
|
|
|
# List all dynamic configs
|
|
ssh pve 'pct exec 202 -- ls -la /etc/traefik/conf.d/'
|
|
|
|
# Check certificate
|
|
ssh pve 'pct exec 202 -- cat /etc/traefik/acme.json | jq'
|
|
|
|
# Restart Traefik (if needed)
|
|
ssh pve 'pct exec 202 -- systemctl restart traefik'
|
|
```
|
|
|
|
### Certificate Management
|
|
|
|
**Let's Encrypt certificates** are automatically managed by Traefik.
|
|
|
|
**Certificate storage:**
|
|
- Traefik-Primary: `/etc/traefik/acme.json` on CT 202
|
|
- Traefik-Saltbox: `/opt/traefik/acme.json` on VM 101
|
|
|
|
**Certificate renewal:**
|
|
- Automatic via HTTP-01 challenge
|
|
- Traefik checks every 24h
|
|
- Renews 30 days before expiry
|
|
|
|
**If certificates fail:**
|
|
```bash
|
|
# Check acme.json permissions (must be 600)
|
|
ssh pve 'pct exec 202 -- ls -la /etc/traefik/acme.json'
|
|
|
|
# Check Traefik can reach Let's Encrypt
|
|
ssh pve 'pct exec 202 -- curl -I https://acme-v02.api.letsencrypt.org/directory'
|
|
|
|
# Delete bad certificate (Traefik will re-request)
|
|
ssh pve 'pct exec 202 -- rm /etc/traefik/acme.json'
|
|
ssh pve 'pct exec 202 -- touch /etc/traefik/acme.json'
|
|
ssh pve 'pct exec 202 -- chmod 600 /etc/traefik/acme.json'
|
|
ssh pve 'pct exec 202 -- systemctl restart traefik'
|
|
```
|
|
|
|
### Docker Service with Traefik Labels (Alternative)
|
|
|
|
If deploying a service via Docker on `docker-host` (VM 206), you can use Traefik labels instead of config files:
|
|
|
|
```yaml
|
|
# docker-compose.yml
|
|
services:
|
|
myservice:
|
|
image: myimage:latest
|
|
labels:
|
|
- "traefik.enable=true"
|
|
- "traefik.http.routers.myservice.rule=Host(`myservice.htsn.io`)"
|
|
- "traefik.http.routers.myservice.entrypoints=websecure"
|
|
- "traefik.http.routers.myservice.tls.certresolver=letsencrypt"
|
|
- "traefik.http.services.myservice.loadbalancer.server.port=8080"
|
|
networks:
|
|
- traefik
|
|
|
|
networks:
|
|
traefik:
|
|
external: true
|
|
```
|
|
|
|
**Note**: This requires Traefik to have access to Docker socket and be on same network.
|
|
|
|
## Cloudflare API Access
|
|
|
|
**Credentials** (stored in Saltbox config):
|
|
- Email: `cloudflare@htsn.io`
|
|
- API Key: `849ebefd163d2ccdec25e49b3e1b3fe2cdadc`
|
|
- Domain: `htsn.io`
|
|
|
|
**Retrieve from Saltbox:**
|
|
```bash
|
|
ssh pve 'qm guest exec 101 -- bash -c "cat /srv/git/saltbox/accounts.yml | grep -A2 cloudflare"'
|
|
```
|
|
|
|
**Cloudflare API Documentation:**
|
|
- API Docs: https://developers.cloudflare.com/api/
|
|
- DNS Records: https://developers.cloudflare.com/api/operations/dns-records-for-a-zone-create-dns-record
|
|
|
|
**Common API operations:**
|
|
|
|
```bash
|
|
# Set credentials
|
|
CF_EMAIL="cloudflare@htsn.io"
|
|
CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
|
|
ZONE_ID="c0f5a80448c608af35d39aa820a5f3af"
|
|
|
|
# List all DNS records
|
|
curl -X GET "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
|
|
-H "X-Auth-Email: $CF_EMAIL" \
|
|
-H "X-Auth-Key: $CF_API_KEY" | jq
|
|
|
|
# Add A record
|
|
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
|
|
-H "X-Auth-Email: $CF_EMAIL" \
|
|
-H "X-Auth-Key: $CF_API_KEY" \
|
|
-H "Content-Type: application/json" \
|
|
--data '{"type":"A","name":"subdomain","content":"IP","proxied":true}'
|
|
|
|
# Delete record
|
|
curl -X DELETE "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" \
|
|
-H "X-Auth-Email: $CF_EMAIL" \
|
|
-H "X-Auth-Key: $CF_API_KEY"
|
|
```
|
|
|
|
## Git Repository
|
|
|
|
This documentation is stored at:
|
|
- **Gitea**: https://git.htsn.io/hutson/homelab-docs
|
|
- **Local**: `~/Projects/homelab`
|
|
- **Notes**: `~/Notes/05_Homelab` (symlink)
|
|
|
|
```bash
|
|
# Clone
|
|
git clone git@git.htsn.io:hutson/homelab-docs.git
|
|
|
|
# Push changes
|
|
cd ~/Projects/homelab
|
|
git add -A && git commit -m "Update docs" && git push
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
| File | Description |
|
|
|------|-------------|
|
|
| [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) | EMC storage enclosure (SES commands, LCC troubleshooting, maintenance) |
|
|
| [HOMEASSISTANT.md](HOMEASSISTANT.md) | Home Assistant API access, automations, integrations |
|
|
| [NETWORK.md](NETWORK.md) | Network bridges, VLANs, which bridge to use for new VMs |
|
|
| [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) | Complete IP address assignments for all devices and services |
|
|
| [SYNCTHING.md](SYNCTHING.md) | Syncthing setup, API access, device list, troubleshooting |
|
|
| [SHELL-ALIASES.md](SHELL-ALIASES.md) | ZSH aliases for Claude Code (`chomelab`, `ctrading`, etc.) |
|
|
| [configs/](configs/) | Symlinks to shared shell configs |
|
|
|
|
---
|
|
|
|
## Backlog
|
|
|
|
Future improvements and maintenance tasks:
|
|
|
|
| Priority | Task | Notes |
|
|
|----------|------|-------|
|
|
| Medium | **Re-IP all devices** | Current IP scheme is inconsistent. Plan: VMs 10.10.10.100-199, LXCs 10.10.10.200-249, Services 10.10.10.250-254 |
|
|
| Low | Install SSH on HomeAssistant | Currently only accessible via QEMU agent |
|
|
| Low | Set up SSH key for router | Currently requires expect/password |
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
### 2025-12-21
|
|
|
|
**Happy Server Self-Hosted Relay**
|
|
- Deployed self-hosted Happy Coder relay server on docker-host (10.10.10.206)
|
|
- Stack includes: Happy Server, PostgreSQL, Redis, MinIO (all containerized)
|
|
- Configured Traefik reverse proxy at https://happy.htsn.io
|
|
- Added Cloudflare DNS record (proxied)
|
|
- Fixed Dockerfile to include Prisma migrations on startup
|
|
|
|
**Docker-host CPU Upgrade**
|
|
- Changed VM 206 CPU from emulated to `host` passthrough
|
|
- Fixes x86-64-v2 compatibility issues with modern binaries (Sharp, MinIO)
|
|
- Requires: `ssh pve 'qm set 206 -cpu host'` + VM reboot
|
|
|
|
**PVE Tailscale Routing Fix**
|
|
- Fixed issue where PVE was unreachable via local network (10.10.10.120)
|
|
- Root cause: Tailscale routing table 52 was capturing local subnet traffic
|
|
- Fix: Added routing rule `ip rule add from 10.10.10.120 table main priority 5200`
|
|
- Made permanent in `/etc/network/interfaces` under vmbr0
|
|
|
|
### 2024-12-20
|
|
|
|
**Git Repository Setup**
|
|
- Created homelab-docs repo on Gitea (git.htsn.io/hutson/homelab-docs)
|
|
- Set up SSH key authentication for git@git.htsn.io
|
|
- Created symlink from ~/Notes/05_Homelab → ~/Projects/homelab
|
|
- Added Gitea API token for future automation
|
|
|
|
**SSH Key Deployment - All Systems**
|
|
- Added SSH keys to ALL VMs and LXCs (13 total hosts now accessible via key)
|
|
- Updated `~/.ssh/config` with complete host aliases
|
|
- Fixed permissions: FindShyt LXC `.ssh` ownership, enabled PermitRootLogin on LXCs
|
|
- Hosts now accessible: pve, pve2, truenas, saltbox, lmdev1, docker-host, fs-dev, copyparty, gitea-vm, trading-vm, pihole, traefik, findshyt
|
|
|
|
**Documentation Updates**
|
|
- Rewrote SSH Access section with complete host table
|
|
- Added Password Auth section for router/Windows/HomeAssistant
|
|
- Added Backlog section with re-IP task
|
|
- Added Git Repository section with clone/push instructions
|
|
|
|
### 2024-12-19
|
|
|
|
**EMC Storage Enclosure - LCC B Failure**
|
|
- Diagnosed loud fan issue (speed code 5 → 4160 RPM)
|
|
- Root cause: Faulty LCC B controller causing false readings
|
|
- Resolution: Switched SAS cable to LCC A, fans now quiet (speed code 3 → 2670 RPM)
|
|
- Replacement ordered: EMC 303-108-000E ($14.95 eBay)
|
|
- Created [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) with full documentation
|
|
|
|
**SSH Key Consolidation**
|
|
- Renamed `~/.ssh/ai_trading_ed25519` → `~/.ssh/homelab`
|
|
- Updated `~/.ssh/config` on MacBook with all homelab hosts
|
|
- SSH key auth now works for: pve, pve2, docker-host, fs-dev, copyparty, lmdev1, gitea-vm, trading-vm
|
|
- No more sshpass needed for PVE servers
|
|
|
|
**QEMU Guest Agent Deployment**
|
|
- Installed on: docker-host (206), fs-dev (105), copyparty (201)
|
|
- All PVE VMs now have agent except homeassistant (110)
|
|
- Can now use `qm guest exec` for remote commands
|
|
|
|
**VM Configuration Updates**
|
|
- docker-host: Fixed SSH key in cloud-init
|
|
- fs-dev: Fixed `.ssh` directory ownership (1000 → 1001)
|
|
- copyparty: Changed from DHCP to static IP (10.10.10.201)
|
|
|
|
**Documentation Updates**
|
|
- Updated CLAUDE.md SSH section (removed sshpass examples)
|
|
- Added QEMU Agent column to VM tables
|
|
- Added storage enclosure troubleshooting to runbooks
|