From 93821d15571889e6d6db318f4476c17883347895 Mon Sep 17 00:00:00 2001 From: Hutson Date: Sat, 20 Dec 2025 02:31:02 -0500 Subject: [PATCH] Initial commit: Homelab infrastructure documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - CLAUDE.md: Main homelab assistant context and instructions - IP-ASSIGNMENTS.md: Complete IP address assignments - NETWORK.md: Network bridges, VLANs, and configuration - EMC-ENCLOSURE.md: EMC storage enclosure documentation - SYNCTHING.md: Syncthing setup and device list - SHELL-ALIASES.md: ZSH aliases for Claude Code sessions - HOMEASSISTANT.md: Home Assistant API and automations - INFRASTRUCTURE.md: Server hardware and power management - configs/: Shared shell configurations - scripts/: Utility scripts - mcp-central/: MCP server configuration πŸ€– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .gitignore | 22 + CHANGELOG.md | 197 +++++++ CLAUDE.md | 962 ++++++++++++++++++++++++++++++++ EMC-ENCLOSURE.md | 247 ++++++++ HOMEASSISTANT.md | 145 +++++ INFRASTRUCTURE.md | 330 +++++++++++ IP-ASSIGNMENTS.md | 139 +++++ NETWORK.md | 226 ++++++++ SHELL-ALIASES.md | 147 +++++ SYNCTHING.md | 166 ++++++ configs/claude-aliases.zsh | 1 + configs/ghostty.conf | 5 + mcp-central/.env.example | 16 + mcp-central/README.md | 129 +++++ mcp-central/docker-compose.yml | 58 ++ scripts/fix-immich-raf-files.sh | 159 ++++++ scripts/health-check.sh | 318 +++++++++++ 17 files changed, 3267 insertions(+) create mode 100644 .gitignore create mode 100644 CHANGELOG.md create mode 100644 CLAUDE.md create mode 100644 EMC-ENCLOSURE.md create mode 100644 HOMEASSISTANT.md create mode 100644 INFRASTRUCTURE.md create mode 100644 IP-ASSIGNMENTS.md create mode 100644 NETWORK.md create mode 100644 SHELL-ALIASES.md create mode 100644 SYNCTHING.md create mode 120000 configs/claude-aliases.zsh create mode 100644 configs/ghostty.conf create mode 100644 mcp-central/.env.example create mode 100644 mcp-central/README.md create mode 100644 mcp-central/docker-compose.yml create mode 100644 scripts/fix-immich-raf-files.sh create mode 100755 scripts/health-check.sh diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..6a4eda8 --- /dev/null +++ b/.gitignore @@ -0,0 +1,22 @@ +# Secrets and credentials +.env +*.credentials +*-credentials*.txt + +# macOS +.DS_Store +.AppleDouble +.LSOverride + +# Editor/IDE +.obsidian/ +.claude/ +.vscode/ +*.swp +*.swo +*~ + +# Temporary files +*.tmp +*.bak +nul diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..4c86066 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,197 @@ +# Homelab Changelog + +## 2024-12-16 + +### Power Investigation +Investigated UPS power limit issues across both Proxmox servers. + +#### Findings +1. **KSMD (Kernel Same-page Merging Daemon)** was consuming 50-57% CPU constantly on PVE + - `sleep_millisecs` set to 12ms (extremely aggressive, default is 200ms) + - `general_profit` was **negative** (-320MB) meaning it was wasting CPU + - No memory overcommit situation (98GB allocated on 128GB RAM) + - Diverse workloads (TrueNAS, Windows, Linux) = few duplicate pages to merge + +2. **GPU Power Draw** identified as major consumers: + - RTX A6000 on PVE2: up to 300W TDP + - TITAN RTX on PVE: up to 280W TDP + - Quadro P2000 on PVE: up to 75W TDP + +3. **TrueNAS VM** occasionally spiking to 86% CPU (needs investigation) + +#### Changes Made +- [x] **Disabled KSMD on PVE** (10.10.10.120) + ```bash + echo 0 > /sys/kernel/mm/ksm/run + ``` + - Immediate result: KSMD CPU dropped from 51-57% to 0% + - Load average dropped from 1.88 to 1.28 + - Estimated savings: ~7-10W continuous + +#### Additional Changes +- [x] **Made KSMD disable persistent on both hosts** + - Note: KSM is controlled via sysfs, not sysctl + - Created systemd service `/etc/systemd/system/disable-ksm.service`: + ```ini + [Unit] + Description=Disable KSM (Kernel Same-page Merging) + After=multi-user.target + + [Service] + Type=oneshot + ExecStart=/bin/sh -c "echo 0 > /sys/kernel/mm/ksm/run" + RemainAfterExit=yes + + [Install] + WantedBy=multi-user.target + ``` + - Enabled on both PVE and PVE2: `systemctl enable disable-ksm.service` + +### Syncthing Rescan Interval Fix +**Root Cause**: Syncthing on TrueNAS was rescanning 56GB of data every 60 seconds, causing constant 100% CPU usage (~3172 minutes CPU time in 3 days). + +**Folders affected** (changed from 60s to 3600s): +- downloads (38GB) +- documents (11GB) +- desktop (7.2GB) +- config, movies, notes, pictures + +**Fix applied**: +```bash +# Downloaded config from TrueNAS +ssh pve 'qm guest exec 100 -- cat /mnt/.ix-apps/app_mounts/syncthing/config/config/config.xml' + +# Changed all rescanIntervalS="60" to rescanIntervalS="3600" +sed -i 's/rescanIntervalS="60"/rescanIntervalS="3600"/g' config.xml + +# Uploaded and restarted Syncthing +curl -X POST -H "X-API-Key: xxx" http://localhost:20910/rest/system/restart +``` + +**Note**: fsWatcher is enabled, so changes are detected in real-time. The rescan is just a safety net. + +**Estimated savings**: ~60-80W (TrueNAS VM CPU will drop from 86% to ~5-10% at idle) + +### GPU Power State Investigation + +| GPU | VM | Idle Power | P-State | Status | +|-----|-----|-----------|---------|--------| +| RTX A6000 | trading-vm (301) | **11W** | P8 | Optimal | +| TITAN RTX | lmdev1 (111) | **2W** | P8 | Excellent! | +| Quadro P2000 | saltbox (101) | **25W** | P0 | Stuck due to Plex | + +**Findings**: +- RTX A6000: Properly entering P8 (11W idle) - excellent +- TITAN RTX: Only 2W at idle despite ComfyUI/Python processes (436MiB VRAM used) + - Modern GPUs have much better idle power management +- Quadro P2000: Stuck in P0 at 25W because Plex Transcoder holds GPU memory + - Older Quadro cards don't idle as efficiently with processes attached + - Power limit fixed at 75W (not adjustable) + +**Changes made**: +- [x] Installed QEMU guest agent on lmdev1 (VM 111) +- [x] Added SSH key access to lmdev1 (10.10.10.111) +- [x] Updated ~/.ssh/config with lmdev1 entry + +### CPU Governor Optimization + +**Issue**: Both servers using `performance` CPU governor, keeping CPUs at high frequencies (3-4GHz) even when 99% idle. + +**Changes**: + +#### PVE (10.10.10.120) +- **Driver**: `amd-pstate-epp` (modern AMD P-State with Energy Performance Preference) +- **Change**: Governor `performance` β†’ `powersave`, EPP `performance` β†’ `balance_power` +- **Result**: Idle frequencies dropped from ~4GHz to ~1.7GHz +- **Persistence**: Created `/etc/systemd/system/cpu-powersave.service` + ```ini + [Unit] + Description=Set CPU governor to powersave with balance_power EPP + After=multi-user.target + + [Service] + Type=oneshot + ExecStart=/bin/bash -c 'for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo powersave > "$gov"; done; for epp in /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference; do echo balance_power > "$epp"; done' + RemainAfterExit=yes + + [Install] + WantedBy=multi-user.target + ``` + +#### PVE2 (10.10.10.102) +- **Driver**: `acpi-cpufreq` (older driver) +- **Change**: Governor `performance` β†’ `schedutil` +- **Result**: Idle frequencies dropped from ~4GHz to ~2.2GHz +- **Persistence**: Created `/etc/systemd/system/cpu-powersave.service` + ```ini + [Unit] + Description=Set CPU governor to schedutil for power savings + After=multi-user.target + + [Service] + Type=oneshot + ExecStart=/bin/bash -c 'for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo schedutil > "$gov"; done' + RemainAfterExit=yes + + [Install] + WantedBy=multi-user.target + ``` + +**Estimated savings**: 30-60W per server (60-120W total) + +### ksmtuned Service Disabled + +**Issue**: The `ksmtuned` (KSM tuning daemon) was still running on both servers even after KSMD was disabled. Consuming ~39 min CPU on PVE and ~12 min CPU on PVE2 over 3 days. + +**Fix**: +```bash +systemctl stop ksmtuned +systemctl disable ksmtuned +``` + +Applied to both PVE and PVE2. + +**Estimated savings**: ~2-5W + +### HDD Spindown on PVE2 + +**Issue**: Two WD Red 6TB drives (local-zfs2 pool) spinning 24/7 despite pool having only 768KB used. Each drive uses 5-8W spinning. + +**Fix**: +```bash +# Set 30-minute spindown timeout +hdparm -S 241 /dev/sda /dev/sdb +``` + +**Persistence**: Created udev rule `/etc/udev/rules.d/69-hdd-spindown.rules`: +``` +ACTION=="add", KERNEL=="sd[a-z]", ATTRS{model}=="WDC WD60EFRX-68L*", RUN+="/usr/sbin/hdparm -S 241 /dev/%k" +``` + +**Estimated savings**: ~10-16W (when drives spin down) + +#### Pending Changes +- [ ] Monitor overall power consumption after all optimizations +- [ ] Consider PCIe ASPM optimization +- [ ] Consider NMI watchdog disable + +### SSH Key Setup +- Added SSH key authentication to both Proxmox servers +- Updated `~/.ssh/config` with entries for `pve` and `pve2` + +--- + +## Notes + +### What is KSMD? +Kernel Same-page Merging Daemon - scans memory for duplicate pages across VMs and merges them. Trades CPU cycles for RAM savings. Useful when: +- Overcommitting memory +- Running many identical VMs + +Not useful when: +- Plenty of RAM headroom (our case) +- Diverse workloads with few duplicate pages +- `general_profit` is negative + +### What is Memory Ballooning? +Guest-cooperative memory management. Hypervisor can request VMs to give back unused RAM. Independent from KSMD. Both are Proxmox/KVM memory optimization features but serve different purposes. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..cd71c99 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,962 @@ +# Homelab Infrastructure + +## Quick Reference - Common Tasks + +| Task | Section | Quick Command | +|------|---------|---------------| +| **Add new public service** | [Reverse Proxy](#reverse-proxy-architecture-traefik) | Create Traefik config + Cloudflare DNS | +| **Add Cloudflare DNS** | [Cloudflare API](#cloudflare-api-access) | `curl -X POST cloudflare.com/...` | +| **Check server temps** | [Temperature Check](#server-temperature-check) | `ssh pve 'grep Tctl ...'` | +| **Syncthing issues** | [Troubleshooting](#troubleshooting-runbooks) | Check API connections | +| **SSL cert issues** | [Traefik DNS Challenge](#ssl-certificates) | Use `cloudflare` resolver | + +**Key Credentials (see sections for full details):** +- Cloudflare: `cloudflare@htsn.io` / API Key in [Cloudflare API](#cloudflare-api-access) +- SSH Password: `GrilledCh33s3#` +- Traefik: CT 202 @ 10.10.10.250 + +--- + +## Role + +You are the **Homelab Assistant** - a Claude Code session dedicated to managing and maintaining Hutson's home infrastructure. Your responsibilities include: + +- **Infrastructure Management**: Proxmox servers, VMs, containers, networking +- **File Sync**: Syncthing configuration across all devices (Mac Mini, MacBook, Windows PC, TrueNAS, Android) +- **Network Administration**: Router config, SSH access, Tailscale, device management +- **Power Optimization**: CPU governors, GPU power states, service tuning +- **Documentation**: Keep CLAUDE.md, SYNCTHING.md, and SHELL-ALIASES.md up to date +- **Automation**: Shell aliases, startup scripts, scheduled tasks + +You have full access to all homelab devices via SSH and APIs. Use this context to help troubleshoot, configure, and optimize the infrastructure. + +### Proactive Behaviors + +When the user mentions issues or asks questions, proactively: +- **"sync not working"** β†’ Check Syncthing status on ALL devices, identify which is offline +- **"device offline"** β†’ Ping both local and Tailscale IPs, check if service is running +- **"slow"** β†’ Check CPU usage, running processes, Syncthing rescan activity +- **"check status"** β†’ Run full health check across all systems +- **"something's wrong"** β†’ Run diagnostics on likely culprits based on context + +### Quick Health Checks + +Run these to get a quick overview of the homelab: + +```bash +# === FULL HEALTH CHECK === +# Syncthing connections (Mac Mini) +curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" "http://127.0.0.1:8384/rest/system/connections" | python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; [print(f\"{v.get('name',k[:7])}: {'UP' if v['connected'] else 'DOWN'}\") for k,v in d.items()]" + +# Proxmox VMs +ssh pve 'qm list' 2>/dev/null || echo "PVE: unreachable" +ssh pve2 'qm list' 2>/dev/null || echo "PVE2: unreachable" + +# Ping critical devices +ping -c 1 -W 1 10.10.10.200 >/dev/null && echo "TrueNAS: UP" || echo "TrueNAS: DOWN" +ping -c 1 -W 1 10.10.10.1 >/dev/null && echo "Router: UP" || echo "Router: DOWN" + +# Check Windows PC Syncthing (often goes offline) +nc -zw1 10.10.10.150 22000 && echo "Windows Syncthing: UP" || echo "Windows Syncthing: DOWN" +``` + +### Troubleshooting Runbooks + +| Symptom | Check | Fix | +|---------|-------|-----| +| Device not syncing | `curl Syncthing API β†’ connections` | Check if device online, restart Syncthing | +| Windows PC offline | `ping 10.10.10.150` then `nc -z 22000` | SSH in, `Start-ScheduledTask -TaskName "Syncthing"` | +| Phone not syncing | Phone Syncthing app in background? | User must open app, keep screen on | +| High CPU on TrueNAS | Syncthing rescan? KSM? | Check rescan intervals, disable KSM | +| VM won't start | Storage available? RAM free? | `ssh pve 'qm start VMID'`, check logs | +| Tailscale offline | `tailscale status` | `tailscale up` or restart service | +| Sync stuck at X% | Folder errors? Conflicts? | Check `rest/folder/errors?folder=NAME` | +| Server running hot | Check KSM, check CPU processes | Disable KSM, identify runaway process | +| Storage enclosure loud | Check fan speed via SES | See [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) | +| Drives not detected | Check SAS link, LCC status | Switch LCC, rescan SCSI hosts | + +### Server Temperature Check +```bash +# Check temps on both servers (Threadripper PRO max safe: 90Β°C Tctl) +ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE Tctl: $(($(cat $f)/1000))Β°C"; fi; done' +ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE2 Tctl: $(($(cat $f)/1000))Β°C"; fi; done' +``` +**Healthy temps**: 70-80Β°C under load. **Warning**: >85Β°C. **Throttle**: 90Β°C. + +### Service Dependencies + +``` +TrueNAS (10.10.10.200) +β”œβ”€β”€ Central Syncthing hub - if down, sync breaks between devices +β”œβ”€β”€ NFS/SMB shares for VMs +└── Media storage for Plex + +PiHole (CT 200) +└── DNS for entire network - if down, name resolution fails + +Traefik (CT 202) +└── Reverse proxy - if down, external access to services fails + +Router (10.10.10.1) +└── Everything - gateway for all traffic +``` + +### API Quick Reference + +| Service | Device | Endpoint | Auth | +|---------|--------|----------|------| +| Syncthing | Mac Mini | `http://127.0.0.1:8384/rest/` | `X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5` | +| Syncthing | MacBook | `http://127.0.0.1:8384/rest/` (via SSH) | `X-API-Key: qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ` | +| Syncthing | Phone | `https://10.10.10.54:8384/rest/` | `X-API-Key: Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM` | +| Proxmox | PVE | `https://10.10.10.120:8006/api2/json/` | SSH key auth | +| Proxmox | PVE2 | `https://10.10.10.102:8006/api2/json/` | SSH key auth | + +### Common Maintenance Tasks + +When user asks for maintenance or you notice issues: + +1. **Check Syncthing sync status** - Any folders behind? Errors? +2. **Verify all devices connected** - Run connection check +3. **Check disk space** - `ssh pve 'df -h'`, `ssh pve2 'df -h'` +4. **Review ZFS pool health** - `ssh pve 'zpool status'` +5. **Check for stuck processes** - High CPU? Memory pressure? +6. **Verify backups** - Are critical folders syncing? + +### Emergency Commands + +```bash +# Restart VM on Proxmox +ssh pve 'qm stop VMID && qm start VMID' + +# Check what's using CPU +ssh pve 'ps aux --sort=-%cpu | head -10' + +# Check ZFS pool status (via QEMU agent) +ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"' + +# Check EMC enclosure fans +ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"' + +# Force Syncthing rescan +curl -X POST "http://127.0.0.1:8384/rest/db/scan?folder=FOLDER" -H "X-API-Key: API_KEY" + +# Restart Syncthing on Windows (when stuck) +sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Stop-Process -Name syncthing -Force; Start-ScheduledTask -TaskName "Syncthing"' + +# Get all device IPs from router +expect -c 'spawn ssh root@10.10.10.1 "cat /proc/net/arp"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof' +``` + +## Overview + +Two Proxmox servers running various VMs and containers for home infrastructure, media, development, and AI workloads. + +## Servers + +### PVE (10.10.10.120) - Primary +- **CPU**: AMD Ryzen Threadripper PRO 3975WX (32-core, 64 threads, 280W TDP) +- **RAM**: 128 GB +- **Storage**: + - `nvme-mirror1`: 2x Sabrent Rocket Q NVMe (3.6TB usable) + - `nvme-mirror2`: 2x Kingston SFYRD 2TB (1.8TB usable) + - `rpool`: 2x Samsung 870 QVO 4TB SSD mirror (3.6TB usable) +- **GPUs**: + - NVIDIA Quadro P2000 (75W TDP) - Plex transcoding + - NVIDIA TITAN RTX (280W TDP) - AI workloads, passed to saltbox/lmdev1 +- **Role**: Primary VM host, TrueNAS, media services + +### PVE2 (10.10.10.102) - Secondary +- **CPU**: AMD Ryzen Threadripper PRO 3975WX (32-core, 64 threads, 280W TDP) +- **RAM**: 128 GB +- **Storage**: + - `nvme-mirror3`: 2x NVMe mirror + - `local-zfs2`: 2x WD Red 6TB HDD mirror +- **GPUs**: + - NVIDIA RTX A6000 (300W TDP) - passed to trading-vm +- **Role**: Trading platform, development + +## SSH Access + +### SSH Key Authentication (All Hosts) + +SSH keys are configured in `~/.ssh/config` on both Mac Mini and MacBook. Use the `~/.ssh/homelab` key. + +| Host Alias | IP | User | Type | Notes | +|------------|-----|------|------|-------| +| `pve` | 10.10.10.120 | root | Proxmox | Primary server | +| `pve2` | 10.10.10.102 | root | Proxmox | Secondary server | +| `truenas` | 10.10.10.200 | root | VM | NAS/storage | +| `saltbox` | 10.10.10.100 | hutson | VM | Media automation | +| `lmdev1` | 10.10.10.111 | hutson | VM | AI/LLM development | +| `docker-host` | 10.10.10.206 | hutson | VM | Docker services | +| `fs-dev` | 10.10.10.5 | hutson | VM | Development | +| `copyparty` | 10.10.10.201 | hutson | VM | File sharing | +| `gitea-vm` | 10.10.10.220 | hutson | VM | Git server | +| `trading-vm` | 10.10.10.221 | hutson | VM | AI trading platform | +| `pihole` | 10.10.10.10 | root | LXC | DNS/Ad blocking | +| `traefik` | 10.10.10.250 | root | LXC | Reverse proxy | +| `findshyt` | 10.10.10.8 | root | LXC | Custom app | + +**Usage examples:** +```bash +ssh pve 'qm list' # List VMs +ssh truenas 'zpool status vault' # Check ZFS pool +ssh saltbox 'docker ps' # List containers +ssh pihole 'pihole status' # Check Pi-hole +``` + +### Password Auth (Special Cases) + +| Device | IP | User | Auth Method | Notes | +|--------|-----|------|-------------|-------| +| UniFi Router | 10.10.10.1 | root | expect (keyboard-interactive) | Gateway | +| Windows PC | 10.10.10.150 | claude | sshpass | PowerShell, use `;` not `&&` | +| HomeAssistant | 10.10.10.110 | - | QEMU agent only | No SSH server | + +**Router access (requires expect):** +```bash +# Run command on router +expect -c 'spawn ssh root@10.10.10.1 "hostname"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof' + +# Get ARP table (all device IPs) +expect -c 'spawn ssh root@10.10.10.1 "cat /proc/net/arp"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof' +``` + +**Windows PC access:** +```bash +sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Get-Process | Select -First 5' +``` + +**HomeAssistant (no SSH, use QEMU agent):** +```bash +ssh pve 'qm guest exec 110 -- bash -c "ha core info"' +``` + +## VMs and Containers + +### PVE (10.10.10.120) +| VMID | Name | vCPUs | RAM | Purpose | GPU/Passthrough | QEMU Agent | +|------|------|-------|-----|---------|-----------------|------------| +| 100 | truenas | 8 | 32GB | NAS, storage | LSI SAS2308 HBA, Samsung NVMe | Yes | +| 101 | saltbox | 16 | 16GB | Media automation | TITAN RTX | Yes | +| 105 | fs-dev | 10 | 8GB | Development | - | Yes | +| 110 | homeassistant | 2 | 2GB | Home automation | - | No | +| 111 | lmdev1 | 8 | 32GB | AI/LLM development | TITAN RTX | Yes | +| 201 | copyparty | 2 | 2GB | File sharing | - | Yes | +| 206 | docker-host | 2 | 4GB | Docker services | - | Yes | +| 200 | pihole (CT) | - | - | DNS/Ad blocking | - | N/A | +| 202 | traefik (CT) | - | - | Reverse proxy | - | N/A | +| 205 | findshyt (CT) | - | - | Custom app | - | N/A | + +### PVE2 (10.10.10.102) +| VMID | Name | vCPUs | RAM | Purpose | GPU/Passthrough | QEMU Agent | +|------|------|-------|-----|---------|-----------------|------------| +| 300 | gitea-vm | 2 | 4GB | Git server | - | Yes | +| 301 | trading-vm | 16 | 32GB | AI trading platform | RTX A6000 | Yes | + +### QEMU Guest Agent +VMs with QEMU agent can be managed via `qm guest exec`: +```bash +# Execute command in VM +ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"' + +# Get VM IP addresses +ssh pve 'qm guest exec 100 -- bash -c "ip addr"' +``` +Only VM 110 (homeassistant) lacks QEMU agent - use its web UI instead. + +## Power Management + +### Estimated Power Draw +- **PVE**: 500-750W (CPU + TITAN RTX + P2000 + storage + HBAs) +- **PVE2**: 450-600W (CPU + RTX A6000 + storage) +- **Combined**: ~1000-1350W under load + +### Optimizations Applied +1. **KSMD Disabled** (2024-12-17 updated) + - Was consuming 44-57% CPU on PVE with negative profit + - Caused CPU temp to rise from 74Β°C to 83Β°C + - Savings: ~7-10W + significant temp reduction + - Made permanent via: + - systemd service: `/etc/systemd/system/disable-ksm.service` + - **ksmtuned masked**: `systemctl mask ksmtuned` (prevents re-enabling) + - **Note**: KSM can get re-enabled by Proxmox updates. If CPU is hot, check: + ```bash + cat /sys/kernel/mm/ksm/run # Should be 0 + ps aux | grep ksmd # Should show 0% CPU + # If KSM is running (run=1), disable it: + echo 0 > /sys/kernel/mm/ksm/run + systemctl mask ksmtuned + ``` + +2. **Syncthing Rescan Intervals** (2024-12-16) + - Changed aggressive 60s rescans to 3600s for large folders + - Affected: downloads (38GB), documents (11GB), desktop (7.2GB), movies, pictures, notes, config + - Savings: ~60-80W (TrueNAS VM was at constant 86% CPU) + +3. **CPU Governor Optimization** (2024-12-16) + - PVE: `powersave` governor + `balance_power` EPP (amd-pstate-epp driver) + - PVE2: `schedutil` governor (acpi-cpufreq driver) + - Made permanent via systemd service: `/etc/systemd/system/cpu-powersave.service` + - Savings: ~60-120W combined (CPUs now idle at 1.7-2.2GHz vs 4GHz) + +4. **GPU Power States** (2024-12-16) - Verified optimal + - RTX A6000: 11W idle (P8 state) + - TITAN RTX: 2-3W idle (P8 state) + - Quadro P2000: 25W (P0 - Plex keeps it active) + +5. **ksmtuned Disabled** (2024-12-16) + - KSM tuning daemon was still running after KSMD disabled + - Stopped and disabled on both servers + - Savings: ~2-5W + +6. **HDD Spindown on PVE2** (2024-12-16) + - local-zfs2 pool (2x WD Red 6TB) had only 768KB used but drives spinning 24/7 + - Set 30-minute spindown via `hdparm -S 241` + - Persistent via udev rule: `/etc/udev/rules.d/69-hdd-spindown.rules` + - Savings: ~10-16W when spun down + +### Potential Optimizations +- [ ] PCIe ASPM power management +- [ ] NMI watchdog disable + +## Memory Configuration +- Ballooning enabled on most VMs but not actively used +- No memory overcommit (98GB allocated on 128GB physical for PVE) +- KSMD was wasting CPU with no benefit (negative general_profit) + +## Network + +See [NETWORK.md](NETWORK.md) for full details. + +### Network Ranges +| Network | Range | Purpose | +|---------|-------|---------| +| LAN | 10.10.10.0/24 | Primary network, all external access | +| Internal | 10.10.20.0/24 | Inter-VM only (storage, NFS/iSCSI) | + +### PVE Bridges (10.10.10.120) +| Bridge | NIC | Speed | Purpose | Use For | +|--------|-----|-------|---------|---------| +| vmbr0 | enp1s0 | 1 Gb | Management | General VMs/CTs | +| vmbr1 | enp35s0f0 | 10 Gb | High-speed LXC | Bandwidth-heavy containers | +| vmbr2 | enp35s0f1 | 10 Gb | High-speed VM | TrueNAS, Saltbox, storage VMs | +| vmbr3 | (none) | Virtual | Internal only | NFS/iSCSI traffic, no internet | + +### Quick Reference +```bash +# Add VM to standard network (1Gb) +qm set VMID --net0 virtio,bridge=vmbr0 + +# Add VM to high-speed network (10Gb) +qm set VMID --net0 virtio,bridge=vmbr2 + +# Add secondary NIC for internal storage network +qm set VMID --net1 virtio,bridge=vmbr3 +``` + +- MTU 9000 (jumbo frames) on all bridges + +## Common Commands +```bash +# Check VM status +ssh pve 'qm list' +ssh pve2 'qm list' + +# Check container status +ssh pve 'pct list' + +# Monitor CPU/power +ssh pve 'top -bn1 | head -20' + +# Check ZFS pools +ssh pve 'zpool status' + +# Check GPU (if nvidia-smi installed in VM) +ssh pve 'lspci | grep -i nvidia' +``` + +## Remote Claude Code Sessions (Mac Mini) + +### Overview +The Mac Mini (`hutson-mac-mini.local`) runs the Happy Coder daemon, enabling on-demand Claude Code sessions accessible from anywhere via the Happy Coder mobile app. Sessions are created when you need them - no persistent tmux sessions required. + +### Architecture +``` +Mac Mini (100.108.89.58 via Tailscale) +β”œβ”€β”€ launchd (auto-starts on boot) +β”‚ └── com.hutson.happy-daemon.plist (starts Happy daemon) +β”œβ”€β”€ Happy Coder daemon (manages remote sessions) +└── Tailscale (secure remote access) +``` + +### How It Works +1. Happy daemon runs on Mac Mini (auto-starts on boot) +2. Open Happy Coder app on phone/tablet +3. Start a new Claude session from the app +4. Session runs in any working directory you choose +5. Session ends when you're done - no cleanup needed + +### Quick Commands +```bash +# Check daemon status +happy daemon list + +# Start a new session manually (from Mac Mini terminal) +cd ~/Projects/homelab && happy claude + +# Check active sessions +happy daemon list +``` + +### Mobile Access Setup (One-time) +1. Download Happy Coder app: + - iOS: https://apps.apple.com/us/app/happy-claude-code-client/id6748571505 + - Android: https://play.google.com/store/apps/details?id=com.ex3ndr.happy +2. On Mac Mini, run: `happy auth` and scan QR code with the app +3. Daemon auto-starts on boot via launchd + +### Daemon Management +```bash +happy daemon start # Start daemon +happy daemon stop # Stop daemon +happy daemon status # Check status +happy daemon list # List active sessions +``` + +### Remote Access via SSH + Tailscale +From any device on Tailscale network: +```bash +# SSH to Mac Mini +ssh hutson@100.108.89.58 + +# Or via hostname +ssh hutson@mac-mini + +# Start Claude in desired directory +cd ~/Projects/homelab && happy claude +``` + +### Files & Configuration +| File | Purpose | +|------|---------| +| `~/Library/LaunchAgents/com.hutson.happy-daemon.plist` | launchd auto-start Happy daemon | +| `~/.happy/` | Happy Coder config and logs | + +### Troubleshooting +```bash +# Check if daemon is running +pgrep -f "happy.*daemon" + +# Check launchd status +launchctl list | grep happy + +# List active sessions +happy daemon list + +# Restart daemon +happy daemon stop && happy daemon start + +# If Tailscale is disconnected +/Applications/Tailscale.app/Contents/MacOS/Tailscale up +``` + +## Agent and Tool Guidelines + +### Background Agents +- **Always spin up background agents when doing multiple independent tasks** +- Background agents allow parallel execution of tasks that don't depend on each other +- This improves efficiency and reduces total execution time +- Use background agents for tasks like running tests, builds, or searches simultaneously + +### MCP Tools for Web Searches + +#### ref.tools - Documentation Lookups +- **`mcp__Ref__ref_search_documentation`**: Search through documentation for specific topics +- **`mcp__Ref__ref_read_url`**: Read and parse content from documentation URLs + +#### Exa MCP - General Web and Code Searches +- **`mcp__exa__web_search_exa`**: General web searches for current information +- **`mcp__exa__get_code_context_exa`**: Code-related searches and repository lookups + +### MCP Tools Reference Table + +| Tool Name | Provider | Purpose | Use Case | +|-----------|----------|---------|----------| +| `mcp__Ref__ref_search_documentation` | ref.tools | Search documentation | Finding specific topics in official docs | +| `mcp__Ref__ref_read_url` | ref.tools | Read documentation URLs | Parsing and extracting content from doc pages | +| `mcp__exa__web_search_exa` | Exa MCP | General web search | Current events, general information lookup | +| `mcp__exa__get_code_context_exa` | Exa MCP | Code-specific search | Finding code examples, repository searches | + +## Reverse Proxy Architecture (Traefik) + +### Overview +There are **TWO separate Traefik instances** handling different services: + +| Instance | Location | IP | Purpose | Manages | +|----------|----------|-----|---------|---------| +| **Traefik-Primary** | CT 202 | **10.10.10.250** | General services | All non-Saltbox services | +| **Traefik-Saltbox** | VM 101 (Docker) | **10.10.10.100** | Saltbox services | Plex, *arr apps, media stack | + +### ⚠️ CRITICAL RULE: Which Traefik to Use + +**When adding ANY new service:** +- βœ… **Use Traefik-Primary (10.10.10.250)** - Unless service lives inside Saltbox VM +- ❌ **DO NOT touch Traefik-Saltbox** - It manages Saltbox services with their own certificates + +**Why this matters:** +- Traefik-Saltbox has complex Saltbox-managed configs +- Messing with it breaks Plex, Sonarr, Radarr, and all media services +- Each Traefik has its own Let's Encrypt certificates +- Mixing them causes certificate conflicts + +### Traefik-Primary (CT 202) - For New Services + +**Location**: `/etc/traefik/` on Container 202 +**Config**: `/etc/traefik/traefik.yaml` +**Dynamic Configs**: `/etc/traefik/conf.d/*.yaml` + +**Services using Traefik-Primary (10.10.10.250):** +- excalidraw.htsn.io β†’ 10.10.10.206:8080 (docker-host) +- findshyt.htsn.io β†’ 10.10.10.205 (CT 205) +- gitea (git.htsn.io) β†’ 10.10.10.220:3000 +- homeassistant β†’ 10.10.10.110 +- lmdev β†’ 10.10.10.111 +- pihole β†’ 10.10.10.200 +- truenas β†’ 10.10.10.200 +- proxmox β†’ 10.10.10.120 +- copyparty β†’ 10.10.10.201 +- aitrade β†’ trading server +- pulse.htsn.io β†’ 10.10.10.206:7655 (Pulse monitoring) + +**Access Traefik config:** +```bash +# From Mac Mini: +ssh pve 'pct exec 202 -- cat /etc/traefik/traefik.yaml' +ssh pve 'pct exec 202 -- ls /etc/traefik/conf.d/' + +# Edit a service config: +ssh pve 'pct exec 202 -- vi /etc/traefik/conf.d/myservice.yaml' +``` + +### Traefik-Saltbox (VM 101) - DO NOT MODIFY + +**Location**: `/opt/traefik/` inside Saltbox VM +**Managed by**: Saltbox Ansible playbooks +**Mounts**: Docker bind mount from `/opt/traefik` β†’ `/etc/traefik` in container + +**Services using Traefik-Saltbox (10.10.10.100):** +- Plex (plex.htsn.io) +- Sonarr, Radarr, Lidarr +- SABnzbd, NZBGet, qBittorrent +- Overseerr, Tautulli, Organizr +- Jackett, NZBHydra2 +- Authelia (SSO) +- All other Saltbox-managed containers + +**View Saltbox Traefik (read-only):** +```bash +ssh pve 'qm guest exec 101 -- bash -c "docker exec traefik cat /etc/traefik/traefik.yml"' +``` + +### Adding a New Public Service - Complete Workflow + +Follow these steps to deploy a new service and make it publicly accessible at `servicename.htsn.io`. + +#### Step 0. Deploy Your Service + +First, deploy your service on the appropriate host: + +**Option A: Docker on docker-host (10.10.10.206)** +```bash +ssh hutson@10.10.10.206 +sudo mkdir -p /opt/myservice +cat > /opt/myservice/docker-compose.yml << 'EOF' +version: "3.8" +services: + myservice: + image: myimage:latest + ports: + - "8080:80" + restart: unless-stopped +EOF +cd /opt/myservice && sudo docker-compose up -d +``` + +**Option B: New LXC Container on PVE** +```bash +ssh pve 'pct create CTID local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \ + --hostname myservice --memory 2048 --cores 2 \ + --net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1 \ + --rootfs local-zfs:8 --unprivileged 1 --start 1' +``` + +**Option C: New VM on PVE** +```bash +ssh pve 'qm create VMID --name myservice --memory 2048 --cores 2 \ + --net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-pci' +``` + +#### Step 1. Create Traefik Config File + +Use this template for new services on **Traefik-Primary (CT 202)**: + +```yaml +# /etc/traefik/conf.d/myservice.yaml +http: + routers: + # HTTPS router + myservice-secure: + entryPoints: + - websecure + rule: "Host(`myservice.htsn.io`)" + service: myservice + tls: + certResolver: cloudflare # Use 'cloudflare' for proxied domains, 'letsencrypt' for DNS-only + priority: 50 + + # HTTP β†’ HTTPS redirect + myservice-redirect: + entryPoints: + - web + rule: "Host(`myservice.htsn.io`)" + middlewares: + - myservice-https-redirect + service: myservice + priority: 50 + + services: + myservice: + loadBalancer: + servers: + - url: "http://10.10.10.XXX:PORT" + + middlewares: + myservice-https-redirect: + redirectScheme: + scheme: https + permanent: true +``` + +### SSL Certificates + +Traefik has **two certificate resolvers** configured: + +| Resolver | Use When | Challenge Type | Notes | +|----------|----------|----------------|-------| +| `letsencrypt` | Cloudflare DNS-only (gray cloud) | HTTP-01 | Requires port 80 reachable | +| `cloudflare` | Cloudflare Proxied (orange cloud) | DNS-01 | Works with Cloudflare proxy | + +**⚠️ Important:** If Cloudflare proxy is enabled (orange cloud), HTTP challenge fails because Cloudflare redirects HTTPβ†’HTTPS. Use `cloudflare` resolver instead. + +**Cloudflare API credentials** are configured in `/etc/systemd/system/traefik.service`: +```bash +Environment="CF_API_EMAIL=cloudflare@htsn.io" +Environment="CF_API_KEY=849ebefd163d2ccdec25e49b3e1b3fe2cdadc" +``` + +**Certificate storage:** +- HTTP challenge certs: `/etc/traefik/acme.json` +- DNS challenge certs: `/etc/traefik/acme-cf.json` + +**Deploy the config:** +```bash +# Create file on CT 202 +ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << '\''EOF'\'' + +EOF"' + +# Traefik auto-reloads (watches conf.d directory) +# Check logs: +ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log' +``` + +#### 2. Add Cloudflare DNS Entry + +**Cloudflare Credentials:** +- Email: `cloudflare@htsn.io` +- API Key: `849ebefd163d2ccdec25e49b3e1b3fe2cdadc` + +**Manual method (via Cloudflare Dashboard):** +1. Go to https://dash.cloudflare.com/ +2. Select `htsn.io` domain +3. DNS β†’ Add Record +4. Type: `A`, Name: `myservice`, IPv4: `70.237.94.174`, Proxied: β˜‘οΈ + +**Automated method (CLI script):** + +Save this as `~/bin/add-cloudflare-dns.sh`: +```bash +#!/bin/bash +# Add DNS record to Cloudflare for htsn.io + +SUBDOMAIN="$1" +CF_EMAIL="cloudflare@htsn.io" +CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc" +ZONE_ID="c0f5a80448c608af35d39aa820a5f3af" # htsn.io zone +PUBLIC_IP="70.237.94.174" # Update if IP changes: curl -s ifconfig.me + +if [ -z "$SUBDOMAIN" ]; then + echo "Usage: $0 " + echo "Example: $0 myservice # Creates myservice.htsn.io" + exit 1 +fi + +curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \ + -H "X-Auth-Email: $CF_EMAIL" \ + -H "X-Auth-Key: $CF_API_KEY" \ + -H "Content-Type: application/json" \ + --data "{ + \"type\":\"A\", + \"name\":\"$SUBDOMAIN\", + \"content\":\"$PUBLIC_IP\", + \"ttl\":1, + \"proxied\":true + }" | jq . +``` + +**Usage:** +```bash +chmod +x ~/bin/add-cloudflare-dns.sh +~/bin/add-cloudflare-dns.sh excalidraw # Creates excalidraw.htsn.io +``` + +#### 3. Testing + +```bash +# Check if DNS resolves +dig myservice.htsn.io + +# Test HTTP redirect +curl -I http://myservice.htsn.io + +# Test HTTPS +curl -I https://myservice.htsn.io + +# Check Traefik dashboard (if enabled) +# Access: http://10.10.10.250:8080/dashboard/ +``` + +#### Step 4. Update Documentation + +After deploying, update these files: + +1. **IP-ASSIGNMENTS.md** - Add to Services & Reverse Proxy Mapping table +2. **CLAUDE.md** - Add to "Services using Traefik-Primary" list (line ~495) + +### Quick Reference - One-Liner Commands + +```bash +# === DEPLOY SERVICE (example: myservice on docker-host port 8080) === + +# 1. Create Traefik config +ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << EOF +http: + routers: + myservice-secure: + entryPoints: [websecure] + rule: Host(\\\`myservice.htsn.io\\\`) + service: myservice + tls: {certResolver: letsencrypt} + services: + myservice: + loadBalancer: + servers: + - url: http://10.10.10.206:8080 +EOF"' + +# 2. Add Cloudflare DNS +curl -s -X POST "https://api.cloudflare.com/client/v4/zones/c0f5a80448c608af35d39aa820a5f3af/dns_records" \ + -H "X-Auth-Email: cloudflare@htsn.io" \ + -H "X-Auth-Key: 849ebefd163d2ccdec25e49b3e1b3fe2cdadc" \ + -H "Content-Type: application/json" \ + --data '{"type":"A","name":"myservice","content":"70.237.94.174","proxied":true}' + +# 3. Test (wait a few seconds for DNS propagation) +curl -I https://myservice.htsn.io +``` + +### Traefik Troubleshooting + +```bash +# View Traefik logs (CT 202) +ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log' + +# Check if config is valid +ssh pve 'pct exec 202 -- cat /etc/traefik/conf.d/myservice.yaml' + +# List all dynamic configs +ssh pve 'pct exec 202 -- ls -la /etc/traefik/conf.d/' + +# Check certificate +ssh pve 'pct exec 202 -- cat /etc/traefik/acme.json | jq' + +# Restart Traefik (if needed) +ssh pve 'pct exec 202 -- systemctl restart traefik' +``` + +### Certificate Management + +**Let's Encrypt certificates** are automatically managed by Traefik. + +**Certificate storage:** +- Traefik-Primary: `/etc/traefik/acme.json` on CT 202 +- Traefik-Saltbox: `/opt/traefik/acme.json` on VM 101 + +**Certificate renewal:** +- Automatic via HTTP-01 challenge +- Traefik checks every 24h +- Renews 30 days before expiry + +**If certificates fail:** +```bash +# Check acme.json permissions (must be 600) +ssh pve 'pct exec 202 -- ls -la /etc/traefik/acme.json' + +# Check Traefik can reach Let's Encrypt +ssh pve 'pct exec 202 -- curl -I https://acme-v02.api.letsencrypt.org/directory' + +# Delete bad certificate (Traefik will re-request) +ssh pve 'pct exec 202 -- rm /etc/traefik/acme.json' +ssh pve 'pct exec 202 -- touch /etc/traefik/acme.json' +ssh pve 'pct exec 202 -- chmod 600 /etc/traefik/acme.json' +ssh pve 'pct exec 202 -- systemctl restart traefik' +``` + +### Docker Service with Traefik Labels (Alternative) + +If deploying a service via Docker on `docker-host` (VM 206), you can use Traefik labels instead of config files: + +```yaml +# docker-compose.yml +services: + myservice: + image: myimage:latest + labels: + - "traefik.enable=true" + - "traefik.http.routers.myservice.rule=Host(`myservice.htsn.io`)" + - "traefik.http.routers.myservice.entrypoints=websecure" + - "traefik.http.routers.myservice.tls.certresolver=letsencrypt" + - "traefik.http.services.myservice.loadbalancer.server.port=8080" + networks: + - traefik + +networks: + traefik: + external: true +``` + +**Note**: This requires Traefik to have access to Docker socket and be on same network. + +## Cloudflare API Access + +**Credentials** (stored in Saltbox config): +- Email: `cloudflare@htsn.io` +- API Key: `849ebefd163d2ccdec25e49b3e1b3fe2cdadc` +- Domain: `htsn.io` + +**Retrieve from Saltbox:** +```bash +ssh pve 'qm guest exec 101 -- bash -c "cat /srv/git/saltbox/accounts.yml | grep -A2 cloudflare"' +``` + +**Cloudflare API Documentation:** +- API Docs: https://developers.cloudflare.com/api/ +- DNS Records: https://developers.cloudflare.com/api/operations/dns-records-for-a-zone-create-dns-record + +**Common API operations:** + +```bash +# Set credentials +CF_EMAIL="cloudflare@htsn.io" +CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc" +ZONE_ID="c0f5a80448c608af35d39aa820a5f3af" + +# List all DNS records +curl -X GET "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \ + -H "X-Auth-Email: $CF_EMAIL" \ + -H "X-Auth-Key: $CF_API_KEY" | jq + +# Add A record +curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \ + -H "X-Auth-Email: $CF_EMAIL" \ + -H "X-Auth-Key: $CF_API_KEY" \ + -H "Content-Type: application/json" \ + --data '{"type":"A","name":"subdomain","content":"IP","proxied":true}' + +# Delete record +curl -X DELETE "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" \ + -H "X-Auth-Email: $CF_EMAIL" \ + -H "X-Auth-Key: $CF_API_KEY" +``` + +## Related Documentation + +| File | Description | +|------|-------------| +| [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) | EMC storage enclosure (SES commands, LCC troubleshooting, maintenance) | +| [HOMEASSISTANT.md](HOMEASSISTANT.md) | Home Assistant API access, automations, integrations | +| [NETWORK.md](NETWORK.md) | Network bridges, VLANs, which bridge to use for new VMs | +| [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) | Complete IP address assignments for all devices and services | +| [SYNCTHING.md](SYNCTHING.md) | Syncthing setup, API access, device list, troubleshooting | +| [SHELL-ALIASES.md](SHELL-ALIASES.md) | ZSH aliases for Claude Code (`chomelab`, `ctrading`, etc.) | +| [configs/](configs/) | Symlinks to shared shell configs | + +--- + +## Backlog + +Future improvements and maintenance tasks: + +| Priority | Task | Notes | +|----------|------|-------| +| Medium | **Re-IP all devices** | Current IP scheme is inconsistent. Plan: VMs 10.10.10.100-199, LXCs 10.10.10.200-249, Services 10.10.10.250-254 | +| Low | Install SSH on HomeAssistant | Currently only accessible via QEMU agent | +| Low | Set up SSH key for router | Currently requires expect/password | + +--- + +## Changelog + +### 2024-12-20 + +**SSH Key Deployment - All Systems** +- Added SSH keys to ALL VMs and LXCs (13 total hosts now accessible via key) +- Updated `~/.ssh/config` with complete host aliases +- Fixed permissions: FindShyt LXC `.ssh` ownership, enabled PermitRootLogin on LXCs +- Hosts now accessible: pve, pve2, truenas, saltbox, lmdev1, docker-host, fs-dev, copyparty, gitea-vm, trading-vm, pihole, traefik, findshyt + +**Documentation Updates** +- Rewrote SSH Access section with complete host table +- Added Password Auth section for router/Windows/HomeAssistant +- Added Backlog section with re-IP task + +### 2024-12-19 + +**EMC Storage Enclosure - LCC B Failure** +- Diagnosed loud fan issue (speed code 5 β†’ 4160 RPM) +- Root cause: Faulty LCC B controller causing false readings +- Resolution: Switched SAS cable to LCC A, fans now quiet (speed code 3 β†’ 2670 RPM) +- Replacement ordered: EMC 303-108-000E ($14.95 eBay) +- Created [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) with full documentation + +**SSH Key Consolidation** +- Renamed `~/.ssh/ai_trading_ed25519` β†’ `~/.ssh/homelab` +- Updated `~/.ssh/config` on MacBook with all homelab hosts +- SSH key auth now works for: pve, pve2, docker-host, fs-dev, copyparty, lmdev1, gitea-vm, trading-vm +- No more sshpass needed for PVE servers + +**QEMU Guest Agent Deployment** +- Installed on: docker-host (206), fs-dev (105), copyparty (201) +- All PVE VMs now have agent except homeassistant (110) +- Can now use `qm guest exec` for remote commands + +**VM Configuration Updates** +- docker-host: Fixed SSH key in cloud-init +- fs-dev: Fixed `.ssh` directory ownership (1000 β†’ 1001) +- copyparty: Changed from DHCP to static IP (10.10.10.201) + +**Documentation Updates** +- Updated CLAUDE.md SSH section (removed sshpass examples) +- Added QEMU Agent column to VM tables +- Added storage enclosure troubleshooting to runbooks diff --git a/EMC-ENCLOSURE.md b/EMC-ENCLOSURE.md new file mode 100644 index 0000000..f5383a3 --- /dev/null +++ b/EMC-ENCLOSURE.md @@ -0,0 +1,247 @@ +# EMC Storage Enclosure Documentation + +## Hardware Overview + +| Component | Details | +|-----------|---------| +| **Model** | EMC ESES Viper DAE (KTN-STL3) | +| **Capacity** | 15x 3.5" SAS/SATA drive bays | +| **SES Device** | `/dev/sg15` (on TrueNAS) | +| **Connection** | SAS to LSI SAS2308 HBA (mpt2sas driver) | +| **Location** | Connected to PVE (10.10.10.120) via TrueNAS VM | + +## Components + +### LCC Controllers (Link Control Cards) +The enclosure has **dual LCC controllers** for redundancy: + +| Controller | Slot | Status | Notes | +|------------|------|--------|-------| +| **LCC A** | Left | Working | Currently in use | +| **LCC B** | Right | Faulty | Causes high fan speed, SAS discovery failure | + +**Replacement Part**: EMC 303-108-000E VIPER 6G SAS LCC (~$15 on eBay) + +### Power Supplies +Two redundant PSUs with integrated fans. + +### Fans +Multiple cooling fans controlled by enclosure firmware. Fan speeds are **automatically managed** based on temperature - manual override is not supported on EMC ESES enclosures. + +**Fan Speed Codes**: +| Code | Description | RPM (approx) | +|------|-------------|--------------| +| 1 | Lowest | ~1500 | +| 2 | Second lowest | ~2000 | +| 3 | Third lowest | ~2670 | +| 4 | Medium | ~3300 | +| 5 | Fifth | ~4160 | +| 6 | Sixth | ~4800 | +| 7 | Highest | ~5500+ | + +## ZFS Pool Using This Enclosure + +``` +Pool: vault +Size: 164TB raidz1 +Drives: 13x HDD in raidz1 + special mirror + NVMe cache/log +Mount: /mnt/vault on TrueNAS +``` + +## SES Commands Reference + +All commands run from TrueNAS (VM 100): + +```bash +# Check overall enclosure status +sg_ses -p 0x02 /dev/sg15 + +# Check fan speeds +sg_ses --index=coo,-1 --get=speed_code /dev/sg15 + +# Check temperatures +sg_ses -p 0x02 /dev/sg15 | grep -E "(Temperature|Cooling)" + +# Check PSU status +sg_ses -p 0x02 /dev/sg15 | grep -A5 "Power supply" + +# Check LCC controller status +sg_ses -p 0x02 /dev/sg15 | grep -A5 "Enclosure services controller" + +# List all SES elements +sg_ses -p 0x07 /dev/sg15 + +# Identify enclosure (flash LEDs) +sg_ses --index=enc,0 --set=ident:1 /dev/sg15 +``` + +### Running SES Commands via Proxmox + +```bash +# From Mac (via SSH key auth) +ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15"' + +# Quick fan check +ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"' + +# Quick temp check +ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep Temperature"' +``` + +## Troubleshooting + +### Symptom: Fans Running Loud (Speed 5+) + +**Possible Causes**: +1. **Faulty LCC controller** - Switch to other LCC +2. **High temperatures** - Check temp sensors +3. **PSU issue** - Check PSU status via SES +4. **Failed drive** - Check drive status LEDs + +**Diagnosis Steps**: +```bash +# 1. Check current fan speed +ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"' +# Normal: 1-3, High: 4-5, Critical: 6-7 + +# 2. Check temperatures +ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep Temperature"' +# Normal: 25-40C, Warning: 45-50C, Critical: 55C+ + +# 3. Check for component failures +ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep -i fail"' + +# 4. If no obvious cause, try switching LCC +# Power down enclosure, move SAS cable to other LCC port +``` + +### Symptom: Drives Not Detected After Enclosure Power Cycle + +**Possible Causes**: +1. Enclosure not fully initialized (wait for green LEDs to stop blinking) +2. Faulty LCC controller +3. SAS cable loose +4. HBA needs rescan + +**Diagnosis Steps**: +```bash +# 1. Check SAS link status +cat /sys/class/sas_phy/*/negotiated_linkrate + +# 2. Check for expanders (should show enclosure) +lsscsi -g | grep -i enclo + +# 3. Force HBA rescan +echo "- - -" > /sys/class/scsi_host/host0/scan + +# 4. If no expander, check SAS cable and try other LCC port +``` + +### Symptom: Pool Won't Import After Enclosure Maintenance + +```bash +# 1. Wait for enclosure to fully initialize (1-2 minutes) + +# 2. Rescan for devices +echo "- - -" > /sys/class/scsi_host/host0/scan + +# 3. Import pool +zpool import vault + +# 4. If read-only mount issues, reboot TrueNAS +ssh pve 'qm reboot 100' +``` + +## Maintenance Procedures + +### Safe Shutdown for Enclosure Maintenance + +```bash +# 1. Stop services using the pool +ssh pve 'qm guest exec 101 -- bash -c "docker stop \$(docker ps -q)"' + +# 2. Shutdown TrueNAS (auto-exports ZFS pool) +ssh pve 'qm shutdown 100 --timeout 120' + +# 3. Wait for TrueNAS to fully stop +ssh pve 'while qm status 100 | grep -q running; do sleep 5; done' + +# 4. Power off enclosure +# (Physical switch or PDU) + +# 5. Perform maintenance + +# 6. Power on enclosure, wait for initialization (green LEDs solid) + +# 7. Start TrueNAS +ssh pve 'qm start 100' + +# 8. Verify pool imported +ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"' +``` + +### Hot-Swap LCC Controller + +LCCs can be hot-swapped while enclosure is running: + +1. Order replacement LCC (EMC 303-108-000E) +2. Move SAS cable to working LCC (if not already) +3. Wait for drives to come online via new LCC +4. Remove faulty LCC +5. Install replacement LCC +6. Optionally move SAS cable back to original port + +## Incident Log + +### 2024-12-19: LCC B Failure + +**Symptoms**: +- Fans running at speed code 5 (~4160 RPM) - very loud +- After enclosure power cycle, drives not detected +- SAS link UP (4 PHYs at 6.0 Gbit) but no expander discovery + +**Root Cause**: +LCC B controller malfunction causing: +- False temperature/error readings β†’ high fan speed +- SAS expander not responding β†’ drives not enumerated + +**Resolution**: +1. Moved SAS cable from LCC B to LCC A +2. Drives immediately appeared +3. Fan speed dropped to code 3 (2670 RPM) - quiet +4. Imported vault pool, all data intact + +**Replacement Ordered**: +- Part: EMC 303-108-000E VIPER 6G SAS LCC +- Source: eBay +- Price: $14.95 + free shipping + +## LED Status Reference + +### Drive LEDs +| LED | Color | Status | +|-----|-------|--------| +| Solid Blue | Power | Drive has power | +| Blinking Blue | Activity | I/O in progress | +| Solid Amber | Fault | Drive failed | +| Blinking Amber | Identify | Drive being located | + +### LCC LEDs +| LED | Color | Status | +|-----|-------|--------| +| Solid Green | Link | SAS connection active | +| Blinking Green | Activity | Data transfer | +| Amber | Fault | LCC issue | + +### PSU LEDs +| LED | Color | Status | +|-----|-------|--------| +| Solid Green | OK | Power supply healthy | +| Off | No Power | No AC input | +| Amber | Fault | PSU failure | + +## Related Documentation + +- [CLAUDE.md](CLAUDE.md) - Main homelab documentation +- [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - Network configuration +- TrueNAS Web UI: https://10.10.10.200 diff --git a/HOMEASSISTANT.md b/HOMEASSISTANT.md new file mode 100644 index 0000000..43a8784 --- /dev/null +++ b/HOMEASSISTANT.md @@ -0,0 +1,145 @@ +# Home Assistant + +## Overview + +| Setting | Value | +|---------|-------| +| VM ID | 110 | +| Host | PVE (10.10.10.120) | +| IP Address | 10.10.10.210 (DHCP - should be static) | +| Port | 8123 | +| Web UI | http://10.10.10.210:8123 | +| OS | Home Assistant OS 16.3 | +| Version | 2025.11.3 (update available: 2025.12.3) | + +## API Access + +Home Assistant uses Long-Lived Access Tokens for API authentication. + +### Getting an API Token + +1. Go to http://10.10.10.210:8123 +2. Click your profile (bottom left) +3. Scroll to "Long-Lived Access Tokens" +4. Click "Create Token" +5. Name it (e.g., "Claude Code") +6. Copy the token (only shown once!) + +### API Configuration + +``` +API_URL: http://10.10.10.210:8123/api +API_TOKEN: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIwZThjZmJjMzVlNDA0NzYwOTMzMjg3MTQ5ZjkwOGU2NyIsImlhdCI6MTc2NTk5MjQ4OCwiZXhwIjoyMDgxMzUyNDg4fQ.r743tsb3E5NNlrwEEu9glkZdiI4j_3SKIT1n5PGUytY +``` + +### API Examples + +```bash +# Set these variables +HA_URL="http://10.10.10.210:8123" +HA_TOKEN="your-token-here" + +# Check API is working +curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/" + +# Get all states +curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq + +# Get specific entity state +curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states/light.living_room" | jq + +# Turn on a light +curl -X POST -H "Authorization: Bearer $HA_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"entity_id": "light.living_room"}' \ + "$HA_URL/api/services/light/turn_on" + +# Turn off a light +curl -X POST -H "Authorization: Bearer $HA_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"entity_id": "light.living_room"}' \ + "$HA_URL/api/services/light/turn_off" + +# Call any service +curl -X POST -H "Authorization: Bearer $HA_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"entity_id": "switch.my_switch"}' \ + "$HA_URL/api/services/switch/toggle" +``` + +## Common Tasks + +### List All Entities +```bash +curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '.[].entity_id' +``` + +### List Entities by Domain +```bash +# All lights +curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '[.[] | select(.entity_id | startswith("light."))]' + +# All switches +curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '[.[] | select(.entity_id | startswith("switch."))]' + +# All sensors +curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '[.[] | select(.entity_id | startswith("sensor."))]' +``` + +### Get Entity History +```bash +# Last 24 hours for an entity +curl -s -H "Authorization: Bearer $HA_TOKEN" \ + "$HA_URL/api/history/period?filter_entity_id=sensor.temperature" | jq +``` + +## Device Summary + +**265 total entities** + +| Domain | Count | Examples | +|--------|-------|----------| +| scene | 87 | Lighting scenes | +| light | 41 | Kitchen, Living room, Bedroom, Office, Cabinet, etc. | +| switch | 36 | Automations, Sonos controls, Motion sensors | +| sensor | 28 | Various sensors | +| number | 21 | Settings/controls | +| event | 17 | Event triggers | +| binary_sensor | 13 | Motion, door sensors | +| media_player | 8 | Sonos speakers (Bedroom, Living Room, Kitchen, Console) | + +### Lights by Room +- **Kitchen**: Kitchen light +- **Living Room**: Living room, Living Room Lamp, TV Bias +- **Bedroom**: Bedroom, Bedside Lamp 1 & 2, Dresser +- **Office**: Office, Office Floor Lamp, Office Lamp +- **Guest Room**: Guest Bed Left, Guest Lamp Right +- **Other**: Cabinet 1 & 2, Pantry, Bathroom, Front Porch, etc. + +### Sonos Speakers +- Bedroom (with surround) +- Living Room (with surround) +- Kitchen +- Console + +### Motion Sensors +- Kitchen Motion +- Office Sensor + +## Integrations + +- **Philips Hue** - Lights +- **Sonos** - Speakers +- **Motion Sensors** - Various locations + +## Automations + +TODO: Document key automations + +## TODO + +- [ ] Set static IP (currently DHCP at .210, should be .110) +- [ ] Add API token to this document +- [ ] Document installed integrations +- [ ] Document automations +- [ ] Set up Traefik reverse proxy (ha.htsn.io) diff --git a/INFRASTRUCTURE.md b/INFRASTRUCTURE.md new file mode 100644 index 0000000..b6a503d --- /dev/null +++ b/INFRASTRUCTURE.md @@ -0,0 +1,330 @@ +# Homelab Infrastructure Documentation + +## Network Topology + +``` + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ Internet β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ Router/Firewall β”‚ + β”‚ 10.10.10.1 β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ Main Switch β”‚ β”‚ Storage VLAN β”‚ β”‚ Tailscale β”‚ + β”‚ vmbr0/vmbr2 β”‚ β”‚ vmbr3 β”‚ β”‚ 100.x.x.x/8 β”‚ + β”‚ 10.10.10.0/24 β”‚ β”‚ (Jumbo 9000) β”‚ β”‚ β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ + β”‚ β”‚ β”‚ β”‚ + β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β” β”‚ + β”‚ PVE β”‚ β”‚ PVE2 β”‚ β”‚ Other β”‚ β”‚ + β”‚ .120 β”‚ β”‚ .102 β”‚ β”‚ Devicesβ”‚ β”‚ + β””β”€β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ + β”‚ β”‚ β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” + β”‚ TrueNAS β”‚ + β”‚ (Storage via β”‚ + β”‚ HBA/NVMe) β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +## IP Address Assignments + +### Management Network (10.10.10.0/24) + +| IP Address | Hostname | Description | +|------------|----------|-------------| +| 10.10.10.1 | router | Gateway/Firewall | +| 10.10.10.102 | pve2 | Proxmox Server 2 | +| 10.10.10.120 | pve | Proxmox Server 1 (Primary) | +| 10.10.10.123 | mac-mini | Mac Mini (Syncthing node) | +| 10.10.10.150 | windows-pc | Windows PC (Syncthing node) | +| 10.10.10.147 | macbook | MacBook Pro (Syncthing node) | +| 10.10.10.200 | truenas | TrueNAS (Storage/Syncthing hub) | +| 10.10.10.220 | gitea-vm | Git Server | +| 10.10.10.221 | trading-vm | AI Trading Platform | + +### Tailscale Network (100.x.x.x) + +| IP Address | Hostname | Description | +|------------|----------|-------------| +| 100.88.161.110 | macbook | MacBook | +| 100.106.175.37 | phone | Mobile Device | +| 100.108.89.58 | mac-mini | Mac Mini | + +--- + +## Server Hardware + +### PVE (10.10.10.120) - Primary Virtualization Host + +| Component | Specification | +|-----------|---------------| +| **CPU** | AMD Ryzen Threadripper PRO 3975WX (32C/64T, 280W TDP) | +| **RAM** | 128 GB DDR4 ECC | +| **Boot** | Samsung 870 QVO 4TB (mirrored) | +| **NVMe Pool 1** | 2x Sabrent Rocket Q NVMe (nvme-mirror1, 3.6TB) | +| **NVMe Pool 2** | 2x Kingston SFYRD 2TB (nvme-mirror2, 1.8TB) | +| **GPU 1** | NVIDIA Quadro P2000 (75W) - Plex transcoding | +| **GPU 2** | NVIDIA TITAN RTX (280W) - AI workloads | +| **HBA** | LSI SAS2308 - Passed to TrueNAS | +| **NVMe Controller** | Samsung PM9A1 - Passed to TrueNAS | + +### PVE2 (10.10.10.102) - Secondary Virtualization Host + +| Component | Specification | +|-----------|---------------| +| **CPU** | AMD Ryzen Threadripper PRO 3975WX (32C/64T, 280W TDP) | +| **RAM** | 128 GB DDR4 ECC | +| **NVMe Pool** | 2x NVMe (nvme-mirror3) | +| **HDD Pool** | 2x WD Red 6TB (local-zfs2, mirrored) | +| **GPU** | NVIDIA RTX A6000 (300W) - AI Trading | + +--- + +## Virtual Machines + +### PVE (10.10.10.120) + +| VMID | Name | vCPUs | RAM | Storage | Purpose | Passthrough | +|------|------|-------|-----|---------|---------|-------------| +| 100 | truenas | 8 | 32GB | rpool | NAS/Storage | LSI SAS2308 HBA, Samsung NVMe | +| 101 | saltbox | 16 | 16GB | rpool/nvme-mirror1/2 | Media automation | TITAN RTX | +| 105 | fs-dev | 10 | 8GB | nvme-mirror1 | Development | - | +| 110 | homeassistant | 2 | 2GB | nvme-mirror2 | Home automation | - | +| 111 | lmdev1 | 8 | 32GB | nvme-mirror1 | AI/LLM development | TITAN RTX | +| 201 | copyparty | 2 | 2GB | nvme-mirror1 | File sharing | - | +| 206 | docker-host | 2 | 4GB | rpool | Docker services | - | + +### PVE2 (10.10.10.102) + +| VMID | Name | vCPUs | RAM | Storage | Purpose | Passthrough | +|------|------|-------|-----|---------|---------|-------------| +| 300 | gitea-vm | 2 | 4GB | nvme-mirror3 | Git server | - | +| 301 | trading-vm | 16 | 32GB | nvme-mirror3 | AI trading platform | RTX A6000 | + +--- + +## LXC Containers + +### PVE (10.10.10.120) + +| VMID | Name | Purpose | Status | +|------|------|---------|--------| +| 200 | pihole | DNS/Ad blocking | Running | +| 202 | traefik | Reverse proxy | Running | +| 205 | findshyt | Custom application | Running | +| 500 | dev1 | Development | Stopped | + +--- + +## Storage Architecture + +``` +PVE (10.10.10.120) +β”œβ”€β”€ rpool (Samsung 870 QVO 4TB mirror) +β”‚ β”œβ”€β”€ Proxmox system +β”‚ β”œβ”€β”€ VM 100 (truenas) boot +β”‚ β”œβ”€β”€ VM 101 (saltbox) boot +β”‚ └── VM 206 (docker-host) +β”‚ +β”œβ”€β”€ nvme-mirror1 (Sabrent Rocket Q mirror, 3.6TB) +β”‚ β”œβ”€β”€ VM 101 (saltbox) data +β”‚ β”œβ”€β”€ VM 105 (fs-dev) +β”‚ β”œβ”€β”€ VM 111 (lmdev1) +β”‚ └── VM 201 (copyparty) +β”‚ +└── nvme-mirror2 (Kingston SFYRD mirror, 1.8TB) + β”œβ”€β”€ VM 101 (saltbox) data + └── VM 110 (homeassistant) + +PVE2 (10.10.10.102) +β”œβ”€β”€ nvme-mirror3 (NVMe mirror) +β”‚ β”œβ”€β”€ VM 300 (gitea-vm) +β”‚ └── VM 301 (trading-vm) +β”‚ +└── local-zfs2 (WD Red 6TB mirror) + └── Backup/archive storage + +TrueNAS (VM 100 on PVE) +β”œβ”€β”€ HBA Passthrough (LSI SAS2308) +β”‚ └── [Physical drives managed by TrueNAS] +β”‚ +└── NVMe Passthrough (Samsung PM9A1) + └── [NVMe drives managed by TrueNAS] +``` + +--- + +## Services Map + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ EXTERNAL ACCESS β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ Tailscale VPN ──► All services accessible via 100.x.x.x β”‚ +β”‚ Traefik (CT 202) ──► Reverse proxy for web services β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ CORE SERVICES β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ PiHole (CT 200) ──► DNS + Ad blocking β”‚ +β”‚ TrueNAS (VM 100) ──► NAS, Syncthing, Storage β”‚ +β”‚ Gitea (VM 300) ──► Git repository hosting β”‚ +β”‚ Home Assistant (VM 110) ──► Home automation β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ MEDIA SERVICES β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ Saltbox (VM 101) ──► Plex, *arr stack, media automation β”‚ +β”‚ CopyParty (VM 201) ──► File sharing β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ DEVELOPMENT/AI β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ Trading VM (VM 301) ──► AI trading platform (RTX A6000) β”‚ +β”‚ LMDev1 (VM 111) ──► LLM development (TITAN RTX) β”‚ +β”‚ FS-Dev (VM 105) ──► General development β”‚ +β”‚ Docker Host (VM 206) ──► Containerized services β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## Syncthing Topology + +``` + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ TrueNAS β”‚ + β”‚ (Hub/Server) β”‚ + β”‚ Port 20910 β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ β”‚ + β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” + β”‚ MacBook β”‚ β”‚ Mac Miniβ”‚ β”‚ Windows β”‚ + β”‚ .147 β”‚ β”‚ .123 β”‚ β”‚ PC .150 β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + +Synced Folders: +β”œβ”€β”€ antigravity (310MB) +β”œβ”€β”€ bin (23KB) +β”œβ”€β”€ claude-code (257MB) +β”œβ”€β”€ claude-desktop (784MB) +β”œβ”€β”€ config (436KB) +β”œβ”€β”€ cursor (459MB) +β”œβ”€β”€ desktop (7.2GB) +β”œβ”€β”€ documents (11GB) +β”œβ”€β”€ dotconfig (212MB) +β”œβ”€β”€ downloads (38GB) +β”œβ”€β”€ movies (334MB) +β”œβ”€β”€ music (606KB) +β”œβ”€β”€ notes (73KB) +β”œβ”€β”€ pictures (259MB) +└── projects (3.1GB) +``` + +--- + +## Power Consumption + +### Estimated Power Draw + +| Component | Idle | Load | Notes | +|-----------|------|------|-------| +| **PVE CPU** | 50W | 280W | TR PRO 3975WX | +| **PVE2 CPU** | 50W | 280W | TR PRO 3975WX | +| **TITAN RTX** | 20W | 280W | Passthrough to saltbox/lmdev1 | +| **RTX A6000** | 25W | 300W | Passthrough to trading-vm | +| **Quadro P2000** | 10W | 75W | Plex transcoding | +| **Storage (per server)** | 30W | 50W | NVMe + SSD mirrors | +| **Base system (each)** | 50W | 60W | Motherboard, RAM, fans | + +### Total Estimates +- **Idle**: ~400-500W combined +- **Moderate load**: ~700-900W combined +- **Full load**: ~1200-1400W combined + +### Power Optimizations Applied +1. KSMD disabled on both hosts (saved ~10W) +2. Syncthing rescan intervals increased (saved ~60-80W from TrueNAS CPU) +3. CPU governor optimization (saved ~60-120W) + - PVE: `powersave` + `balance_power` EPP (amd-pstate-epp) + - PVE2: `schedutil` (acpi-cpufreq) +4. ksmtuned service disabled on both hosts (saved ~2-5W) +5. HDD spindown on PVE2 - 30 min timeout (saved ~10-16W) + - local-zfs2 pool (2x WD Red 6TB) essentially empty + +**Total estimated savings: ~142-231W** + +--- + +## SSH Access + +### Credentials + +| Host | IP Address | Username | Password | Notes | +|------|------------|----------|----------|-------| +| Hutson-PC | 10.10.10.150 | claude | GrilledCh33s3# | Windows PC | +| MacBook | 10.10.10.147 | hutson | GrilledCh33s3# | MacBook Pro | +| TrueNAS | 10.10.10.200 | truenas_admin | GrilledCh33s3# | SSH key configured | + +### SSH Keys + +The Mac Mini has an SSH key configured at `~/.ssh/id_ed25519` for passwordless authentication to Proxmox hosts and other infrastructure. + +For Proxmox servers (PVE and PVE2), SSH access is configured in `~/.ssh/config`: +``` +Host pve + HostName 10.10.10.120 + User root + IdentityFile ~/.ssh/ai_trading_ed25519 + +Host pve2 + HostName 10.10.10.102 + User root + IdentityFile ~/.ssh/ai_trading_ed25519 +``` + +--- + +## Credentials Management + +Sensitive credentials are stored in `/Users/hutson/Projects/homelab/.env` for use with infrastructure management scripts and automation. + +This file contains: +- Service passwords +- API keys +- Database credentials +- Other sensitive configuration values + +**Note**: The `.env` file is git-ignored and should never be committed to version control. + +--- + +## Configuration Backups + +Configuration files are backed up in `/Users/hutson/Projects/homelab/configs/` directory. + +### Current Backups + +| File | Description | +|------|-------------| +| ghostty.conf | Ghostty terminal emulator configuration | + +This directory serves as a centralized location for storing configuration backups from various systems and applications in the homelab environment. diff --git a/IP-ASSIGNMENTS.md b/IP-ASSIGNMENTS.md new file mode 100644 index 0000000..1ff91e7 --- /dev/null +++ b/IP-ASSIGNMENTS.md @@ -0,0 +1,139 @@ +# IP Address Assignments + +This document tracks all IP addresses in the homelab infrastructure. + +## Network Overview + +| Network | Range | Purpose | +|---------|-------|---------| +| Management VLAN | 10.10.10.0/24 | Primary network for all devices | +| Storage VLAN | 10.10.20.0/24 | NFS/iSCSI storage traffic | +| Tailscale | 100.x.x.x | VPN overlay network | + +## Infrastructure Devices + +| IP Address | Device | Type | Notes | +|------------|--------|------|-------| +| 10.10.10.1 | UniFi UCG-Fiber | Router | Gateway for all traffic | +| 10.10.10.120 | PVE | Proxmox Host | Primary server (Threadripper PRO 3975WX) | +| 10.10.10.102 | PVE2 | Proxmox Host | Secondary server (Threadripper PRO 3975WX) | + +## Virtual Machines - PVE (10.10.10.120) + +| VMID | Name | IP Address | Purpose | Status | +|------|------|------------|---------|--------| +| 100 | truenas | 10.10.10.200 | NAS, central Syncthing hub | Running | +| 101 | saltbox | 10.10.10.100 | Media automation, Plex, *arr apps | Running | +| 105 | fs-dev | 10.10.10.5 | Development environment | Running | +| 110 | homeassistant | 10.10.10.110 | Home automation | Running | +| 111 | lmdev1 | 10.10.10.111 | AI/LLM development (TITAN RTX) | Running | +| 201 | copyparty | 10.10.10.201 | File sharing | Running | +| 206 | docker-host | 10.10.10.206 | Docker services (Excalidraw, etc.) | Running | + +## Containers (LXC) - PVE (10.10.10.120) + +| CTID | Name | IP Address | Purpose | Status | +|------|------|------------|---------|--------| +| 200 | pihole | 10.10.10.10 | DNS/Ad blocking | Running | +| 202 | traefik | 10.10.10.250 | Reverse proxy (Traefik-Primary) | Running | +| 205 | findshyt | 10.10.10.8 | Custom app | Running | +| 500 | dev1 | DHCP | Development container | Stopped | + +## Virtual Machines - PVE2 (10.10.10.102) + +| VMID | Name | IP Address | Purpose | Status | +|------|------|------------|---------|--------| +| 300 | gitea-vm | 10.10.10.220 | Git server | Running | +| 301 | trading-vm | 10.10.10.221 | AI trading platform (RTX A6000) | Running | + +## Workstations & Personal Devices + +| IP Address | Tailscale IP | Device | User | Notes | +|------------|--------------|--------|------|-------| +| 10.10.10.147 | 100.88.161.1 | MacBook Pro | hutson | Laptop | +| 10.10.10.148 | 100.108.89.58 | Mac Mini | hutson | Persistent Claude sessions | +| 10.10.10.150 | 100.120.97.76 | Hutson-PC (Windows) | claude/micro | Windows workstation | +| 10.10.10.54 | - | Android Phone | hutson | Syncthing mobile | + +## Services & Reverse Proxy Mapping + +| Service | Domain | Backend IP:Port | Traefik Instance | +|---------|--------|-----------------|------------------| +| Traefik-Primary | - | 10.10.10.250 | Self (CT 202) | +| Traefik-Saltbox | - | 10.10.10.100 | Self (VM 101) | +| FindShyt | findshyt.htsn.io | 10.10.10.8:3000 | Traefik-Primary | +| Gitea | git.htsn.io | 10.10.10.220:3000 | Traefik-Primary | +| Home Assistant | ha.htsn.io | 10.10.10.110:8123 | Traefik-Primary | +| TrueNAS | nas.htsn.io | 10.10.10.200 | Traefik-Primary | +| Proxmox | pve.htsn.io | 10.10.10.120:8006 | Traefik-Primary | +| CopyParty | cp.htsn.io | 10.10.10.201:3923 | Traefik-Primary | +| LMDev | lmdev.htsn.io | 10.10.10.111 | Traefik-Primary | +| Excalidraw | excalidraw.htsn.io | 10.10.10.206:8080 | Traefik-Primary | +| Plex | plex.htsn.io | 10.10.10.100:32400 | Traefik-Saltbox | +| Sonarr | sonarr.htsn.io | 10.10.10.100:8989 | Traefik-Saltbox | +| Radarr | radarr.htsn.io | 10.10.10.100:7878 | Traefik-Saltbox | + +## Reserved/Available IPs + +### Currently Used (10.10.10.x) +- .1 - Router (gateway) +- .5 - fs-dev +- .8 - FindShyt +- .10 - PiHole (DNS) +- .54 - Android Phone +- .100 - Saltbox (Traefik-Saltbox) +- .102 - PVE2 +- .110 - Home Assistant +- .111 - LMDev1 +- .120 - PVE +- .147 - MacBook Pro +- .148 - Mac Mini +- .150 - Windows PC +- .200 - TrueNAS +- .201 - CopyParty +- .206 - Docker-host +- .220 - Gitea +- .221 - Trading VM +- .250 - Traefik-Primary + +### Available Ranges +- 10.10.10.2 - 10.10.10.4 (3 IPs) +- 10.10.10.6 - 10.10.10.7 (2 IPs) +- 10.10.10.9 (1 IP) +- 10.10.10.11 - 10.10.10.53 (43 IPs) +- 10.10.10.55 - 10.10.10.99 (45 IPs) +- 10.10.10.101 (1 IP) +- 10.10.10.103 - 10.10.10.109 (7 IPs) +- 10.10.10.112 - 10.10.10.119 (8 IPs) +- 10.10.10.121 - 10.10.10.146 (26 IPs) +- 10.10.10.149 (1 IP) +- 10.10.10.151 - 10.10.10.199 (49 IPs) +- 10.10.10.202 - 10.10.10.205 (4 IPs) +- 10.10.10.207 - 10.10.10.219 (13 IPs) +- 10.10.10.222 - 10.10.10.249 (28 IPs) +- 10.10.10.251 - 10.10.10.254 (4 IPs) + +## Docker Host Services (10.10.10.206) + +| Service | Port | Purpose | +|---------|------|---------| +| Excalidraw | 8080 | Whiteboard/diagramming (excalidraw.htsn.io) | +| Portainer CE | 9000, 9443 | Local Docker management UI | +| Portainer Agent | 9001 | Remote management from other Portainer | +| Gotenberg | 3000 | PDF generation API | + +## Syncthing API Endpoints + +| Device | IP | Port | API Key | +|--------|-----|------|---------| +| Mac Mini | 127.0.0.1 | 8384 | oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5 | +| MacBook | 127.0.0.1 (via SSH) | 8384 | qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ | +| Android Phone | 10.10.10.54 | 8384 | Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM | +| TrueNAS | 10.10.10.200 | 8384 | (check TrueNAS config) | + +## Notes + +- **MTU 9000** (jumbo frames) enabled on storage networks +- **Tailscale** provides VPN access from anywhere +- **DNS** handled by PiHole at 10.10.10.10 +- All new services should use **Traefik-Primary (10.10.10.250)** unless they're Saltbox services diff --git a/NETWORK.md b/NETWORK.md new file mode 100644 index 0000000..8a9fad3 --- /dev/null +++ b/NETWORK.md @@ -0,0 +1,226 @@ +# Network Architecture + +## Network Ranges + +| Network | Range | Purpose | Gateway | +|---------|-------|---------|---------| +| LAN | 10.10.10.0/24 | Primary network, management, general access | 10.10.10.1 (UniFi Router) | +| Storage/Internal | 10.10.20.0/24 | Inter-VM traffic, NFS/iSCSI, no external access | 10.10.20.1 (vmbr3) | +| Tailscale | 100.x.x.x | VPN overlay for remote access | N/A | + +## PVE (10.10.10.120) - Network Bridges + +### Physical NICs + +| Interface | Speed | Type | MAC Address | Connected To | +|-----------|-------|------|-------------|--------------| +| enp1s0 | 1 Gbps | Onboard NIC | e0:4f:43:e6:41:6c | Switch β†’ UniFi eth5 | +| enp35s0f0 | 10 Gbps | Intel X550 Port 0 | b4:96:91:39:86:98 | Switch β†’ UniFi eth5 | +| enp35s0f1 | 10 Gbps | Intel X550 Port 1 | b4:96:91:39:86:99 | Switch β†’ UniFi eth5 | + +**Note:** All three NICs connect through a switch to the UniFi Gateway's 10Gb SFP+ port (eth5). No direct firewall connection. + +### Bridge Configuration + +#### vmbr0 - Management Bridge (1Gb) +- **Physical NIC**: enp1s0 (1 Gbps onboard) +- **IP**: 10.10.10.120/24 +- **Gateway**: 10.10.10.1 +- **MTU**: 9000 +- **Purpose**: General VM/CT networking, management access +- **Use for**: Most VMs and containers that need basic internet access + +**VMs/CTs on vmbr0:** +| VMID | Name | IP | +|------|------|-----| +| 105 | fs-dev | 10.10.10.5 | +| 110 | homeassistant | 10.10.10.110 | +| 201 | copyparty | DHCP | +| 206 | docker-host | 10.10.10.206 | +| 200 | pihole (CT) | 10.10.10.10 | +| 205 | findshyt (CT) | 10.10.10.205 | + +--- + +#### vmbr1 - High-Speed LXC Bridge (10Gb) +- **Physical NIC**: enp35s0f0 (10 Gbps Intel X550) +- **IP**: 10.10.10.121/24 +- **Gateway**: 10.10.10.1 +- **MTU**: 9000 +- **Purpose**: High-bandwidth LXC containers and VMs +- **Use for**: Containers/VMs that need high throughput to network + +**VMs/CTs on vmbr1:** +| VMID | Name | IP | +|------|------|-----| +| 111 | lmdev1 | 10.10.10.111 | + +--- + +#### vmbr2 - High-Speed VM Bridge (10Gb) +- **Physical NIC**: enp35s0f1 (10 Gbps Intel X550) +- **IP**: 10.10.10.122/24 +- **Gateway**: (none configured) +- **MTU**: 9000 +- **Purpose**: High-bandwidth VMs, storage traffic +- **Use for**: VMs that need high throughput (TrueNAS, Saltbox) + +**VMs/CTs on vmbr2:** +| VMID | Name | IP | +|------|------|-----| +| 100 | truenas | 10.10.10.200 | +| 101 | saltbox | 10.10.10.100 | +| 202 | traefik (CT) | 10.10.10.250 | + +--- + +#### vmbr3 - Internal-Only Bridge (Virtual) +- **Physical NIC**: None (isolated virtual network) +- **IP**: 10.10.20.1/24 +- **Gateway**: N/A (no external routing) +- **MTU**: 9000 +- **Purpose**: Inter-VM communication without external access +- **Use for**: Storage traffic (NFS/iSCSI), internal APIs, secure VM-to-VM + +**VMs with secondary interface on vmbr3:** +| VMID | Name | Internal IP | Notes | +|------|------|-------------|-------| +| 100 | truenas | (check TrueNAS config) | NFS/iSCSI server | +| 101 | saltbox | (check VM config) | Media storage access | +| 111 | lmdev1 | (check VM config) | AI model storage | +| 201 | copyparty | 10.10.20.201 | Confirmed via cloud-init | + +--- + +## PVE2 (10.10.10.102) - Network Bridges + +### Physical NICs + +| Interface | Speed | Type | MAC Address | Connected To | +|-----------|-------|------|-------------|--------------| +| nic0 | Unknown | Unused | e0:4f:43:e6:1b:e3 | Not connected | +| nic1 | 10 Gbps | Primary NIC | a0:36:9f:26:b9:bc | **Direct to UCG-Fiber (10Gb negotiated)** | + +**Note:** PVE2 connects directly to the UCG-Fiber. Link negotiates at 10Gb. + +### Bridge Configuration + +#### vmbr0 - Single Bridge (10Gb) +- **Physical NIC**: nic1 (10 Gbps) +- **IP**: 10.10.10.102/24 +- **Gateway**: 10.10.10.1 +- **Purpose**: All VMs on PVE2 + +**VMs on vmbr0:** +| VMID | Name | IP | +|------|------|-----| +| 300 | gitea-vm | 10.10.10.220 | +| 301 | trading-vm | 10.10.10.221 | + +--- + +## Which Bridge to Use? + +| Scenario | Bridge | Reason | +|----------|--------|--------| +| General VM/CT | vmbr0 | Standard networking, 1Gb is sufficient | +| High-bandwidth VM (media, AI) | vmbr1 or vmbr2 | 10Gb for large file transfers | +| Storage-heavy VM (NAS access) | vmbr2 + vmbr3 | 10Gb external + internal storage network | +| Isolated internal service | vmbr3 only | No external access, secure | +| VM needing both external + internal | vmbr0/1/2 + vmbr3 | Dual-homed configuration | + +## Traffic Flow + +``` +Internet + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ UCG-Fiber (10.10.10.1) β”‚ +β”‚ β”‚ +β”‚ eth5 (10Gb SFP+) switch0 (eth0-eth4, 10Gb) β”‚ +β”‚ β”‚ β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ + β–Ό β”‚ +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ 10Gb Switch β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ + β”‚ β”‚ β”‚ β”‚ + β”‚ β”‚ β”‚ β”‚ + β–Ό β–Ό β–Ό β–Ό + enp1s0 enp35s0f0 enp35s0f1 nic1 + (1Gb) (10Gb) (10Gb) (10Gb) + β”‚ β”‚ β”‚ β”‚ + β–Ό β–Ό β–Ό β–Ό + vmbr0 vmbr1 vmbr2 vmbr0 + β”‚ β”‚ β”‚ β”‚ + β”‚ β”‚ β”‚ β”‚ + PVE PVE PVE PVE2 + General lmdev1 TrueNAS, gitea-vm, + VMs Saltbox, trading-vm + Traefik + +Internal Only (no external access): +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ vmbr3 (10.10.20.0/24) - Virtual β”‚ +β”‚ No physical NIC - inter-VM only β”‚ +β”‚ β”‚ +β”‚ TrueNAS ◄──► Saltbox β”‚ +β”‚ β–² β–² β”‚ +β”‚ β”‚ β”‚ β”‚ +β”‚ └─── lmdev1 β”˜ β”‚ +β”‚ β–² β”‚ +β”‚ β”‚ β”‚ +β”‚ copyparty β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +## Determining Physical Connections + +To determine which 10Gb port goes where, check: +1. **Physical cable tracing** - Follow cables from server to switch/firewall +2. **Switch port status** - Check UniFi controller for connected ports +3. **MAC addresses** - Compare `ip link show` MACs with switch ARP table + +```bash +# On PVE - get MAC addresses +ip link show enp35s0f0 | grep ether +ip link show enp35s0f1 | grep ether + +# On router - check ARP +ssh root@10.10.10.1 'cat /proc/net/arp' +``` + +## Adding a New VM to a Specific Network + +```bash +# Add VM to vmbr0 (standard) +qm set VMID --net0 virtio,bridge=vmbr0 + +# Add VM to vmbr2 (10Gb) +qm set VMID --net0 virtio,bridge=vmbr2 + +# Add second NIC for internal network +qm set VMID --net1 virtio,bridge=vmbr3 + +# For containers +pct set CTID --net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1 +``` + +## MTU Configuration + +All bridges use **MTU 9000** (jumbo frames) for optimal storage performance. + +If adding a new VM that will access NFS/iSCSI storage, ensure the guest OS also uses MTU 9000: +```bash +# Linux guest +ip link set eth0 mtu 9000 + +# Permanent (netplan) +# /etc/netplan/00-installer-config.yaml +network: + ethernets: + eth0: + mtu: 9000 +``` diff --git a/SHELL-ALIASES.md b/SHELL-ALIASES.md new file mode 100644 index 0000000..61d045c --- /dev/null +++ b/SHELL-ALIASES.md @@ -0,0 +1,147 @@ +# Shell Aliases & Shortcuts + +## Overview +ZSH aliases for quickly launching Claude Code in project directories with `--dangerously-skip-permissions` enabled. Aliases sync across devices via Syncthing. + +## Setup + +### File Locations +``` +~/.config/shell/shared.zsh # Main shared config (sourced by .zshrc) +~/.config/shell/claude-aliases.zsh # Claude Code aliases +~/Projects/homelab/configs/ # Symlinks for reference +``` + +### Installation +Add to `~/.zshrc`: +```bash +source ~/.config/shell/shared.zsh +``` + +## Claude Code Aliases + +### Quick Start (--continue) +Continue the most recent session in each project: + +| Alias | Directory | Command | +|-------|-----------|---------| +| `chomelab` | ~/Projects/homelab | `claude --dangerously-skip-permissions --continue` | +| `ctrading` | ~/Projects/ai-trading-platform | `claude --dangerously-skip-permissions --continue` | +| `cnotes` | ~/Notes | `claude --dangerously-skip-permissions --continue --ide` | +| `chome` | ~ | `claude --dangerously-skip-permissions --continue` | +| `cfindshyt` | ~/Desktop/findshyt-working-folder | `claude --dangerously-skip-permissions --continue` | +| `ciconik` | ~/Projects/iconik-uploader | `claude --dangerously-skip-permissions --continue` | +| `cghostty` | ~/.config/ghostty | `claude --dangerously-skip-permissions --continue` | +| `cprojects` | ~/Projects | `claude --dangerously-skip-permissions --continue` | +| `cclaudeui` | ~/Projects/claude-ui | `claude --dangerously-skip-permissions --continue` | +| `clucid` | ~/Projects/lucidlink-upgrade | `claude --dangerously-skip-permissions --continue` | +| `cbeeper` | ~/Projects/beeper | `claude --dangerously-skip-permissions --continue` | + +### Resume (--resume) +Show list of sessions to pick from: + +| Alias | Directory | +|-------|-----------| +| `chomelab-r` | ~/Projects/homelab | +| `ctrading-r` | ~/Projects/ai-trading-platform | +| `cnotes-r` | ~/Notes | +| `chome-r` | ~ | +| `ciconik-r` | ~/Projects/iconik-uploader | +| `cbeeper-r` | ~/Projects/beeper | + +### Fresh Start (no flags) +Start a new session without resuming: + +| Alias | Directory | +|-------|-----------| +| `chomelab-new` | ~/Projects/homelab | +| `ctrading-new` | ~/Projects/ai-trading-platform | +| `cnotes-new` | ~/Notes | +| `chome-new` | ~ | + +## Usage Examples + +```bash +# Continue homelab session +chomelab + +# Pick from recent homelab sessions +chomelab-r + +# Start fresh homelab session +chomelab-new + +# Quick AI trading work +ctrading +``` + +## Adding New Aliases + +Edit `~/.config/shell/claude-aliases.zsh`: + +```bash +# Template for new project +alias cproject='cd ~/Projects/new-project && claude --dangerously-skip-permissions --continue' +alias cproject-r='cd ~/Projects/new-project && claude --dangerously-skip-permissions --resume' +alias cproject-new='cd ~/Projects/new-project && claude --dangerously-skip-permissions' +``` + +Changes sync automatically to all devices via Syncthing (~/.config folder). + +## Enterprise/Work Aliases (claude-gateway) + +Use `ec` prefix for work Claude account via `claude-gateway`: + +### Quick Start (--continue) +| Alias | Directory | +|-------|-----------| +| `echomelab` | ~/Projects/homelab | +| `ectrading` | ~/Projects/ai-trading-platform | +| `ecnotes` | ~/Notes | +| `echome` | ~ | +| `ecfindshyt` | ~/Desktop/findshyt-working-folder | +| `eciconik` | ~/Projects/iconik-uploader | +| `ecghostty` | ~/.config/ghostty | +| `ecprojects` | ~/Projects | +| `ecclaudeui` | ~/Projects/claude-ui | +| `eclucid` | ~/Projects/lucidlink-upgrade | +| `ecbeeper` | ~/Projects/beeper | + +### Resume & Fresh +- Resume: `echomelab-r`, `ectrading-r`, `ecnotes-r`, `echome-r`, `eciconik-r`, `ecbeeper-r` +- Fresh: `echomelab-new`, `ectrading-new`, `ecnotes-new`, `echome-new` + +## Full Alias File + +Located at: `~/.config/shell/claude-aliases.zsh` + +```bash +# Claude Code Project Aliases + +# Main projects +alias chome='cd ~ && claude --dangerously-skip-permissions --continue' +alias ctrading='cd ~/Projects/ai-trading-platform && claude --dangerously-skip-permissions --continue' +alias ciconik='cd ~/Projects/iconik-uploader && claude --dangerously-skip-permissions --continue' +alias cnotes='cd ~/Notes && claude --dangerously-skip-permissions --continue --ide' +alias chomelab='cd ~/Projects/homelab && claude --dangerously-skip-permissions --continue' +alias cfindshyt='cd ~/Desktop/findshyt-working-folder && claude --dangerously-skip-permissions --continue' +alias cghostty='cd ~/.config/ghostty && claude --dangerously-skip-permissions --continue' +alias cprojects='cd ~/Projects && claude --dangerously-skip-permissions --continue' +alias cclaudeui='cd ~/projects/claude-ui && claude --dangerously-skip-permissions --continue' +alias clucid='cd ~/Projects/lucidlink-upgrade && claude --dangerously-skip-permissions --continue' +alias cbeeper='cd ~/Projects/beeper && claude --dangerously-skip-permissions --continue' + +# Resume variants +alias chome-r='cd ~ && claude --dangerously-skip-permissions --resume' +alias ctrading-r='cd ~/Projects/ai-trading-platform && claude --dangerously-skip-permissions --resume' +alias ciconik-r='cd ~/Projects/iconik-uploader && claude --dangerously-skip-permissions --resume' +alias cnotes-r='cd ~/Notes && claude --dangerously-skip-permissions --resume --ide' +alias chomelab-r='cd ~/Projects/homelab && claude --dangerously-skip-permissions --resume' +alias cbeeper-r='cd ~/Projects/beeper && claude --dangerously-skip-permissions --resume' + +# Fresh start +alias chome-new='cd ~ && claude --dangerously-skip-permissions' +alias ctrading-new='cd ~/Projects/ai-trading-platform && claude --dangerously-skip-permissions' +alias cnotes-new='cd ~/Notes && claude --dangerously-skip-permissions --ide' +alias chomelab-new='cd ~/Projects/homelab && claude --dangerously-skip-permissions' +``` diff --git a/SYNCTHING.md b/SYNCTHING.md new file mode 100644 index 0000000..df531ea --- /dev/null +++ b/SYNCTHING.md @@ -0,0 +1,166 @@ +# Syncthing Setup + +## Overview +Syncthing provides real-time file synchronization across all devices. Files sync automatically when devices connect. + +## Devices + +| Device | ID Prefix | Local IP | Tailscale IP | Port | Role | +|--------|-----------|----------|--------------|------|------| +| Mac Mini | L3PJR73 | 10.10.10.123 | 100.108.89.58 | 22000 | Primary workstation | +| MacBook Pro | 3TFMYEI | 10.10.10.147 | 100.88.161.1 | 22000 | Laptop | +| TrueNAS | TPO72EY | 10.10.10.200 | 100.100.94.71 | 20978 | Storage server (central hub) | +| Windows PC | YDCPUQK | 10.10.10.150 | 100.120.97.76 | 22000 | Windows workstation | +| Phone (Android) | XLMZCCH | 10.10.10.54 | 100.106.175.37 | 22000 | Android, Notes only, HTTPS API | + +## Network Configuration + +**IPv4 Only** - All devices configured with explicit IPv4 addresses (no dynamic/IPv6): +- Local network: `10.10.10.0/24` +- Tailscale network: `100.x.x.x` + +Device address format: `tcp4://IP:PORT` (e.g., `tcp4://10.10.10.123:22000`) + +## Synced Folders + +| Folder | Path | Devices | Notes | +|--------|------|---------|-------| +| Downloads | ~/Downloads | Mac Mini, MacBook, TrueNAS, Windows | Large folder, 3600s rescan | +| Notes | ~/Notes | Mac Mini, MacBook, TrueNAS | Documentation | +| Projects | ~/Projects | Mac Mini, MacBook, TrueNAS | Code repositories | +| bin | ~/bin | Mac Mini, MacBook, TrueNAS | Scripts and tools | +| Documents | ~/Documents | Mac Mini, MacBook, TrueNAS | Personal documents | +| Desktop | ~/Desktop | Mac Mini, MacBook, TrueNAS | Desktop files | +| config | ~/.config | Mac Mini, MacBook | Shell configs, app settings | +| Antigravity | ~/.gemini | Mac Mini, MacBook, TrueNAS | Gemini config | + +## API Access + +### Mac Mini +```bash +API_KEY="oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" +curl -s "http://127.0.0.1:8384/rest/system/status" -H "X-API-Key: $API_KEY" +``` + +### MacBook Pro +```bash +API_KEY="qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ" +curl -s "http://127.0.0.1:8384/rest/system/status" -H "X-API-Key: $API_KEY" +``` + +### Windows PC +```bash +API_KEY="KPHGteJv6APPE7zFun33b3qM3Vn5KSA7" +curl -s "http://10.10.10.150:8384/rest/system/status" -H "X-API-Key: $API_KEY" +``` + +### Phone (Android) - Uses HTTPS +```bash +API_KEY="Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM" +# Access via local IP (use -k to skip cert verification) +curl -sk "https://10.10.10.54:8384/rest/system/status" -H "X-API-Key: $API_KEY" +# Or via Tailscale +curl -sk "https://100.106.175.37:8384/rest/system/status" -H "X-API-Key: $API_KEY" +``` + +## Common Commands + +### Check Status +```bash +# Folder status +curl -s "http://127.0.0.1:8384/rest/db/status?folder=downloads" -H "X-API-Key: $API_KEY" + +# Connection status +curl -s "http://127.0.0.1:8384/rest/system/connections" -H "X-API-Key: $API_KEY" + +# Device completion for a folder +curl -s "http://127.0.0.1:8384/rest/db/completion?folder=downloads&device=DEVICE_ID" -H "X-API-Key: $API_KEY" +``` + +### Check Errors +```bash +curl -s "http://127.0.0.1:8384/rest/folder/errors?folder=downloads" -H "X-API-Key: $API_KEY" +``` + +### Rescan Folder +```bash +curl -X POST "http://127.0.0.1:8384/rest/db/scan?folder=downloads" -H "X-API-Key: $API_KEY" +``` + +## Configuration Files + +| Device | Config Path | +|--------|-------------| +| Mac Mini | ~/Library/Application Support/Syncthing/config.xml | +| MacBook Pro | ~/Library/Application Support/Syncthing/config.xml | +| TrueNAS | /mnt/tank/syncthing/config/config.xml | + +## Performance Tuning + +### Speed Optimizations (2024-12-17) + +#### Global Options +| Setting | Value | Effect | +|---------|-------|--------| +| `numConnections` | 4 | Parallel transfers per device | +| `compression` | never | No CPU overhead on fast LAN | +| `setLowPriority` | false | Normal CPU priority | +| `connectionPriorityQuicLan` | 10 | QUIC preferred on LAN | +| `connectionPriorityTcpLan` | 20 | TCP fallback on LAN | +| `connectionPriorityQuicWan` | 30 | QUIC preferred on WAN | +| `connectionPriorityTcpWan` | 40 | TCP fallback on WAN | +| `progressUpdateIntervalS` | -1 | Disabled progress updates (reduces overhead) | +| `maxConcurrentIncomingRequestKiB` | 1048576 | 1GB buffer for incoming requests | + +**Applied to**: Mac Mini, MacBook, Windows PC (Phone uses 512MB buffer) + +#### Folder-Level Settings +| Setting | Value | Effect | +|---------|-------|--------| +| `pullerMaxPendingKiB` | 131072-262144 | 128-256MB pending data buffer per folder | + +**Applied to**: downloads, projects, documents, desktop, notes folders + +### Rescan Intervals (set to 3600s for large folders) +Large folders like Downloads use 1-hour rescan intervals to reduce CPU usage: +- File system watcher handles real-time changes +- Hourly rescan catches anything missed + +### Power Optimization +From CLAUDE.md - Syncthing rescan optimization saved ~60-80W on TrueNAS VM. + +## Troubleshooting + +### Device Not Syncing +1. Check connection status: +```bash +curl -s "http://127.0.0.1:8384/rest/system/connections" -H "X-API-Key: $API_KEY" | python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; [print(f'{k[:7]}: {v[\"connected\"]}') for k,v in d.items()]" +``` + +2. Check folder completion: +```bash +curl -s "http://127.0.0.1:8384/rest/db/status?folder=FOLDER" -H "X-API-Key: $API_KEY" +``` + +3. Check for errors: +```bash +curl -s "http://127.0.0.1:8384/rest/folder/errors?folder=FOLDER" -H "X-API-Key: $API_KEY" +``` + +### Many Pending Deletes +If a device shows thousands of "needDeletes", it means files were deleted elsewhere and need to propagate. This is normal after reorganization - let it complete. + +### Web UI +Access Syncthing web interface at http://127.0.0.1:8384 + +## SSH Access to Devices + +### MacBook Pro (via Tailscale) +```bash +sshpass -p 'GrilledCh33s3#' ssh -o StrictHostKeyChecking=no hutson@100.88.161.1 +``` + +### Check Syncthing remotely +```bash +sshpass -p 'GrilledCh33s3#' ssh hutson@100.88.161.1 'curl -s "http://127.0.0.1:8384/rest/db/status?folder=downloads" -H "X-API-Key: qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ"' +``` diff --git a/configs/claude-aliases.zsh b/configs/claude-aliases.zsh new file mode 120000 index 0000000..fafdf12 --- /dev/null +++ b/configs/claude-aliases.zsh @@ -0,0 +1 @@ +/Users/hutson/.config/shell/claude-aliases.zsh \ No newline at end of file diff --git a/configs/ghostty.conf b/configs/ghostty.conf new file mode 100644 index 0000000..8f0633e --- /dev/null +++ b/configs/ghostty.conf @@ -0,0 +1,5 @@ +theme = Gruvbox Dark +font-feature = -liga +font-size = 16 +font-family = "JetBrains Mono" +split-divider-color = #83a598 diff --git a/mcp-central/.env.example b/mcp-central/.env.example new file mode 100644 index 0000000..bd6ae4a --- /dev/null +++ b/mcp-central/.env.example @@ -0,0 +1,16 @@ +# MCP Central Server Environment Variables +# Copy to .env and fill in your values + +# Airtable +AIRTABLE_API_KEY=patIrM3XYParyuHQL.xxxxx + +# Exa +EXA_API_KEY=your_exa_api_key + +# TickTick (if using) +TICKTICK_CLIENT_ID=your_client_id +TICKTICK_CLIENT_SECRET=your_client_secret + +# Slack (if using) +SLACK_BOT_TOKEN=xoxb-xxxxx +SLACK_USER_TOKEN=xoxp-xxxxx diff --git a/mcp-central/README.md b/mcp-central/README.md new file mode 100644 index 0000000..4d35347 --- /dev/null +++ b/mcp-central/README.md @@ -0,0 +1,129 @@ +# Centralized MCP Servers for Homelab + +## Current State of MCP Remote Access + +**The Problem**: Most MCP servers use `stdio` transport (local process communication). +Claude Code clients expect to spawn local processes. + +**The Solution**: Use `mcp-remote` to bridge local clients to remote servers. + +## Architecture + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ docker-host (10.10.10.206) β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ airtable-mcpβ”‚ β”‚ exa-mcp β”‚ β”‚ ticktick-mcpβ”‚ ... β”‚ +β”‚ β”‚ :3001/sse β”‚ β”‚ :3002/sse β”‚ β”‚ :3003/sse β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β–² β–² β–² + β”‚ β”‚ β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β” + β”‚ Tailscale / LAN β”‚ + β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ MacBook β”‚ β”‚ Mac Mini β”‚ β”‚ Windows PC β”‚ + β”‚ Claude Code β”‚ β”‚ Claude Code β”‚ β”‚ Claude Code β”‚ + β”‚ mcp-remote β”‚ β”‚ mcp-remote β”‚ β”‚ mcp-remote β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +## Setup + +### Step 1: Deploy MCP Servers on docker-host + +```bash +ssh hutson@10.10.10.206 +cd /opt/mcp-central +docker-compose up -d +``` + +### Step 2: Configure Claude Code Clients + +Each device needs `mcp-remote` installed and configured. + +**Install mcp-remote:** +```bash +npm install -g mcp-remote +``` + +**Update ~/.claude/settings.json:** +```json +{ + "mcpServers": { + "airtable": { + "command": "npx", + "args": ["mcp-remote", "http://10.10.10.206:3001/sse"] + }, + "exa": { + "command": "npx", + "args": ["mcp-remote", "http://10.10.10.206:3002/sse"] + }, + "ticktick": { + "command": "npx", + "args": ["mcp-remote", "http://10.10.10.206:3003/sse"] + } + } +} +``` + +**For remote access via Tailscale, use Tailscale IP:** +```json +{ + "mcpServers": { + "airtable": { + "command": "npx", + "args": ["mcp-remote", "http://100.x.x.x:3001/sse"] + } + } +} +``` + +## Which Servers Can Be Centralized? + +| Server | Centralizable | Notes | +|--------|--------------|-------| +| Airtable | Yes | Just needs API key | +| Exa | Yes | Just needs API key | +| TickTick | Yes | OAuth token stored server-side | +| Slack | Yes | Bot token stored server-side | +| Ref | Yes | API key only | +| Beeper | No | Needs local Beeper Desktop | +| Google Sheets | Partial | OAuth flow needs user interaction | +| Monarch Money | Partial | Credentials stored server-side | + +## Alternative: Shared Config File + +If full centralization is too complex, you can at least share the config: + +1. Store `settings.json` in a synced folder (e.g., Syncthing `configs/`) +2. Symlink from each device: + ```bash + ln -s ~/Sync/configs/claude-settings.json ~/.claude/settings.json + ``` + +This doesn't centralize the servers, but ensures all devices have the same config. + +## Traefik Integration (Optional) + +Add to Traefik for HTTPS access: + +```yaml +# /etc/traefik/conf.d/mcp.yaml +http: + routers: + mcp-airtable: + rule: "Host(`mcp-airtable.htsn.io`)" + service: mcp-airtable + tls: + certResolver: cloudflare + services: + mcp-airtable: + loadBalancer: + servers: + - url: "http://10.10.10.206:3001" +``` + +Then use: `http://mcp-airtable.htsn.io/sse` in your config. diff --git a/mcp-central/docker-compose.yml b/mcp-central/docker-compose.yml new file mode 100644 index 0000000..1aa10bd --- /dev/null +++ b/mcp-central/docker-compose.yml @@ -0,0 +1,58 @@ +# Centralized MCP Server Stack +# Deploy on docker-host (10.10.10.206) +# All Claude Code clients connect via HTTP/SSE + +version: "3.8" + +services: + # MCP Gateway - Routes all MCP requests + mcp-gateway: + image: node:20-slim + container_name: mcp-gateway + working_dir: /app + volumes: + - ./gateway:/app + ports: + - "3100:3100" + command: node server.js + restart: unless-stopped + environment: + - PORT=3100 + networks: + - mcp-network + + # Airtable MCP Server + airtable-mcp: + image: node:20-slim + container_name: airtable-mcp + working_dir: /app + command: sh -c "npm install airtable-mcp-server && npx airtable-mcp-server" + environment: + - AIRTABLE_API_KEY=${AIRTABLE_API_KEY} + - MCP_TRANSPORT=sse + - MCP_PORT=3001 + ports: + - "3001:3001" + restart: unless-stopped + networks: + - mcp-network + + # Exa MCP Server + exa-mcp: + image: node:20-slim + container_name: exa-mcp + working_dir: /app + command: sh -c "npm install @anthropic/mcp-server-exa && npx @anthropic/mcp-server-exa" + environment: + - EXA_API_KEY=${EXA_API_KEY} + - MCP_TRANSPORT=sse + - MCP_PORT=3002 + ports: + - "3002:3002" + restart: unless-stopped + networks: + - mcp-network + +networks: + mcp-network: + driver: bridge diff --git a/scripts/fix-immich-raf-files.sh b/scripts/fix-immich-raf-files.sh new file mode 100644 index 0000000..3c301d4 --- /dev/null +++ b/scripts/fix-immich-raf-files.sh @@ -0,0 +1,159 @@ +#!/bin/bash +# +# Fix Immich RAF files that were mislabeled as JPG +# This script: +# 1. Finds all JPG files that are actually Fujifilm RAF (RAW) files +# 2. Renames them from .jpg to .raf on the filesystem +# 3. Updates Immich's database to match +# 4. Triggers thumbnail regeneration +# +# Run from Mac Mini or any machine with SSH access to PVE +# + +set -e + +# Config +SSH_PASS="GrilledCh33s3#" +PVE_IP="10.10.10.120" +SSH_OPTS="-o StrictHostKeyChecking=no" + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' + +echo "==========================================" +echo " Immich RAF File Fixer" +echo "==========================================" +echo "" + +# Test connectivity +echo "Testing connection to Saltbox..." +if ! sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm status 101' &>/dev/null; then + echo -e "${RED}Error: Cannot connect to PVE or Saltbox VM not running${NC}" + exit 1 +fi +echo -e "${GREEN}Connected${NC}" +echo "" + +# Step 1: Find mislabeled files +echo "Step 1: Finding JPG files that are actually RAF..." +echo "" + +MISLABELED_COUNT=$(sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "echo \"SELECT COUNT(*) FROM asset a JOIN asset_exif e ON a.id = e.\\\"assetId\\\" WHERE a.\\\"originalFileName\\\" ILIKE '"'"'%.jpg'"'"' AND e.\\\"fileSizeInByte\\\" > 35000000 AND e.make = '"'"'FUJIFILM'"'"';\" | docker exec -i immich-postgres psql -U hutson -d immich -t"' 2>/dev/null | grep -o '[0-9]*' | head -1) + +echo -e "Found ${YELLOW}${MISLABELED_COUNT}${NC} mislabeled files" +echo "" + +if [ "$MISLABELED_COUNT" -eq 0 ]; then + echo -e "${GREEN}No mislabeled files found. Nothing to fix!${NC}" + exit 0 +fi + +# Confirm before proceeding +read -p "Proceed with fixing these files? (y/N) " -n 1 -r +echo "" +if [[ ! $REPLY =~ ^[Yy]$ ]]; then + echo "Aborted." + exit 0 +fi + +echo "" +echo "Step 2: Creating fix script on Saltbox..." + +# Create the fix script on Saltbox +sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "cat > /tmp/fix-raf-files.sh << '"'"'SCRIPT'"'"' +#!/bin/bash +set -e + +echo "Getting list of mislabeled files..." + +# Get list of files to fix +docker exec -i immich-postgres psql -U hutson -d immich -t -A -F\",\" -c " +SELECT a.id, a.\"originalPath\", a.\"originalFileName\" +FROM asset a +JOIN asset_exif e ON a.id = e.\"assetId\" +WHERE a.\"originalFileName\" ILIKE '"'"'"'"'"'"'"'"'%.jpg'"'"'"'"'"'"'"'"' +AND e.\"fileSizeInByte\" > 35000000 +AND e.make = '"'"'"'"'"'"'"'"'FUJIFILM'"'"'"'"'"'"'"'"' +" > /tmp/files_to_fix.csv + +TOTAL=$(wc -l < /tmp/files_to_fix.csv) +echo "Processing $TOTAL files..." + +COUNT=0 +ERRORS=0 + +while IFS="," read -r asset_id old_path old_filename; do + COUNT=$((COUNT + 1)) + + # Skip empty lines + [ -z "$asset_id" ] && continue + + # Calculate new paths + new_filename=$(echo "$old_filename" | sed "s/\.[jJ][pP][gG]$/.RAF/") + new_path=$(echo "$old_path" | sed "s/\.[jJ][pP][gG]$/.raf/") + + echo "[$COUNT/$TOTAL] $old_filename -> $new_filename" + + # Rename file on filesystem (inside immich container) + if docker exec immich test -f "$old_path"; then + docker exec immich mv "$old_path" "$new_path" 2>/dev/null + if [ $? -ne 0 ]; then + echo " ERROR: Failed to rename file" + ERRORS=$((ERRORS + 1)) + continue + fi + else + echo " WARNING: File not found at $old_path" + ERRORS=$((ERRORS + 1)) + continue + fi + + # Update database + docker exec -i immich-postgres psql -U hutson -d immich -c " + UPDATE asset + SET \"originalPath\" = '"'"'"'"'"'"'"'"'$new_path'"'"'"'"'"'"'"'"', + \"originalFileName\" = '"'"'"'"'"'"'"'"'$new_filename'"'"'"'"'"'"'"'"' + WHERE id = '"'"'"'"'"'"'"'"'$asset_id'"'"'"'"'"'"'"'"'::uuid; + " > /dev/null 2>&1 + + if [ $? -ne 0 ]; then + echo " ERROR: Failed to update database" + # Try to rename back + docker exec immich mv "$new_path" "$old_path" 2>/dev/null + ERRORS=$((ERRORS + 1)) + continue + fi + +done < /tmp/files_to_fix.csv + +echo "" +echo "==========================================" +echo "Completed: $((COUNT - ERRORS)) fixed, $ERRORS errors" +echo "==========================================" + +# Cleanup +rm -f /tmp/files_to_fix.csv +SCRIPT +chmod +x /tmp/fix-raf-files.sh"' + +echo "" +echo "Step 3: Running fix script (this may take a while)..." +echo "" + +# Run the fix script +sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "/tmp/fix-raf-files.sh"' 2>&1 | grep -o '"out-data"[^}]*' | sed 's/"out-data" *: *"//' | sed 's/\\n/\n/g' | sed 's/\\t/\t/g' | sed 's/"$//' + +echo "" +echo "Step 4: Restarting Immich to pick up changes..." + +sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "docker restart immich"' > /dev/null 2>&1 + +echo -e "${GREEN}Done!${NC}" +echo "" +echo "Next steps:" +echo "1. Go to Immich Admin -> Jobs -> Thumbnail Generation -> All -> Start" +echo "2. This will regenerate thumbnails for all assets" +echo "" diff --git a/scripts/health-check.sh b/scripts/health-check.sh new file mode 100755 index 0000000..3643d46 --- /dev/null +++ b/scripts/health-check.sh @@ -0,0 +1,318 @@ +#!/bin/bash +# +# Homelab Health Check & Recovery Script +# Run this to check status and bring services online +# +# Usage: ./health-check.sh [--fix] +# Without --fix: Read-only health check +# With --fix: Attempt to start stopped services and fix issues +# + +set -e + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' # No Color + +# Config +SSH_PASS="GrilledCh33s3#" +PVE_IP="10.10.10.120" +PVE2_IP="10.10.10.102" +SSH_OPTS="-o StrictHostKeyChecking=no -o ConnectTimeout=5" + +FIX_MODE=false +if [[ "$1" == "--fix" ]]; then + FIX_MODE=true + echo -e "${YELLOW}Running in FIX mode - will attempt to start stopped services${NC}" + echo "" +fi + +# Helper functions +ssh_pve() { + sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP "$@" 2>/dev/null +} + +ssh_pve2() { + sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE2_IP "$@" 2>/dev/null +} + +print_status() { + if [[ "$2" == "ok" ]]; then + echo -e " ${GREEN}βœ“${NC} $1" + elif [[ "$2" == "warn" ]]; then + echo -e " ${YELLOW}!${NC} $1" + else + echo -e " ${RED}βœ—${NC} $1" + fi +} + +# Check if sshpass is installed +if ! command -v sshpass &> /dev/null; then + echo -e "${RED}Error: sshpass is not installed${NC}" + echo "Install with: brew install hudochenkov/sshpass/sshpass" + exit 1 +fi + +echo "================================" +echo " HOMELAB HEALTH CHECK" +echo " $(date '+%Y-%m-%d %H:%M:%S')" +echo "================================" +echo "" + +# ============================================ +# PVE (Primary Server) +# ============================================ +echo "--- PVE (10.10.10.120) ---" + +# Check connectivity +if ssh_pve "echo ok" > /dev/null 2>&1; then + print_status "PVE Reachable" "ok" +else + print_status "PVE Unreachable" "fail" + echo "" + echo "--- PVE2 (10.10.10.102) ---" + if ssh_pve2 "echo ok" > /dev/null 2>&1; then + print_status "PVE2 Reachable" "ok" + else + print_status "PVE2 Unreachable" "fail" + fi + exit 1 +fi + +# Check cluster quorum +QUORUM=$(ssh_pve "pvecm status 2>&1 | grep 'Quorate:' | awk '{print \$2}'" || echo "Unknown") +if [[ "$QUORUM" == "Yes" ]]; then + print_status "Cluster Quorum: $QUORUM" "ok" +else + print_status "Cluster Quorum: $QUORUM" "fail" +fi + +# Check CPU temp +TEMP=$(ssh_pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo $(($(cat $f)/1000)); fi; done') +if [[ -n "$TEMP" ]]; then + if [[ "$TEMP" -lt 85 ]]; then + print_status "CPU Temp: ${TEMP}Β°C" "ok" + elif [[ "$TEMP" -lt 90 ]]; then + print_status "CPU Temp: ${TEMP}Β°C (warm)" "warn" + else + print_status "CPU Temp: ${TEMP}Β°C (HOT!)" "fail" + fi +fi + +# Check ZFS pools +ZFS_STATUS=$(ssh_pve "zpool status -x" || echo "Unknown") +if [[ "$ZFS_STATUS" == "all pools are healthy" ]]; then + print_status "ZFS Pools: Healthy" "ok" +else + print_status "ZFS Pools: $ZFS_STATUS" "fail" +fi + +# Check VMs +echo "" +echo " VMs:" +CRITICAL_VMS="100 101 110 206" # TrueNAS, Saltbox, HomeAssistant, Docker-host +STOPPED_VMS="" +TRUENAS_ZFS_SUSPENDED=false + +while IFS= read -r line; do + VMID=$(echo "$line" | awk '{print $1}') + NAME=$(echo "$line" | awk '{print $2}') + STATUS=$(echo "$line" | awk '{print $3}') + + if [[ "$STATUS" == "running" ]]; then + print_status "$VMID $NAME: $STATUS" "ok" + else + print_status "$VMID $NAME: $STATUS" "fail" + if [[ " $CRITICAL_VMS " =~ " $VMID " ]]; then + STOPPED_VMS="$STOPPED_VMS $VMID" + fi + fi +done < <(ssh_pve "qm list" | tail -n +2) + +# Check TrueNAS ZFS (VM 100) if running +if ssh_pve "qm status 100" 2>/dev/null | grep -q running; then + echo "" + echo " TrueNAS ZFS:" + TRUENAS_ZFS=$(ssh_pve 'qm guest exec 100 -- bash -c "zpool list -H -o name,health vault 2>/dev/null"' 2>/dev/null | grep -o '"out-data"[^}]*' | sed 's/"out-data" : "//' | tr -d '\\n"' || echo "Unknown") + + if [[ "$TRUENAS_ZFS" == *"ONLINE"* ]]; then + print_status "vault pool: ONLINE" "ok" + elif [[ "$TRUENAS_ZFS" == *"SUSPENDED"* ]]; then + print_status "vault pool: SUSPENDED (needs zpool clear)" "fail" + TRUENAS_ZFS_SUSPENDED=true + elif [[ "$TRUENAS_ZFS" == *"DEGRADED"* ]]; then + print_status "vault pool: DEGRADED" "warn" + else + print_status "vault pool: $TRUENAS_ZFS" "fail" + fi +fi + +# Check Containers +echo "" +echo " Containers:" +CRITICAL_CTS="200 202" # PiHole, Traefik +STOPPED_CTS="" + +while IFS= read -r line; do + CTID=$(echo "$line" | awk '{print $1}') + STATUS=$(echo "$line" | awk '{print $2}') + NAME=$(echo "$line" | awk '{print $4}') + + if [[ "$STATUS" == "running" ]]; then + print_status "$CTID $NAME: $STATUS" "ok" + else + print_status "$CTID $NAME: $STATUS" "fail" + if [[ " $CRITICAL_CTS " =~ " $CTID " ]]; then + STOPPED_CTS="$STOPPED_CTS $CTID" + fi + fi +done < <(ssh_pve "pct list" | tail -n +2) + +# ============================================ +# PVE2 (Secondary Server) +# ============================================ +echo "" +echo "--- PVE2 (10.10.10.102) ---" + +if ssh_pve2 "echo ok" > /dev/null 2>&1; then + print_status "PVE2 Reachable" "ok" + + # Check CPU temp + TEMP2=$(ssh_pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo $(($(cat $f)/1000)); fi; done') + if [[ -n "$TEMP2" ]]; then + if [[ "$TEMP2" -lt 85 ]]; then + print_status "CPU Temp: ${TEMP2}Β°C" "ok" + elif [[ "$TEMP2" -lt 90 ]]; then + print_status "CPU Temp: ${TEMP2}Β°C (warm)" "warn" + else + print_status "CPU Temp: ${TEMP2}Β°C (HOT!)" "fail" + fi + fi + + # Check VMs + echo "" + echo " VMs:" + while IFS= read -r line; do + VMID=$(echo "$line" | awk '{print $1}') + NAME=$(echo "$line" | awk '{print $2}') + STATUS=$(echo "$line" | awk '{print $3}') + + if [[ "$STATUS" == "running" ]]; then + print_status "$VMID $NAME: $STATUS" "ok" + else + print_status "$VMID $NAME: $STATUS" "fail" + fi + done < <(ssh_pve2 "qm list" | tail -n +2) +else + print_status "PVE2 Unreachable" "fail" +fi + +# ============================================ +# FIX MODE - Start stopped services +# ============================================ +if $FIX_MODE && [[ -n "$STOPPED_VMS" || -n "$STOPPED_CTS" || "$TRUENAS_ZFS_SUSPENDED" == "true" ]]; then + echo "" + echo "================================" + echo " RECOVERY MODE" + echo "================================" + + # Fix TrueNAS ZFS SUSPENDED state first (critical for mounts) + if [[ "$TRUENAS_ZFS_SUSPENDED" == "true" ]]; then + echo "" + echo "Clearing TrueNAS ZFS pool errors..." + ZFS_CLEAR_RESULT=$(ssh_pve 'qm guest exec 100 -- bash -c "zpool clear vault 2>&1 && zpool list -H -o health vault"' 2>/dev/null | grep -o '"out-data"[^}]*' | sed 's/"out-data" : "//' | tr -d '\\n"' || echo "FAILED") + + if [[ "$ZFS_CLEAR_RESULT" == *"ONLINE"* ]]; then + print_status "vault pool recovered: ONLINE" "ok" + else + print_status "vault pool recovery failed: $ZFS_CLEAR_RESULT" "fail" + fi + sleep 5 # Give ZFS time to stabilize + fi + + # Start TrueNAS first (it provides storage) + if [[ " $STOPPED_VMS " =~ " 100 " ]]; then + echo "" + echo "Starting TrueNAS (VM 100) first..." + ssh_pve "qm start 100" && print_status "TrueNAS started" "ok" || print_status "Failed to start TrueNAS" "fail" + echo "Waiting 60s for TrueNAS to boot..." + sleep 60 + fi + + # Start other VMs + for VMID in $STOPPED_VMS; do + if [[ "$VMID" != "100" ]]; then + NAME=$(ssh_pve "qm config $VMID | grep '^name:' | awk '{print \$2}'") + echo "Starting VM $VMID ($NAME)..." + ssh_pve "qm start $VMID" && print_status "$NAME started" "ok" || print_status "Failed to start $NAME" "fail" + sleep 5 + fi + done + + # Start containers + for CTID in $STOPPED_CTS; do + NAME=$(ssh_pve "pct config $CTID | grep '^hostname:' | awk '{print \$2}'") + echo "Starting CT $CTID ($NAME)..." + ssh_pve "pct start $CTID" && print_status "$NAME started" "ok" || print_status "Failed to start $NAME" "fail" + sleep 3 + done + + # Mount TrueNAS shares on Saltbox if Saltbox is running + if ssh_pve "qm status 101" 2>/dev/null | grep -q running; then + echo "" + echo "Checking TrueNAS mounts on Saltbox..." + sleep 10 # Give services time to start + + MOUNT_STATUS=$(ssh_pve 'qm guest exec 101 -- bash -c "mount | grep -c Media"' 2>/dev/null | grep -o '"out-data"[^}]*' | grep -o '[0-9]' || echo "0") + + if [[ "$MOUNT_STATUS" == "0" ]]; then + echo "Mounting TrueNAS shares..." + ssh_pve 'qm guest exec 101 -- bash -c "mount /mnt/local/Media; mount /mnt/local/downloads"' 2>/dev/null + print_status "TrueNAS mounts attempted" "ok" + + # Restart Immich + echo "Restarting Immich..." + ssh_pve 'qm guest exec 101 -- bash -c "docker restart immich"' 2>/dev/null + print_status "Immich restarted" "ok" + else + print_status "TrueNAS mounts already present" "ok" + fi + fi +fi + +# ============================================ +# Summary +# ============================================ +echo "" +echo "================================" +echo " SUMMARY" +echo "================================" + +ISSUES=0 + +if [[ -n "$STOPPED_VMS" ]] && ! $FIX_MODE; then + echo -e "${YELLOW}Stopped critical VMs:${NC}$STOPPED_VMS" + ISSUES=$((ISSUES + 1)) +fi + +if [[ -n "$STOPPED_CTS" ]] && ! $FIX_MODE; then + echo -e "${YELLOW}Stopped critical containers:${NC}$STOPPED_CTS" + ISSUES=$((ISSUES + 1)) +fi + +if [[ "$TRUENAS_ZFS_SUSPENDED" == "true" ]] && ! $FIX_MODE; then + echo -e "${RED}TrueNAS ZFS pool SUSPENDED!${NC} SMB mounts will fail." + ISSUES=$((ISSUES + 1)) +fi + +if [[ "$ISSUES" -eq 0 ]]; then + echo -e "${GREEN}All critical services healthy!${NC}" +else + echo "" + echo -e "Run ${YELLOW}./health-check.sh --fix${NC} to attempt recovery" +fi + +echo "" +echo "Done: $(date '+%Y-%m-%d %H:%M:%S')"