Initial commit: Homelab infrastructure documentation

- CLAUDE.md: Main homelab assistant context and instructions
- IP-ASSIGNMENTS.md: Complete IP address assignments
- NETWORK.md: Network bridges, VLANs, and configuration
- EMC-ENCLOSURE.md: EMC storage enclosure documentation
- SYNCTHING.md: Syncthing setup and device list
- SHELL-ALIASES.md: ZSH aliases for Claude Code sessions
- HOMEASSISTANT.md: Home Assistant API and automations
- INFRASTRUCTURE.md: Server hardware and power management
- configs/: Shared shell configurations
- scripts/: Utility scripts
- mcp-central/: MCP server configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Hutson
2025-12-20 02:31:02 -05:00
commit 93821d1557
17 changed files with 3267 additions and 0 deletions

22
.gitignore vendored Normal file
View File

@@ -0,0 +1,22 @@
# Secrets and credentials
.env
*.credentials
*-credentials*.txt
# macOS
.DS_Store
.AppleDouble
.LSOverride
# Editor/IDE
.obsidian/
.claude/
.vscode/
*.swp
*.swo
*~
# Temporary files
*.tmp
*.bak
nul

197
CHANGELOG.md Normal file
View File

@@ -0,0 +1,197 @@
# Homelab Changelog
## 2024-12-16
### Power Investigation
Investigated UPS power limit issues across both Proxmox servers.
#### Findings
1. **KSMD (Kernel Same-page Merging Daemon)** was consuming 50-57% CPU constantly on PVE
- `sleep_millisecs` set to 12ms (extremely aggressive, default is 200ms)
- `general_profit` was **negative** (-320MB) meaning it was wasting CPU
- No memory overcommit situation (98GB allocated on 128GB RAM)
- Diverse workloads (TrueNAS, Windows, Linux) = few duplicate pages to merge
2. **GPU Power Draw** identified as major consumers:
- RTX A6000 on PVE2: up to 300W TDP
- TITAN RTX on PVE: up to 280W TDP
- Quadro P2000 on PVE: up to 75W TDP
3. **TrueNAS VM** occasionally spiking to 86% CPU (needs investigation)
#### Changes Made
- [x] **Disabled KSMD on PVE** (10.10.10.120)
```bash
echo 0 > /sys/kernel/mm/ksm/run
```
- Immediate result: KSMD CPU dropped from 51-57% to 0%
- Load average dropped from 1.88 to 1.28
- Estimated savings: ~7-10W continuous
#### Additional Changes
- [x] **Made KSMD disable persistent on both hosts**
- Note: KSM is controlled via sysfs, not sysctl
- Created systemd service `/etc/systemd/system/disable-ksm.service`:
```ini
[Unit]
Description=Disable KSM (Kernel Same-page Merging)
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/bin/sh -c "echo 0 > /sys/kernel/mm/ksm/run"
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
```
- Enabled on both PVE and PVE2: `systemctl enable disable-ksm.service`
### Syncthing Rescan Interval Fix
**Root Cause**: Syncthing on TrueNAS was rescanning 56GB of data every 60 seconds, causing constant 100% CPU usage (~3172 minutes CPU time in 3 days).
**Folders affected** (changed from 60s to 3600s):
- downloads (38GB)
- documents (11GB)
- desktop (7.2GB)
- config, movies, notes, pictures
**Fix applied**:
```bash
# Downloaded config from TrueNAS
ssh pve 'qm guest exec 100 -- cat /mnt/.ix-apps/app_mounts/syncthing/config/config/config.xml'
# Changed all rescanIntervalS="60" to rescanIntervalS="3600"
sed -i 's/rescanIntervalS="60"/rescanIntervalS="3600"/g' config.xml
# Uploaded and restarted Syncthing
curl -X POST -H "X-API-Key: xxx" http://localhost:20910/rest/system/restart
```
**Note**: fsWatcher is enabled, so changes are detected in real-time. The rescan is just a safety net.
**Estimated savings**: ~60-80W (TrueNAS VM CPU will drop from 86% to ~5-10% at idle)
### GPU Power State Investigation
| GPU | VM | Idle Power | P-State | Status |
|-----|-----|-----------|---------|--------|
| RTX A6000 | trading-vm (301) | **11W** | P8 | Optimal |
| TITAN RTX | lmdev1 (111) | **2W** | P8 | Excellent! |
| Quadro P2000 | saltbox (101) | **25W** | P0 | Stuck due to Plex |
**Findings**:
- RTX A6000: Properly entering P8 (11W idle) - excellent
- TITAN RTX: Only 2W at idle despite ComfyUI/Python processes (436MiB VRAM used)
- Modern GPUs have much better idle power management
- Quadro P2000: Stuck in P0 at 25W because Plex Transcoder holds GPU memory
- Older Quadro cards don't idle as efficiently with processes attached
- Power limit fixed at 75W (not adjustable)
**Changes made**:
- [x] Installed QEMU guest agent on lmdev1 (VM 111)
- [x] Added SSH key access to lmdev1 (10.10.10.111)
- [x] Updated ~/.ssh/config with lmdev1 entry
### CPU Governor Optimization
**Issue**: Both servers using `performance` CPU governor, keeping CPUs at high frequencies (3-4GHz) even when 99% idle.
**Changes**:
#### PVE (10.10.10.120)
- **Driver**: `amd-pstate-epp` (modern AMD P-State with Energy Performance Preference)
- **Change**: Governor `performance` → `powersave`, EPP `performance` → `balance_power`
- **Result**: Idle frequencies dropped from ~4GHz to ~1.7GHz
- **Persistence**: Created `/etc/systemd/system/cpu-powersave.service`
```ini
[Unit]
Description=Set CPU governor to powersave with balance_power EPP
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/bin/bash -c 'for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo powersave > "$gov"; done; for epp in /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference; do echo balance_power > "$epp"; done'
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
```
#### PVE2 (10.10.10.102)
- **Driver**: `acpi-cpufreq` (older driver)
- **Change**: Governor `performance` → `schedutil`
- **Result**: Idle frequencies dropped from ~4GHz to ~2.2GHz
- **Persistence**: Created `/etc/systemd/system/cpu-powersave.service`
```ini
[Unit]
Description=Set CPU governor to schedutil for power savings
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/bin/bash -c 'for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo schedutil > "$gov"; done'
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
```
**Estimated savings**: 30-60W per server (60-120W total)
### ksmtuned Service Disabled
**Issue**: The `ksmtuned` (KSM tuning daemon) was still running on both servers even after KSMD was disabled. Consuming ~39 min CPU on PVE and ~12 min CPU on PVE2 over 3 days.
**Fix**:
```bash
systemctl stop ksmtuned
systemctl disable ksmtuned
```
Applied to both PVE and PVE2.
**Estimated savings**: ~2-5W
### HDD Spindown on PVE2
**Issue**: Two WD Red 6TB drives (local-zfs2 pool) spinning 24/7 despite pool having only 768KB used. Each drive uses 5-8W spinning.
**Fix**:
```bash
# Set 30-minute spindown timeout
hdparm -S 241 /dev/sda /dev/sdb
```
**Persistence**: Created udev rule `/etc/udev/rules.d/69-hdd-spindown.rules`:
```
ACTION=="add", KERNEL=="sd[a-z]", ATTRS{model}=="WDC WD60EFRX-68L*", RUN+="/usr/sbin/hdparm -S 241 /dev/%k"
```
**Estimated savings**: ~10-16W (when drives spin down)
#### Pending Changes
- [ ] Monitor overall power consumption after all optimizations
- [ ] Consider PCIe ASPM optimization
- [ ] Consider NMI watchdog disable
### SSH Key Setup
- Added SSH key authentication to both Proxmox servers
- Updated `~/.ssh/config` with entries for `pve` and `pve2`
---
## Notes
### What is KSMD?
Kernel Same-page Merging Daemon - scans memory for duplicate pages across VMs and merges them. Trades CPU cycles for RAM savings. Useful when:
- Overcommitting memory
- Running many identical VMs
Not useful when:
- Plenty of RAM headroom (our case)
- Diverse workloads with few duplicate pages
- `general_profit` is negative
### What is Memory Ballooning?
Guest-cooperative memory management. Hypervisor can request VMs to give back unused RAM. Independent from KSMD. Both are Proxmox/KVM memory optimization features but serve different purposes.

962
CLAUDE.md Normal file
View File

@@ -0,0 +1,962 @@
# Homelab Infrastructure
## Quick Reference - Common Tasks
| Task | Section | Quick Command |
|------|---------|---------------|
| **Add new public service** | [Reverse Proxy](#reverse-proxy-architecture-traefik) | Create Traefik config + Cloudflare DNS |
| **Add Cloudflare DNS** | [Cloudflare API](#cloudflare-api-access) | `curl -X POST cloudflare.com/...` |
| **Check server temps** | [Temperature Check](#server-temperature-check) | `ssh pve 'grep Tctl ...'` |
| **Syncthing issues** | [Troubleshooting](#troubleshooting-runbooks) | Check API connections |
| **SSL cert issues** | [Traefik DNS Challenge](#ssl-certificates) | Use `cloudflare` resolver |
**Key Credentials (see sections for full details):**
- Cloudflare: `cloudflare@htsn.io` / API Key in [Cloudflare API](#cloudflare-api-access)
- SSH Password: `GrilledCh33s3#`
- Traefik: CT 202 @ 10.10.10.250
---
## Role
You are the **Homelab Assistant** - a Claude Code session dedicated to managing and maintaining Hutson's home infrastructure. Your responsibilities include:
- **Infrastructure Management**: Proxmox servers, VMs, containers, networking
- **File Sync**: Syncthing configuration across all devices (Mac Mini, MacBook, Windows PC, TrueNAS, Android)
- **Network Administration**: Router config, SSH access, Tailscale, device management
- **Power Optimization**: CPU governors, GPU power states, service tuning
- **Documentation**: Keep CLAUDE.md, SYNCTHING.md, and SHELL-ALIASES.md up to date
- **Automation**: Shell aliases, startup scripts, scheduled tasks
You have full access to all homelab devices via SSH and APIs. Use this context to help troubleshoot, configure, and optimize the infrastructure.
### Proactive Behaviors
When the user mentions issues or asks questions, proactively:
- **"sync not working"** → Check Syncthing status on ALL devices, identify which is offline
- **"device offline"** → Ping both local and Tailscale IPs, check if service is running
- **"slow"** → Check CPU usage, running processes, Syncthing rescan activity
- **"check status"** → Run full health check across all systems
- **"something's wrong"** → Run diagnostics on likely culprits based on context
### Quick Health Checks
Run these to get a quick overview of the homelab:
```bash
# === FULL HEALTH CHECK ===
# Syncthing connections (Mac Mini)
curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" "http://127.0.0.1:8384/rest/system/connections" | python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; [print(f\"{v.get('name',k[:7])}: {'UP' if v['connected'] else 'DOWN'}\") for k,v in d.items()]"
# Proxmox VMs
ssh pve 'qm list' 2>/dev/null || echo "PVE: unreachable"
ssh pve2 'qm list' 2>/dev/null || echo "PVE2: unreachable"
# Ping critical devices
ping -c 1 -W 1 10.10.10.200 >/dev/null && echo "TrueNAS: UP" || echo "TrueNAS: DOWN"
ping -c 1 -W 1 10.10.10.1 >/dev/null && echo "Router: UP" || echo "Router: DOWN"
# Check Windows PC Syncthing (often goes offline)
nc -zw1 10.10.10.150 22000 && echo "Windows Syncthing: UP" || echo "Windows Syncthing: DOWN"
```
### Troubleshooting Runbooks
| Symptom | Check | Fix |
|---------|-------|-----|
| Device not syncing | `curl Syncthing API → connections` | Check if device online, restart Syncthing |
| Windows PC offline | `ping 10.10.10.150` then `nc -z 22000` | SSH in, `Start-ScheduledTask -TaskName "Syncthing"` |
| Phone not syncing | Phone Syncthing app in background? | User must open app, keep screen on |
| High CPU on TrueNAS | Syncthing rescan? KSM? | Check rescan intervals, disable KSM |
| VM won't start | Storage available? RAM free? | `ssh pve 'qm start VMID'`, check logs |
| Tailscale offline | `tailscale status` | `tailscale up` or restart service |
| Sync stuck at X% | Folder errors? Conflicts? | Check `rest/folder/errors?folder=NAME` |
| Server running hot | Check KSM, check CPU processes | Disable KSM, identify runaway process |
| Storage enclosure loud | Check fan speed via SES | See [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) |
| Drives not detected | Check SAS link, LCC status | Switch LCC, rescan SCSI hosts |
### Server Temperature Check
```bash
# Check temps on both servers (Threadripper PRO max safe: 90°C Tctl)
ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE Tctl: $(($(cat $f)/1000))°C"; fi; done'
ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE2 Tctl: $(($(cat $f)/1000))°C"; fi; done'
```
**Healthy temps**: 70-80°C under load. **Warning**: >85°C. **Throttle**: 90°C.
### Service Dependencies
```
TrueNAS (10.10.10.200)
├── Central Syncthing hub - if down, sync breaks between devices
├── NFS/SMB shares for VMs
└── Media storage for Plex
PiHole (CT 200)
└── DNS for entire network - if down, name resolution fails
Traefik (CT 202)
└── Reverse proxy - if down, external access to services fails
Router (10.10.10.1)
└── Everything - gateway for all traffic
```
### API Quick Reference
| Service | Device | Endpoint | Auth |
|---------|--------|----------|------|
| Syncthing | Mac Mini | `http://127.0.0.1:8384/rest/` | `X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5` |
| Syncthing | MacBook | `http://127.0.0.1:8384/rest/` (via SSH) | `X-API-Key: qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ` |
| Syncthing | Phone | `https://10.10.10.54:8384/rest/` | `X-API-Key: Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM` |
| Proxmox | PVE | `https://10.10.10.120:8006/api2/json/` | SSH key auth |
| Proxmox | PVE2 | `https://10.10.10.102:8006/api2/json/` | SSH key auth |
### Common Maintenance Tasks
When user asks for maintenance or you notice issues:
1. **Check Syncthing sync status** - Any folders behind? Errors?
2. **Verify all devices connected** - Run connection check
3. **Check disk space** - `ssh pve 'df -h'`, `ssh pve2 'df -h'`
4. **Review ZFS pool health** - `ssh pve 'zpool status'`
5. **Check for stuck processes** - High CPU? Memory pressure?
6. **Verify backups** - Are critical folders syncing?
### Emergency Commands
```bash
# Restart VM on Proxmox
ssh pve 'qm stop VMID && qm start VMID'
# Check what's using CPU
ssh pve 'ps aux --sort=-%cpu | head -10'
# Check ZFS pool status (via QEMU agent)
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
# Check EMC enclosure fans
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
# Force Syncthing rescan
curl -X POST "http://127.0.0.1:8384/rest/db/scan?folder=FOLDER" -H "X-API-Key: API_KEY"
# Restart Syncthing on Windows (when stuck)
sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Stop-Process -Name syncthing -Force; Start-ScheduledTask -TaskName "Syncthing"'
# Get all device IPs from router
expect -c 'spawn ssh root@10.10.10.1 "cat /proc/net/arp"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
```
## Overview
Two Proxmox servers running various VMs and containers for home infrastructure, media, development, and AI workloads.
## Servers
### PVE (10.10.10.120) - Primary
- **CPU**: AMD Ryzen Threadripper PRO 3975WX (32-core, 64 threads, 280W TDP)
- **RAM**: 128 GB
- **Storage**:
- `nvme-mirror1`: 2x Sabrent Rocket Q NVMe (3.6TB usable)
- `nvme-mirror2`: 2x Kingston SFYRD 2TB (1.8TB usable)
- `rpool`: 2x Samsung 870 QVO 4TB SSD mirror (3.6TB usable)
- **GPUs**:
- NVIDIA Quadro P2000 (75W TDP) - Plex transcoding
- NVIDIA TITAN RTX (280W TDP) - AI workloads, passed to saltbox/lmdev1
- **Role**: Primary VM host, TrueNAS, media services
### PVE2 (10.10.10.102) - Secondary
- **CPU**: AMD Ryzen Threadripper PRO 3975WX (32-core, 64 threads, 280W TDP)
- **RAM**: 128 GB
- **Storage**:
- `nvme-mirror3`: 2x NVMe mirror
- `local-zfs2`: 2x WD Red 6TB HDD mirror
- **GPUs**:
- NVIDIA RTX A6000 (300W TDP) - passed to trading-vm
- **Role**: Trading platform, development
## SSH Access
### SSH Key Authentication (All Hosts)
SSH keys are configured in `~/.ssh/config` on both Mac Mini and MacBook. Use the `~/.ssh/homelab` key.
| Host Alias | IP | User | Type | Notes |
|------------|-----|------|------|-------|
| `pve` | 10.10.10.120 | root | Proxmox | Primary server |
| `pve2` | 10.10.10.102 | root | Proxmox | Secondary server |
| `truenas` | 10.10.10.200 | root | VM | NAS/storage |
| `saltbox` | 10.10.10.100 | hutson | VM | Media automation |
| `lmdev1` | 10.10.10.111 | hutson | VM | AI/LLM development |
| `docker-host` | 10.10.10.206 | hutson | VM | Docker services |
| `fs-dev` | 10.10.10.5 | hutson | VM | Development |
| `copyparty` | 10.10.10.201 | hutson | VM | File sharing |
| `gitea-vm` | 10.10.10.220 | hutson | VM | Git server |
| `trading-vm` | 10.10.10.221 | hutson | VM | AI trading platform |
| `pihole` | 10.10.10.10 | root | LXC | DNS/Ad blocking |
| `traefik` | 10.10.10.250 | root | LXC | Reverse proxy |
| `findshyt` | 10.10.10.8 | root | LXC | Custom app |
**Usage examples:**
```bash
ssh pve 'qm list' # List VMs
ssh truenas 'zpool status vault' # Check ZFS pool
ssh saltbox 'docker ps' # List containers
ssh pihole 'pihole status' # Check Pi-hole
```
### Password Auth (Special Cases)
| Device | IP | User | Auth Method | Notes |
|--------|-----|------|-------------|-------|
| UniFi Router | 10.10.10.1 | root | expect (keyboard-interactive) | Gateway |
| Windows PC | 10.10.10.150 | claude | sshpass | PowerShell, use `;` not `&&` |
| HomeAssistant | 10.10.10.110 | - | QEMU agent only | No SSH server |
**Router access (requires expect):**
```bash
# Run command on router
expect -c 'spawn ssh root@10.10.10.1 "hostname"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
# Get ARP table (all device IPs)
expect -c 'spawn ssh root@10.10.10.1 "cat /proc/net/arp"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
```
**Windows PC access:**
```bash
sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Get-Process | Select -First 5'
```
**HomeAssistant (no SSH, use QEMU agent):**
```bash
ssh pve 'qm guest exec 110 -- bash -c "ha core info"'
```
## VMs and Containers
### PVE (10.10.10.120)
| VMID | Name | vCPUs | RAM | Purpose | GPU/Passthrough | QEMU Agent |
|------|------|-------|-----|---------|-----------------|------------|
| 100 | truenas | 8 | 32GB | NAS, storage | LSI SAS2308 HBA, Samsung NVMe | Yes |
| 101 | saltbox | 16 | 16GB | Media automation | TITAN RTX | Yes |
| 105 | fs-dev | 10 | 8GB | Development | - | Yes |
| 110 | homeassistant | 2 | 2GB | Home automation | - | No |
| 111 | lmdev1 | 8 | 32GB | AI/LLM development | TITAN RTX | Yes |
| 201 | copyparty | 2 | 2GB | File sharing | - | Yes |
| 206 | docker-host | 2 | 4GB | Docker services | - | Yes |
| 200 | pihole (CT) | - | - | DNS/Ad blocking | - | N/A |
| 202 | traefik (CT) | - | - | Reverse proxy | - | N/A |
| 205 | findshyt (CT) | - | - | Custom app | - | N/A |
### PVE2 (10.10.10.102)
| VMID | Name | vCPUs | RAM | Purpose | GPU/Passthrough | QEMU Agent |
|------|------|-------|-----|---------|-----------------|------------|
| 300 | gitea-vm | 2 | 4GB | Git server | - | Yes |
| 301 | trading-vm | 16 | 32GB | AI trading platform | RTX A6000 | Yes |
### QEMU Guest Agent
VMs with QEMU agent can be managed via `qm guest exec`:
```bash
# Execute command in VM
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
# Get VM IP addresses
ssh pve 'qm guest exec 100 -- bash -c "ip addr"'
```
Only VM 110 (homeassistant) lacks QEMU agent - use its web UI instead.
## Power Management
### Estimated Power Draw
- **PVE**: 500-750W (CPU + TITAN RTX + P2000 + storage + HBAs)
- **PVE2**: 450-600W (CPU + RTX A6000 + storage)
- **Combined**: ~1000-1350W under load
### Optimizations Applied
1. **KSMD Disabled** (2024-12-17 updated)
- Was consuming 44-57% CPU on PVE with negative profit
- Caused CPU temp to rise from 74°C to 83°C
- Savings: ~7-10W + significant temp reduction
- Made permanent via:
- systemd service: `/etc/systemd/system/disable-ksm.service`
- **ksmtuned masked**: `systemctl mask ksmtuned` (prevents re-enabling)
- **Note**: KSM can get re-enabled by Proxmox updates. If CPU is hot, check:
```bash
cat /sys/kernel/mm/ksm/run # Should be 0
ps aux | grep ksmd # Should show 0% CPU
# If KSM is running (run=1), disable it:
echo 0 > /sys/kernel/mm/ksm/run
systemctl mask ksmtuned
```
2. **Syncthing Rescan Intervals** (2024-12-16)
- Changed aggressive 60s rescans to 3600s for large folders
- Affected: downloads (38GB), documents (11GB), desktop (7.2GB), movies, pictures, notes, config
- Savings: ~60-80W (TrueNAS VM was at constant 86% CPU)
3. **CPU Governor Optimization** (2024-12-16)
- PVE: `powersave` governor + `balance_power` EPP (amd-pstate-epp driver)
- PVE2: `schedutil` governor (acpi-cpufreq driver)
- Made permanent via systemd service: `/etc/systemd/system/cpu-powersave.service`
- Savings: ~60-120W combined (CPUs now idle at 1.7-2.2GHz vs 4GHz)
4. **GPU Power States** (2024-12-16) - Verified optimal
- RTX A6000: 11W idle (P8 state)
- TITAN RTX: 2-3W idle (P8 state)
- Quadro P2000: 25W (P0 - Plex keeps it active)
5. **ksmtuned Disabled** (2024-12-16)
- KSM tuning daemon was still running after KSMD disabled
- Stopped and disabled on both servers
- Savings: ~2-5W
6. **HDD Spindown on PVE2** (2024-12-16)
- local-zfs2 pool (2x WD Red 6TB) had only 768KB used but drives spinning 24/7
- Set 30-minute spindown via `hdparm -S 241`
- Persistent via udev rule: `/etc/udev/rules.d/69-hdd-spindown.rules`
- Savings: ~10-16W when spun down
### Potential Optimizations
- [ ] PCIe ASPM power management
- [ ] NMI watchdog disable
## Memory Configuration
- Ballooning enabled on most VMs but not actively used
- No memory overcommit (98GB allocated on 128GB physical for PVE)
- KSMD was wasting CPU with no benefit (negative general_profit)
## Network
See [NETWORK.md](NETWORK.md) for full details.
### Network Ranges
| Network | Range | Purpose |
|---------|-------|---------|
| LAN | 10.10.10.0/24 | Primary network, all external access |
| Internal | 10.10.20.0/24 | Inter-VM only (storage, NFS/iSCSI) |
### PVE Bridges (10.10.10.120)
| Bridge | NIC | Speed | Purpose | Use For |
|--------|-----|-------|---------|---------|
| vmbr0 | enp1s0 | 1 Gb | Management | General VMs/CTs |
| vmbr1 | enp35s0f0 | 10 Gb | High-speed LXC | Bandwidth-heavy containers |
| vmbr2 | enp35s0f1 | 10 Gb | High-speed VM | TrueNAS, Saltbox, storage VMs |
| vmbr3 | (none) | Virtual | Internal only | NFS/iSCSI traffic, no internet |
### Quick Reference
```bash
# Add VM to standard network (1Gb)
qm set VMID --net0 virtio,bridge=vmbr0
# Add VM to high-speed network (10Gb)
qm set VMID --net0 virtio,bridge=vmbr2
# Add secondary NIC for internal storage network
qm set VMID --net1 virtio,bridge=vmbr3
```
- MTU 9000 (jumbo frames) on all bridges
## Common Commands
```bash
# Check VM status
ssh pve 'qm list'
ssh pve2 'qm list'
# Check container status
ssh pve 'pct list'
# Monitor CPU/power
ssh pve 'top -bn1 | head -20'
# Check ZFS pools
ssh pve 'zpool status'
# Check GPU (if nvidia-smi installed in VM)
ssh pve 'lspci | grep -i nvidia'
```
## Remote Claude Code Sessions (Mac Mini)
### Overview
The Mac Mini (`hutson-mac-mini.local`) runs the Happy Coder daemon, enabling on-demand Claude Code sessions accessible from anywhere via the Happy Coder mobile app. Sessions are created when you need them - no persistent tmux sessions required.
### Architecture
```
Mac Mini (100.108.89.58 via Tailscale)
├── launchd (auto-starts on boot)
│ └── com.hutson.happy-daemon.plist (starts Happy daemon)
├── Happy Coder daemon (manages remote sessions)
└── Tailscale (secure remote access)
```
### How It Works
1. Happy daemon runs on Mac Mini (auto-starts on boot)
2. Open Happy Coder app on phone/tablet
3. Start a new Claude session from the app
4. Session runs in any working directory you choose
5. Session ends when you're done - no cleanup needed
### Quick Commands
```bash
# Check daemon status
happy daemon list
# Start a new session manually (from Mac Mini terminal)
cd ~/Projects/homelab && happy claude
# Check active sessions
happy daemon list
```
### Mobile Access Setup (One-time)
1. Download Happy Coder app:
- iOS: https://apps.apple.com/us/app/happy-claude-code-client/id6748571505
- Android: https://play.google.com/store/apps/details?id=com.ex3ndr.happy
2. On Mac Mini, run: `happy auth` and scan QR code with the app
3. Daemon auto-starts on boot via launchd
### Daemon Management
```bash
happy daemon start # Start daemon
happy daemon stop # Stop daemon
happy daemon status # Check status
happy daemon list # List active sessions
```
### Remote Access via SSH + Tailscale
From any device on Tailscale network:
```bash
# SSH to Mac Mini
ssh hutson@100.108.89.58
# Or via hostname
ssh hutson@mac-mini
# Start Claude in desired directory
cd ~/Projects/homelab && happy claude
```
### Files & Configuration
| File | Purpose |
|------|---------|
| `~/Library/LaunchAgents/com.hutson.happy-daemon.plist` | launchd auto-start Happy daemon |
| `~/.happy/` | Happy Coder config and logs |
### Troubleshooting
```bash
# Check if daemon is running
pgrep -f "happy.*daemon"
# Check launchd status
launchctl list | grep happy
# List active sessions
happy daemon list
# Restart daemon
happy daemon stop && happy daemon start
# If Tailscale is disconnected
/Applications/Tailscale.app/Contents/MacOS/Tailscale up
```
## Agent and Tool Guidelines
### Background Agents
- **Always spin up background agents when doing multiple independent tasks**
- Background agents allow parallel execution of tasks that don't depend on each other
- This improves efficiency and reduces total execution time
- Use background agents for tasks like running tests, builds, or searches simultaneously
### MCP Tools for Web Searches
#### ref.tools - Documentation Lookups
- **`mcp__Ref__ref_search_documentation`**: Search through documentation for specific topics
- **`mcp__Ref__ref_read_url`**: Read and parse content from documentation URLs
#### Exa MCP - General Web and Code Searches
- **`mcp__exa__web_search_exa`**: General web searches for current information
- **`mcp__exa__get_code_context_exa`**: Code-related searches and repository lookups
### MCP Tools Reference Table
| Tool Name | Provider | Purpose | Use Case |
|-----------|----------|---------|----------|
| `mcp__Ref__ref_search_documentation` | ref.tools | Search documentation | Finding specific topics in official docs |
| `mcp__Ref__ref_read_url` | ref.tools | Read documentation URLs | Parsing and extracting content from doc pages |
| `mcp__exa__web_search_exa` | Exa MCP | General web search | Current events, general information lookup |
| `mcp__exa__get_code_context_exa` | Exa MCP | Code-specific search | Finding code examples, repository searches |
## Reverse Proxy Architecture (Traefik)
### Overview
There are **TWO separate Traefik instances** handling different services:
| Instance | Location | IP | Purpose | Manages |
|----------|----------|-----|---------|---------|
| **Traefik-Primary** | CT 202 | **10.10.10.250** | General services | All non-Saltbox services |
| **Traefik-Saltbox** | VM 101 (Docker) | **10.10.10.100** | Saltbox services | Plex, *arr apps, media stack |
### ⚠️ CRITICAL RULE: Which Traefik to Use
**When adding ANY new service:**
- ✅ **Use Traefik-Primary (10.10.10.250)** - Unless service lives inside Saltbox VM
- ❌ **DO NOT touch Traefik-Saltbox** - It manages Saltbox services with their own certificates
**Why this matters:**
- Traefik-Saltbox has complex Saltbox-managed configs
- Messing with it breaks Plex, Sonarr, Radarr, and all media services
- Each Traefik has its own Let's Encrypt certificates
- Mixing them causes certificate conflicts
### Traefik-Primary (CT 202) - For New Services
**Location**: `/etc/traefik/` on Container 202
**Config**: `/etc/traefik/traefik.yaml`
**Dynamic Configs**: `/etc/traefik/conf.d/*.yaml`
**Services using Traefik-Primary (10.10.10.250):**
- excalidraw.htsn.io → 10.10.10.206:8080 (docker-host)
- findshyt.htsn.io → 10.10.10.205 (CT 205)
- gitea (git.htsn.io) → 10.10.10.220:3000
- homeassistant → 10.10.10.110
- lmdev → 10.10.10.111
- pihole → 10.10.10.200
- truenas → 10.10.10.200
- proxmox → 10.10.10.120
- copyparty → 10.10.10.201
- aitrade → trading server
- pulse.htsn.io → 10.10.10.206:7655 (Pulse monitoring)
**Access Traefik config:**
```bash
# From Mac Mini:
ssh pve 'pct exec 202 -- cat /etc/traefik/traefik.yaml'
ssh pve 'pct exec 202 -- ls /etc/traefik/conf.d/'
# Edit a service config:
ssh pve 'pct exec 202 -- vi /etc/traefik/conf.d/myservice.yaml'
```
### Traefik-Saltbox (VM 101) - DO NOT MODIFY
**Location**: `/opt/traefik/` inside Saltbox VM
**Managed by**: Saltbox Ansible playbooks
**Mounts**: Docker bind mount from `/opt/traefik` → `/etc/traefik` in container
**Services using Traefik-Saltbox (10.10.10.100):**
- Plex (plex.htsn.io)
- Sonarr, Radarr, Lidarr
- SABnzbd, NZBGet, qBittorrent
- Overseerr, Tautulli, Organizr
- Jackett, NZBHydra2
- Authelia (SSO)
- All other Saltbox-managed containers
**View Saltbox Traefik (read-only):**
```bash
ssh pve 'qm guest exec 101 -- bash -c "docker exec traefik cat /etc/traefik/traefik.yml"'
```
### Adding a New Public Service - Complete Workflow
Follow these steps to deploy a new service and make it publicly accessible at `servicename.htsn.io`.
#### Step 0. Deploy Your Service
First, deploy your service on the appropriate host:
**Option A: Docker on docker-host (10.10.10.206)**
```bash
ssh hutson@10.10.10.206
sudo mkdir -p /opt/myservice
cat > /opt/myservice/docker-compose.yml << 'EOF'
version: "3.8"
services:
myservice:
image: myimage:latest
ports:
- "8080:80"
restart: unless-stopped
EOF
cd /opt/myservice && sudo docker-compose up -d
```
**Option B: New LXC Container on PVE**
```bash
ssh pve 'pct create CTID local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
--hostname myservice --memory 2048 --cores 2 \
--net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1 \
--rootfs local-zfs:8 --unprivileged 1 --start 1'
```
**Option C: New VM on PVE**
```bash
ssh pve 'qm create VMID --name myservice --memory 2048 --cores 2 \
--net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-pci'
```
#### Step 1. Create Traefik Config File
Use this template for new services on **Traefik-Primary (CT 202)**:
```yaml
# /etc/traefik/conf.d/myservice.yaml
http:
routers:
# HTTPS router
myservice-secure:
entryPoints:
- websecure
rule: "Host(`myservice.htsn.io`)"
service: myservice
tls:
certResolver: cloudflare # Use 'cloudflare' for proxied domains, 'letsencrypt' for DNS-only
priority: 50
# HTTP → HTTPS redirect
myservice-redirect:
entryPoints:
- web
rule: "Host(`myservice.htsn.io`)"
middlewares:
- myservice-https-redirect
service: myservice
priority: 50
services:
myservice:
loadBalancer:
servers:
- url: "http://10.10.10.XXX:PORT"
middlewares:
myservice-https-redirect:
redirectScheme:
scheme: https
permanent: true
```
### SSL Certificates
Traefik has **two certificate resolvers** configured:
| Resolver | Use When | Challenge Type | Notes |
|----------|----------|----------------|-------|
| `letsencrypt` | Cloudflare DNS-only (gray cloud) | HTTP-01 | Requires port 80 reachable |
| `cloudflare` | Cloudflare Proxied (orange cloud) | DNS-01 | Works with Cloudflare proxy |
**⚠️ Important:** If Cloudflare proxy is enabled (orange cloud), HTTP challenge fails because Cloudflare redirects HTTP→HTTPS. Use `cloudflare` resolver instead.
**Cloudflare API credentials** are configured in `/etc/systemd/system/traefik.service`:
```bash
Environment="CF_API_EMAIL=cloudflare@htsn.io"
Environment="CF_API_KEY=849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
```
**Certificate storage:**
- HTTP challenge certs: `/etc/traefik/acme.json`
- DNS challenge certs: `/etc/traefik/acme-cf.json`
**Deploy the config:**
```bash
# Create file on CT 202
ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << '\''EOF'\''
<paste config here>
EOF"'
# Traefik auto-reloads (watches conf.d directory)
# Check logs:
ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
```
#### 2. Add Cloudflare DNS Entry
**Cloudflare Credentials:**
- Email: `cloudflare@htsn.io`
- API Key: `849ebefd163d2ccdec25e49b3e1b3fe2cdadc`
**Manual method (via Cloudflare Dashboard):**
1. Go to https://dash.cloudflare.com/
2. Select `htsn.io` domain
3. DNS → Add Record
4. Type: `A`, Name: `myservice`, IPv4: `70.237.94.174`, Proxied: ☑️
**Automated method (CLI script):**
Save this as `~/bin/add-cloudflare-dns.sh`:
```bash
#!/bin/bash
# Add DNS record to Cloudflare for htsn.io
SUBDOMAIN="$1"
CF_EMAIL="cloudflare@htsn.io"
CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
ZONE_ID="c0f5a80448c608af35d39aa820a5f3af" # htsn.io zone
PUBLIC_IP="70.237.94.174" # Update if IP changes: curl -s ifconfig.me
if [ -z "$SUBDOMAIN" ]; then
echo "Usage: $0 <subdomain>"
echo "Example: $0 myservice # Creates myservice.htsn.io"
exit 1
fi
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
-H "X-Auth-Email: $CF_EMAIL" \
-H "X-Auth-Key: $CF_API_KEY" \
-H "Content-Type: application/json" \
--data "{
\"type\":\"A\",
\"name\":\"$SUBDOMAIN\",
\"content\":\"$PUBLIC_IP\",
\"ttl\":1,
\"proxied\":true
}" | jq .
```
**Usage:**
```bash
chmod +x ~/bin/add-cloudflare-dns.sh
~/bin/add-cloudflare-dns.sh excalidraw # Creates excalidraw.htsn.io
```
#### 3. Testing
```bash
# Check if DNS resolves
dig myservice.htsn.io
# Test HTTP redirect
curl -I http://myservice.htsn.io
# Test HTTPS
curl -I https://myservice.htsn.io
# Check Traefik dashboard (if enabled)
# Access: http://10.10.10.250:8080/dashboard/
```
#### Step 4. Update Documentation
After deploying, update these files:
1. **IP-ASSIGNMENTS.md** - Add to Services & Reverse Proxy Mapping table
2. **CLAUDE.md** - Add to "Services using Traefik-Primary" list (line ~495)
### Quick Reference - One-Liner Commands
```bash
# === DEPLOY SERVICE (example: myservice on docker-host port 8080) ===
# 1. Create Traefik config
ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << EOF
http:
routers:
myservice-secure:
entryPoints: [websecure]
rule: Host(\\\`myservice.htsn.io\\\`)
service: myservice
tls: {certResolver: letsencrypt}
services:
myservice:
loadBalancer:
servers:
- url: http://10.10.10.206:8080
EOF"'
# 2. Add Cloudflare DNS
curl -s -X POST "https://api.cloudflare.com/client/v4/zones/c0f5a80448c608af35d39aa820a5f3af/dns_records" \
-H "X-Auth-Email: cloudflare@htsn.io" \
-H "X-Auth-Key: 849ebefd163d2ccdec25e49b3e1b3fe2cdadc" \
-H "Content-Type: application/json" \
--data '{"type":"A","name":"myservice","content":"70.237.94.174","proxied":true}'
# 3. Test (wait a few seconds for DNS propagation)
curl -I https://myservice.htsn.io
```
### Traefik Troubleshooting
```bash
# View Traefik logs (CT 202)
ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
# Check if config is valid
ssh pve 'pct exec 202 -- cat /etc/traefik/conf.d/myservice.yaml'
# List all dynamic configs
ssh pve 'pct exec 202 -- ls -la /etc/traefik/conf.d/'
# Check certificate
ssh pve 'pct exec 202 -- cat /etc/traefik/acme.json | jq'
# Restart Traefik (if needed)
ssh pve 'pct exec 202 -- systemctl restart traefik'
```
### Certificate Management
**Let's Encrypt certificates** are automatically managed by Traefik.
**Certificate storage:**
- Traefik-Primary: `/etc/traefik/acme.json` on CT 202
- Traefik-Saltbox: `/opt/traefik/acme.json` on VM 101
**Certificate renewal:**
- Automatic via HTTP-01 challenge
- Traefik checks every 24h
- Renews 30 days before expiry
**If certificates fail:**
```bash
# Check acme.json permissions (must be 600)
ssh pve 'pct exec 202 -- ls -la /etc/traefik/acme.json'
# Check Traefik can reach Let's Encrypt
ssh pve 'pct exec 202 -- curl -I https://acme-v02.api.letsencrypt.org/directory'
# Delete bad certificate (Traefik will re-request)
ssh pve 'pct exec 202 -- rm /etc/traefik/acme.json'
ssh pve 'pct exec 202 -- touch /etc/traefik/acme.json'
ssh pve 'pct exec 202 -- chmod 600 /etc/traefik/acme.json'
ssh pve 'pct exec 202 -- systemctl restart traefik'
```
### Docker Service with Traefik Labels (Alternative)
If deploying a service via Docker on `docker-host` (VM 206), you can use Traefik labels instead of config files:
```yaml
# docker-compose.yml
services:
myservice:
image: myimage:latest
labels:
- "traefik.enable=true"
- "traefik.http.routers.myservice.rule=Host(`myservice.htsn.io`)"
- "traefik.http.routers.myservice.entrypoints=websecure"
- "traefik.http.routers.myservice.tls.certresolver=letsencrypt"
- "traefik.http.services.myservice.loadbalancer.server.port=8080"
networks:
- traefik
networks:
traefik:
external: true
```
**Note**: This requires Traefik to have access to Docker socket and be on same network.
## Cloudflare API Access
**Credentials** (stored in Saltbox config):
- Email: `cloudflare@htsn.io`
- API Key: `849ebefd163d2ccdec25e49b3e1b3fe2cdadc`
- Domain: `htsn.io`
**Retrieve from Saltbox:**
```bash
ssh pve 'qm guest exec 101 -- bash -c "cat /srv/git/saltbox/accounts.yml | grep -A2 cloudflare"'
```
**Cloudflare API Documentation:**
- API Docs: https://developers.cloudflare.com/api/
- DNS Records: https://developers.cloudflare.com/api/operations/dns-records-for-a-zone-create-dns-record
**Common API operations:**
```bash
# Set credentials
CF_EMAIL="cloudflare@htsn.io"
CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
ZONE_ID="c0f5a80448c608af35d39aa820a5f3af"
# List all DNS records
curl -X GET "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
-H "X-Auth-Email: $CF_EMAIL" \
-H "X-Auth-Key: $CF_API_KEY" | jq
# Add A record
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
-H "X-Auth-Email: $CF_EMAIL" \
-H "X-Auth-Key: $CF_API_KEY" \
-H "Content-Type: application/json" \
--data '{"type":"A","name":"subdomain","content":"IP","proxied":true}'
# Delete record
curl -X DELETE "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" \
-H "X-Auth-Email: $CF_EMAIL" \
-H "X-Auth-Key: $CF_API_KEY"
```
## Related Documentation
| File | Description |
|------|-------------|
| [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) | EMC storage enclosure (SES commands, LCC troubleshooting, maintenance) |
| [HOMEASSISTANT.md](HOMEASSISTANT.md) | Home Assistant API access, automations, integrations |
| [NETWORK.md](NETWORK.md) | Network bridges, VLANs, which bridge to use for new VMs |
| [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) | Complete IP address assignments for all devices and services |
| [SYNCTHING.md](SYNCTHING.md) | Syncthing setup, API access, device list, troubleshooting |
| [SHELL-ALIASES.md](SHELL-ALIASES.md) | ZSH aliases for Claude Code (`chomelab`, `ctrading`, etc.) |
| [configs/](configs/) | Symlinks to shared shell configs |
---
## Backlog
Future improvements and maintenance tasks:
| Priority | Task | Notes |
|----------|------|-------|
| Medium | **Re-IP all devices** | Current IP scheme is inconsistent. Plan: VMs 10.10.10.100-199, LXCs 10.10.10.200-249, Services 10.10.10.250-254 |
| Low | Install SSH on HomeAssistant | Currently only accessible via QEMU agent |
| Low | Set up SSH key for router | Currently requires expect/password |
---
## Changelog
### 2024-12-20
**SSH Key Deployment - All Systems**
- Added SSH keys to ALL VMs and LXCs (13 total hosts now accessible via key)
- Updated `~/.ssh/config` with complete host aliases
- Fixed permissions: FindShyt LXC `.ssh` ownership, enabled PermitRootLogin on LXCs
- Hosts now accessible: pve, pve2, truenas, saltbox, lmdev1, docker-host, fs-dev, copyparty, gitea-vm, trading-vm, pihole, traefik, findshyt
**Documentation Updates**
- Rewrote SSH Access section with complete host table
- Added Password Auth section for router/Windows/HomeAssistant
- Added Backlog section with re-IP task
### 2024-12-19
**EMC Storage Enclosure - LCC B Failure**
- Diagnosed loud fan issue (speed code 5 → 4160 RPM)
- Root cause: Faulty LCC B controller causing false readings
- Resolution: Switched SAS cable to LCC A, fans now quiet (speed code 3 → 2670 RPM)
- Replacement ordered: EMC 303-108-000E ($14.95 eBay)
- Created [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) with full documentation
**SSH Key Consolidation**
- Renamed `~/.ssh/ai_trading_ed25519``~/.ssh/homelab`
- Updated `~/.ssh/config` on MacBook with all homelab hosts
- SSH key auth now works for: pve, pve2, docker-host, fs-dev, copyparty, lmdev1, gitea-vm, trading-vm
- No more sshpass needed for PVE servers
**QEMU Guest Agent Deployment**
- Installed on: docker-host (206), fs-dev (105), copyparty (201)
- All PVE VMs now have agent except homeassistant (110)
- Can now use `qm guest exec` for remote commands
**VM Configuration Updates**
- docker-host: Fixed SSH key in cloud-init
- fs-dev: Fixed `.ssh` directory ownership (1000 → 1001)
- copyparty: Changed from DHCP to static IP (10.10.10.201)
**Documentation Updates**
- Updated CLAUDE.md SSH section (removed sshpass examples)
- Added QEMU Agent column to VM tables
- Added storage enclosure troubleshooting to runbooks

247
EMC-ENCLOSURE.md Normal file
View File

@@ -0,0 +1,247 @@
# EMC Storage Enclosure Documentation
## Hardware Overview
| Component | Details |
|-----------|---------|
| **Model** | EMC ESES Viper DAE (KTN-STL3) |
| **Capacity** | 15x 3.5" SAS/SATA drive bays |
| **SES Device** | `/dev/sg15` (on TrueNAS) |
| **Connection** | SAS to LSI SAS2308 HBA (mpt2sas driver) |
| **Location** | Connected to PVE (10.10.10.120) via TrueNAS VM |
## Components
### LCC Controllers (Link Control Cards)
The enclosure has **dual LCC controllers** for redundancy:
| Controller | Slot | Status | Notes |
|------------|------|--------|-------|
| **LCC A** | Left | Working | Currently in use |
| **LCC B** | Right | Faulty | Causes high fan speed, SAS discovery failure |
**Replacement Part**: EMC 303-108-000E VIPER 6G SAS LCC (~$15 on eBay)
### Power Supplies
Two redundant PSUs with integrated fans.
### Fans
Multiple cooling fans controlled by enclosure firmware. Fan speeds are **automatically managed** based on temperature - manual override is not supported on EMC ESES enclosures.
**Fan Speed Codes**:
| Code | Description | RPM (approx) |
|------|-------------|--------------|
| 1 | Lowest | ~1500 |
| 2 | Second lowest | ~2000 |
| 3 | Third lowest | ~2670 |
| 4 | Medium | ~3300 |
| 5 | Fifth | ~4160 |
| 6 | Sixth | ~4800 |
| 7 | Highest | ~5500+ |
## ZFS Pool Using This Enclosure
```
Pool: vault
Size: 164TB raidz1
Drives: 13x HDD in raidz1 + special mirror + NVMe cache/log
Mount: /mnt/vault on TrueNAS
```
## SES Commands Reference
All commands run from TrueNAS (VM 100):
```bash
# Check overall enclosure status
sg_ses -p 0x02 /dev/sg15
# Check fan speeds
sg_ses --index=coo,-1 --get=speed_code /dev/sg15
# Check temperatures
sg_ses -p 0x02 /dev/sg15 | grep -E "(Temperature|Cooling)"
# Check PSU status
sg_ses -p 0x02 /dev/sg15 | grep -A5 "Power supply"
# Check LCC controller status
sg_ses -p 0x02 /dev/sg15 | grep -A5 "Enclosure services controller"
# List all SES elements
sg_ses -p 0x07 /dev/sg15
# Identify enclosure (flash LEDs)
sg_ses --index=enc,0 --set=ident:1 /dev/sg15
```
### Running SES Commands via Proxmox
```bash
# From Mac (via SSH key auth)
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15"'
# Quick fan check
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
# Quick temp check
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep Temperature"'
```
## Troubleshooting
### Symptom: Fans Running Loud (Speed 5+)
**Possible Causes**:
1. **Faulty LCC controller** - Switch to other LCC
2. **High temperatures** - Check temp sensors
3. **PSU issue** - Check PSU status via SES
4. **Failed drive** - Check drive status LEDs
**Diagnosis Steps**:
```bash
# 1. Check current fan speed
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
# Normal: 1-3, High: 4-5, Critical: 6-7
# 2. Check temperatures
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep Temperature"'
# Normal: 25-40C, Warning: 45-50C, Critical: 55C+
# 3. Check for component failures
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep -i fail"'
# 4. If no obvious cause, try switching LCC
# Power down enclosure, move SAS cable to other LCC port
```
### Symptom: Drives Not Detected After Enclosure Power Cycle
**Possible Causes**:
1. Enclosure not fully initialized (wait for green LEDs to stop blinking)
2. Faulty LCC controller
3. SAS cable loose
4. HBA needs rescan
**Diagnosis Steps**:
```bash
# 1. Check SAS link status
cat /sys/class/sas_phy/*/negotiated_linkrate
# 2. Check for expanders (should show enclosure)
lsscsi -g | grep -i enclo
# 3. Force HBA rescan
echo "- - -" > /sys/class/scsi_host/host0/scan
# 4. If no expander, check SAS cable and try other LCC port
```
### Symptom: Pool Won't Import After Enclosure Maintenance
```bash
# 1. Wait for enclosure to fully initialize (1-2 minutes)
# 2. Rescan for devices
echo "- - -" > /sys/class/scsi_host/host0/scan
# 3. Import pool
zpool import vault
# 4. If read-only mount issues, reboot TrueNAS
ssh pve 'qm reboot 100'
```
## Maintenance Procedures
### Safe Shutdown for Enclosure Maintenance
```bash
# 1. Stop services using the pool
ssh pve 'qm guest exec 101 -- bash -c "docker stop \$(docker ps -q)"'
# 2. Shutdown TrueNAS (auto-exports ZFS pool)
ssh pve 'qm shutdown 100 --timeout 120'
# 3. Wait for TrueNAS to fully stop
ssh pve 'while qm status 100 | grep -q running; do sleep 5; done'
# 4. Power off enclosure
# (Physical switch or PDU)
# 5. Perform maintenance
# 6. Power on enclosure, wait for initialization (green LEDs solid)
# 7. Start TrueNAS
ssh pve 'qm start 100'
# 8. Verify pool imported
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
```
### Hot-Swap LCC Controller
LCCs can be hot-swapped while enclosure is running:
1. Order replacement LCC (EMC 303-108-000E)
2. Move SAS cable to working LCC (if not already)
3. Wait for drives to come online via new LCC
4. Remove faulty LCC
5. Install replacement LCC
6. Optionally move SAS cable back to original port
## Incident Log
### 2024-12-19: LCC B Failure
**Symptoms**:
- Fans running at speed code 5 (~4160 RPM) - very loud
- After enclosure power cycle, drives not detected
- SAS link UP (4 PHYs at 6.0 Gbit) but no expander discovery
**Root Cause**:
LCC B controller malfunction causing:
- False temperature/error readings → high fan speed
- SAS expander not responding → drives not enumerated
**Resolution**:
1. Moved SAS cable from LCC B to LCC A
2. Drives immediately appeared
3. Fan speed dropped to code 3 (2670 RPM) - quiet
4. Imported vault pool, all data intact
**Replacement Ordered**:
- Part: EMC 303-108-000E VIPER 6G SAS LCC
- Source: eBay
- Price: $14.95 + free shipping
## LED Status Reference
### Drive LEDs
| LED | Color | Status |
|-----|-------|--------|
| Solid Blue | Power | Drive has power |
| Blinking Blue | Activity | I/O in progress |
| Solid Amber | Fault | Drive failed |
| Blinking Amber | Identify | Drive being located |
### LCC LEDs
| LED | Color | Status |
|-----|-------|--------|
| Solid Green | Link | SAS connection active |
| Blinking Green | Activity | Data transfer |
| Amber | Fault | LCC issue |
### PSU LEDs
| LED | Color | Status |
|-----|-------|--------|
| Solid Green | OK | Power supply healthy |
| Off | No Power | No AC input |
| Amber | Fault | PSU failure |
## Related Documentation
- [CLAUDE.md](CLAUDE.md) - Main homelab documentation
- [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - Network configuration
- TrueNAS Web UI: https://10.10.10.200

145
HOMEASSISTANT.md Normal file
View File

@@ -0,0 +1,145 @@
# Home Assistant
## Overview
| Setting | Value |
|---------|-------|
| VM ID | 110 |
| Host | PVE (10.10.10.120) |
| IP Address | 10.10.10.210 (DHCP - should be static) |
| Port | 8123 |
| Web UI | http://10.10.10.210:8123 |
| OS | Home Assistant OS 16.3 |
| Version | 2025.11.3 (update available: 2025.12.3) |
## API Access
Home Assistant uses Long-Lived Access Tokens for API authentication.
### Getting an API Token
1. Go to http://10.10.10.210:8123
2. Click your profile (bottom left)
3. Scroll to "Long-Lived Access Tokens"
4. Click "Create Token"
5. Name it (e.g., "Claude Code")
6. Copy the token (only shown once!)
### API Configuration
```
API_URL: http://10.10.10.210:8123/api
API_TOKEN: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIwZThjZmJjMzVlNDA0NzYwOTMzMjg3MTQ5ZjkwOGU2NyIsImlhdCI6MTc2NTk5MjQ4OCwiZXhwIjoyMDgxMzUyNDg4fQ.r743tsb3E5NNlrwEEu9glkZdiI4j_3SKIT1n5PGUytY
```
### API Examples
```bash
# Set these variables
HA_URL="http://10.10.10.210:8123"
HA_TOKEN="your-token-here"
# Check API is working
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/"
# Get all states
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq
# Get specific entity state
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states/light.living_room" | jq
# Turn on a light
curl -X POST -H "Authorization: Bearer $HA_TOKEN" \
-H "Content-Type: application/json" \
-d '{"entity_id": "light.living_room"}' \
"$HA_URL/api/services/light/turn_on"
# Turn off a light
curl -X POST -H "Authorization: Bearer $HA_TOKEN" \
-H "Content-Type: application/json" \
-d '{"entity_id": "light.living_room"}' \
"$HA_URL/api/services/light/turn_off"
# Call any service
curl -X POST -H "Authorization: Bearer $HA_TOKEN" \
-H "Content-Type: application/json" \
-d '{"entity_id": "switch.my_switch"}' \
"$HA_URL/api/services/switch/toggle"
```
## Common Tasks
### List All Entities
```bash
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '.[].entity_id'
```
### List Entities by Domain
```bash
# All lights
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '[.[] | select(.entity_id | startswith("light."))]'
# All switches
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '[.[] | select(.entity_id | startswith("switch."))]'
# All sensors
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '[.[] | select(.entity_id | startswith("sensor."))]'
```
### Get Entity History
```bash
# Last 24 hours for an entity
curl -s -H "Authorization: Bearer $HA_TOKEN" \
"$HA_URL/api/history/period?filter_entity_id=sensor.temperature" | jq
```
## Device Summary
**265 total entities**
| Domain | Count | Examples |
|--------|-------|----------|
| scene | 87 | Lighting scenes |
| light | 41 | Kitchen, Living room, Bedroom, Office, Cabinet, etc. |
| switch | 36 | Automations, Sonos controls, Motion sensors |
| sensor | 28 | Various sensors |
| number | 21 | Settings/controls |
| event | 17 | Event triggers |
| binary_sensor | 13 | Motion, door sensors |
| media_player | 8 | Sonos speakers (Bedroom, Living Room, Kitchen, Console) |
### Lights by Room
- **Kitchen**: Kitchen light
- **Living Room**: Living room, Living Room Lamp, TV Bias
- **Bedroom**: Bedroom, Bedside Lamp 1 & 2, Dresser
- **Office**: Office, Office Floor Lamp, Office Lamp
- **Guest Room**: Guest Bed Left, Guest Lamp Right
- **Other**: Cabinet 1 & 2, Pantry, Bathroom, Front Porch, etc.
### Sonos Speakers
- Bedroom (with surround)
- Living Room (with surround)
- Kitchen
- Console
### Motion Sensors
- Kitchen Motion
- Office Sensor
## Integrations
- **Philips Hue** - Lights
- **Sonos** - Speakers
- **Motion Sensors** - Various locations
## Automations
TODO: Document key automations
## TODO
- [ ] Set static IP (currently DHCP at .210, should be .110)
- [ ] Add API token to this document
- [ ] Document installed integrations
- [ ] Document automations
- [ ] Set up Traefik reverse proxy (ha.htsn.io)

330
INFRASTRUCTURE.md Normal file
View File

@@ -0,0 +1,330 @@
# Homelab Infrastructure Documentation
## Network Topology
```
┌─────────────────┐
│ Internet │
└────────┬────────┘
┌────────▼────────┐
│ Router/Firewall │
│ 10.10.10.1 │
└────────┬────────┘
┌────────────────────────┼────────────────────────┐
│ │ │
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
│ Main Switch │ │ Storage VLAN │ │ Tailscale │
│ vmbr0/vmbr2 │ │ vmbr3 │ │ 100.x.x.x/8 │
│ 10.10.10.0/24 │ │ (Jumbo 9000) │ │ │
└────────┬────────┘ └────────┬────────┘ └─────────────────┘
│ │
┌───────────┼───────────┐ │
│ │ │ │
┌────▼───┐ ┌────▼───┐ ┌────▼───┐ │
│ PVE │ │ PVE2 │ │ Other │ │
│ .120 │ │ .102 │ │ Devices│ │
└────┬───┘ └────┬───┘ └────────┘ │
│ │ │
└───────────┴────────────────────────┘
┌───────▼───────┐
│ TrueNAS │
│ (Storage via │
│ HBA/NVMe) │
└───────────────┘
```
## IP Address Assignments
### Management Network (10.10.10.0/24)
| IP Address | Hostname | Description |
|------------|----------|-------------|
| 10.10.10.1 | router | Gateway/Firewall |
| 10.10.10.102 | pve2 | Proxmox Server 2 |
| 10.10.10.120 | pve | Proxmox Server 1 (Primary) |
| 10.10.10.123 | mac-mini | Mac Mini (Syncthing node) |
| 10.10.10.150 | windows-pc | Windows PC (Syncthing node) |
| 10.10.10.147 | macbook | MacBook Pro (Syncthing node) |
| 10.10.10.200 | truenas | TrueNAS (Storage/Syncthing hub) |
| 10.10.10.220 | gitea-vm | Git Server |
| 10.10.10.221 | trading-vm | AI Trading Platform |
### Tailscale Network (100.x.x.x)
| IP Address | Hostname | Description |
|------------|----------|-------------|
| 100.88.161.110 | macbook | MacBook |
| 100.106.175.37 | phone | Mobile Device |
| 100.108.89.58 | mac-mini | Mac Mini |
---
## Server Hardware
### PVE (10.10.10.120) - Primary Virtualization Host
| Component | Specification |
|-----------|---------------|
| **CPU** | AMD Ryzen Threadripper PRO 3975WX (32C/64T, 280W TDP) |
| **RAM** | 128 GB DDR4 ECC |
| **Boot** | Samsung 870 QVO 4TB (mirrored) |
| **NVMe Pool 1** | 2x Sabrent Rocket Q NVMe (nvme-mirror1, 3.6TB) |
| **NVMe Pool 2** | 2x Kingston SFYRD 2TB (nvme-mirror2, 1.8TB) |
| **GPU 1** | NVIDIA Quadro P2000 (75W) - Plex transcoding |
| **GPU 2** | NVIDIA TITAN RTX (280W) - AI workloads |
| **HBA** | LSI SAS2308 - Passed to TrueNAS |
| **NVMe Controller** | Samsung PM9A1 - Passed to TrueNAS |
### PVE2 (10.10.10.102) - Secondary Virtualization Host
| Component | Specification |
|-----------|---------------|
| **CPU** | AMD Ryzen Threadripper PRO 3975WX (32C/64T, 280W TDP) |
| **RAM** | 128 GB DDR4 ECC |
| **NVMe Pool** | 2x NVMe (nvme-mirror3) |
| **HDD Pool** | 2x WD Red 6TB (local-zfs2, mirrored) |
| **GPU** | NVIDIA RTX A6000 (300W) - AI Trading |
---
## Virtual Machines
### PVE (10.10.10.120)
| VMID | Name | vCPUs | RAM | Storage | Purpose | Passthrough |
|------|------|-------|-----|---------|---------|-------------|
| 100 | truenas | 8 | 32GB | rpool | NAS/Storage | LSI SAS2308 HBA, Samsung NVMe |
| 101 | saltbox | 16 | 16GB | rpool/nvme-mirror1/2 | Media automation | TITAN RTX |
| 105 | fs-dev | 10 | 8GB | nvme-mirror1 | Development | - |
| 110 | homeassistant | 2 | 2GB | nvme-mirror2 | Home automation | - |
| 111 | lmdev1 | 8 | 32GB | nvme-mirror1 | AI/LLM development | TITAN RTX |
| 201 | copyparty | 2 | 2GB | nvme-mirror1 | File sharing | - |
| 206 | docker-host | 2 | 4GB | rpool | Docker services | - |
### PVE2 (10.10.10.102)
| VMID | Name | vCPUs | RAM | Storage | Purpose | Passthrough |
|------|------|-------|-----|---------|---------|-------------|
| 300 | gitea-vm | 2 | 4GB | nvme-mirror3 | Git server | - |
| 301 | trading-vm | 16 | 32GB | nvme-mirror3 | AI trading platform | RTX A6000 |
---
## LXC Containers
### PVE (10.10.10.120)
| VMID | Name | Purpose | Status |
|------|------|---------|--------|
| 200 | pihole | DNS/Ad blocking | Running |
| 202 | traefik | Reverse proxy | Running |
| 205 | findshyt | Custom application | Running |
| 500 | dev1 | Development | Stopped |
---
## Storage Architecture
```
PVE (10.10.10.120)
├── rpool (Samsung 870 QVO 4TB mirror)
│ ├── Proxmox system
│ ├── VM 100 (truenas) boot
│ ├── VM 101 (saltbox) boot
│ └── VM 206 (docker-host)
├── nvme-mirror1 (Sabrent Rocket Q mirror, 3.6TB)
│ ├── VM 101 (saltbox) data
│ ├── VM 105 (fs-dev)
│ ├── VM 111 (lmdev1)
│ └── VM 201 (copyparty)
└── nvme-mirror2 (Kingston SFYRD mirror, 1.8TB)
├── VM 101 (saltbox) data
└── VM 110 (homeassistant)
PVE2 (10.10.10.102)
├── nvme-mirror3 (NVMe mirror)
│ ├── VM 300 (gitea-vm)
│ └── VM 301 (trading-vm)
└── local-zfs2 (WD Red 6TB mirror)
└── Backup/archive storage
TrueNAS (VM 100 on PVE)
├── HBA Passthrough (LSI SAS2308)
│ └── [Physical drives managed by TrueNAS]
└── NVMe Passthrough (Samsung PM9A1)
└── [NVMe drives managed by TrueNAS]
```
---
## Services Map
```
┌─────────────────────────────────────────────────────────────────┐
│ EXTERNAL ACCESS │
├─────────────────────────────────────────────────────────────────┤
│ Tailscale VPN ──► All services accessible via 100.x.x.x │
│ Traefik (CT 202) ──► Reverse proxy for web services │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ CORE SERVICES │
├─────────────────────────────────────────────────────────────────┤
│ PiHole (CT 200) ──► DNS + Ad blocking │
│ TrueNAS (VM 100) ──► NAS, Syncthing, Storage │
│ Gitea (VM 300) ──► Git repository hosting │
│ Home Assistant (VM 110) ──► Home automation │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ MEDIA SERVICES │
├─────────────────────────────────────────────────────────────────┤
│ Saltbox (VM 101) ──► Plex, *arr stack, media automation │
│ CopyParty (VM 201) ──► File sharing │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ DEVELOPMENT/AI │
├─────────────────────────────────────────────────────────────────┤
│ Trading VM (VM 301) ──► AI trading platform (RTX A6000) │
│ LMDev1 (VM 111) ──► LLM development (TITAN RTX) │
│ FS-Dev (VM 105) ──► General development │
│ Docker Host (VM 206) ──► Containerized services │
└─────────────────────────────────────────────────────────────────┘
```
---
## Syncthing Topology
```
┌─────────────────┐
│ TrueNAS │
│ (Hub/Server) │
│ Port 20910 │
└────────┬────────┘
┌───────────────────┼───────────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ MacBook │ │ Mac Mini│ │ Windows │
│ .147 │ │ .123 │ │ PC .150 │
└─────────┘ └─────────┘ └─────────┘
Synced Folders:
├── antigravity (310MB)
├── bin (23KB)
├── claude-code (257MB)
├── claude-desktop (784MB)
├── config (436KB)
├── cursor (459MB)
├── desktop (7.2GB)
├── documents (11GB)
├── dotconfig (212MB)
├── downloads (38GB)
├── movies (334MB)
├── music (606KB)
├── notes (73KB)
├── pictures (259MB)
└── projects (3.1GB)
```
---
## Power Consumption
### Estimated Power Draw
| Component | Idle | Load | Notes |
|-----------|------|------|-------|
| **PVE CPU** | 50W | 280W | TR PRO 3975WX |
| **PVE2 CPU** | 50W | 280W | TR PRO 3975WX |
| **TITAN RTX** | 20W | 280W | Passthrough to saltbox/lmdev1 |
| **RTX A6000** | 25W | 300W | Passthrough to trading-vm |
| **Quadro P2000** | 10W | 75W | Plex transcoding |
| **Storage (per server)** | 30W | 50W | NVMe + SSD mirrors |
| **Base system (each)** | 50W | 60W | Motherboard, RAM, fans |
### Total Estimates
- **Idle**: ~400-500W combined
- **Moderate load**: ~700-900W combined
- **Full load**: ~1200-1400W combined
### Power Optimizations Applied
1. KSMD disabled on both hosts (saved ~10W)
2. Syncthing rescan intervals increased (saved ~60-80W from TrueNAS CPU)
3. CPU governor optimization (saved ~60-120W)
- PVE: `powersave` + `balance_power` EPP (amd-pstate-epp)
- PVE2: `schedutil` (acpi-cpufreq)
4. ksmtuned service disabled on both hosts (saved ~2-5W)
5. HDD spindown on PVE2 - 30 min timeout (saved ~10-16W)
- local-zfs2 pool (2x WD Red 6TB) essentially empty
**Total estimated savings: ~142-231W**
---
## SSH Access
### Credentials
| Host | IP Address | Username | Password | Notes |
|------|------------|----------|----------|-------|
| Hutson-PC | 10.10.10.150 | claude | GrilledCh33s3# | Windows PC |
| MacBook | 10.10.10.147 | hutson | GrilledCh33s3# | MacBook Pro |
| TrueNAS | 10.10.10.200 | truenas_admin | GrilledCh33s3# | SSH key configured |
### SSH Keys
The Mac Mini has an SSH key configured at `~/.ssh/id_ed25519` for passwordless authentication to Proxmox hosts and other infrastructure.
For Proxmox servers (PVE and PVE2), SSH access is configured in `~/.ssh/config`:
```
Host pve
HostName 10.10.10.120
User root
IdentityFile ~/.ssh/ai_trading_ed25519
Host pve2
HostName 10.10.10.102
User root
IdentityFile ~/.ssh/ai_trading_ed25519
```
---
## Credentials Management
Sensitive credentials are stored in `/Users/hutson/Projects/homelab/.env` for use with infrastructure management scripts and automation.
This file contains:
- Service passwords
- API keys
- Database credentials
- Other sensitive configuration values
**Note**: The `.env` file is git-ignored and should never be committed to version control.
---
## Configuration Backups
Configuration files are backed up in `/Users/hutson/Projects/homelab/configs/` directory.
### Current Backups
| File | Description |
|------|-------------|
| ghostty.conf | Ghostty terminal emulator configuration |
This directory serves as a centralized location for storing configuration backups from various systems and applications in the homelab environment.

139
IP-ASSIGNMENTS.md Normal file
View File

@@ -0,0 +1,139 @@
# IP Address Assignments
This document tracks all IP addresses in the homelab infrastructure.
## Network Overview
| Network | Range | Purpose |
|---------|-------|---------|
| Management VLAN | 10.10.10.0/24 | Primary network for all devices |
| Storage VLAN | 10.10.20.0/24 | NFS/iSCSI storage traffic |
| Tailscale | 100.x.x.x | VPN overlay network |
## Infrastructure Devices
| IP Address | Device | Type | Notes |
|------------|--------|------|-------|
| 10.10.10.1 | UniFi UCG-Fiber | Router | Gateway for all traffic |
| 10.10.10.120 | PVE | Proxmox Host | Primary server (Threadripper PRO 3975WX) |
| 10.10.10.102 | PVE2 | Proxmox Host | Secondary server (Threadripper PRO 3975WX) |
## Virtual Machines - PVE (10.10.10.120)
| VMID | Name | IP Address | Purpose | Status |
|------|------|------------|---------|--------|
| 100 | truenas | 10.10.10.200 | NAS, central Syncthing hub | Running |
| 101 | saltbox | 10.10.10.100 | Media automation, Plex, *arr apps | Running |
| 105 | fs-dev | 10.10.10.5 | Development environment | Running |
| 110 | homeassistant | 10.10.10.110 | Home automation | Running |
| 111 | lmdev1 | 10.10.10.111 | AI/LLM development (TITAN RTX) | Running |
| 201 | copyparty | 10.10.10.201 | File sharing | Running |
| 206 | docker-host | 10.10.10.206 | Docker services (Excalidraw, etc.) | Running |
## Containers (LXC) - PVE (10.10.10.120)
| CTID | Name | IP Address | Purpose | Status |
|------|------|------------|---------|--------|
| 200 | pihole | 10.10.10.10 | DNS/Ad blocking | Running |
| 202 | traefik | 10.10.10.250 | Reverse proxy (Traefik-Primary) | Running |
| 205 | findshyt | 10.10.10.8 | Custom app | Running |
| 500 | dev1 | DHCP | Development container | Stopped |
## Virtual Machines - PVE2 (10.10.10.102)
| VMID | Name | IP Address | Purpose | Status |
|------|------|------------|---------|--------|
| 300 | gitea-vm | 10.10.10.220 | Git server | Running |
| 301 | trading-vm | 10.10.10.221 | AI trading platform (RTX A6000) | Running |
## Workstations & Personal Devices
| IP Address | Tailscale IP | Device | User | Notes |
|------------|--------------|--------|------|-------|
| 10.10.10.147 | 100.88.161.1 | MacBook Pro | hutson | Laptop |
| 10.10.10.148 | 100.108.89.58 | Mac Mini | hutson | Persistent Claude sessions |
| 10.10.10.150 | 100.120.97.76 | Hutson-PC (Windows) | claude/micro | Windows workstation |
| 10.10.10.54 | - | Android Phone | hutson | Syncthing mobile |
## Services & Reverse Proxy Mapping
| Service | Domain | Backend IP:Port | Traefik Instance |
|---------|--------|-----------------|------------------|
| Traefik-Primary | - | 10.10.10.250 | Self (CT 202) |
| Traefik-Saltbox | - | 10.10.10.100 | Self (VM 101) |
| FindShyt | findshyt.htsn.io | 10.10.10.8:3000 | Traefik-Primary |
| Gitea | git.htsn.io | 10.10.10.220:3000 | Traefik-Primary |
| Home Assistant | ha.htsn.io | 10.10.10.110:8123 | Traefik-Primary |
| TrueNAS | nas.htsn.io | 10.10.10.200 | Traefik-Primary |
| Proxmox | pve.htsn.io | 10.10.10.120:8006 | Traefik-Primary |
| CopyParty | cp.htsn.io | 10.10.10.201:3923 | Traefik-Primary |
| LMDev | lmdev.htsn.io | 10.10.10.111 | Traefik-Primary |
| Excalidraw | excalidraw.htsn.io | 10.10.10.206:8080 | Traefik-Primary |
| Plex | plex.htsn.io | 10.10.10.100:32400 | Traefik-Saltbox |
| Sonarr | sonarr.htsn.io | 10.10.10.100:8989 | Traefik-Saltbox |
| Radarr | radarr.htsn.io | 10.10.10.100:7878 | Traefik-Saltbox |
## Reserved/Available IPs
### Currently Used (10.10.10.x)
- .1 - Router (gateway)
- .5 - fs-dev
- .8 - FindShyt
- .10 - PiHole (DNS)
- .54 - Android Phone
- .100 - Saltbox (Traefik-Saltbox)
- .102 - PVE2
- .110 - Home Assistant
- .111 - LMDev1
- .120 - PVE
- .147 - MacBook Pro
- .148 - Mac Mini
- .150 - Windows PC
- .200 - TrueNAS
- .201 - CopyParty
- .206 - Docker-host
- .220 - Gitea
- .221 - Trading VM
- .250 - Traefik-Primary
### Available Ranges
- 10.10.10.2 - 10.10.10.4 (3 IPs)
- 10.10.10.6 - 10.10.10.7 (2 IPs)
- 10.10.10.9 (1 IP)
- 10.10.10.11 - 10.10.10.53 (43 IPs)
- 10.10.10.55 - 10.10.10.99 (45 IPs)
- 10.10.10.101 (1 IP)
- 10.10.10.103 - 10.10.10.109 (7 IPs)
- 10.10.10.112 - 10.10.10.119 (8 IPs)
- 10.10.10.121 - 10.10.10.146 (26 IPs)
- 10.10.10.149 (1 IP)
- 10.10.10.151 - 10.10.10.199 (49 IPs)
- 10.10.10.202 - 10.10.10.205 (4 IPs)
- 10.10.10.207 - 10.10.10.219 (13 IPs)
- 10.10.10.222 - 10.10.10.249 (28 IPs)
- 10.10.10.251 - 10.10.10.254 (4 IPs)
## Docker Host Services (10.10.10.206)
| Service | Port | Purpose |
|---------|------|---------|
| Excalidraw | 8080 | Whiteboard/diagramming (excalidraw.htsn.io) |
| Portainer CE | 9000, 9443 | Local Docker management UI |
| Portainer Agent | 9001 | Remote management from other Portainer |
| Gotenberg | 3000 | PDF generation API |
## Syncthing API Endpoints
| Device | IP | Port | API Key |
|--------|-----|------|---------|
| Mac Mini | 127.0.0.1 | 8384 | oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5 |
| MacBook | 127.0.0.1 (via SSH) | 8384 | qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ |
| Android Phone | 10.10.10.54 | 8384 | Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM |
| TrueNAS | 10.10.10.200 | 8384 | (check TrueNAS config) |
## Notes
- **MTU 9000** (jumbo frames) enabled on storage networks
- **Tailscale** provides VPN access from anywhere
- **DNS** handled by PiHole at 10.10.10.10
- All new services should use **Traefik-Primary (10.10.10.250)** unless they're Saltbox services

226
NETWORK.md Normal file
View File

@@ -0,0 +1,226 @@
# Network Architecture
## Network Ranges
| Network | Range | Purpose | Gateway |
|---------|-------|---------|---------|
| LAN | 10.10.10.0/24 | Primary network, management, general access | 10.10.10.1 (UniFi Router) |
| Storage/Internal | 10.10.20.0/24 | Inter-VM traffic, NFS/iSCSI, no external access | 10.10.20.1 (vmbr3) |
| Tailscale | 100.x.x.x | VPN overlay for remote access | N/A |
## PVE (10.10.10.120) - Network Bridges
### Physical NICs
| Interface | Speed | Type | MAC Address | Connected To |
|-----------|-------|------|-------------|--------------|
| enp1s0 | 1 Gbps | Onboard NIC | e0:4f:43:e6:41:6c | Switch → UniFi eth5 |
| enp35s0f0 | 10 Gbps | Intel X550 Port 0 | b4:96:91:39:86:98 | Switch → UniFi eth5 |
| enp35s0f1 | 10 Gbps | Intel X550 Port 1 | b4:96:91:39:86:99 | Switch → UniFi eth5 |
**Note:** All three NICs connect through a switch to the UniFi Gateway's 10Gb SFP+ port (eth5). No direct firewall connection.
### Bridge Configuration
#### vmbr0 - Management Bridge (1Gb)
- **Physical NIC**: enp1s0 (1 Gbps onboard)
- **IP**: 10.10.10.120/24
- **Gateway**: 10.10.10.1
- **MTU**: 9000
- **Purpose**: General VM/CT networking, management access
- **Use for**: Most VMs and containers that need basic internet access
**VMs/CTs on vmbr0:**
| VMID | Name | IP |
|------|------|-----|
| 105 | fs-dev | 10.10.10.5 |
| 110 | homeassistant | 10.10.10.110 |
| 201 | copyparty | DHCP |
| 206 | docker-host | 10.10.10.206 |
| 200 | pihole (CT) | 10.10.10.10 |
| 205 | findshyt (CT) | 10.10.10.205 |
---
#### vmbr1 - High-Speed LXC Bridge (10Gb)
- **Physical NIC**: enp35s0f0 (10 Gbps Intel X550)
- **IP**: 10.10.10.121/24
- **Gateway**: 10.10.10.1
- **MTU**: 9000
- **Purpose**: High-bandwidth LXC containers and VMs
- **Use for**: Containers/VMs that need high throughput to network
**VMs/CTs on vmbr1:**
| VMID | Name | IP |
|------|------|-----|
| 111 | lmdev1 | 10.10.10.111 |
---
#### vmbr2 - High-Speed VM Bridge (10Gb)
- **Physical NIC**: enp35s0f1 (10 Gbps Intel X550)
- **IP**: 10.10.10.122/24
- **Gateway**: (none configured)
- **MTU**: 9000
- **Purpose**: High-bandwidth VMs, storage traffic
- **Use for**: VMs that need high throughput (TrueNAS, Saltbox)
**VMs/CTs on vmbr2:**
| VMID | Name | IP |
|------|------|-----|
| 100 | truenas | 10.10.10.200 |
| 101 | saltbox | 10.10.10.100 |
| 202 | traefik (CT) | 10.10.10.250 |
---
#### vmbr3 - Internal-Only Bridge (Virtual)
- **Physical NIC**: None (isolated virtual network)
- **IP**: 10.10.20.1/24
- **Gateway**: N/A (no external routing)
- **MTU**: 9000
- **Purpose**: Inter-VM communication without external access
- **Use for**: Storage traffic (NFS/iSCSI), internal APIs, secure VM-to-VM
**VMs with secondary interface on vmbr3:**
| VMID | Name | Internal IP | Notes |
|------|------|-------------|-------|
| 100 | truenas | (check TrueNAS config) | NFS/iSCSI server |
| 101 | saltbox | (check VM config) | Media storage access |
| 111 | lmdev1 | (check VM config) | AI model storage |
| 201 | copyparty | 10.10.20.201 | Confirmed via cloud-init |
---
## PVE2 (10.10.10.102) - Network Bridges
### Physical NICs
| Interface | Speed | Type | MAC Address | Connected To |
|-----------|-------|------|-------------|--------------|
| nic0 | Unknown | Unused | e0:4f:43:e6:1b:e3 | Not connected |
| nic1 | 10 Gbps | Primary NIC | a0:36:9f:26:b9:bc | **Direct to UCG-Fiber (10Gb negotiated)** |
**Note:** PVE2 connects directly to the UCG-Fiber. Link negotiates at 10Gb.
### Bridge Configuration
#### vmbr0 - Single Bridge (10Gb)
- **Physical NIC**: nic1 (10 Gbps)
- **IP**: 10.10.10.102/24
- **Gateway**: 10.10.10.1
- **Purpose**: All VMs on PVE2
**VMs on vmbr0:**
| VMID | Name | IP |
|------|------|-----|
| 300 | gitea-vm | 10.10.10.220 |
| 301 | trading-vm | 10.10.10.221 |
---
## Which Bridge to Use?
| Scenario | Bridge | Reason |
|----------|--------|--------|
| General VM/CT | vmbr0 | Standard networking, 1Gb is sufficient |
| High-bandwidth VM (media, AI) | vmbr1 or vmbr2 | 10Gb for large file transfers |
| Storage-heavy VM (NAS access) | vmbr2 + vmbr3 | 10Gb external + internal storage network |
| Isolated internal service | vmbr3 only | No external access, secure |
| VM needing both external + internal | vmbr0/1/2 + vmbr3 | Dual-homed configuration |
## Traffic Flow
```
Internet
┌─────────────────────────────────────────────────────────────┐
│ UCG-Fiber (10.10.10.1) │
│ │
│ eth5 (10Gb SFP+) switch0 (eth0-eth4, 10Gb) │
│ │ │ │
└────────┼───────────────────────────────┼────────────────────┘
│ │
▼ │
┌─────────────────────┐ │
│ 10Gb Switch │ │
└─────────────────────┘ │
│ │ │ │
│ │ │ │
▼ ▼ ▼ ▼
enp1s0 enp35s0f0 enp35s0f1 nic1
(1Gb) (10Gb) (10Gb) (10Gb)
│ │ │ │
▼ ▼ ▼ ▼
vmbr0 vmbr1 vmbr2 vmbr0
│ │ │ │
│ │ │ │
PVE PVE PVE PVE2
General lmdev1 TrueNAS, gitea-vm,
VMs Saltbox, trading-vm
Traefik
Internal Only (no external access):
┌─────────────────────────────────────┐
│ vmbr3 (10.10.20.0/24) - Virtual │
│ No physical NIC - inter-VM only │
│ │
│ TrueNAS ◄──► Saltbox │
│ ▲ ▲ │
│ │ │ │
│ └─── lmdev1 ┘ │
│ ▲ │
│ │ │
│ copyparty │
└─────────────────────────────────────┘
```
## Determining Physical Connections
To determine which 10Gb port goes where, check:
1. **Physical cable tracing** - Follow cables from server to switch/firewall
2. **Switch port status** - Check UniFi controller for connected ports
3. **MAC addresses** - Compare `ip link show` MACs with switch ARP table
```bash
# On PVE - get MAC addresses
ip link show enp35s0f0 | grep ether
ip link show enp35s0f1 | grep ether
# On router - check ARP
ssh root@10.10.10.1 'cat /proc/net/arp'
```
## Adding a New VM to a Specific Network
```bash
# Add VM to vmbr0 (standard)
qm set VMID --net0 virtio,bridge=vmbr0
# Add VM to vmbr2 (10Gb)
qm set VMID --net0 virtio,bridge=vmbr2
# Add second NIC for internal network
qm set VMID --net1 virtio,bridge=vmbr3
# For containers
pct set CTID --net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1
```
## MTU Configuration
All bridges use **MTU 9000** (jumbo frames) for optimal storage performance.
If adding a new VM that will access NFS/iSCSI storage, ensure the guest OS also uses MTU 9000:
```bash
# Linux guest
ip link set eth0 mtu 9000
# Permanent (netplan)
# /etc/netplan/00-installer-config.yaml
network:
ethernets:
eth0:
mtu: 9000
```

147
SHELL-ALIASES.md Normal file
View File

@@ -0,0 +1,147 @@
# Shell Aliases & Shortcuts
## Overview
ZSH aliases for quickly launching Claude Code in project directories with `--dangerously-skip-permissions` enabled. Aliases sync across devices via Syncthing.
## Setup
### File Locations
```
~/.config/shell/shared.zsh # Main shared config (sourced by .zshrc)
~/.config/shell/claude-aliases.zsh # Claude Code aliases
~/Projects/homelab/configs/ # Symlinks for reference
```
### Installation
Add to `~/.zshrc`:
```bash
source ~/.config/shell/shared.zsh
```
## Claude Code Aliases
### Quick Start (--continue)
Continue the most recent session in each project:
| Alias | Directory | Command |
|-------|-----------|---------|
| `chomelab` | ~/Projects/homelab | `claude --dangerously-skip-permissions --continue` |
| `ctrading` | ~/Projects/ai-trading-platform | `claude --dangerously-skip-permissions --continue` |
| `cnotes` | ~/Notes | `claude --dangerously-skip-permissions --continue --ide` |
| `chome` | ~ | `claude --dangerously-skip-permissions --continue` |
| `cfindshyt` | ~/Desktop/findshyt-working-folder | `claude --dangerously-skip-permissions --continue` |
| `ciconik` | ~/Projects/iconik-uploader | `claude --dangerously-skip-permissions --continue` |
| `cghostty` | ~/.config/ghostty | `claude --dangerously-skip-permissions --continue` |
| `cprojects` | ~/Projects | `claude --dangerously-skip-permissions --continue` |
| `cclaudeui` | ~/Projects/claude-ui | `claude --dangerously-skip-permissions --continue` |
| `clucid` | ~/Projects/lucidlink-upgrade | `claude --dangerously-skip-permissions --continue` |
| `cbeeper` | ~/Projects/beeper | `claude --dangerously-skip-permissions --continue` |
### Resume (--resume)
Show list of sessions to pick from:
| Alias | Directory |
|-------|-----------|
| `chomelab-r` | ~/Projects/homelab |
| `ctrading-r` | ~/Projects/ai-trading-platform |
| `cnotes-r` | ~/Notes |
| `chome-r` | ~ |
| `ciconik-r` | ~/Projects/iconik-uploader |
| `cbeeper-r` | ~/Projects/beeper |
### Fresh Start (no flags)
Start a new session without resuming:
| Alias | Directory |
|-------|-----------|
| `chomelab-new` | ~/Projects/homelab |
| `ctrading-new` | ~/Projects/ai-trading-platform |
| `cnotes-new` | ~/Notes |
| `chome-new` | ~ |
## Usage Examples
```bash
# Continue homelab session
chomelab
# Pick from recent homelab sessions
chomelab-r
# Start fresh homelab session
chomelab-new
# Quick AI trading work
ctrading
```
## Adding New Aliases
Edit `~/.config/shell/claude-aliases.zsh`:
```bash
# Template for new project
alias cproject='cd ~/Projects/new-project && claude --dangerously-skip-permissions --continue'
alias cproject-r='cd ~/Projects/new-project && claude --dangerously-skip-permissions --resume'
alias cproject-new='cd ~/Projects/new-project && claude --dangerously-skip-permissions'
```
Changes sync automatically to all devices via Syncthing (~/.config folder).
## Enterprise/Work Aliases (claude-gateway)
Use `ec` prefix for work Claude account via `claude-gateway`:
### Quick Start (--continue)
| Alias | Directory |
|-------|-----------|
| `echomelab` | ~/Projects/homelab |
| `ectrading` | ~/Projects/ai-trading-platform |
| `ecnotes` | ~/Notes |
| `echome` | ~ |
| `ecfindshyt` | ~/Desktop/findshyt-working-folder |
| `eciconik` | ~/Projects/iconik-uploader |
| `ecghostty` | ~/.config/ghostty |
| `ecprojects` | ~/Projects |
| `ecclaudeui` | ~/Projects/claude-ui |
| `eclucid` | ~/Projects/lucidlink-upgrade |
| `ecbeeper` | ~/Projects/beeper |
### Resume & Fresh
- Resume: `echomelab-r`, `ectrading-r`, `ecnotes-r`, `echome-r`, `eciconik-r`, `ecbeeper-r`
- Fresh: `echomelab-new`, `ectrading-new`, `ecnotes-new`, `echome-new`
## Full Alias File
Located at: `~/.config/shell/claude-aliases.zsh`
```bash
# Claude Code Project Aliases
# Main projects
alias chome='cd ~ && claude --dangerously-skip-permissions --continue'
alias ctrading='cd ~/Projects/ai-trading-platform && claude --dangerously-skip-permissions --continue'
alias ciconik='cd ~/Projects/iconik-uploader && claude --dangerously-skip-permissions --continue'
alias cnotes='cd ~/Notes && claude --dangerously-skip-permissions --continue --ide'
alias chomelab='cd ~/Projects/homelab && claude --dangerously-skip-permissions --continue'
alias cfindshyt='cd ~/Desktop/findshyt-working-folder && claude --dangerously-skip-permissions --continue'
alias cghostty='cd ~/.config/ghostty && claude --dangerously-skip-permissions --continue'
alias cprojects='cd ~/Projects && claude --dangerously-skip-permissions --continue'
alias cclaudeui='cd ~/projects/claude-ui && claude --dangerously-skip-permissions --continue'
alias clucid='cd ~/Projects/lucidlink-upgrade && claude --dangerously-skip-permissions --continue'
alias cbeeper='cd ~/Projects/beeper && claude --dangerously-skip-permissions --continue'
# Resume variants
alias chome-r='cd ~ && claude --dangerously-skip-permissions --resume'
alias ctrading-r='cd ~/Projects/ai-trading-platform && claude --dangerously-skip-permissions --resume'
alias ciconik-r='cd ~/Projects/iconik-uploader && claude --dangerously-skip-permissions --resume'
alias cnotes-r='cd ~/Notes && claude --dangerously-skip-permissions --resume --ide'
alias chomelab-r='cd ~/Projects/homelab && claude --dangerously-skip-permissions --resume'
alias cbeeper-r='cd ~/Projects/beeper && claude --dangerously-skip-permissions --resume'
# Fresh start
alias chome-new='cd ~ && claude --dangerously-skip-permissions'
alias ctrading-new='cd ~/Projects/ai-trading-platform && claude --dangerously-skip-permissions'
alias cnotes-new='cd ~/Notes && claude --dangerously-skip-permissions --ide'
alias chomelab-new='cd ~/Projects/homelab && claude --dangerously-skip-permissions'
```

166
SYNCTHING.md Normal file
View File

@@ -0,0 +1,166 @@
# Syncthing Setup
## Overview
Syncthing provides real-time file synchronization across all devices. Files sync automatically when devices connect.
## Devices
| Device | ID Prefix | Local IP | Tailscale IP | Port | Role |
|--------|-----------|----------|--------------|------|------|
| Mac Mini | L3PJR73 | 10.10.10.123 | 100.108.89.58 | 22000 | Primary workstation |
| MacBook Pro | 3TFMYEI | 10.10.10.147 | 100.88.161.1 | 22000 | Laptop |
| TrueNAS | TPO72EY | 10.10.10.200 | 100.100.94.71 | 20978 | Storage server (central hub) |
| Windows PC | YDCPUQK | 10.10.10.150 | 100.120.97.76 | 22000 | Windows workstation |
| Phone (Android) | XLMZCCH | 10.10.10.54 | 100.106.175.37 | 22000 | Android, Notes only, HTTPS API |
## Network Configuration
**IPv4 Only** - All devices configured with explicit IPv4 addresses (no dynamic/IPv6):
- Local network: `10.10.10.0/24`
- Tailscale network: `100.x.x.x`
Device address format: `tcp4://IP:PORT` (e.g., `tcp4://10.10.10.123:22000`)
## Synced Folders
| Folder | Path | Devices | Notes |
|--------|------|---------|-------|
| Downloads | ~/Downloads | Mac Mini, MacBook, TrueNAS, Windows | Large folder, 3600s rescan |
| Notes | ~/Notes | Mac Mini, MacBook, TrueNAS | Documentation |
| Projects | ~/Projects | Mac Mini, MacBook, TrueNAS | Code repositories |
| bin | ~/bin | Mac Mini, MacBook, TrueNAS | Scripts and tools |
| Documents | ~/Documents | Mac Mini, MacBook, TrueNAS | Personal documents |
| Desktop | ~/Desktop | Mac Mini, MacBook, TrueNAS | Desktop files |
| config | ~/.config | Mac Mini, MacBook | Shell configs, app settings |
| Antigravity | ~/.gemini | Mac Mini, MacBook, TrueNAS | Gemini config |
## API Access
### Mac Mini
```bash
API_KEY="oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5"
curl -s "http://127.0.0.1:8384/rest/system/status" -H "X-API-Key: $API_KEY"
```
### MacBook Pro
```bash
API_KEY="qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ"
curl -s "http://127.0.0.1:8384/rest/system/status" -H "X-API-Key: $API_KEY"
```
### Windows PC
```bash
API_KEY="KPHGteJv6APPE7zFun33b3qM3Vn5KSA7"
curl -s "http://10.10.10.150:8384/rest/system/status" -H "X-API-Key: $API_KEY"
```
### Phone (Android) - Uses HTTPS
```bash
API_KEY="Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM"
# Access via local IP (use -k to skip cert verification)
curl -sk "https://10.10.10.54:8384/rest/system/status" -H "X-API-Key: $API_KEY"
# Or via Tailscale
curl -sk "https://100.106.175.37:8384/rest/system/status" -H "X-API-Key: $API_KEY"
```
## Common Commands
### Check Status
```bash
# Folder status
curl -s "http://127.0.0.1:8384/rest/db/status?folder=downloads" -H "X-API-Key: $API_KEY"
# Connection status
curl -s "http://127.0.0.1:8384/rest/system/connections" -H "X-API-Key: $API_KEY"
# Device completion for a folder
curl -s "http://127.0.0.1:8384/rest/db/completion?folder=downloads&device=DEVICE_ID" -H "X-API-Key: $API_KEY"
```
### Check Errors
```bash
curl -s "http://127.0.0.1:8384/rest/folder/errors?folder=downloads" -H "X-API-Key: $API_KEY"
```
### Rescan Folder
```bash
curl -X POST "http://127.0.0.1:8384/rest/db/scan?folder=downloads" -H "X-API-Key: $API_KEY"
```
## Configuration Files
| Device | Config Path |
|--------|-------------|
| Mac Mini | ~/Library/Application Support/Syncthing/config.xml |
| MacBook Pro | ~/Library/Application Support/Syncthing/config.xml |
| TrueNAS | /mnt/tank/syncthing/config/config.xml |
## Performance Tuning
### Speed Optimizations (2024-12-17)
#### Global Options
| Setting | Value | Effect |
|---------|-------|--------|
| `numConnections` | 4 | Parallel transfers per device |
| `compression` | never | No CPU overhead on fast LAN |
| `setLowPriority` | false | Normal CPU priority |
| `connectionPriorityQuicLan` | 10 | QUIC preferred on LAN |
| `connectionPriorityTcpLan` | 20 | TCP fallback on LAN |
| `connectionPriorityQuicWan` | 30 | QUIC preferred on WAN |
| `connectionPriorityTcpWan` | 40 | TCP fallback on WAN |
| `progressUpdateIntervalS` | -1 | Disabled progress updates (reduces overhead) |
| `maxConcurrentIncomingRequestKiB` | 1048576 | 1GB buffer for incoming requests |
**Applied to**: Mac Mini, MacBook, Windows PC (Phone uses 512MB buffer)
#### Folder-Level Settings
| Setting | Value | Effect |
|---------|-------|--------|
| `pullerMaxPendingKiB` | 131072-262144 | 128-256MB pending data buffer per folder |
**Applied to**: downloads, projects, documents, desktop, notes folders
### Rescan Intervals (set to 3600s for large folders)
Large folders like Downloads use 1-hour rescan intervals to reduce CPU usage:
- File system watcher handles real-time changes
- Hourly rescan catches anything missed
### Power Optimization
From CLAUDE.md - Syncthing rescan optimization saved ~60-80W on TrueNAS VM.
## Troubleshooting
### Device Not Syncing
1. Check connection status:
```bash
curl -s "http://127.0.0.1:8384/rest/system/connections" -H "X-API-Key: $API_KEY" | python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; [print(f'{k[:7]}: {v[\"connected\"]}') for k,v in d.items()]"
```
2. Check folder completion:
```bash
curl -s "http://127.0.0.1:8384/rest/db/status?folder=FOLDER" -H "X-API-Key: $API_KEY"
```
3. Check for errors:
```bash
curl -s "http://127.0.0.1:8384/rest/folder/errors?folder=FOLDER" -H "X-API-Key: $API_KEY"
```
### Many Pending Deletes
If a device shows thousands of "needDeletes", it means files were deleted elsewhere and need to propagate. This is normal after reorganization - let it complete.
### Web UI
Access Syncthing web interface at http://127.0.0.1:8384
## SSH Access to Devices
### MacBook Pro (via Tailscale)
```bash
sshpass -p 'GrilledCh33s3#' ssh -o StrictHostKeyChecking=no hutson@100.88.161.1
```
### Check Syncthing remotely
```bash
sshpass -p 'GrilledCh33s3#' ssh hutson@100.88.161.1 'curl -s "http://127.0.0.1:8384/rest/db/status?folder=downloads" -H "X-API-Key: qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ"'
```

1
configs/claude-aliases.zsh Symbolic link
View File

@@ -0,0 +1 @@
/Users/hutson/.config/shell/claude-aliases.zsh

5
configs/ghostty.conf Normal file
View File

@@ -0,0 +1,5 @@
theme = Gruvbox Dark
font-feature = -liga
font-size = 16
font-family = "JetBrains Mono"
split-divider-color = #83a598

16
mcp-central/.env.example Normal file
View File

@@ -0,0 +1,16 @@
# MCP Central Server Environment Variables
# Copy to .env and fill in your values
# Airtable
AIRTABLE_API_KEY=patIrM3XYParyuHQL.xxxxx
# Exa
EXA_API_KEY=your_exa_api_key
# TickTick (if using)
TICKTICK_CLIENT_ID=your_client_id
TICKTICK_CLIENT_SECRET=your_client_secret
# Slack (if using)
SLACK_BOT_TOKEN=xoxb-xxxxx
SLACK_USER_TOKEN=xoxp-xxxxx

129
mcp-central/README.md Normal file
View File

@@ -0,0 +1,129 @@
# Centralized MCP Servers for Homelab
## Current State of MCP Remote Access
**The Problem**: Most MCP servers use `stdio` transport (local process communication).
Claude Code clients expect to spawn local processes.
**The Solution**: Use `mcp-remote` to bridge local clients to remote servers.
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ docker-host (10.10.10.206) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ airtable-mcp│ │ exa-mcp │ │ ticktick-mcp│ ... │
│ │ :3001/sse │ │ :3002/sse │ │ :3003/sse │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
▲ ▲ ▲
│ │ │
┌──────┴───────────────┴───────────────┴──────┐
│ Tailscale / LAN │
└──────┬───────────────┬───────────────┬──────┘
│ │ │
┌─────────▼─────┐ ┌───────▼───────┐ ┌─────▼─────────┐
│ MacBook │ │ Mac Mini │ │ Windows PC │
│ Claude Code │ │ Claude Code │ │ Claude Code │
│ mcp-remote │ │ mcp-remote │ │ mcp-remote │
└───────────────┘ └───────────────┘ └───────────────┘
```
## Setup
### Step 1: Deploy MCP Servers on docker-host
```bash
ssh hutson@10.10.10.206
cd /opt/mcp-central
docker-compose up -d
```
### Step 2: Configure Claude Code Clients
Each device needs `mcp-remote` installed and configured.
**Install mcp-remote:**
```bash
npm install -g mcp-remote
```
**Update ~/.claude/settings.json:**
```json
{
"mcpServers": {
"airtable": {
"command": "npx",
"args": ["mcp-remote", "http://10.10.10.206:3001/sse"]
},
"exa": {
"command": "npx",
"args": ["mcp-remote", "http://10.10.10.206:3002/sse"]
},
"ticktick": {
"command": "npx",
"args": ["mcp-remote", "http://10.10.10.206:3003/sse"]
}
}
}
```
**For remote access via Tailscale, use Tailscale IP:**
```json
{
"mcpServers": {
"airtable": {
"command": "npx",
"args": ["mcp-remote", "http://100.x.x.x:3001/sse"]
}
}
}
```
## Which Servers Can Be Centralized?
| Server | Centralizable | Notes |
|--------|--------------|-------|
| Airtable | Yes | Just needs API key |
| Exa | Yes | Just needs API key |
| TickTick | Yes | OAuth token stored server-side |
| Slack | Yes | Bot token stored server-side |
| Ref | Yes | API key only |
| Beeper | No | Needs local Beeper Desktop |
| Google Sheets | Partial | OAuth flow needs user interaction |
| Monarch Money | Partial | Credentials stored server-side |
## Alternative: Shared Config File
If full centralization is too complex, you can at least share the config:
1. Store `settings.json` in a synced folder (e.g., Syncthing `configs/`)
2. Symlink from each device:
```bash
ln -s ~/Sync/configs/claude-settings.json ~/.claude/settings.json
```
This doesn't centralize the servers, but ensures all devices have the same config.
## Traefik Integration (Optional)
Add to Traefik for HTTPS access:
```yaml
# /etc/traefik/conf.d/mcp.yaml
http:
routers:
mcp-airtable:
rule: "Host(`mcp-airtable.htsn.io`)"
service: mcp-airtable
tls:
certResolver: cloudflare
services:
mcp-airtable:
loadBalancer:
servers:
- url: "http://10.10.10.206:3001"
```
Then use: `http://mcp-airtable.htsn.io/sse` in your config.

View File

@@ -0,0 +1,58 @@
# Centralized MCP Server Stack
# Deploy on docker-host (10.10.10.206)
# All Claude Code clients connect via HTTP/SSE
version: "3.8"
services:
# MCP Gateway - Routes all MCP requests
mcp-gateway:
image: node:20-slim
container_name: mcp-gateway
working_dir: /app
volumes:
- ./gateway:/app
ports:
- "3100:3100"
command: node server.js
restart: unless-stopped
environment:
- PORT=3100
networks:
- mcp-network
# Airtable MCP Server
airtable-mcp:
image: node:20-slim
container_name: airtable-mcp
working_dir: /app
command: sh -c "npm install airtable-mcp-server && npx airtable-mcp-server"
environment:
- AIRTABLE_API_KEY=${AIRTABLE_API_KEY}
- MCP_TRANSPORT=sse
- MCP_PORT=3001
ports:
- "3001:3001"
restart: unless-stopped
networks:
- mcp-network
# Exa MCP Server
exa-mcp:
image: node:20-slim
container_name: exa-mcp
working_dir: /app
command: sh -c "npm install @anthropic/mcp-server-exa && npx @anthropic/mcp-server-exa"
environment:
- EXA_API_KEY=${EXA_API_KEY}
- MCP_TRANSPORT=sse
- MCP_PORT=3002
ports:
- "3002:3002"
restart: unless-stopped
networks:
- mcp-network
networks:
mcp-network:
driver: bridge

View File

@@ -0,0 +1,159 @@
#!/bin/bash
#
# Fix Immich RAF files that were mislabeled as JPG
# This script:
# 1. Finds all JPG files that are actually Fujifilm RAF (RAW) files
# 2. Renames them from .jpg to .raf on the filesystem
# 3. Updates Immich's database to match
# 4. Triggers thumbnail regeneration
#
# Run from Mac Mini or any machine with SSH access to PVE
#
set -e
# Config
SSH_PASS="GrilledCh33s3#"
PVE_IP="10.10.10.120"
SSH_OPTS="-o StrictHostKeyChecking=no"
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
echo "=========================================="
echo " Immich RAF File Fixer"
echo "=========================================="
echo ""
# Test connectivity
echo "Testing connection to Saltbox..."
if ! sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm status 101' &>/dev/null; then
echo -e "${RED}Error: Cannot connect to PVE or Saltbox VM not running${NC}"
exit 1
fi
echo -e "${GREEN}Connected${NC}"
echo ""
# Step 1: Find mislabeled files
echo "Step 1: Finding JPG files that are actually RAF..."
echo ""
MISLABELED_COUNT=$(sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "echo \"SELECT COUNT(*) FROM asset a JOIN asset_exif e ON a.id = e.\\\"assetId\\\" WHERE a.\\\"originalFileName\\\" ILIKE '"'"'%.jpg'"'"' AND e.\\\"fileSizeInByte\\\" > 35000000 AND e.make = '"'"'FUJIFILM'"'"';\" | docker exec -i immich-postgres psql -U hutson -d immich -t"' 2>/dev/null | grep -o '[0-9]*' | head -1)
echo -e "Found ${YELLOW}${MISLABELED_COUNT}${NC} mislabeled files"
echo ""
if [ "$MISLABELED_COUNT" -eq 0 ]; then
echo -e "${GREEN}No mislabeled files found. Nothing to fix!${NC}"
exit 0
fi
# Confirm before proceeding
read -p "Proceed with fixing these files? (y/N) " -n 1 -r
echo ""
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
echo "Aborted."
exit 0
fi
echo ""
echo "Step 2: Creating fix script on Saltbox..."
# Create the fix script on Saltbox
sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "cat > /tmp/fix-raf-files.sh << '"'"'SCRIPT'"'"'
#!/bin/bash
set -e
echo "Getting list of mislabeled files..."
# Get list of files to fix
docker exec -i immich-postgres psql -U hutson -d immich -t -A -F\",\" -c "
SELECT a.id, a.\"originalPath\", a.\"originalFileName\"
FROM asset a
JOIN asset_exif e ON a.id = e.\"assetId\"
WHERE a.\"originalFileName\" ILIKE '"'"'"'"'"'"'"'"'%.jpg'"'"'"'"'"'"'"'"'
AND e.\"fileSizeInByte\" > 35000000
AND e.make = '"'"'"'"'"'"'"'"'FUJIFILM'"'"'"'"'"'"'"'"'
" > /tmp/files_to_fix.csv
TOTAL=$(wc -l < /tmp/files_to_fix.csv)
echo "Processing $TOTAL files..."
COUNT=0
ERRORS=0
while IFS="," read -r asset_id old_path old_filename; do
COUNT=$((COUNT + 1))
# Skip empty lines
[ -z "$asset_id" ] && continue
# Calculate new paths
new_filename=$(echo "$old_filename" | sed "s/\.[jJ][pP][gG]$/.RAF/")
new_path=$(echo "$old_path" | sed "s/\.[jJ][pP][gG]$/.raf/")
echo "[$COUNT/$TOTAL] $old_filename -> $new_filename"
# Rename file on filesystem (inside immich container)
if docker exec immich test -f "$old_path"; then
docker exec immich mv "$old_path" "$new_path" 2>/dev/null
if [ $? -ne 0 ]; then
echo " ERROR: Failed to rename file"
ERRORS=$((ERRORS + 1))
continue
fi
else
echo " WARNING: File not found at $old_path"
ERRORS=$((ERRORS + 1))
continue
fi
# Update database
docker exec -i immich-postgres psql -U hutson -d immich -c "
UPDATE asset
SET \"originalPath\" = '"'"'"'"'"'"'"'"'$new_path'"'"'"'"'"'"'"'"',
\"originalFileName\" = '"'"'"'"'"'"'"'"'$new_filename'"'"'"'"'"'"'"'"'
WHERE id = '"'"'"'"'"'"'"'"'$asset_id'"'"'"'"'"'"'"'"'::uuid;
" > /dev/null 2>&1
if [ $? -ne 0 ]; then
echo " ERROR: Failed to update database"
# Try to rename back
docker exec immich mv "$new_path" "$old_path" 2>/dev/null
ERRORS=$((ERRORS + 1))
continue
fi
done < /tmp/files_to_fix.csv
echo ""
echo "=========================================="
echo "Completed: $((COUNT - ERRORS)) fixed, $ERRORS errors"
echo "=========================================="
# Cleanup
rm -f /tmp/files_to_fix.csv
SCRIPT
chmod +x /tmp/fix-raf-files.sh"'
echo ""
echo "Step 3: Running fix script (this may take a while)..."
echo ""
# Run the fix script
sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "/tmp/fix-raf-files.sh"' 2>&1 | grep -o '"out-data"[^}]*' | sed 's/"out-data" *: *"//' | sed 's/\\n/\n/g' | sed 's/\\t/\t/g' | sed 's/"$//'
echo ""
echo "Step 4: Restarting Immich to pick up changes..."
sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "docker restart immich"' > /dev/null 2>&1
echo -e "${GREEN}Done!${NC}"
echo ""
echo "Next steps:"
echo "1. Go to Immich Admin -> Jobs -> Thumbnail Generation -> All -> Start"
echo "2. This will regenerate thumbnails for all assets"
echo ""

318
scripts/health-check.sh Executable file
View File

@@ -0,0 +1,318 @@
#!/bin/bash
#
# Homelab Health Check & Recovery Script
# Run this to check status and bring services online
#
# Usage: ./health-check.sh [--fix]
# Without --fix: Read-only health check
# With --fix: Attempt to start stopped services and fix issues
#
set -e
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Config
SSH_PASS="GrilledCh33s3#"
PVE_IP="10.10.10.120"
PVE2_IP="10.10.10.102"
SSH_OPTS="-o StrictHostKeyChecking=no -o ConnectTimeout=5"
FIX_MODE=false
if [[ "$1" == "--fix" ]]; then
FIX_MODE=true
echo -e "${YELLOW}Running in FIX mode - will attempt to start stopped services${NC}"
echo ""
fi
# Helper functions
ssh_pve() {
sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP "$@" 2>/dev/null
}
ssh_pve2() {
sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE2_IP "$@" 2>/dev/null
}
print_status() {
if [[ "$2" == "ok" ]]; then
echo -e " ${GREEN}${NC} $1"
elif [[ "$2" == "warn" ]]; then
echo -e " ${YELLOW}!${NC} $1"
else
echo -e " ${RED}${NC} $1"
fi
}
# Check if sshpass is installed
if ! command -v sshpass &> /dev/null; then
echo -e "${RED}Error: sshpass is not installed${NC}"
echo "Install with: brew install hudochenkov/sshpass/sshpass"
exit 1
fi
echo "================================"
echo " HOMELAB HEALTH CHECK"
echo " $(date '+%Y-%m-%d %H:%M:%S')"
echo "================================"
echo ""
# ============================================
# PVE (Primary Server)
# ============================================
echo "--- PVE (10.10.10.120) ---"
# Check connectivity
if ssh_pve "echo ok" > /dev/null 2>&1; then
print_status "PVE Reachable" "ok"
else
print_status "PVE Unreachable" "fail"
echo ""
echo "--- PVE2 (10.10.10.102) ---"
if ssh_pve2 "echo ok" > /dev/null 2>&1; then
print_status "PVE2 Reachable" "ok"
else
print_status "PVE2 Unreachable" "fail"
fi
exit 1
fi
# Check cluster quorum
QUORUM=$(ssh_pve "pvecm status 2>&1 | grep 'Quorate:' | awk '{print \$2}'" || echo "Unknown")
if [[ "$QUORUM" == "Yes" ]]; then
print_status "Cluster Quorum: $QUORUM" "ok"
else
print_status "Cluster Quorum: $QUORUM" "fail"
fi
# Check CPU temp
TEMP=$(ssh_pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo $(($(cat $f)/1000)); fi; done')
if [[ -n "$TEMP" ]]; then
if [[ "$TEMP" -lt 85 ]]; then
print_status "CPU Temp: ${TEMP}°C" "ok"
elif [[ "$TEMP" -lt 90 ]]; then
print_status "CPU Temp: ${TEMP}°C (warm)" "warn"
else
print_status "CPU Temp: ${TEMP}°C (HOT!)" "fail"
fi
fi
# Check ZFS pools
ZFS_STATUS=$(ssh_pve "zpool status -x" || echo "Unknown")
if [[ "$ZFS_STATUS" == "all pools are healthy" ]]; then
print_status "ZFS Pools: Healthy" "ok"
else
print_status "ZFS Pools: $ZFS_STATUS" "fail"
fi
# Check VMs
echo ""
echo " VMs:"
CRITICAL_VMS="100 101 110 206" # TrueNAS, Saltbox, HomeAssistant, Docker-host
STOPPED_VMS=""
TRUENAS_ZFS_SUSPENDED=false
while IFS= read -r line; do
VMID=$(echo "$line" | awk '{print $1}')
NAME=$(echo "$line" | awk '{print $2}')
STATUS=$(echo "$line" | awk '{print $3}')
if [[ "$STATUS" == "running" ]]; then
print_status "$VMID $NAME: $STATUS" "ok"
else
print_status "$VMID $NAME: $STATUS" "fail"
if [[ " $CRITICAL_VMS " =~ " $VMID " ]]; then
STOPPED_VMS="$STOPPED_VMS $VMID"
fi
fi
done < <(ssh_pve "qm list" | tail -n +2)
# Check TrueNAS ZFS (VM 100) if running
if ssh_pve "qm status 100" 2>/dev/null | grep -q running; then
echo ""
echo " TrueNAS ZFS:"
TRUENAS_ZFS=$(ssh_pve 'qm guest exec 100 -- bash -c "zpool list -H -o name,health vault 2>/dev/null"' 2>/dev/null | grep -o '"out-data"[^}]*' | sed 's/"out-data" : "//' | tr -d '\\n"' || echo "Unknown")
if [[ "$TRUENAS_ZFS" == *"ONLINE"* ]]; then
print_status "vault pool: ONLINE" "ok"
elif [[ "$TRUENAS_ZFS" == *"SUSPENDED"* ]]; then
print_status "vault pool: SUSPENDED (needs zpool clear)" "fail"
TRUENAS_ZFS_SUSPENDED=true
elif [[ "$TRUENAS_ZFS" == *"DEGRADED"* ]]; then
print_status "vault pool: DEGRADED" "warn"
else
print_status "vault pool: $TRUENAS_ZFS" "fail"
fi
fi
# Check Containers
echo ""
echo " Containers:"
CRITICAL_CTS="200 202" # PiHole, Traefik
STOPPED_CTS=""
while IFS= read -r line; do
CTID=$(echo "$line" | awk '{print $1}')
STATUS=$(echo "$line" | awk '{print $2}')
NAME=$(echo "$line" | awk '{print $4}')
if [[ "$STATUS" == "running" ]]; then
print_status "$CTID $NAME: $STATUS" "ok"
else
print_status "$CTID $NAME: $STATUS" "fail"
if [[ " $CRITICAL_CTS " =~ " $CTID " ]]; then
STOPPED_CTS="$STOPPED_CTS $CTID"
fi
fi
done < <(ssh_pve "pct list" | tail -n +2)
# ============================================
# PVE2 (Secondary Server)
# ============================================
echo ""
echo "--- PVE2 (10.10.10.102) ---"
if ssh_pve2 "echo ok" > /dev/null 2>&1; then
print_status "PVE2 Reachable" "ok"
# Check CPU temp
TEMP2=$(ssh_pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo $(($(cat $f)/1000)); fi; done')
if [[ -n "$TEMP2" ]]; then
if [[ "$TEMP2" -lt 85 ]]; then
print_status "CPU Temp: ${TEMP2}°C" "ok"
elif [[ "$TEMP2" -lt 90 ]]; then
print_status "CPU Temp: ${TEMP2}°C (warm)" "warn"
else
print_status "CPU Temp: ${TEMP2}°C (HOT!)" "fail"
fi
fi
# Check VMs
echo ""
echo " VMs:"
while IFS= read -r line; do
VMID=$(echo "$line" | awk '{print $1}')
NAME=$(echo "$line" | awk '{print $2}')
STATUS=$(echo "$line" | awk '{print $3}')
if [[ "$STATUS" == "running" ]]; then
print_status "$VMID $NAME: $STATUS" "ok"
else
print_status "$VMID $NAME: $STATUS" "fail"
fi
done < <(ssh_pve2 "qm list" | tail -n +2)
else
print_status "PVE2 Unreachable" "fail"
fi
# ============================================
# FIX MODE - Start stopped services
# ============================================
if $FIX_MODE && [[ -n "$STOPPED_VMS" || -n "$STOPPED_CTS" || "$TRUENAS_ZFS_SUSPENDED" == "true" ]]; then
echo ""
echo "================================"
echo " RECOVERY MODE"
echo "================================"
# Fix TrueNAS ZFS SUSPENDED state first (critical for mounts)
if [[ "$TRUENAS_ZFS_SUSPENDED" == "true" ]]; then
echo ""
echo "Clearing TrueNAS ZFS pool errors..."
ZFS_CLEAR_RESULT=$(ssh_pve 'qm guest exec 100 -- bash -c "zpool clear vault 2>&1 && zpool list -H -o health vault"' 2>/dev/null | grep -o '"out-data"[^}]*' | sed 's/"out-data" : "//' | tr -d '\\n"' || echo "FAILED")
if [[ "$ZFS_CLEAR_RESULT" == *"ONLINE"* ]]; then
print_status "vault pool recovered: ONLINE" "ok"
else
print_status "vault pool recovery failed: $ZFS_CLEAR_RESULT" "fail"
fi
sleep 5 # Give ZFS time to stabilize
fi
# Start TrueNAS first (it provides storage)
if [[ " $STOPPED_VMS " =~ " 100 " ]]; then
echo ""
echo "Starting TrueNAS (VM 100) first..."
ssh_pve "qm start 100" && print_status "TrueNAS started" "ok" || print_status "Failed to start TrueNAS" "fail"
echo "Waiting 60s for TrueNAS to boot..."
sleep 60
fi
# Start other VMs
for VMID in $STOPPED_VMS; do
if [[ "$VMID" != "100" ]]; then
NAME=$(ssh_pve "qm config $VMID | grep '^name:' | awk '{print \$2}'")
echo "Starting VM $VMID ($NAME)..."
ssh_pve "qm start $VMID" && print_status "$NAME started" "ok" || print_status "Failed to start $NAME" "fail"
sleep 5
fi
done
# Start containers
for CTID in $STOPPED_CTS; do
NAME=$(ssh_pve "pct config $CTID | grep '^hostname:' | awk '{print \$2}'")
echo "Starting CT $CTID ($NAME)..."
ssh_pve "pct start $CTID" && print_status "$NAME started" "ok" || print_status "Failed to start $NAME" "fail"
sleep 3
done
# Mount TrueNAS shares on Saltbox if Saltbox is running
if ssh_pve "qm status 101" 2>/dev/null | grep -q running; then
echo ""
echo "Checking TrueNAS mounts on Saltbox..."
sleep 10 # Give services time to start
MOUNT_STATUS=$(ssh_pve 'qm guest exec 101 -- bash -c "mount | grep -c Media"' 2>/dev/null | grep -o '"out-data"[^}]*' | grep -o '[0-9]' || echo "0")
if [[ "$MOUNT_STATUS" == "0" ]]; then
echo "Mounting TrueNAS shares..."
ssh_pve 'qm guest exec 101 -- bash -c "mount /mnt/local/Media; mount /mnt/local/downloads"' 2>/dev/null
print_status "TrueNAS mounts attempted" "ok"
# Restart Immich
echo "Restarting Immich..."
ssh_pve 'qm guest exec 101 -- bash -c "docker restart immich"' 2>/dev/null
print_status "Immich restarted" "ok"
else
print_status "TrueNAS mounts already present" "ok"
fi
fi
fi
# ============================================
# Summary
# ============================================
echo ""
echo "================================"
echo " SUMMARY"
echo "================================"
ISSUES=0
if [[ -n "$STOPPED_VMS" ]] && ! $FIX_MODE; then
echo -e "${YELLOW}Stopped critical VMs:${NC}$STOPPED_VMS"
ISSUES=$((ISSUES + 1))
fi
if [[ -n "$STOPPED_CTS" ]] && ! $FIX_MODE; then
echo -e "${YELLOW}Stopped critical containers:${NC}$STOPPED_CTS"
ISSUES=$((ISSUES + 1))
fi
if [[ "$TRUENAS_ZFS_SUSPENDED" == "true" ]] && ! $FIX_MODE; then
echo -e "${RED}TrueNAS ZFS pool SUSPENDED!${NC} SMB mounts will fail."
ISSUES=$((ISSUES + 1))
fi
if [[ "$ISSUES" -eq 0 ]]; then
echo -e "${GREEN}All critical services healthy!${NC}"
else
echo ""
echo -e "Run ${YELLOW}./health-check.sh --fix${NC} to attempt recovery"
fi
echo ""
echo "Done: $(date '+%Y-%m-%d %H:%M:%S')"