Initial commit: Homelab infrastructure documentation
- CLAUDE.md: Main homelab assistant context and instructions - IP-ASSIGNMENTS.md: Complete IP address assignments - NETWORK.md: Network bridges, VLANs, and configuration - EMC-ENCLOSURE.md: EMC storage enclosure documentation - SYNCTHING.md: Syncthing setup and device list - SHELL-ALIASES.md: ZSH aliases for Claude Code sessions - HOMEASSISTANT.md: Home Assistant API and automations - INFRASTRUCTURE.md: Server hardware and power management - configs/: Shared shell configurations - scripts/: Utility scripts - mcp-central/: MCP server configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
22
.gitignore
vendored
Normal file
22
.gitignore
vendored
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
# Secrets and credentials
|
||||||
|
.env
|
||||||
|
*.credentials
|
||||||
|
*-credentials*.txt
|
||||||
|
|
||||||
|
# macOS
|
||||||
|
.DS_Store
|
||||||
|
.AppleDouble
|
||||||
|
.LSOverride
|
||||||
|
|
||||||
|
# Editor/IDE
|
||||||
|
.obsidian/
|
||||||
|
.claude/
|
||||||
|
.vscode/
|
||||||
|
*.swp
|
||||||
|
*.swo
|
||||||
|
*~
|
||||||
|
|
||||||
|
# Temporary files
|
||||||
|
*.tmp
|
||||||
|
*.bak
|
||||||
|
nul
|
||||||
197
CHANGELOG.md
Normal file
197
CHANGELOG.md
Normal file
@@ -0,0 +1,197 @@
|
|||||||
|
# Homelab Changelog
|
||||||
|
|
||||||
|
## 2024-12-16
|
||||||
|
|
||||||
|
### Power Investigation
|
||||||
|
Investigated UPS power limit issues across both Proxmox servers.
|
||||||
|
|
||||||
|
#### Findings
|
||||||
|
1. **KSMD (Kernel Same-page Merging Daemon)** was consuming 50-57% CPU constantly on PVE
|
||||||
|
- `sleep_millisecs` set to 12ms (extremely aggressive, default is 200ms)
|
||||||
|
- `general_profit` was **negative** (-320MB) meaning it was wasting CPU
|
||||||
|
- No memory overcommit situation (98GB allocated on 128GB RAM)
|
||||||
|
- Diverse workloads (TrueNAS, Windows, Linux) = few duplicate pages to merge
|
||||||
|
|
||||||
|
2. **GPU Power Draw** identified as major consumers:
|
||||||
|
- RTX A6000 on PVE2: up to 300W TDP
|
||||||
|
- TITAN RTX on PVE: up to 280W TDP
|
||||||
|
- Quadro P2000 on PVE: up to 75W TDP
|
||||||
|
|
||||||
|
3. **TrueNAS VM** occasionally spiking to 86% CPU (needs investigation)
|
||||||
|
|
||||||
|
#### Changes Made
|
||||||
|
- [x] **Disabled KSMD on PVE** (10.10.10.120)
|
||||||
|
```bash
|
||||||
|
echo 0 > /sys/kernel/mm/ksm/run
|
||||||
|
```
|
||||||
|
- Immediate result: KSMD CPU dropped from 51-57% to 0%
|
||||||
|
- Load average dropped from 1.88 to 1.28
|
||||||
|
- Estimated savings: ~7-10W continuous
|
||||||
|
|
||||||
|
#### Additional Changes
|
||||||
|
- [x] **Made KSMD disable persistent on both hosts**
|
||||||
|
- Note: KSM is controlled via sysfs, not sysctl
|
||||||
|
- Created systemd service `/etc/systemd/system/disable-ksm.service`:
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=Disable KSM (Kernel Same-page Merging)
|
||||||
|
After=multi-user.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecStart=/bin/sh -c "echo 0 > /sys/kernel/mm/ksm/run"
|
||||||
|
RemainAfterExit=yes
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
- Enabled on both PVE and PVE2: `systemctl enable disable-ksm.service`
|
||||||
|
|
||||||
|
### Syncthing Rescan Interval Fix
|
||||||
|
**Root Cause**: Syncthing on TrueNAS was rescanning 56GB of data every 60 seconds, causing constant 100% CPU usage (~3172 minutes CPU time in 3 days).
|
||||||
|
|
||||||
|
**Folders affected** (changed from 60s to 3600s):
|
||||||
|
- downloads (38GB)
|
||||||
|
- documents (11GB)
|
||||||
|
- desktop (7.2GB)
|
||||||
|
- config, movies, notes, pictures
|
||||||
|
|
||||||
|
**Fix applied**:
|
||||||
|
```bash
|
||||||
|
# Downloaded config from TrueNAS
|
||||||
|
ssh pve 'qm guest exec 100 -- cat /mnt/.ix-apps/app_mounts/syncthing/config/config/config.xml'
|
||||||
|
|
||||||
|
# Changed all rescanIntervalS="60" to rescanIntervalS="3600"
|
||||||
|
sed -i 's/rescanIntervalS="60"/rescanIntervalS="3600"/g' config.xml
|
||||||
|
|
||||||
|
# Uploaded and restarted Syncthing
|
||||||
|
curl -X POST -H "X-API-Key: xxx" http://localhost:20910/rest/system/restart
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: fsWatcher is enabled, so changes are detected in real-time. The rescan is just a safety net.
|
||||||
|
|
||||||
|
**Estimated savings**: ~60-80W (TrueNAS VM CPU will drop from 86% to ~5-10% at idle)
|
||||||
|
|
||||||
|
### GPU Power State Investigation
|
||||||
|
|
||||||
|
| GPU | VM | Idle Power | P-State | Status |
|
||||||
|
|-----|-----|-----------|---------|--------|
|
||||||
|
| RTX A6000 | trading-vm (301) | **11W** | P8 | Optimal |
|
||||||
|
| TITAN RTX | lmdev1 (111) | **2W** | P8 | Excellent! |
|
||||||
|
| Quadro P2000 | saltbox (101) | **25W** | P0 | Stuck due to Plex |
|
||||||
|
|
||||||
|
**Findings**:
|
||||||
|
- RTX A6000: Properly entering P8 (11W idle) - excellent
|
||||||
|
- TITAN RTX: Only 2W at idle despite ComfyUI/Python processes (436MiB VRAM used)
|
||||||
|
- Modern GPUs have much better idle power management
|
||||||
|
- Quadro P2000: Stuck in P0 at 25W because Plex Transcoder holds GPU memory
|
||||||
|
- Older Quadro cards don't idle as efficiently with processes attached
|
||||||
|
- Power limit fixed at 75W (not adjustable)
|
||||||
|
|
||||||
|
**Changes made**:
|
||||||
|
- [x] Installed QEMU guest agent on lmdev1 (VM 111)
|
||||||
|
- [x] Added SSH key access to lmdev1 (10.10.10.111)
|
||||||
|
- [x] Updated ~/.ssh/config with lmdev1 entry
|
||||||
|
|
||||||
|
### CPU Governor Optimization
|
||||||
|
|
||||||
|
**Issue**: Both servers using `performance` CPU governor, keeping CPUs at high frequencies (3-4GHz) even when 99% idle.
|
||||||
|
|
||||||
|
**Changes**:
|
||||||
|
|
||||||
|
#### PVE (10.10.10.120)
|
||||||
|
- **Driver**: `amd-pstate-epp` (modern AMD P-State with Energy Performance Preference)
|
||||||
|
- **Change**: Governor `performance` → `powersave`, EPP `performance` → `balance_power`
|
||||||
|
- **Result**: Idle frequencies dropped from ~4GHz to ~1.7GHz
|
||||||
|
- **Persistence**: Created `/etc/systemd/system/cpu-powersave.service`
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=Set CPU governor to powersave with balance_power EPP
|
||||||
|
After=multi-user.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecStart=/bin/bash -c 'for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo powersave > "$gov"; done; for epp in /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference; do echo balance_power > "$epp"; done'
|
||||||
|
RemainAfterExit=yes
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
#### PVE2 (10.10.10.102)
|
||||||
|
- **Driver**: `acpi-cpufreq` (older driver)
|
||||||
|
- **Change**: Governor `performance` → `schedutil`
|
||||||
|
- **Result**: Idle frequencies dropped from ~4GHz to ~2.2GHz
|
||||||
|
- **Persistence**: Created `/etc/systemd/system/cpu-powersave.service`
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=Set CPU governor to schedutil for power savings
|
||||||
|
After=multi-user.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecStart=/bin/bash -c 'for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo schedutil > "$gov"; done'
|
||||||
|
RemainAfterExit=yes
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
**Estimated savings**: 30-60W per server (60-120W total)
|
||||||
|
|
||||||
|
### ksmtuned Service Disabled
|
||||||
|
|
||||||
|
**Issue**: The `ksmtuned` (KSM tuning daemon) was still running on both servers even after KSMD was disabled. Consuming ~39 min CPU on PVE and ~12 min CPU on PVE2 over 3 days.
|
||||||
|
|
||||||
|
**Fix**:
|
||||||
|
```bash
|
||||||
|
systemctl stop ksmtuned
|
||||||
|
systemctl disable ksmtuned
|
||||||
|
```
|
||||||
|
|
||||||
|
Applied to both PVE and PVE2.
|
||||||
|
|
||||||
|
**Estimated savings**: ~2-5W
|
||||||
|
|
||||||
|
### HDD Spindown on PVE2
|
||||||
|
|
||||||
|
**Issue**: Two WD Red 6TB drives (local-zfs2 pool) spinning 24/7 despite pool having only 768KB used. Each drive uses 5-8W spinning.
|
||||||
|
|
||||||
|
**Fix**:
|
||||||
|
```bash
|
||||||
|
# Set 30-minute spindown timeout
|
||||||
|
hdparm -S 241 /dev/sda /dev/sdb
|
||||||
|
```
|
||||||
|
|
||||||
|
**Persistence**: Created udev rule `/etc/udev/rules.d/69-hdd-spindown.rules`:
|
||||||
|
```
|
||||||
|
ACTION=="add", KERNEL=="sd[a-z]", ATTRS{model}=="WDC WD60EFRX-68L*", RUN+="/usr/sbin/hdparm -S 241 /dev/%k"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Estimated savings**: ~10-16W (when drives spin down)
|
||||||
|
|
||||||
|
#### Pending Changes
|
||||||
|
- [ ] Monitor overall power consumption after all optimizations
|
||||||
|
- [ ] Consider PCIe ASPM optimization
|
||||||
|
- [ ] Consider NMI watchdog disable
|
||||||
|
|
||||||
|
### SSH Key Setup
|
||||||
|
- Added SSH key authentication to both Proxmox servers
|
||||||
|
- Updated `~/.ssh/config` with entries for `pve` and `pve2`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
### What is KSMD?
|
||||||
|
Kernel Same-page Merging Daemon - scans memory for duplicate pages across VMs and merges them. Trades CPU cycles for RAM savings. Useful when:
|
||||||
|
- Overcommitting memory
|
||||||
|
- Running many identical VMs
|
||||||
|
|
||||||
|
Not useful when:
|
||||||
|
- Plenty of RAM headroom (our case)
|
||||||
|
- Diverse workloads with few duplicate pages
|
||||||
|
- `general_profit` is negative
|
||||||
|
|
||||||
|
### What is Memory Ballooning?
|
||||||
|
Guest-cooperative memory management. Hypervisor can request VMs to give back unused RAM. Independent from KSMD. Both are Proxmox/KVM memory optimization features but serve different purposes.
|
||||||
962
CLAUDE.md
Normal file
962
CLAUDE.md
Normal file
@@ -0,0 +1,962 @@
|
|||||||
|
# Homelab Infrastructure
|
||||||
|
|
||||||
|
## Quick Reference - Common Tasks
|
||||||
|
|
||||||
|
| Task | Section | Quick Command |
|
||||||
|
|------|---------|---------------|
|
||||||
|
| **Add new public service** | [Reverse Proxy](#reverse-proxy-architecture-traefik) | Create Traefik config + Cloudflare DNS |
|
||||||
|
| **Add Cloudflare DNS** | [Cloudflare API](#cloudflare-api-access) | `curl -X POST cloudflare.com/...` |
|
||||||
|
| **Check server temps** | [Temperature Check](#server-temperature-check) | `ssh pve 'grep Tctl ...'` |
|
||||||
|
| **Syncthing issues** | [Troubleshooting](#troubleshooting-runbooks) | Check API connections |
|
||||||
|
| **SSL cert issues** | [Traefik DNS Challenge](#ssl-certificates) | Use `cloudflare` resolver |
|
||||||
|
|
||||||
|
**Key Credentials (see sections for full details):**
|
||||||
|
- Cloudflare: `cloudflare@htsn.io` / API Key in [Cloudflare API](#cloudflare-api-access)
|
||||||
|
- SSH Password: `GrilledCh33s3#`
|
||||||
|
- Traefik: CT 202 @ 10.10.10.250
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Role
|
||||||
|
|
||||||
|
You are the **Homelab Assistant** - a Claude Code session dedicated to managing and maintaining Hutson's home infrastructure. Your responsibilities include:
|
||||||
|
|
||||||
|
- **Infrastructure Management**: Proxmox servers, VMs, containers, networking
|
||||||
|
- **File Sync**: Syncthing configuration across all devices (Mac Mini, MacBook, Windows PC, TrueNAS, Android)
|
||||||
|
- **Network Administration**: Router config, SSH access, Tailscale, device management
|
||||||
|
- **Power Optimization**: CPU governors, GPU power states, service tuning
|
||||||
|
- **Documentation**: Keep CLAUDE.md, SYNCTHING.md, and SHELL-ALIASES.md up to date
|
||||||
|
- **Automation**: Shell aliases, startup scripts, scheduled tasks
|
||||||
|
|
||||||
|
You have full access to all homelab devices via SSH and APIs. Use this context to help troubleshoot, configure, and optimize the infrastructure.
|
||||||
|
|
||||||
|
### Proactive Behaviors
|
||||||
|
|
||||||
|
When the user mentions issues or asks questions, proactively:
|
||||||
|
- **"sync not working"** → Check Syncthing status on ALL devices, identify which is offline
|
||||||
|
- **"device offline"** → Ping both local and Tailscale IPs, check if service is running
|
||||||
|
- **"slow"** → Check CPU usage, running processes, Syncthing rescan activity
|
||||||
|
- **"check status"** → Run full health check across all systems
|
||||||
|
- **"something's wrong"** → Run diagnostics on likely culprits based on context
|
||||||
|
|
||||||
|
### Quick Health Checks
|
||||||
|
|
||||||
|
Run these to get a quick overview of the homelab:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# === FULL HEALTH CHECK ===
|
||||||
|
# Syncthing connections (Mac Mini)
|
||||||
|
curl -s -H "X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5" "http://127.0.0.1:8384/rest/system/connections" | python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; [print(f\"{v.get('name',k[:7])}: {'UP' if v['connected'] else 'DOWN'}\") for k,v in d.items()]"
|
||||||
|
|
||||||
|
# Proxmox VMs
|
||||||
|
ssh pve 'qm list' 2>/dev/null || echo "PVE: unreachable"
|
||||||
|
ssh pve2 'qm list' 2>/dev/null || echo "PVE2: unreachable"
|
||||||
|
|
||||||
|
# Ping critical devices
|
||||||
|
ping -c 1 -W 1 10.10.10.200 >/dev/null && echo "TrueNAS: UP" || echo "TrueNAS: DOWN"
|
||||||
|
ping -c 1 -W 1 10.10.10.1 >/dev/null && echo "Router: UP" || echo "Router: DOWN"
|
||||||
|
|
||||||
|
# Check Windows PC Syncthing (often goes offline)
|
||||||
|
nc -zw1 10.10.10.150 22000 && echo "Windows Syncthing: UP" || echo "Windows Syncthing: DOWN"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Troubleshooting Runbooks
|
||||||
|
|
||||||
|
| Symptom | Check | Fix |
|
||||||
|
|---------|-------|-----|
|
||||||
|
| Device not syncing | `curl Syncthing API → connections` | Check if device online, restart Syncthing |
|
||||||
|
| Windows PC offline | `ping 10.10.10.150` then `nc -z 22000` | SSH in, `Start-ScheduledTask -TaskName "Syncthing"` |
|
||||||
|
| Phone not syncing | Phone Syncthing app in background? | User must open app, keep screen on |
|
||||||
|
| High CPU on TrueNAS | Syncthing rescan? KSM? | Check rescan intervals, disable KSM |
|
||||||
|
| VM won't start | Storage available? RAM free? | `ssh pve 'qm start VMID'`, check logs |
|
||||||
|
| Tailscale offline | `tailscale status` | `tailscale up` or restart service |
|
||||||
|
| Sync stuck at X% | Folder errors? Conflicts? | Check `rest/folder/errors?folder=NAME` |
|
||||||
|
| Server running hot | Check KSM, check CPU processes | Disable KSM, identify runaway process |
|
||||||
|
| Storage enclosure loud | Check fan speed via SES | See [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) |
|
||||||
|
| Drives not detected | Check SAS link, LCC status | Switch LCC, rescan SCSI hosts |
|
||||||
|
|
||||||
|
### Server Temperature Check
|
||||||
|
```bash
|
||||||
|
# Check temps on both servers (Threadripper PRO max safe: 90°C Tctl)
|
||||||
|
ssh pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE Tctl: $(($(cat $f)/1000))°C"; fi; done'
|
||||||
|
ssh pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo "PVE2 Tctl: $(($(cat $f)/1000))°C"; fi; done'
|
||||||
|
```
|
||||||
|
**Healthy temps**: 70-80°C under load. **Warning**: >85°C. **Throttle**: 90°C.
|
||||||
|
|
||||||
|
### Service Dependencies
|
||||||
|
|
||||||
|
```
|
||||||
|
TrueNAS (10.10.10.200)
|
||||||
|
├── Central Syncthing hub - if down, sync breaks between devices
|
||||||
|
├── NFS/SMB shares for VMs
|
||||||
|
└── Media storage for Plex
|
||||||
|
|
||||||
|
PiHole (CT 200)
|
||||||
|
└── DNS for entire network - if down, name resolution fails
|
||||||
|
|
||||||
|
Traefik (CT 202)
|
||||||
|
└── Reverse proxy - if down, external access to services fails
|
||||||
|
|
||||||
|
Router (10.10.10.1)
|
||||||
|
└── Everything - gateway for all traffic
|
||||||
|
```
|
||||||
|
|
||||||
|
### API Quick Reference
|
||||||
|
|
||||||
|
| Service | Device | Endpoint | Auth |
|
||||||
|
|---------|--------|----------|------|
|
||||||
|
| Syncthing | Mac Mini | `http://127.0.0.1:8384/rest/` | `X-API-Key: oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5` |
|
||||||
|
| Syncthing | MacBook | `http://127.0.0.1:8384/rest/` (via SSH) | `X-API-Key: qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ` |
|
||||||
|
| Syncthing | Phone | `https://10.10.10.54:8384/rest/` | `X-API-Key: Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM` |
|
||||||
|
| Proxmox | PVE | `https://10.10.10.120:8006/api2/json/` | SSH key auth |
|
||||||
|
| Proxmox | PVE2 | `https://10.10.10.102:8006/api2/json/` | SSH key auth |
|
||||||
|
|
||||||
|
### Common Maintenance Tasks
|
||||||
|
|
||||||
|
When user asks for maintenance or you notice issues:
|
||||||
|
|
||||||
|
1. **Check Syncthing sync status** - Any folders behind? Errors?
|
||||||
|
2. **Verify all devices connected** - Run connection check
|
||||||
|
3. **Check disk space** - `ssh pve 'df -h'`, `ssh pve2 'df -h'`
|
||||||
|
4. **Review ZFS pool health** - `ssh pve 'zpool status'`
|
||||||
|
5. **Check for stuck processes** - High CPU? Memory pressure?
|
||||||
|
6. **Verify backups** - Are critical folders syncing?
|
||||||
|
|
||||||
|
### Emergency Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restart VM on Proxmox
|
||||||
|
ssh pve 'qm stop VMID && qm start VMID'
|
||||||
|
|
||||||
|
# Check what's using CPU
|
||||||
|
ssh pve 'ps aux --sort=-%cpu | head -10'
|
||||||
|
|
||||||
|
# Check ZFS pool status (via QEMU agent)
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
|
||||||
|
|
||||||
|
# Check EMC enclosure fans
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
|
||||||
|
|
||||||
|
# Force Syncthing rescan
|
||||||
|
curl -X POST "http://127.0.0.1:8384/rest/db/scan?folder=FOLDER" -H "X-API-Key: API_KEY"
|
||||||
|
|
||||||
|
# Restart Syncthing on Windows (when stuck)
|
||||||
|
sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Stop-Process -Name syncthing -Force; Start-ScheduledTask -TaskName "Syncthing"'
|
||||||
|
|
||||||
|
# Get all device IPs from router
|
||||||
|
expect -c 'spawn ssh root@10.10.10.1 "cat /proc/net/arp"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Two Proxmox servers running various VMs and containers for home infrastructure, media, development, and AI workloads.
|
||||||
|
|
||||||
|
## Servers
|
||||||
|
|
||||||
|
### PVE (10.10.10.120) - Primary
|
||||||
|
- **CPU**: AMD Ryzen Threadripper PRO 3975WX (32-core, 64 threads, 280W TDP)
|
||||||
|
- **RAM**: 128 GB
|
||||||
|
- **Storage**:
|
||||||
|
- `nvme-mirror1`: 2x Sabrent Rocket Q NVMe (3.6TB usable)
|
||||||
|
- `nvme-mirror2`: 2x Kingston SFYRD 2TB (1.8TB usable)
|
||||||
|
- `rpool`: 2x Samsung 870 QVO 4TB SSD mirror (3.6TB usable)
|
||||||
|
- **GPUs**:
|
||||||
|
- NVIDIA Quadro P2000 (75W TDP) - Plex transcoding
|
||||||
|
- NVIDIA TITAN RTX (280W TDP) - AI workloads, passed to saltbox/lmdev1
|
||||||
|
- **Role**: Primary VM host, TrueNAS, media services
|
||||||
|
|
||||||
|
### PVE2 (10.10.10.102) - Secondary
|
||||||
|
- **CPU**: AMD Ryzen Threadripper PRO 3975WX (32-core, 64 threads, 280W TDP)
|
||||||
|
- **RAM**: 128 GB
|
||||||
|
- **Storage**:
|
||||||
|
- `nvme-mirror3`: 2x NVMe mirror
|
||||||
|
- `local-zfs2`: 2x WD Red 6TB HDD mirror
|
||||||
|
- **GPUs**:
|
||||||
|
- NVIDIA RTX A6000 (300W TDP) - passed to trading-vm
|
||||||
|
- **Role**: Trading platform, development
|
||||||
|
|
||||||
|
## SSH Access
|
||||||
|
|
||||||
|
### SSH Key Authentication (All Hosts)
|
||||||
|
|
||||||
|
SSH keys are configured in `~/.ssh/config` on both Mac Mini and MacBook. Use the `~/.ssh/homelab` key.
|
||||||
|
|
||||||
|
| Host Alias | IP | User | Type | Notes |
|
||||||
|
|------------|-----|------|------|-------|
|
||||||
|
| `pve` | 10.10.10.120 | root | Proxmox | Primary server |
|
||||||
|
| `pve2` | 10.10.10.102 | root | Proxmox | Secondary server |
|
||||||
|
| `truenas` | 10.10.10.200 | root | VM | NAS/storage |
|
||||||
|
| `saltbox` | 10.10.10.100 | hutson | VM | Media automation |
|
||||||
|
| `lmdev1` | 10.10.10.111 | hutson | VM | AI/LLM development |
|
||||||
|
| `docker-host` | 10.10.10.206 | hutson | VM | Docker services |
|
||||||
|
| `fs-dev` | 10.10.10.5 | hutson | VM | Development |
|
||||||
|
| `copyparty` | 10.10.10.201 | hutson | VM | File sharing |
|
||||||
|
| `gitea-vm` | 10.10.10.220 | hutson | VM | Git server |
|
||||||
|
| `trading-vm` | 10.10.10.221 | hutson | VM | AI trading platform |
|
||||||
|
| `pihole` | 10.10.10.10 | root | LXC | DNS/Ad blocking |
|
||||||
|
| `traefik` | 10.10.10.250 | root | LXC | Reverse proxy |
|
||||||
|
| `findshyt` | 10.10.10.8 | root | LXC | Custom app |
|
||||||
|
|
||||||
|
**Usage examples:**
|
||||||
|
```bash
|
||||||
|
ssh pve 'qm list' # List VMs
|
||||||
|
ssh truenas 'zpool status vault' # Check ZFS pool
|
||||||
|
ssh saltbox 'docker ps' # List containers
|
||||||
|
ssh pihole 'pihole status' # Check Pi-hole
|
||||||
|
```
|
||||||
|
|
||||||
|
### Password Auth (Special Cases)
|
||||||
|
|
||||||
|
| Device | IP | User | Auth Method | Notes |
|
||||||
|
|--------|-----|------|-------------|-------|
|
||||||
|
| UniFi Router | 10.10.10.1 | root | expect (keyboard-interactive) | Gateway |
|
||||||
|
| Windows PC | 10.10.10.150 | claude | sshpass | PowerShell, use `;` not `&&` |
|
||||||
|
| HomeAssistant | 10.10.10.110 | - | QEMU agent only | No SSH server |
|
||||||
|
|
||||||
|
**Router access (requires expect):**
|
||||||
|
```bash
|
||||||
|
# Run command on router
|
||||||
|
expect -c 'spawn ssh root@10.10.10.1 "hostname"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
|
||||||
|
|
||||||
|
# Get ARP table (all device IPs)
|
||||||
|
expect -c 'spawn ssh root@10.10.10.1 "cat /proc/net/arp"; expect "Password:"; send "GrilledCh33s3#\r"; expect eof'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Windows PC access:**
|
||||||
|
```bash
|
||||||
|
sshpass -p 'GrilledCh33s3#' ssh claude@10.10.10.150 'Get-Process | Select -First 5'
|
||||||
|
```
|
||||||
|
|
||||||
|
**HomeAssistant (no SSH, use QEMU agent):**
|
||||||
|
```bash
|
||||||
|
ssh pve 'qm guest exec 110 -- bash -c "ha core info"'
|
||||||
|
```
|
||||||
|
|
||||||
|
## VMs and Containers
|
||||||
|
|
||||||
|
### PVE (10.10.10.120)
|
||||||
|
| VMID | Name | vCPUs | RAM | Purpose | GPU/Passthrough | QEMU Agent |
|
||||||
|
|------|------|-------|-----|---------|-----------------|------------|
|
||||||
|
| 100 | truenas | 8 | 32GB | NAS, storage | LSI SAS2308 HBA, Samsung NVMe | Yes |
|
||||||
|
| 101 | saltbox | 16 | 16GB | Media automation | TITAN RTX | Yes |
|
||||||
|
| 105 | fs-dev | 10 | 8GB | Development | - | Yes |
|
||||||
|
| 110 | homeassistant | 2 | 2GB | Home automation | - | No |
|
||||||
|
| 111 | lmdev1 | 8 | 32GB | AI/LLM development | TITAN RTX | Yes |
|
||||||
|
| 201 | copyparty | 2 | 2GB | File sharing | - | Yes |
|
||||||
|
| 206 | docker-host | 2 | 4GB | Docker services | - | Yes |
|
||||||
|
| 200 | pihole (CT) | - | - | DNS/Ad blocking | - | N/A |
|
||||||
|
| 202 | traefik (CT) | - | - | Reverse proxy | - | N/A |
|
||||||
|
| 205 | findshyt (CT) | - | - | Custom app | - | N/A |
|
||||||
|
|
||||||
|
### PVE2 (10.10.10.102)
|
||||||
|
| VMID | Name | vCPUs | RAM | Purpose | GPU/Passthrough | QEMU Agent |
|
||||||
|
|------|------|-------|-----|---------|-----------------|------------|
|
||||||
|
| 300 | gitea-vm | 2 | 4GB | Git server | - | Yes |
|
||||||
|
| 301 | trading-vm | 16 | 32GB | AI trading platform | RTX A6000 | Yes |
|
||||||
|
|
||||||
|
### QEMU Guest Agent
|
||||||
|
VMs with QEMU agent can be managed via `qm guest exec`:
|
||||||
|
```bash
|
||||||
|
# Execute command in VM
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
|
||||||
|
|
||||||
|
# Get VM IP addresses
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "ip addr"'
|
||||||
|
```
|
||||||
|
Only VM 110 (homeassistant) lacks QEMU agent - use its web UI instead.
|
||||||
|
|
||||||
|
## Power Management
|
||||||
|
|
||||||
|
### Estimated Power Draw
|
||||||
|
- **PVE**: 500-750W (CPU + TITAN RTX + P2000 + storage + HBAs)
|
||||||
|
- **PVE2**: 450-600W (CPU + RTX A6000 + storage)
|
||||||
|
- **Combined**: ~1000-1350W under load
|
||||||
|
|
||||||
|
### Optimizations Applied
|
||||||
|
1. **KSMD Disabled** (2024-12-17 updated)
|
||||||
|
- Was consuming 44-57% CPU on PVE with negative profit
|
||||||
|
- Caused CPU temp to rise from 74°C to 83°C
|
||||||
|
- Savings: ~7-10W + significant temp reduction
|
||||||
|
- Made permanent via:
|
||||||
|
- systemd service: `/etc/systemd/system/disable-ksm.service`
|
||||||
|
- **ksmtuned masked**: `systemctl mask ksmtuned` (prevents re-enabling)
|
||||||
|
- **Note**: KSM can get re-enabled by Proxmox updates. If CPU is hot, check:
|
||||||
|
```bash
|
||||||
|
cat /sys/kernel/mm/ksm/run # Should be 0
|
||||||
|
ps aux | grep ksmd # Should show 0% CPU
|
||||||
|
# If KSM is running (run=1), disable it:
|
||||||
|
echo 0 > /sys/kernel/mm/ksm/run
|
||||||
|
systemctl mask ksmtuned
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Syncthing Rescan Intervals** (2024-12-16)
|
||||||
|
- Changed aggressive 60s rescans to 3600s for large folders
|
||||||
|
- Affected: downloads (38GB), documents (11GB), desktop (7.2GB), movies, pictures, notes, config
|
||||||
|
- Savings: ~60-80W (TrueNAS VM was at constant 86% CPU)
|
||||||
|
|
||||||
|
3. **CPU Governor Optimization** (2024-12-16)
|
||||||
|
- PVE: `powersave` governor + `balance_power` EPP (amd-pstate-epp driver)
|
||||||
|
- PVE2: `schedutil` governor (acpi-cpufreq driver)
|
||||||
|
- Made permanent via systemd service: `/etc/systemd/system/cpu-powersave.service`
|
||||||
|
- Savings: ~60-120W combined (CPUs now idle at 1.7-2.2GHz vs 4GHz)
|
||||||
|
|
||||||
|
4. **GPU Power States** (2024-12-16) - Verified optimal
|
||||||
|
- RTX A6000: 11W idle (P8 state)
|
||||||
|
- TITAN RTX: 2-3W idle (P8 state)
|
||||||
|
- Quadro P2000: 25W (P0 - Plex keeps it active)
|
||||||
|
|
||||||
|
5. **ksmtuned Disabled** (2024-12-16)
|
||||||
|
- KSM tuning daemon was still running after KSMD disabled
|
||||||
|
- Stopped and disabled on both servers
|
||||||
|
- Savings: ~2-5W
|
||||||
|
|
||||||
|
6. **HDD Spindown on PVE2** (2024-12-16)
|
||||||
|
- local-zfs2 pool (2x WD Red 6TB) had only 768KB used but drives spinning 24/7
|
||||||
|
- Set 30-minute spindown via `hdparm -S 241`
|
||||||
|
- Persistent via udev rule: `/etc/udev/rules.d/69-hdd-spindown.rules`
|
||||||
|
- Savings: ~10-16W when spun down
|
||||||
|
|
||||||
|
### Potential Optimizations
|
||||||
|
- [ ] PCIe ASPM power management
|
||||||
|
- [ ] NMI watchdog disable
|
||||||
|
|
||||||
|
## Memory Configuration
|
||||||
|
- Ballooning enabled on most VMs but not actively used
|
||||||
|
- No memory overcommit (98GB allocated on 128GB physical for PVE)
|
||||||
|
- KSMD was wasting CPU with no benefit (negative general_profit)
|
||||||
|
|
||||||
|
## Network
|
||||||
|
|
||||||
|
See [NETWORK.md](NETWORK.md) for full details.
|
||||||
|
|
||||||
|
### Network Ranges
|
||||||
|
| Network | Range | Purpose |
|
||||||
|
|---------|-------|---------|
|
||||||
|
| LAN | 10.10.10.0/24 | Primary network, all external access |
|
||||||
|
| Internal | 10.10.20.0/24 | Inter-VM only (storage, NFS/iSCSI) |
|
||||||
|
|
||||||
|
### PVE Bridges (10.10.10.120)
|
||||||
|
| Bridge | NIC | Speed | Purpose | Use For |
|
||||||
|
|--------|-----|-------|---------|---------|
|
||||||
|
| vmbr0 | enp1s0 | 1 Gb | Management | General VMs/CTs |
|
||||||
|
| vmbr1 | enp35s0f0 | 10 Gb | High-speed LXC | Bandwidth-heavy containers |
|
||||||
|
| vmbr2 | enp35s0f1 | 10 Gb | High-speed VM | TrueNAS, Saltbox, storage VMs |
|
||||||
|
| vmbr3 | (none) | Virtual | Internal only | NFS/iSCSI traffic, no internet |
|
||||||
|
|
||||||
|
### Quick Reference
|
||||||
|
```bash
|
||||||
|
# Add VM to standard network (1Gb)
|
||||||
|
qm set VMID --net0 virtio,bridge=vmbr0
|
||||||
|
|
||||||
|
# Add VM to high-speed network (10Gb)
|
||||||
|
qm set VMID --net0 virtio,bridge=vmbr2
|
||||||
|
|
||||||
|
# Add secondary NIC for internal storage network
|
||||||
|
qm set VMID --net1 virtio,bridge=vmbr3
|
||||||
|
```
|
||||||
|
|
||||||
|
- MTU 9000 (jumbo frames) on all bridges
|
||||||
|
|
||||||
|
## Common Commands
|
||||||
|
```bash
|
||||||
|
# Check VM status
|
||||||
|
ssh pve 'qm list'
|
||||||
|
ssh pve2 'qm list'
|
||||||
|
|
||||||
|
# Check container status
|
||||||
|
ssh pve 'pct list'
|
||||||
|
|
||||||
|
# Monitor CPU/power
|
||||||
|
ssh pve 'top -bn1 | head -20'
|
||||||
|
|
||||||
|
# Check ZFS pools
|
||||||
|
ssh pve 'zpool status'
|
||||||
|
|
||||||
|
# Check GPU (if nvidia-smi installed in VM)
|
||||||
|
ssh pve 'lspci | grep -i nvidia'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Remote Claude Code Sessions (Mac Mini)
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
The Mac Mini (`hutson-mac-mini.local`) runs the Happy Coder daemon, enabling on-demand Claude Code sessions accessible from anywhere via the Happy Coder mobile app. Sessions are created when you need them - no persistent tmux sessions required.
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
```
|
||||||
|
Mac Mini (100.108.89.58 via Tailscale)
|
||||||
|
├── launchd (auto-starts on boot)
|
||||||
|
│ └── com.hutson.happy-daemon.plist (starts Happy daemon)
|
||||||
|
├── Happy Coder daemon (manages remote sessions)
|
||||||
|
└── Tailscale (secure remote access)
|
||||||
|
```
|
||||||
|
|
||||||
|
### How It Works
|
||||||
|
1. Happy daemon runs on Mac Mini (auto-starts on boot)
|
||||||
|
2. Open Happy Coder app on phone/tablet
|
||||||
|
3. Start a new Claude session from the app
|
||||||
|
4. Session runs in any working directory you choose
|
||||||
|
5. Session ends when you're done - no cleanup needed
|
||||||
|
|
||||||
|
### Quick Commands
|
||||||
|
```bash
|
||||||
|
# Check daemon status
|
||||||
|
happy daemon list
|
||||||
|
|
||||||
|
# Start a new session manually (from Mac Mini terminal)
|
||||||
|
cd ~/Projects/homelab && happy claude
|
||||||
|
|
||||||
|
# Check active sessions
|
||||||
|
happy daemon list
|
||||||
|
```
|
||||||
|
|
||||||
|
### Mobile Access Setup (One-time)
|
||||||
|
1. Download Happy Coder app:
|
||||||
|
- iOS: https://apps.apple.com/us/app/happy-claude-code-client/id6748571505
|
||||||
|
- Android: https://play.google.com/store/apps/details?id=com.ex3ndr.happy
|
||||||
|
2. On Mac Mini, run: `happy auth` and scan QR code with the app
|
||||||
|
3. Daemon auto-starts on boot via launchd
|
||||||
|
|
||||||
|
### Daemon Management
|
||||||
|
```bash
|
||||||
|
happy daemon start # Start daemon
|
||||||
|
happy daemon stop # Stop daemon
|
||||||
|
happy daemon status # Check status
|
||||||
|
happy daemon list # List active sessions
|
||||||
|
```
|
||||||
|
|
||||||
|
### Remote Access via SSH + Tailscale
|
||||||
|
From any device on Tailscale network:
|
||||||
|
```bash
|
||||||
|
# SSH to Mac Mini
|
||||||
|
ssh hutson@100.108.89.58
|
||||||
|
|
||||||
|
# Or via hostname
|
||||||
|
ssh hutson@mac-mini
|
||||||
|
|
||||||
|
# Start Claude in desired directory
|
||||||
|
cd ~/Projects/homelab && happy claude
|
||||||
|
```
|
||||||
|
|
||||||
|
### Files & Configuration
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `~/Library/LaunchAgents/com.hutson.happy-daemon.plist` | launchd auto-start Happy daemon |
|
||||||
|
| `~/.happy/` | Happy Coder config and logs |
|
||||||
|
|
||||||
|
### Troubleshooting
|
||||||
|
```bash
|
||||||
|
# Check if daemon is running
|
||||||
|
pgrep -f "happy.*daemon"
|
||||||
|
|
||||||
|
# Check launchd status
|
||||||
|
launchctl list | grep happy
|
||||||
|
|
||||||
|
# List active sessions
|
||||||
|
happy daemon list
|
||||||
|
|
||||||
|
# Restart daemon
|
||||||
|
happy daemon stop && happy daemon start
|
||||||
|
|
||||||
|
# If Tailscale is disconnected
|
||||||
|
/Applications/Tailscale.app/Contents/MacOS/Tailscale up
|
||||||
|
```
|
||||||
|
|
||||||
|
## Agent and Tool Guidelines
|
||||||
|
|
||||||
|
### Background Agents
|
||||||
|
- **Always spin up background agents when doing multiple independent tasks**
|
||||||
|
- Background agents allow parallel execution of tasks that don't depend on each other
|
||||||
|
- This improves efficiency and reduces total execution time
|
||||||
|
- Use background agents for tasks like running tests, builds, or searches simultaneously
|
||||||
|
|
||||||
|
### MCP Tools for Web Searches
|
||||||
|
|
||||||
|
#### ref.tools - Documentation Lookups
|
||||||
|
- **`mcp__Ref__ref_search_documentation`**: Search through documentation for specific topics
|
||||||
|
- **`mcp__Ref__ref_read_url`**: Read and parse content from documentation URLs
|
||||||
|
|
||||||
|
#### Exa MCP - General Web and Code Searches
|
||||||
|
- **`mcp__exa__web_search_exa`**: General web searches for current information
|
||||||
|
- **`mcp__exa__get_code_context_exa`**: Code-related searches and repository lookups
|
||||||
|
|
||||||
|
### MCP Tools Reference Table
|
||||||
|
|
||||||
|
| Tool Name | Provider | Purpose | Use Case |
|
||||||
|
|-----------|----------|---------|----------|
|
||||||
|
| `mcp__Ref__ref_search_documentation` | ref.tools | Search documentation | Finding specific topics in official docs |
|
||||||
|
| `mcp__Ref__ref_read_url` | ref.tools | Read documentation URLs | Parsing and extracting content from doc pages |
|
||||||
|
| `mcp__exa__web_search_exa` | Exa MCP | General web search | Current events, general information lookup |
|
||||||
|
| `mcp__exa__get_code_context_exa` | Exa MCP | Code-specific search | Finding code examples, repository searches |
|
||||||
|
|
||||||
|
## Reverse Proxy Architecture (Traefik)
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
There are **TWO separate Traefik instances** handling different services:
|
||||||
|
|
||||||
|
| Instance | Location | IP | Purpose | Manages |
|
||||||
|
|----------|----------|-----|---------|---------|
|
||||||
|
| **Traefik-Primary** | CT 202 | **10.10.10.250** | General services | All non-Saltbox services |
|
||||||
|
| **Traefik-Saltbox** | VM 101 (Docker) | **10.10.10.100** | Saltbox services | Plex, *arr apps, media stack |
|
||||||
|
|
||||||
|
### ⚠️ CRITICAL RULE: Which Traefik to Use
|
||||||
|
|
||||||
|
**When adding ANY new service:**
|
||||||
|
- ✅ **Use Traefik-Primary (10.10.10.250)** - Unless service lives inside Saltbox VM
|
||||||
|
- ❌ **DO NOT touch Traefik-Saltbox** - It manages Saltbox services with their own certificates
|
||||||
|
|
||||||
|
**Why this matters:**
|
||||||
|
- Traefik-Saltbox has complex Saltbox-managed configs
|
||||||
|
- Messing with it breaks Plex, Sonarr, Radarr, and all media services
|
||||||
|
- Each Traefik has its own Let's Encrypt certificates
|
||||||
|
- Mixing them causes certificate conflicts
|
||||||
|
|
||||||
|
### Traefik-Primary (CT 202) - For New Services
|
||||||
|
|
||||||
|
**Location**: `/etc/traefik/` on Container 202
|
||||||
|
**Config**: `/etc/traefik/traefik.yaml`
|
||||||
|
**Dynamic Configs**: `/etc/traefik/conf.d/*.yaml`
|
||||||
|
|
||||||
|
**Services using Traefik-Primary (10.10.10.250):**
|
||||||
|
- excalidraw.htsn.io → 10.10.10.206:8080 (docker-host)
|
||||||
|
- findshyt.htsn.io → 10.10.10.205 (CT 205)
|
||||||
|
- gitea (git.htsn.io) → 10.10.10.220:3000
|
||||||
|
- homeassistant → 10.10.10.110
|
||||||
|
- lmdev → 10.10.10.111
|
||||||
|
- pihole → 10.10.10.200
|
||||||
|
- truenas → 10.10.10.200
|
||||||
|
- proxmox → 10.10.10.120
|
||||||
|
- copyparty → 10.10.10.201
|
||||||
|
- aitrade → trading server
|
||||||
|
- pulse.htsn.io → 10.10.10.206:7655 (Pulse monitoring)
|
||||||
|
|
||||||
|
**Access Traefik config:**
|
||||||
|
```bash
|
||||||
|
# From Mac Mini:
|
||||||
|
ssh pve 'pct exec 202 -- cat /etc/traefik/traefik.yaml'
|
||||||
|
ssh pve 'pct exec 202 -- ls /etc/traefik/conf.d/'
|
||||||
|
|
||||||
|
# Edit a service config:
|
||||||
|
ssh pve 'pct exec 202 -- vi /etc/traefik/conf.d/myservice.yaml'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Traefik-Saltbox (VM 101) - DO NOT MODIFY
|
||||||
|
|
||||||
|
**Location**: `/opt/traefik/` inside Saltbox VM
|
||||||
|
**Managed by**: Saltbox Ansible playbooks
|
||||||
|
**Mounts**: Docker bind mount from `/opt/traefik` → `/etc/traefik` in container
|
||||||
|
|
||||||
|
**Services using Traefik-Saltbox (10.10.10.100):**
|
||||||
|
- Plex (plex.htsn.io)
|
||||||
|
- Sonarr, Radarr, Lidarr
|
||||||
|
- SABnzbd, NZBGet, qBittorrent
|
||||||
|
- Overseerr, Tautulli, Organizr
|
||||||
|
- Jackett, NZBHydra2
|
||||||
|
- Authelia (SSO)
|
||||||
|
- All other Saltbox-managed containers
|
||||||
|
|
||||||
|
**View Saltbox Traefik (read-only):**
|
||||||
|
```bash
|
||||||
|
ssh pve 'qm guest exec 101 -- bash -c "docker exec traefik cat /etc/traefik/traefik.yml"'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adding a New Public Service - Complete Workflow
|
||||||
|
|
||||||
|
Follow these steps to deploy a new service and make it publicly accessible at `servicename.htsn.io`.
|
||||||
|
|
||||||
|
#### Step 0. Deploy Your Service
|
||||||
|
|
||||||
|
First, deploy your service on the appropriate host:
|
||||||
|
|
||||||
|
**Option A: Docker on docker-host (10.10.10.206)**
|
||||||
|
```bash
|
||||||
|
ssh hutson@10.10.10.206
|
||||||
|
sudo mkdir -p /opt/myservice
|
||||||
|
cat > /opt/myservice/docker-compose.yml << 'EOF'
|
||||||
|
version: "3.8"
|
||||||
|
services:
|
||||||
|
myservice:
|
||||||
|
image: myimage:latest
|
||||||
|
ports:
|
||||||
|
- "8080:80"
|
||||||
|
restart: unless-stopped
|
||||||
|
EOF
|
||||||
|
cd /opt/myservice && sudo docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option B: New LXC Container on PVE**
|
||||||
|
```bash
|
||||||
|
ssh pve 'pct create CTID local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
|
||||||
|
--hostname myservice --memory 2048 --cores 2 \
|
||||||
|
--net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1 \
|
||||||
|
--rootfs local-zfs:8 --unprivileged 1 --start 1'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option C: New VM on PVE**
|
||||||
|
```bash
|
||||||
|
ssh pve 'qm create VMID --name myservice --memory 2048 --cores 2 \
|
||||||
|
--net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-pci'
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 1. Create Traefik Config File
|
||||||
|
|
||||||
|
Use this template for new services on **Traefik-Primary (CT 202)**:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# /etc/traefik/conf.d/myservice.yaml
|
||||||
|
http:
|
||||||
|
routers:
|
||||||
|
# HTTPS router
|
||||||
|
myservice-secure:
|
||||||
|
entryPoints:
|
||||||
|
- websecure
|
||||||
|
rule: "Host(`myservice.htsn.io`)"
|
||||||
|
service: myservice
|
||||||
|
tls:
|
||||||
|
certResolver: cloudflare # Use 'cloudflare' for proxied domains, 'letsencrypt' for DNS-only
|
||||||
|
priority: 50
|
||||||
|
|
||||||
|
# HTTP → HTTPS redirect
|
||||||
|
myservice-redirect:
|
||||||
|
entryPoints:
|
||||||
|
- web
|
||||||
|
rule: "Host(`myservice.htsn.io`)"
|
||||||
|
middlewares:
|
||||||
|
- myservice-https-redirect
|
||||||
|
service: myservice
|
||||||
|
priority: 50
|
||||||
|
|
||||||
|
services:
|
||||||
|
myservice:
|
||||||
|
loadBalancer:
|
||||||
|
servers:
|
||||||
|
- url: "http://10.10.10.XXX:PORT"
|
||||||
|
|
||||||
|
middlewares:
|
||||||
|
myservice-https-redirect:
|
||||||
|
redirectScheme:
|
||||||
|
scheme: https
|
||||||
|
permanent: true
|
||||||
|
```
|
||||||
|
|
||||||
|
### SSL Certificates
|
||||||
|
|
||||||
|
Traefik has **two certificate resolvers** configured:
|
||||||
|
|
||||||
|
| Resolver | Use When | Challenge Type | Notes |
|
||||||
|
|----------|----------|----------------|-------|
|
||||||
|
| `letsencrypt` | Cloudflare DNS-only (gray cloud) | HTTP-01 | Requires port 80 reachable |
|
||||||
|
| `cloudflare` | Cloudflare Proxied (orange cloud) | DNS-01 | Works with Cloudflare proxy |
|
||||||
|
|
||||||
|
**⚠️ Important:** If Cloudflare proxy is enabled (orange cloud), HTTP challenge fails because Cloudflare redirects HTTP→HTTPS. Use `cloudflare` resolver instead.
|
||||||
|
|
||||||
|
**Cloudflare API credentials** are configured in `/etc/systemd/system/traefik.service`:
|
||||||
|
```bash
|
||||||
|
Environment="CF_API_EMAIL=cloudflare@htsn.io"
|
||||||
|
Environment="CF_API_KEY=849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Certificate storage:**
|
||||||
|
- HTTP challenge certs: `/etc/traefik/acme.json`
|
||||||
|
- DNS challenge certs: `/etc/traefik/acme-cf.json`
|
||||||
|
|
||||||
|
**Deploy the config:**
|
||||||
|
```bash
|
||||||
|
# Create file on CT 202
|
||||||
|
ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << '\''EOF'\''
|
||||||
|
<paste config here>
|
||||||
|
EOF"'
|
||||||
|
|
||||||
|
# Traefik auto-reloads (watches conf.d directory)
|
||||||
|
# Check logs:
|
||||||
|
ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2. Add Cloudflare DNS Entry
|
||||||
|
|
||||||
|
**Cloudflare Credentials:**
|
||||||
|
- Email: `cloudflare@htsn.io`
|
||||||
|
- API Key: `849ebefd163d2ccdec25e49b3e1b3fe2cdadc`
|
||||||
|
|
||||||
|
**Manual method (via Cloudflare Dashboard):**
|
||||||
|
1. Go to https://dash.cloudflare.com/
|
||||||
|
2. Select `htsn.io` domain
|
||||||
|
3. DNS → Add Record
|
||||||
|
4. Type: `A`, Name: `myservice`, IPv4: `70.237.94.174`, Proxied: ☑️
|
||||||
|
|
||||||
|
**Automated method (CLI script):**
|
||||||
|
|
||||||
|
Save this as `~/bin/add-cloudflare-dns.sh`:
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# Add DNS record to Cloudflare for htsn.io
|
||||||
|
|
||||||
|
SUBDOMAIN="$1"
|
||||||
|
CF_EMAIL="cloudflare@htsn.io"
|
||||||
|
CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
|
||||||
|
ZONE_ID="c0f5a80448c608af35d39aa820a5f3af" # htsn.io zone
|
||||||
|
PUBLIC_IP="70.237.94.174" # Update if IP changes: curl -s ifconfig.me
|
||||||
|
|
||||||
|
if [ -z "$SUBDOMAIN" ]; then
|
||||||
|
echo "Usage: $0 <subdomain>"
|
||||||
|
echo "Example: $0 myservice # Creates myservice.htsn.io"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
|
||||||
|
-H "X-Auth-Email: $CF_EMAIL" \
|
||||||
|
-H "X-Auth-Key: $CF_API_KEY" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
--data "{
|
||||||
|
\"type\":\"A\",
|
||||||
|
\"name\":\"$SUBDOMAIN\",
|
||||||
|
\"content\":\"$PUBLIC_IP\",
|
||||||
|
\"ttl\":1,
|
||||||
|
\"proxied\":true
|
||||||
|
}" | jq .
|
||||||
|
```
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
chmod +x ~/bin/add-cloudflare-dns.sh
|
||||||
|
~/bin/add-cloudflare-dns.sh excalidraw # Creates excalidraw.htsn.io
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3. Testing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check if DNS resolves
|
||||||
|
dig myservice.htsn.io
|
||||||
|
|
||||||
|
# Test HTTP redirect
|
||||||
|
curl -I http://myservice.htsn.io
|
||||||
|
|
||||||
|
# Test HTTPS
|
||||||
|
curl -I https://myservice.htsn.io
|
||||||
|
|
||||||
|
# Check Traefik dashboard (if enabled)
|
||||||
|
# Access: http://10.10.10.250:8080/dashboard/
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 4. Update Documentation
|
||||||
|
|
||||||
|
After deploying, update these files:
|
||||||
|
|
||||||
|
1. **IP-ASSIGNMENTS.md** - Add to Services & Reverse Proxy Mapping table
|
||||||
|
2. **CLAUDE.md** - Add to "Services using Traefik-Primary" list (line ~495)
|
||||||
|
|
||||||
|
### Quick Reference - One-Liner Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# === DEPLOY SERVICE (example: myservice on docker-host port 8080) ===
|
||||||
|
|
||||||
|
# 1. Create Traefik config
|
||||||
|
ssh pve 'pct exec 202 -- bash -c "cat > /etc/traefik/conf.d/myservice.yaml << EOF
|
||||||
|
http:
|
||||||
|
routers:
|
||||||
|
myservice-secure:
|
||||||
|
entryPoints: [websecure]
|
||||||
|
rule: Host(\\\`myservice.htsn.io\\\`)
|
||||||
|
service: myservice
|
||||||
|
tls: {certResolver: letsencrypt}
|
||||||
|
services:
|
||||||
|
myservice:
|
||||||
|
loadBalancer:
|
||||||
|
servers:
|
||||||
|
- url: http://10.10.10.206:8080
|
||||||
|
EOF"'
|
||||||
|
|
||||||
|
# 2. Add Cloudflare DNS
|
||||||
|
curl -s -X POST "https://api.cloudflare.com/client/v4/zones/c0f5a80448c608af35d39aa820a5f3af/dns_records" \
|
||||||
|
-H "X-Auth-Email: cloudflare@htsn.io" \
|
||||||
|
-H "X-Auth-Key: 849ebefd163d2ccdec25e49b3e1b3fe2cdadc" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
--data '{"type":"A","name":"myservice","content":"70.237.94.174","proxied":true}'
|
||||||
|
|
||||||
|
# 3. Test (wait a few seconds for DNS propagation)
|
||||||
|
curl -I https://myservice.htsn.io
|
||||||
|
```
|
||||||
|
|
||||||
|
### Traefik Troubleshooting
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View Traefik logs (CT 202)
|
||||||
|
ssh pve 'pct exec 202 -- tail -f /var/log/traefik/traefik.log'
|
||||||
|
|
||||||
|
# Check if config is valid
|
||||||
|
ssh pve 'pct exec 202 -- cat /etc/traefik/conf.d/myservice.yaml'
|
||||||
|
|
||||||
|
# List all dynamic configs
|
||||||
|
ssh pve 'pct exec 202 -- ls -la /etc/traefik/conf.d/'
|
||||||
|
|
||||||
|
# Check certificate
|
||||||
|
ssh pve 'pct exec 202 -- cat /etc/traefik/acme.json | jq'
|
||||||
|
|
||||||
|
# Restart Traefik (if needed)
|
||||||
|
ssh pve 'pct exec 202 -- systemctl restart traefik'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Certificate Management
|
||||||
|
|
||||||
|
**Let's Encrypt certificates** are automatically managed by Traefik.
|
||||||
|
|
||||||
|
**Certificate storage:**
|
||||||
|
- Traefik-Primary: `/etc/traefik/acme.json` on CT 202
|
||||||
|
- Traefik-Saltbox: `/opt/traefik/acme.json` on VM 101
|
||||||
|
|
||||||
|
**Certificate renewal:**
|
||||||
|
- Automatic via HTTP-01 challenge
|
||||||
|
- Traefik checks every 24h
|
||||||
|
- Renews 30 days before expiry
|
||||||
|
|
||||||
|
**If certificates fail:**
|
||||||
|
```bash
|
||||||
|
# Check acme.json permissions (must be 600)
|
||||||
|
ssh pve 'pct exec 202 -- ls -la /etc/traefik/acme.json'
|
||||||
|
|
||||||
|
# Check Traefik can reach Let's Encrypt
|
||||||
|
ssh pve 'pct exec 202 -- curl -I https://acme-v02.api.letsencrypt.org/directory'
|
||||||
|
|
||||||
|
# Delete bad certificate (Traefik will re-request)
|
||||||
|
ssh pve 'pct exec 202 -- rm /etc/traefik/acme.json'
|
||||||
|
ssh pve 'pct exec 202 -- touch /etc/traefik/acme.json'
|
||||||
|
ssh pve 'pct exec 202 -- chmod 600 /etc/traefik/acme.json'
|
||||||
|
ssh pve 'pct exec 202 -- systemctl restart traefik'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker Service with Traefik Labels (Alternative)
|
||||||
|
|
||||||
|
If deploying a service via Docker on `docker-host` (VM 206), you can use Traefik labels instead of config files:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# docker-compose.yml
|
||||||
|
services:
|
||||||
|
myservice:
|
||||||
|
image: myimage:latest
|
||||||
|
labels:
|
||||||
|
- "traefik.enable=true"
|
||||||
|
- "traefik.http.routers.myservice.rule=Host(`myservice.htsn.io`)"
|
||||||
|
- "traefik.http.routers.myservice.entrypoints=websecure"
|
||||||
|
- "traefik.http.routers.myservice.tls.certresolver=letsencrypt"
|
||||||
|
- "traefik.http.services.myservice.loadbalancer.server.port=8080"
|
||||||
|
networks:
|
||||||
|
- traefik
|
||||||
|
|
||||||
|
networks:
|
||||||
|
traefik:
|
||||||
|
external: true
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: This requires Traefik to have access to Docker socket and be on same network.
|
||||||
|
|
||||||
|
## Cloudflare API Access
|
||||||
|
|
||||||
|
**Credentials** (stored in Saltbox config):
|
||||||
|
- Email: `cloudflare@htsn.io`
|
||||||
|
- API Key: `849ebefd163d2ccdec25e49b3e1b3fe2cdadc`
|
||||||
|
- Domain: `htsn.io`
|
||||||
|
|
||||||
|
**Retrieve from Saltbox:**
|
||||||
|
```bash
|
||||||
|
ssh pve 'qm guest exec 101 -- bash -c "cat /srv/git/saltbox/accounts.yml | grep -A2 cloudflare"'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Cloudflare API Documentation:**
|
||||||
|
- API Docs: https://developers.cloudflare.com/api/
|
||||||
|
- DNS Records: https://developers.cloudflare.com/api/operations/dns-records-for-a-zone-create-dns-record
|
||||||
|
|
||||||
|
**Common API operations:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Set credentials
|
||||||
|
CF_EMAIL="cloudflare@htsn.io"
|
||||||
|
CF_API_KEY="849ebefd163d2ccdec25e49b3e1b3fe2cdadc"
|
||||||
|
ZONE_ID="c0f5a80448c608af35d39aa820a5f3af"
|
||||||
|
|
||||||
|
# List all DNS records
|
||||||
|
curl -X GET "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
|
||||||
|
-H "X-Auth-Email: $CF_EMAIL" \
|
||||||
|
-H "X-Auth-Key: $CF_API_KEY" | jq
|
||||||
|
|
||||||
|
# Add A record
|
||||||
|
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" \
|
||||||
|
-H "X-Auth-Email: $CF_EMAIL" \
|
||||||
|
-H "X-Auth-Key: $CF_API_KEY" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
--data '{"type":"A","name":"subdomain","content":"IP","proxied":true}'
|
||||||
|
|
||||||
|
# Delete record
|
||||||
|
curl -X DELETE "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" \
|
||||||
|
-H "X-Auth-Email: $CF_EMAIL" \
|
||||||
|
-H "X-Auth-Key: $CF_API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related Documentation
|
||||||
|
|
||||||
|
| File | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) | EMC storage enclosure (SES commands, LCC troubleshooting, maintenance) |
|
||||||
|
| [HOMEASSISTANT.md](HOMEASSISTANT.md) | Home Assistant API access, automations, integrations |
|
||||||
|
| [NETWORK.md](NETWORK.md) | Network bridges, VLANs, which bridge to use for new VMs |
|
||||||
|
| [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) | Complete IP address assignments for all devices and services |
|
||||||
|
| [SYNCTHING.md](SYNCTHING.md) | Syncthing setup, API access, device list, troubleshooting |
|
||||||
|
| [SHELL-ALIASES.md](SHELL-ALIASES.md) | ZSH aliases for Claude Code (`chomelab`, `ctrading`, etc.) |
|
||||||
|
| [configs/](configs/) | Symlinks to shared shell configs |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Backlog
|
||||||
|
|
||||||
|
Future improvements and maintenance tasks:
|
||||||
|
|
||||||
|
| Priority | Task | Notes |
|
||||||
|
|----------|------|-------|
|
||||||
|
| Medium | **Re-IP all devices** | Current IP scheme is inconsistent. Plan: VMs 10.10.10.100-199, LXCs 10.10.10.200-249, Services 10.10.10.250-254 |
|
||||||
|
| Low | Install SSH on HomeAssistant | Currently only accessible via QEMU agent |
|
||||||
|
| Low | Set up SSH key for router | Currently requires expect/password |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
### 2024-12-20
|
||||||
|
|
||||||
|
**SSH Key Deployment - All Systems**
|
||||||
|
- Added SSH keys to ALL VMs and LXCs (13 total hosts now accessible via key)
|
||||||
|
- Updated `~/.ssh/config` with complete host aliases
|
||||||
|
- Fixed permissions: FindShyt LXC `.ssh` ownership, enabled PermitRootLogin on LXCs
|
||||||
|
- Hosts now accessible: pve, pve2, truenas, saltbox, lmdev1, docker-host, fs-dev, copyparty, gitea-vm, trading-vm, pihole, traefik, findshyt
|
||||||
|
|
||||||
|
**Documentation Updates**
|
||||||
|
- Rewrote SSH Access section with complete host table
|
||||||
|
- Added Password Auth section for router/Windows/HomeAssistant
|
||||||
|
- Added Backlog section with re-IP task
|
||||||
|
|
||||||
|
### 2024-12-19
|
||||||
|
|
||||||
|
**EMC Storage Enclosure - LCC B Failure**
|
||||||
|
- Diagnosed loud fan issue (speed code 5 → 4160 RPM)
|
||||||
|
- Root cause: Faulty LCC B controller causing false readings
|
||||||
|
- Resolution: Switched SAS cable to LCC A, fans now quiet (speed code 3 → 2670 RPM)
|
||||||
|
- Replacement ordered: EMC 303-108-000E ($14.95 eBay)
|
||||||
|
- Created [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) with full documentation
|
||||||
|
|
||||||
|
**SSH Key Consolidation**
|
||||||
|
- Renamed `~/.ssh/ai_trading_ed25519` → `~/.ssh/homelab`
|
||||||
|
- Updated `~/.ssh/config` on MacBook with all homelab hosts
|
||||||
|
- SSH key auth now works for: pve, pve2, docker-host, fs-dev, copyparty, lmdev1, gitea-vm, trading-vm
|
||||||
|
- No more sshpass needed for PVE servers
|
||||||
|
|
||||||
|
**QEMU Guest Agent Deployment**
|
||||||
|
- Installed on: docker-host (206), fs-dev (105), copyparty (201)
|
||||||
|
- All PVE VMs now have agent except homeassistant (110)
|
||||||
|
- Can now use `qm guest exec` for remote commands
|
||||||
|
|
||||||
|
**VM Configuration Updates**
|
||||||
|
- docker-host: Fixed SSH key in cloud-init
|
||||||
|
- fs-dev: Fixed `.ssh` directory ownership (1000 → 1001)
|
||||||
|
- copyparty: Changed from DHCP to static IP (10.10.10.201)
|
||||||
|
|
||||||
|
**Documentation Updates**
|
||||||
|
- Updated CLAUDE.md SSH section (removed sshpass examples)
|
||||||
|
- Added QEMU Agent column to VM tables
|
||||||
|
- Added storage enclosure troubleshooting to runbooks
|
||||||
247
EMC-ENCLOSURE.md
Normal file
247
EMC-ENCLOSURE.md
Normal file
@@ -0,0 +1,247 @@
|
|||||||
|
# EMC Storage Enclosure Documentation
|
||||||
|
|
||||||
|
## Hardware Overview
|
||||||
|
|
||||||
|
| Component | Details |
|
||||||
|
|-----------|---------|
|
||||||
|
| **Model** | EMC ESES Viper DAE (KTN-STL3) |
|
||||||
|
| **Capacity** | 15x 3.5" SAS/SATA drive bays |
|
||||||
|
| **SES Device** | `/dev/sg15` (on TrueNAS) |
|
||||||
|
| **Connection** | SAS to LSI SAS2308 HBA (mpt2sas driver) |
|
||||||
|
| **Location** | Connected to PVE (10.10.10.120) via TrueNAS VM |
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### LCC Controllers (Link Control Cards)
|
||||||
|
The enclosure has **dual LCC controllers** for redundancy:
|
||||||
|
|
||||||
|
| Controller | Slot | Status | Notes |
|
||||||
|
|------------|------|--------|-------|
|
||||||
|
| **LCC A** | Left | Working | Currently in use |
|
||||||
|
| **LCC B** | Right | Faulty | Causes high fan speed, SAS discovery failure |
|
||||||
|
|
||||||
|
**Replacement Part**: EMC 303-108-000E VIPER 6G SAS LCC (~$15 on eBay)
|
||||||
|
|
||||||
|
### Power Supplies
|
||||||
|
Two redundant PSUs with integrated fans.
|
||||||
|
|
||||||
|
### Fans
|
||||||
|
Multiple cooling fans controlled by enclosure firmware. Fan speeds are **automatically managed** based on temperature - manual override is not supported on EMC ESES enclosures.
|
||||||
|
|
||||||
|
**Fan Speed Codes**:
|
||||||
|
| Code | Description | RPM (approx) |
|
||||||
|
|------|-------------|--------------|
|
||||||
|
| 1 | Lowest | ~1500 |
|
||||||
|
| 2 | Second lowest | ~2000 |
|
||||||
|
| 3 | Third lowest | ~2670 |
|
||||||
|
| 4 | Medium | ~3300 |
|
||||||
|
| 5 | Fifth | ~4160 |
|
||||||
|
| 6 | Sixth | ~4800 |
|
||||||
|
| 7 | Highest | ~5500+ |
|
||||||
|
|
||||||
|
## ZFS Pool Using This Enclosure
|
||||||
|
|
||||||
|
```
|
||||||
|
Pool: vault
|
||||||
|
Size: 164TB raidz1
|
||||||
|
Drives: 13x HDD in raidz1 + special mirror + NVMe cache/log
|
||||||
|
Mount: /mnt/vault on TrueNAS
|
||||||
|
```
|
||||||
|
|
||||||
|
## SES Commands Reference
|
||||||
|
|
||||||
|
All commands run from TrueNAS (VM 100):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check overall enclosure status
|
||||||
|
sg_ses -p 0x02 /dev/sg15
|
||||||
|
|
||||||
|
# Check fan speeds
|
||||||
|
sg_ses --index=coo,-1 --get=speed_code /dev/sg15
|
||||||
|
|
||||||
|
# Check temperatures
|
||||||
|
sg_ses -p 0x02 /dev/sg15 | grep -E "(Temperature|Cooling)"
|
||||||
|
|
||||||
|
# Check PSU status
|
||||||
|
sg_ses -p 0x02 /dev/sg15 | grep -A5 "Power supply"
|
||||||
|
|
||||||
|
# Check LCC controller status
|
||||||
|
sg_ses -p 0x02 /dev/sg15 | grep -A5 "Enclosure services controller"
|
||||||
|
|
||||||
|
# List all SES elements
|
||||||
|
sg_ses -p 0x07 /dev/sg15
|
||||||
|
|
||||||
|
# Identify enclosure (flash LEDs)
|
||||||
|
sg_ses --index=enc,0 --set=ident:1 /dev/sg15
|
||||||
|
```
|
||||||
|
|
||||||
|
### Running SES Commands via Proxmox
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# From Mac (via SSH key auth)
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15"'
|
||||||
|
|
||||||
|
# Quick fan check
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
|
||||||
|
|
||||||
|
# Quick temp check
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep Temperature"'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Symptom: Fans Running Loud (Speed 5+)
|
||||||
|
|
||||||
|
**Possible Causes**:
|
||||||
|
1. **Faulty LCC controller** - Switch to other LCC
|
||||||
|
2. **High temperatures** - Check temp sensors
|
||||||
|
3. **PSU issue** - Check PSU status via SES
|
||||||
|
4. **Failed drive** - Check drive status LEDs
|
||||||
|
|
||||||
|
**Diagnosis Steps**:
|
||||||
|
```bash
|
||||||
|
# 1. Check current fan speed
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
|
||||||
|
# Normal: 1-3, High: 4-5, Critical: 6-7
|
||||||
|
|
||||||
|
# 2. Check temperatures
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep Temperature"'
|
||||||
|
# Normal: 25-40C, Warning: 45-50C, Critical: 55C+
|
||||||
|
|
||||||
|
# 3. Check for component failures
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep -i fail"'
|
||||||
|
|
||||||
|
# 4. If no obvious cause, try switching LCC
|
||||||
|
# Power down enclosure, move SAS cable to other LCC port
|
||||||
|
```
|
||||||
|
|
||||||
|
### Symptom: Drives Not Detected After Enclosure Power Cycle
|
||||||
|
|
||||||
|
**Possible Causes**:
|
||||||
|
1. Enclosure not fully initialized (wait for green LEDs to stop blinking)
|
||||||
|
2. Faulty LCC controller
|
||||||
|
3. SAS cable loose
|
||||||
|
4. HBA needs rescan
|
||||||
|
|
||||||
|
**Diagnosis Steps**:
|
||||||
|
```bash
|
||||||
|
# 1. Check SAS link status
|
||||||
|
cat /sys/class/sas_phy/*/negotiated_linkrate
|
||||||
|
|
||||||
|
# 2. Check for expanders (should show enclosure)
|
||||||
|
lsscsi -g | grep -i enclo
|
||||||
|
|
||||||
|
# 3. Force HBA rescan
|
||||||
|
echo "- - -" > /sys/class/scsi_host/host0/scan
|
||||||
|
|
||||||
|
# 4. If no expander, check SAS cable and try other LCC port
|
||||||
|
```
|
||||||
|
|
||||||
|
### Symptom: Pool Won't Import After Enclosure Maintenance
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Wait for enclosure to fully initialize (1-2 minutes)
|
||||||
|
|
||||||
|
# 2. Rescan for devices
|
||||||
|
echo "- - -" > /sys/class/scsi_host/host0/scan
|
||||||
|
|
||||||
|
# 3. Import pool
|
||||||
|
zpool import vault
|
||||||
|
|
||||||
|
# 4. If read-only mount issues, reboot TrueNAS
|
||||||
|
ssh pve 'qm reboot 100'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Maintenance Procedures
|
||||||
|
|
||||||
|
### Safe Shutdown for Enclosure Maintenance
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Stop services using the pool
|
||||||
|
ssh pve 'qm guest exec 101 -- bash -c "docker stop \$(docker ps -q)"'
|
||||||
|
|
||||||
|
# 2. Shutdown TrueNAS (auto-exports ZFS pool)
|
||||||
|
ssh pve 'qm shutdown 100 --timeout 120'
|
||||||
|
|
||||||
|
# 3. Wait for TrueNAS to fully stop
|
||||||
|
ssh pve 'while qm status 100 | grep -q running; do sleep 5; done'
|
||||||
|
|
||||||
|
# 4. Power off enclosure
|
||||||
|
# (Physical switch or PDU)
|
||||||
|
|
||||||
|
# 5. Perform maintenance
|
||||||
|
|
||||||
|
# 6. Power on enclosure, wait for initialization (green LEDs solid)
|
||||||
|
|
||||||
|
# 7. Start TrueNAS
|
||||||
|
ssh pve 'qm start 100'
|
||||||
|
|
||||||
|
# 8. Verify pool imported
|
||||||
|
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Hot-Swap LCC Controller
|
||||||
|
|
||||||
|
LCCs can be hot-swapped while enclosure is running:
|
||||||
|
|
||||||
|
1. Order replacement LCC (EMC 303-108-000E)
|
||||||
|
2. Move SAS cable to working LCC (if not already)
|
||||||
|
3. Wait for drives to come online via new LCC
|
||||||
|
4. Remove faulty LCC
|
||||||
|
5. Install replacement LCC
|
||||||
|
6. Optionally move SAS cable back to original port
|
||||||
|
|
||||||
|
## Incident Log
|
||||||
|
|
||||||
|
### 2024-12-19: LCC B Failure
|
||||||
|
|
||||||
|
**Symptoms**:
|
||||||
|
- Fans running at speed code 5 (~4160 RPM) - very loud
|
||||||
|
- After enclosure power cycle, drives not detected
|
||||||
|
- SAS link UP (4 PHYs at 6.0 Gbit) but no expander discovery
|
||||||
|
|
||||||
|
**Root Cause**:
|
||||||
|
LCC B controller malfunction causing:
|
||||||
|
- False temperature/error readings → high fan speed
|
||||||
|
- SAS expander not responding → drives not enumerated
|
||||||
|
|
||||||
|
**Resolution**:
|
||||||
|
1. Moved SAS cable from LCC B to LCC A
|
||||||
|
2. Drives immediately appeared
|
||||||
|
3. Fan speed dropped to code 3 (2670 RPM) - quiet
|
||||||
|
4. Imported vault pool, all data intact
|
||||||
|
|
||||||
|
**Replacement Ordered**:
|
||||||
|
- Part: EMC 303-108-000E VIPER 6G SAS LCC
|
||||||
|
- Source: eBay
|
||||||
|
- Price: $14.95 + free shipping
|
||||||
|
|
||||||
|
## LED Status Reference
|
||||||
|
|
||||||
|
### Drive LEDs
|
||||||
|
| LED | Color | Status |
|
||||||
|
|-----|-------|--------|
|
||||||
|
| Solid Blue | Power | Drive has power |
|
||||||
|
| Blinking Blue | Activity | I/O in progress |
|
||||||
|
| Solid Amber | Fault | Drive failed |
|
||||||
|
| Blinking Amber | Identify | Drive being located |
|
||||||
|
|
||||||
|
### LCC LEDs
|
||||||
|
| LED | Color | Status |
|
||||||
|
|-----|-------|--------|
|
||||||
|
| Solid Green | Link | SAS connection active |
|
||||||
|
| Blinking Green | Activity | Data transfer |
|
||||||
|
| Amber | Fault | LCC issue |
|
||||||
|
|
||||||
|
### PSU LEDs
|
||||||
|
| LED | Color | Status |
|
||||||
|
|-----|-------|--------|
|
||||||
|
| Solid Green | OK | Power supply healthy |
|
||||||
|
| Off | No Power | No AC input |
|
||||||
|
| Amber | Fault | PSU failure |
|
||||||
|
|
||||||
|
## Related Documentation
|
||||||
|
|
||||||
|
- [CLAUDE.md](CLAUDE.md) - Main homelab documentation
|
||||||
|
- [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - Network configuration
|
||||||
|
- TrueNAS Web UI: https://10.10.10.200
|
||||||
145
HOMEASSISTANT.md
Normal file
145
HOMEASSISTANT.md
Normal file
@@ -0,0 +1,145 @@
|
|||||||
|
# Home Assistant
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
| Setting | Value |
|
||||||
|
|---------|-------|
|
||||||
|
| VM ID | 110 |
|
||||||
|
| Host | PVE (10.10.10.120) |
|
||||||
|
| IP Address | 10.10.10.210 (DHCP - should be static) |
|
||||||
|
| Port | 8123 |
|
||||||
|
| Web UI | http://10.10.10.210:8123 |
|
||||||
|
| OS | Home Assistant OS 16.3 |
|
||||||
|
| Version | 2025.11.3 (update available: 2025.12.3) |
|
||||||
|
|
||||||
|
## API Access
|
||||||
|
|
||||||
|
Home Assistant uses Long-Lived Access Tokens for API authentication.
|
||||||
|
|
||||||
|
### Getting an API Token
|
||||||
|
|
||||||
|
1. Go to http://10.10.10.210:8123
|
||||||
|
2. Click your profile (bottom left)
|
||||||
|
3. Scroll to "Long-Lived Access Tokens"
|
||||||
|
4. Click "Create Token"
|
||||||
|
5. Name it (e.g., "Claude Code")
|
||||||
|
6. Copy the token (only shown once!)
|
||||||
|
|
||||||
|
### API Configuration
|
||||||
|
|
||||||
|
```
|
||||||
|
API_URL: http://10.10.10.210:8123/api
|
||||||
|
API_TOKEN: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIwZThjZmJjMzVlNDA0NzYwOTMzMjg3MTQ5ZjkwOGU2NyIsImlhdCI6MTc2NTk5MjQ4OCwiZXhwIjoyMDgxMzUyNDg4fQ.r743tsb3E5NNlrwEEu9glkZdiI4j_3SKIT1n5PGUytY
|
||||||
|
```
|
||||||
|
|
||||||
|
### API Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Set these variables
|
||||||
|
HA_URL="http://10.10.10.210:8123"
|
||||||
|
HA_TOKEN="your-token-here"
|
||||||
|
|
||||||
|
# Check API is working
|
||||||
|
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/"
|
||||||
|
|
||||||
|
# Get all states
|
||||||
|
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq
|
||||||
|
|
||||||
|
# Get specific entity state
|
||||||
|
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states/light.living_room" | jq
|
||||||
|
|
||||||
|
# Turn on a light
|
||||||
|
curl -X POST -H "Authorization: Bearer $HA_TOKEN" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"entity_id": "light.living_room"}' \
|
||||||
|
"$HA_URL/api/services/light/turn_on"
|
||||||
|
|
||||||
|
# Turn off a light
|
||||||
|
curl -X POST -H "Authorization: Bearer $HA_TOKEN" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"entity_id": "light.living_room"}' \
|
||||||
|
"$HA_URL/api/services/light/turn_off"
|
||||||
|
|
||||||
|
# Call any service
|
||||||
|
curl -X POST -H "Authorization: Bearer $HA_TOKEN" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"entity_id": "switch.my_switch"}' \
|
||||||
|
"$HA_URL/api/services/switch/toggle"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Tasks
|
||||||
|
|
||||||
|
### List All Entities
|
||||||
|
```bash
|
||||||
|
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '.[].entity_id'
|
||||||
|
```
|
||||||
|
|
||||||
|
### List Entities by Domain
|
||||||
|
```bash
|
||||||
|
# All lights
|
||||||
|
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '[.[] | select(.entity_id | startswith("light."))]'
|
||||||
|
|
||||||
|
# All switches
|
||||||
|
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '[.[] | select(.entity_id | startswith("switch."))]'
|
||||||
|
|
||||||
|
# All sensors
|
||||||
|
curl -s -H "Authorization: Bearer $HA_TOKEN" "$HA_URL/api/states" | jq '[.[] | select(.entity_id | startswith("sensor."))]'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Get Entity History
|
||||||
|
```bash
|
||||||
|
# Last 24 hours for an entity
|
||||||
|
curl -s -H "Authorization: Bearer $HA_TOKEN" \
|
||||||
|
"$HA_URL/api/history/period?filter_entity_id=sensor.temperature" | jq
|
||||||
|
```
|
||||||
|
|
||||||
|
## Device Summary
|
||||||
|
|
||||||
|
**265 total entities**
|
||||||
|
|
||||||
|
| Domain | Count | Examples |
|
||||||
|
|--------|-------|----------|
|
||||||
|
| scene | 87 | Lighting scenes |
|
||||||
|
| light | 41 | Kitchen, Living room, Bedroom, Office, Cabinet, etc. |
|
||||||
|
| switch | 36 | Automations, Sonos controls, Motion sensors |
|
||||||
|
| sensor | 28 | Various sensors |
|
||||||
|
| number | 21 | Settings/controls |
|
||||||
|
| event | 17 | Event triggers |
|
||||||
|
| binary_sensor | 13 | Motion, door sensors |
|
||||||
|
| media_player | 8 | Sonos speakers (Bedroom, Living Room, Kitchen, Console) |
|
||||||
|
|
||||||
|
### Lights by Room
|
||||||
|
- **Kitchen**: Kitchen light
|
||||||
|
- **Living Room**: Living room, Living Room Lamp, TV Bias
|
||||||
|
- **Bedroom**: Bedroom, Bedside Lamp 1 & 2, Dresser
|
||||||
|
- **Office**: Office, Office Floor Lamp, Office Lamp
|
||||||
|
- **Guest Room**: Guest Bed Left, Guest Lamp Right
|
||||||
|
- **Other**: Cabinet 1 & 2, Pantry, Bathroom, Front Porch, etc.
|
||||||
|
|
||||||
|
### Sonos Speakers
|
||||||
|
- Bedroom (with surround)
|
||||||
|
- Living Room (with surround)
|
||||||
|
- Kitchen
|
||||||
|
- Console
|
||||||
|
|
||||||
|
### Motion Sensors
|
||||||
|
- Kitchen Motion
|
||||||
|
- Office Sensor
|
||||||
|
|
||||||
|
## Integrations
|
||||||
|
|
||||||
|
- **Philips Hue** - Lights
|
||||||
|
- **Sonos** - Speakers
|
||||||
|
- **Motion Sensors** - Various locations
|
||||||
|
|
||||||
|
## Automations
|
||||||
|
|
||||||
|
TODO: Document key automations
|
||||||
|
|
||||||
|
## TODO
|
||||||
|
|
||||||
|
- [ ] Set static IP (currently DHCP at .210, should be .110)
|
||||||
|
- [ ] Add API token to this document
|
||||||
|
- [ ] Document installed integrations
|
||||||
|
- [ ] Document automations
|
||||||
|
- [ ] Set up Traefik reverse proxy (ha.htsn.io)
|
||||||
330
INFRASTRUCTURE.md
Normal file
330
INFRASTRUCTURE.md
Normal file
@@ -0,0 +1,330 @@
|
|||||||
|
# Homelab Infrastructure Documentation
|
||||||
|
|
||||||
|
## Network Topology
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────┐
|
||||||
|
│ Internet │
|
||||||
|
└────────┬────────┘
|
||||||
|
│
|
||||||
|
┌────────▼────────┐
|
||||||
|
│ Router/Firewall │
|
||||||
|
│ 10.10.10.1 │
|
||||||
|
└────────┬────────┘
|
||||||
|
│
|
||||||
|
┌────────────────────────┼────────────────────────┐
|
||||||
|
│ │ │
|
||||||
|
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
|
||||||
|
│ Main Switch │ │ Storage VLAN │ │ Tailscale │
|
||||||
|
│ vmbr0/vmbr2 │ │ vmbr3 │ │ 100.x.x.x/8 │
|
||||||
|
│ 10.10.10.0/24 │ │ (Jumbo 9000) │ │ │
|
||||||
|
└────────┬────────┘ └────────┬────────┘ └─────────────────┘
|
||||||
|
│ │
|
||||||
|
┌───────────┼───────────┐ │
|
||||||
|
│ │ │ │
|
||||||
|
┌────▼───┐ ┌────▼───┐ ┌────▼───┐ │
|
||||||
|
│ PVE │ │ PVE2 │ │ Other │ │
|
||||||
|
│ .120 │ │ .102 │ │ Devices│ │
|
||||||
|
└────┬───┘ └────┬───┘ └────────┘ │
|
||||||
|
│ │ │
|
||||||
|
└───────────┴────────────────────────┘
|
||||||
|
│
|
||||||
|
┌───────▼───────┐
|
||||||
|
│ TrueNAS │
|
||||||
|
│ (Storage via │
|
||||||
|
│ HBA/NVMe) │
|
||||||
|
└───────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## IP Address Assignments
|
||||||
|
|
||||||
|
### Management Network (10.10.10.0/24)
|
||||||
|
|
||||||
|
| IP Address | Hostname | Description |
|
||||||
|
|------------|----------|-------------|
|
||||||
|
| 10.10.10.1 | router | Gateway/Firewall |
|
||||||
|
| 10.10.10.102 | pve2 | Proxmox Server 2 |
|
||||||
|
| 10.10.10.120 | pve | Proxmox Server 1 (Primary) |
|
||||||
|
| 10.10.10.123 | mac-mini | Mac Mini (Syncthing node) |
|
||||||
|
| 10.10.10.150 | windows-pc | Windows PC (Syncthing node) |
|
||||||
|
| 10.10.10.147 | macbook | MacBook Pro (Syncthing node) |
|
||||||
|
| 10.10.10.200 | truenas | TrueNAS (Storage/Syncthing hub) |
|
||||||
|
| 10.10.10.220 | gitea-vm | Git Server |
|
||||||
|
| 10.10.10.221 | trading-vm | AI Trading Platform |
|
||||||
|
|
||||||
|
### Tailscale Network (100.x.x.x)
|
||||||
|
|
||||||
|
| IP Address | Hostname | Description |
|
||||||
|
|------------|----------|-------------|
|
||||||
|
| 100.88.161.110 | macbook | MacBook |
|
||||||
|
| 100.106.175.37 | phone | Mobile Device |
|
||||||
|
| 100.108.89.58 | mac-mini | Mac Mini |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Server Hardware
|
||||||
|
|
||||||
|
### PVE (10.10.10.120) - Primary Virtualization Host
|
||||||
|
|
||||||
|
| Component | Specification |
|
||||||
|
|-----------|---------------|
|
||||||
|
| **CPU** | AMD Ryzen Threadripper PRO 3975WX (32C/64T, 280W TDP) |
|
||||||
|
| **RAM** | 128 GB DDR4 ECC |
|
||||||
|
| **Boot** | Samsung 870 QVO 4TB (mirrored) |
|
||||||
|
| **NVMe Pool 1** | 2x Sabrent Rocket Q NVMe (nvme-mirror1, 3.6TB) |
|
||||||
|
| **NVMe Pool 2** | 2x Kingston SFYRD 2TB (nvme-mirror2, 1.8TB) |
|
||||||
|
| **GPU 1** | NVIDIA Quadro P2000 (75W) - Plex transcoding |
|
||||||
|
| **GPU 2** | NVIDIA TITAN RTX (280W) - AI workloads |
|
||||||
|
| **HBA** | LSI SAS2308 - Passed to TrueNAS |
|
||||||
|
| **NVMe Controller** | Samsung PM9A1 - Passed to TrueNAS |
|
||||||
|
|
||||||
|
### PVE2 (10.10.10.102) - Secondary Virtualization Host
|
||||||
|
|
||||||
|
| Component | Specification |
|
||||||
|
|-----------|---------------|
|
||||||
|
| **CPU** | AMD Ryzen Threadripper PRO 3975WX (32C/64T, 280W TDP) |
|
||||||
|
| **RAM** | 128 GB DDR4 ECC |
|
||||||
|
| **NVMe Pool** | 2x NVMe (nvme-mirror3) |
|
||||||
|
| **HDD Pool** | 2x WD Red 6TB (local-zfs2, mirrored) |
|
||||||
|
| **GPU** | NVIDIA RTX A6000 (300W) - AI Trading |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Virtual Machines
|
||||||
|
|
||||||
|
### PVE (10.10.10.120)
|
||||||
|
|
||||||
|
| VMID | Name | vCPUs | RAM | Storage | Purpose | Passthrough |
|
||||||
|
|------|------|-------|-----|---------|---------|-------------|
|
||||||
|
| 100 | truenas | 8 | 32GB | rpool | NAS/Storage | LSI SAS2308 HBA, Samsung NVMe |
|
||||||
|
| 101 | saltbox | 16 | 16GB | rpool/nvme-mirror1/2 | Media automation | TITAN RTX |
|
||||||
|
| 105 | fs-dev | 10 | 8GB | nvme-mirror1 | Development | - |
|
||||||
|
| 110 | homeassistant | 2 | 2GB | nvme-mirror2 | Home automation | - |
|
||||||
|
| 111 | lmdev1 | 8 | 32GB | nvme-mirror1 | AI/LLM development | TITAN RTX |
|
||||||
|
| 201 | copyparty | 2 | 2GB | nvme-mirror1 | File sharing | - |
|
||||||
|
| 206 | docker-host | 2 | 4GB | rpool | Docker services | - |
|
||||||
|
|
||||||
|
### PVE2 (10.10.10.102)
|
||||||
|
|
||||||
|
| VMID | Name | vCPUs | RAM | Storage | Purpose | Passthrough |
|
||||||
|
|------|------|-------|-----|---------|---------|-------------|
|
||||||
|
| 300 | gitea-vm | 2 | 4GB | nvme-mirror3 | Git server | - |
|
||||||
|
| 301 | trading-vm | 16 | 32GB | nvme-mirror3 | AI trading platform | RTX A6000 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## LXC Containers
|
||||||
|
|
||||||
|
### PVE (10.10.10.120)
|
||||||
|
|
||||||
|
| VMID | Name | Purpose | Status |
|
||||||
|
|------|------|---------|--------|
|
||||||
|
| 200 | pihole | DNS/Ad blocking | Running |
|
||||||
|
| 202 | traefik | Reverse proxy | Running |
|
||||||
|
| 205 | findshyt | Custom application | Running |
|
||||||
|
| 500 | dev1 | Development | Stopped |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Storage Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
PVE (10.10.10.120)
|
||||||
|
├── rpool (Samsung 870 QVO 4TB mirror)
|
||||||
|
│ ├── Proxmox system
|
||||||
|
│ ├── VM 100 (truenas) boot
|
||||||
|
│ ├── VM 101 (saltbox) boot
|
||||||
|
│ └── VM 206 (docker-host)
|
||||||
|
│
|
||||||
|
├── nvme-mirror1 (Sabrent Rocket Q mirror, 3.6TB)
|
||||||
|
│ ├── VM 101 (saltbox) data
|
||||||
|
│ ├── VM 105 (fs-dev)
|
||||||
|
│ ├── VM 111 (lmdev1)
|
||||||
|
│ └── VM 201 (copyparty)
|
||||||
|
│
|
||||||
|
└── nvme-mirror2 (Kingston SFYRD mirror, 1.8TB)
|
||||||
|
├── VM 101 (saltbox) data
|
||||||
|
└── VM 110 (homeassistant)
|
||||||
|
|
||||||
|
PVE2 (10.10.10.102)
|
||||||
|
├── nvme-mirror3 (NVMe mirror)
|
||||||
|
│ ├── VM 300 (gitea-vm)
|
||||||
|
│ └── VM 301 (trading-vm)
|
||||||
|
│
|
||||||
|
└── local-zfs2 (WD Red 6TB mirror)
|
||||||
|
└── Backup/archive storage
|
||||||
|
|
||||||
|
TrueNAS (VM 100 on PVE)
|
||||||
|
├── HBA Passthrough (LSI SAS2308)
|
||||||
|
│ └── [Physical drives managed by TrueNAS]
|
||||||
|
│
|
||||||
|
└── NVMe Passthrough (Samsung PM9A1)
|
||||||
|
└── [NVMe drives managed by TrueNAS]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Services Map
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ EXTERNAL ACCESS │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Tailscale VPN ──► All services accessible via 100.x.x.x │
|
||||||
|
│ Traefik (CT 202) ──► Reverse proxy for web services │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ CORE SERVICES │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ PiHole (CT 200) ──► DNS + Ad blocking │
|
||||||
|
│ TrueNAS (VM 100) ──► NAS, Syncthing, Storage │
|
||||||
|
│ Gitea (VM 300) ──► Git repository hosting │
|
||||||
|
│ Home Assistant (VM 110) ──► Home automation │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ MEDIA SERVICES │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Saltbox (VM 101) ──► Plex, *arr stack, media automation │
|
||||||
|
│ CopyParty (VM 201) ──► File sharing │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ DEVELOPMENT/AI │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Trading VM (VM 301) ──► AI trading platform (RTX A6000) │
|
||||||
|
│ LMDev1 (VM 111) ──► LLM development (TITAN RTX) │
|
||||||
|
│ FS-Dev (VM 105) ──► General development │
|
||||||
|
│ Docker Host (VM 206) ──► Containerized services │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Syncthing Topology
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────┐
|
||||||
|
│ TrueNAS │
|
||||||
|
│ (Hub/Server) │
|
||||||
|
│ Port 20910 │
|
||||||
|
└────────┬────────┘
|
||||||
|
│
|
||||||
|
┌───────────────────┼───────────────────┐
|
||||||
|
│ │ │
|
||||||
|
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
|
||||||
|
│ MacBook │ │ Mac Mini│ │ Windows │
|
||||||
|
│ .147 │ │ .123 │ │ PC .150 │
|
||||||
|
└─────────┘ └─────────┘ └─────────┘
|
||||||
|
|
||||||
|
Synced Folders:
|
||||||
|
├── antigravity (310MB)
|
||||||
|
├── bin (23KB)
|
||||||
|
├── claude-code (257MB)
|
||||||
|
├── claude-desktop (784MB)
|
||||||
|
├── config (436KB)
|
||||||
|
├── cursor (459MB)
|
||||||
|
├── desktop (7.2GB)
|
||||||
|
├── documents (11GB)
|
||||||
|
├── dotconfig (212MB)
|
||||||
|
├── downloads (38GB)
|
||||||
|
├── movies (334MB)
|
||||||
|
├── music (606KB)
|
||||||
|
├── notes (73KB)
|
||||||
|
├── pictures (259MB)
|
||||||
|
└── projects (3.1GB)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Power Consumption
|
||||||
|
|
||||||
|
### Estimated Power Draw
|
||||||
|
|
||||||
|
| Component | Idle | Load | Notes |
|
||||||
|
|-----------|------|------|-------|
|
||||||
|
| **PVE CPU** | 50W | 280W | TR PRO 3975WX |
|
||||||
|
| **PVE2 CPU** | 50W | 280W | TR PRO 3975WX |
|
||||||
|
| **TITAN RTX** | 20W | 280W | Passthrough to saltbox/lmdev1 |
|
||||||
|
| **RTX A6000** | 25W | 300W | Passthrough to trading-vm |
|
||||||
|
| **Quadro P2000** | 10W | 75W | Plex transcoding |
|
||||||
|
| **Storage (per server)** | 30W | 50W | NVMe + SSD mirrors |
|
||||||
|
| **Base system (each)** | 50W | 60W | Motherboard, RAM, fans |
|
||||||
|
|
||||||
|
### Total Estimates
|
||||||
|
- **Idle**: ~400-500W combined
|
||||||
|
- **Moderate load**: ~700-900W combined
|
||||||
|
- **Full load**: ~1200-1400W combined
|
||||||
|
|
||||||
|
### Power Optimizations Applied
|
||||||
|
1. KSMD disabled on both hosts (saved ~10W)
|
||||||
|
2. Syncthing rescan intervals increased (saved ~60-80W from TrueNAS CPU)
|
||||||
|
3. CPU governor optimization (saved ~60-120W)
|
||||||
|
- PVE: `powersave` + `balance_power` EPP (amd-pstate-epp)
|
||||||
|
- PVE2: `schedutil` (acpi-cpufreq)
|
||||||
|
4. ksmtuned service disabled on both hosts (saved ~2-5W)
|
||||||
|
5. HDD spindown on PVE2 - 30 min timeout (saved ~10-16W)
|
||||||
|
- local-zfs2 pool (2x WD Red 6TB) essentially empty
|
||||||
|
|
||||||
|
**Total estimated savings: ~142-231W**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## SSH Access
|
||||||
|
|
||||||
|
### Credentials
|
||||||
|
|
||||||
|
| Host | IP Address | Username | Password | Notes |
|
||||||
|
|------|------------|----------|----------|-------|
|
||||||
|
| Hutson-PC | 10.10.10.150 | claude | GrilledCh33s3# | Windows PC |
|
||||||
|
| MacBook | 10.10.10.147 | hutson | GrilledCh33s3# | MacBook Pro |
|
||||||
|
| TrueNAS | 10.10.10.200 | truenas_admin | GrilledCh33s3# | SSH key configured |
|
||||||
|
|
||||||
|
### SSH Keys
|
||||||
|
|
||||||
|
The Mac Mini has an SSH key configured at `~/.ssh/id_ed25519` for passwordless authentication to Proxmox hosts and other infrastructure.
|
||||||
|
|
||||||
|
For Proxmox servers (PVE and PVE2), SSH access is configured in `~/.ssh/config`:
|
||||||
|
```
|
||||||
|
Host pve
|
||||||
|
HostName 10.10.10.120
|
||||||
|
User root
|
||||||
|
IdentityFile ~/.ssh/ai_trading_ed25519
|
||||||
|
|
||||||
|
Host pve2
|
||||||
|
HostName 10.10.10.102
|
||||||
|
User root
|
||||||
|
IdentityFile ~/.ssh/ai_trading_ed25519
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Credentials Management
|
||||||
|
|
||||||
|
Sensitive credentials are stored in `/Users/hutson/Projects/homelab/.env` for use with infrastructure management scripts and automation.
|
||||||
|
|
||||||
|
This file contains:
|
||||||
|
- Service passwords
|
||||||
|
- API keys
|
||||||
|
- Database credentials
|
||||||
|
- Other sensitive configuration values
|
||||||
|
|
||||||
|
**Note**: The `.env` file is git-ignored and should never be committed to version control.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration Backups
|
||||||
|
|
||||||
|
Configuration files are backed up in `/Users/hutson/Projects/homelab/configs/` directory.
|
||||||
|
|
||||||
|
### Current Backups
|
||||||
|
|
||||||
|
| File | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| ghostty.conf | Ghostty terminal emulator configuration |
|
||||||
|
|
||||||
|
This directory serves as a centralized location for storing configuration backups from various systems and applications in the homelab environment.
|
||||||
139
IP-ASSIGNMENTS.md
Normal file
139
IP-ASSIGNMENTS.md
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
# IP Address Assignments
|
||||||
|
|
||||||
|
This document tracks all IP addresses in the homelab infrastructure.
|
||||||
|
|
||||||
|
## Network Overview
|
||||||
|
|
||||||
|
| Network | Range | Purpose |
|
||||||
|
|---------|-------|---------|
|
||||||
|
| Management VLAN | 10.10.10.0/24 | Primary network for all devices |
|
||||||
|
| Storage VLAN | 10.10.20.0/24 | NFS/iSCSI storage traffic |
|
||||||
|
| Tailscale | 100.x.x.x | VPN overlay network |
|
||||||
|
|
||||||
|
## Infrastructure Devices
|
||||||
|
|
||||||
|
| IP Address | Device | Type | Notes |
|
||||||
|
|------------|--------|------|-------|
|
||||||
|
| 10.10.10.1 | UniFi UCG-Fiber | Router | Gateway for all traffic |
|
||||||
|
| 10.10.10.120 | PVE | Proxmox Host | Primary server (Threadripper PRO 3975WX) |
|
||||||
|
| 10.10.10.102 | PVE2 | Proxmox Host | Secondary server (Threadripper PRO 3975WX) |
|
||||||
|
|
||||||
|
## Virtual Machines - PVE (10.10.10.120)
|
||||||
|
|
||||||
|
| VMID | Name | IP Address | Purpose | Status |
|
||||||
|
|------|------|------------|---------|--------|
|
||||||
|
| 100 | truenas | 10.10.10.200 | NAS, central Syncthing hub | Running |
|
||||||
|
| 101 | saltbox | 10.10.10.100 | Media automation, Plex, *arr apps | Running |
|
||||||
|
| 105 | fs-dev | 10.10.10.5 | Development environment | Running |
|
||||||
|
| 110 | homeassistant | 10.10.10.110 | Home automation | Running |
|
||||||
|
| 111 | lmdev1 | 10.10.10.111 | AI/LLM development (TITAN RTX) | Running |
|
||||||
|
| 201 | copyparty | 10.10.10.201 | File sharing | Running |
|
||||||
|
| 206 | docker-host | 10.10.10.206 | Docker services (Excalidraw, etc.) | Running |
|
||||||
|
|
||||||
|
## Containers (LXC) - PVE (10.10.10.120)
|
||||||
|
|
||||||
|
| CTID | Name | IP Address | Purpose | Status |
|
||||||
|
|------|------|------------|---------|--------|
|
||||||
|
| 200 | pihole | 10.10.10.10 | DNS/Ad blocking | Running |
|
||||||
|
| 202 | traefik | 10.10.10.250 | Reverse proxy (Traefik-Primary) | Running |
|
||||||
|
| 205 | findshyt | 10.10.10.8 | Custom app | Running |
|
||||||
|
| 500 | dev1 | DHCP | Development container | Stopped |
|
||||||
|
|
||||||
|
## Virtual Machines - PVE2 (10.10.10.102)
|
||||||
|
|
||||||
|
| VMID | Name | IP Address | Purpose | Status |
|
||||||
|
|------|------|------------|---------|--------|
|
||||||
|
| 300 | gitea-vm | 10.10.10.220 | Git server | Running |
|
||||||
|
| 301 | trading-vm | 10.10.10.221 | AI trading platform (RTX A6000) | Running |
|
||||||
|
|
||||||
|
## Workstations & Personal Devices
|
||||||
|
|
||||||
|
| IP Address | Tailscale IP | Device | User | Notes |
|
||||||
|
|------------|--------------|--------|------|-------|
|
||||||
|
| 10.10.10.147 | 100.88.161.1 | MacBook Pro | hutson | Laptop |
|
||||||
|
| 10.10.10.148 | 100.108.89.58 | Mac Mini | hutson | Persistent Claude sessions |
|
||||||
|
| 10.10.10.150 | 100.120.97.76 | Hutson-PC (Windows) | claude/micro | Windows workstation |
|
||||||
|
| 10.10.10.54 | - | Android Phone | hutson | Syncthing mobile |
|
||||||
|
|
||||||
|
## Services & Reverse Proxy Mapping
|
||||||
|
|
||||||
|
| Service | Domain | Backend IP:Port | Traefik Instance |
|
||||||
|
|---------|--------|-----------------|------------------|
|
||||||
|
| Traefik-Primary | - | 10.10.10.250 | Self (CT 202) |
|
||||||
|
| Traefik-Saltbox | - | 10.10.10.100 | Self (VM 101) |
|
||||||
|
| FindShyt | findshyt.htsn.io | 10.10.10.8:3000 | Traefik-Primary |
|
||||||
|
| Gitea | git.htsn.io | 10.10.10.220:3000 | Traefik-Primary |
|
||||||
|
| Home Assistant | ha.htsn.io | 10.10.10.110:8123 | Traefik-Primary |
|
||||||
|
| TrueNAS | nas.htsn.io | 10.10.10.200 | Traefik-Primary |
|
||||||
|
| Proxmox | pve.htsn.io | 10.10.10.120:8006 | Traefik-Primary |
|
||||||
|
| CopyParty | cp.htsn.io | 10.10.10.201:3923 | Traefik-Primary |
|
||||||
|
| LMDev | lmdev.htsn.io | 10.10.10.111 | Traefik-Primary |
|
||||||
|
| Excalidraw | excalidraw.htsn.io | 10.10.10.206:8080 | Traefik-Primary |
|
||||||
|
| Plex | plex.htsn.io | 10.10.10.100:32400 | Traefik-Saltbox |
|
||||||
|
| Sonarr | sonarr.htsn.io | 10.10.10.100:8989 | Traefik-Saltbox |
|
||||||
|
| Radarr | radarr.htsn.io | 10.10.10.100:7878 | Traefik-Saltbox |
|
||||||
|
|
||||||
|
## Reserved/Available IPs
|
||||||
|
|
||||||
|
### Currently Used (10.10.10.x)
|
||||||
|
- .1 - Router (gateway)
|
||||||
|
- .5 - fs-dev
|
||||||
|
- .8 - FindShyt
|
||||||
|
- .10 - PiHole (DNS)
|
||||||
|
- .54 - Android Phone
|
||||||
|
- .100 - Saltbox (Traefik-Saltbox)
|
||||||
|
- .102 - PVE2
|
||||||
|
- .110 - Home Assistant
|
||||||
|
- .111 - LMDev1
|
||||||
|
- .120 - PVE
|
||||||
|
- .147 - MacBook Pro
|
||||||
|
- .148 - Mac Mini
|
||||||
|
- .150 - Windows PC
|
||||||
|
- .200 - TrueNAS
|
||||||
|
- .201 - CopyParty
|
||||||
|
- .206 - Docker-host
|
||||||
|
- .220 - Gitea
|
||||||
|
- .221 - Trading VM
|
||||||
|
- .250 - Traefik-Primary
|
||||||
|
|
||||||
|
### Available Ranges
|
||||||
|
- 10.10.10.2 - 10.10.10.4 (3 IPs)
|
||||||
|
- 10.10.10.6 - 10.10.10.7 (2 IPs)
|
||||||
|
- 10.10.10.9 (1 IP)
|
||||||
|
- 10.10.10.11 - 10.10.10.53 (43 IPs)
|
||||||
|
- 10.10.10.55 - 10.10.10.99 (45 IPs)
|
||||||
|
- 10.10.10.101 (1 IP)
|
||||||
|
- 10.10.10.103 - 10.10.10.109 (7 IPs)
|
||||||
|
- 10.10.10.112 - 10.10.10.119 (8 IPs)
|
||||||
|
- 10.10.10.121 - 10.10.10.146 (26 IPs)
|
||||||
|
- 10.10.10.149 (1 IP)
|
||||||
|
- 10.10.10.151 - 10.10.10.199 (49 IPs)
|
||||||
|
- 10.10.10.202 - 10.10.10.205 (4 IPs)
|
||||||
|
- 10.10.10.207 - 10.10.10.219 (13 IPs)
|
||||||
|
- 10.10.10.222 - 10.10.10.249 (28 IPs)
|
||||||
|
- 10.10.10.251 - 10.10.10.254 (4 IPs)
|
||||||
|
|
||||||
|
## Docker Host Services (10.10.10.206)
|
||||||
|
|
||||||
|
| Service | Port | Purpose |
|
||||||
|
|---------|------|---------|
|
||||||
|
| Excalidraw | 8080 | Whiteboard/diagramming (excalidraw.htsn.io) |
|
||||||
|
| Portainer CE | 9000, 9443 | Local Docker management UI |
|
||||||
|
| Portainer Agent | 9001 | Remote management from other Portainer |
|
||||||
|
| Gotenberg | 3000 | PDF generation API |
|
||||||
|
|
||||||
|
## Syncthing API Endpoints
|
||||||
|
|
||||||
|
| Device | IP | Port | API Key |
|
||||||
|
|--------|-----|------|---------|
|
||||||
|
| Mac Mini | 127.0.0.1 | 8384 | oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5 |
|
||||||
|
| MacBook | 127.0.0.1 (via SSH) | 8384 | qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ |
|
||||||
|
| Android Phone | 10.10.10.54 | 8384 | Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM |
|
||||||
|
| TrueNAS | 10.10.10.200 | 8384 | (check TrueNAS config) |
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- **MTU 9000** (jumbo frames) enabled on storage networks
|
||||||
|
- **Tailscale** provides VPN access from anywhere
|
||||||
|
- **DNS** handled by PiHole at 10.10.10.10
|
||||||
|
- All new services should use **Traefik-Primary (10.10.10.250)** unless they're Saltbox services
|
||||||
226
NETWORK.md
Normal file
226
NETWORK.md
Normal file
@@ -0,0 +1,226 @@
|
|||||||
|
# Network Architecture
|
||||||
|
|
||||||
|
## Network Ranges
|
||||||
|
|
||||||
|
| Network | Range | Purpose | Gateway |
|
||||||
|
|---------|-------|---------|---------|
|
||||||
|
| LAN | 10.10.10.0/24 | Primary network, management, general access | 10.10.10.1 (UniFi Router) |
|
||||||
|
| Storage/Internal | 10.10.20.0/24 | Inter-VM traffic, NFS/iSCSI, no external access | 10.10.20.1 (vmbr3) |
|
||||||
|
| Tailscale | 100.x.x.x | VPN overlay for remote access | N/A |
|
||||||
|
|
||||||
|
## PVE (10.10.10.120) - Network Bridges
|
||||||
|
|
||||||
|
### Physical NICs
|
||||||
|
|
||||||
|
| Interface | Speed | Type | MAC Address | Connected To |
|
||||||
|
|-----------|-------|------|-------------|--------------|
|
||||||
|
| enp1s0 | 1 Gbps | Onboard NIC | e0:4f:43:e6:41:6c | Switch → UniFi eth5 |
|
||||||
|
| enp35s0f0 | 10 Gbps | Intel X550 Port 0 | b4:96:91:39:86:98 | Switch → UniFi eth5 |
|
||||||
|
| enp35s0f1 | 10 Gbps | Intel X550 Port 1 | b4:96:91:39:86:99 | Switch → UniFi eth5 |
|
||||||
|
|
||||||
|
**Note:** All three NICs connect through a switch to the UniFi Gateway's 10Gb SFP+ port (eth5). No direct firewall connection.
|
||||||
|
|
||||||
|
### Bridge Configuration
|
||||||
|
|
||||||
|
#### vmbr0 - Management Bridge (1Gb)
|
||||||
|
- **Physical NIC**: enp1s0 (1 Gbps onboard)
|
||||||
|
- **IP**: 10.10.10.120/24
|
||||||
|
- **Gateway**: 10.10.10.1
|
||||||
|
- **MTU**: 9000
|
||||||
|
- **Purpose**: General VM/CT networking, management access
|
||||||
|
- **Use for**: Most VMs and containers that need basic internet access
|
||||||
|
|
||||||
|
**VMs/CTs on vmbr0:**
|
||||||
|
| VMID | Name | IP |
|
||||||
|
|------|------|-----|
|
||||||
|
| 105 | fs-dev | 10.10.10.5 |
|
||||||
|
| 110 | homeassistant | 10.10.10.110 |
|
||||||
|
| 201 | copyparty | DHCP |
|
||||||
|
| 206 | docker-host | 10.10.10.206 |
|
||||||
|
| 200 | pihole (CT) | 10.10.10.10 |
|
||||||
|
| 205 | findshyt (CT) | 10.10.10.205 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### vmbr1 - High-Speed LXC Bridge (10Gb)
|
||||||
|
- **Physical NIC**: enp35s0f0 (10 Gbps Intel X550)
|
||||||
|
- **IP**: 10.10.10.121/24
|
||||||
|
- **Gateway**: 10.10.10.1
|
||||||
|
- **MTU**: 9000
|
||||||
|
- **Purpose**: High-bandwidth LXC containers and VMs
|
||||||
|
- **Use for**: Containers/VMs that need high throughput to network
|
||||||
|
|
||||||
|
**VMs/CTs on vmbr1:**
|
||||||
|
| VMID | Name | IP |
|
||||||
|
|------|------|-----|
|
||||||
|
| 111 | lmdev1 | 10.10.10.111 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### vmbr2 - High-Speed VM Bridge (10Gb)
|
||||||
|
- **Physical NIC**: enp35s0f1 (10 Gbps Intel X550)
|
||||||
|
- **IP**: 10.10.10.122/24
|
||||||
|
- **Gateway**: (none configured)
|
||||||
|
- **MTU**: 9000
|
||||||
|
- **Purpose**: High-bandwidth VMs, storage traffic
|
||||||
|
- **Use for**: VMs that need high throughput (TrueNAS, Saltbox)
|
||||||
|
|
||||||
|
**VMs/CTs on vmbr2:**
|
||||||
|
| VMID | Name | IP |
|
||||||
|
|------|------|-----|
|
||||||
|
| 100 | truenas | 10.10.10.200 |
|
||||||
|
| 101 | saltbox | 10.10.10.100 |
|
||||||
|
| 202 | traefik (CT) | 10.10.10.250 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### vmbr3 - Internal-Only Bridge (Virtual)
|
||||||
|
- **Physical NIC**: None (isolated virtual network)
|
||||||
|
- **IP**: 10.10.20.1/24
|
||||||
|
- **Gateway**: N/A (no external routing)
|
||||||
|
- **MTU**: 9000
|
||||||
|
- **Purpose**: Inter-VM communication without external access
|
||||||
|
- **Use for**: Storage traffic (NFS/iSCSI), internal APIs, secure VM-to-VM
|
||||||
|
|
||||||
|
**VMs with secondary interface on vmbr3:**
|
||||||
|
| VMID | Name | Internal IP | Notes |
|
||||||
|
|------|------|-------------|-------|
|
||||||
|
| 100 | truenas | (check TrueNAS config) | NFS/iSCSI server |
|
||||||
|
| 101 | saltbox | (check VM config) | Media storage access |
|
||||||
|
| 111 | lmdev1 | (check VM config) | AI model storage |
|
||||||
|
| 201 | copyparty | 10.10.20.201 | Confirmed via cloud-init |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PVE2 (10.10.10.102) - Network Bridges
|
||||||
|
|
||||||
|
### Physical NICs
|
||||||
|
|
||||||
|
| Interface | Speed | Type | MAC Address | Connected To |
|
||||||
|
|-----------|-------|------|-------------|--------------|
|
||||||
|
| nic0 | Unknown | Unused | e0:4f:43:e6:1b:e3 | Not connected |
|
||||||
|
| nic1 | 10 Gbps | Primary NIC | a0:36:9f:26:b9:bc | **Direct to UCG-Fiber (10Gb negotiated)** |
|
||||||
|
|
||||||
|
**Note:** PVE2 connects directly to the UCG-Fiber. Link negotiates at 10Gb.
|
||||||
|
|
||||||
|
### Bridge Configuration
|
||||||
|
|
||||||
|
#### vmbr0 - Single Bridge (10Gb)
|
||||||
|
- **Physical NIC**: nic1 (10 Gbps)
|
||||||
|
- **IP**: 10.10.10.102/24
|
||||||
|
- **Gateway**: 10.10.10.1
|
||||||
|
- **Purpose**: All VMs on PVE2
|
||||||
|
|
||||||
|
**VMs on vmbr0:**
|
||||||
|
| VMID | Name | IP |
|
||||||
|
|------|------|-----|
|
||||||
|
| 300 | gitea-vm | 10.10.10.220 |
|
||||||
|
| 301 | trading-vm | 10.10.10.221 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Which Bridge to Use?
|
||||||
|
|
||||||
|
| Scenario | Bridge | Reason |
|
||||||
|
|----------|--------|--------|
|
||||||
|
| General VM/CT | vmbr0 | Standard networking, 1Gb is sufficient |
|
||||||
|
| High-bandwidth VM (media, AI) | vmbr1 or vmbr2 | 10Gb for large file transfers |
|
||||||
|
| Storage-heavy VM (NAS access) | vmbr2 + vmbr3 | 10Gb external + internal storage network |
|
||||||
|
| Isolated internal service | vmbr3 only | No external access, secure |
|
||||||
|
| VM needing both external + internal | vmbr0/1/2 + vmbr3 | Dual-homed configuration |
|
||||||
|
|
||||||
|
## Traffic Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Internet
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ UCG-Fiber (10.10.10.1) │
|
||||||
|
│ │
|
||||||
|
│ eth5 (10Gb SFP+) switch0 (eth0-eth4, 10Gb) │
|
||||||
|
│ │ │ │
|
||||||
|
└────────┼───────────────────────────────┼────────────────────┘
|
||||||
|
│ │
|
||||||
|
▼ │
|
||||||
|
┌─────────────────────┐ │
|
||||||
|
│ 10Gb Switch │ │
|
||||||
|
└─────────────────────┘ │
|
||||||
|
│ │ │ │
|
||||||
|
│ │ │ │
|
||||||
|
▼ ▼ ▼ ▼
|
||||||
|
enp1s0 enp35s0f0 enp35s0f1 nic1
|
||||||
|
(1Gb) (10Gb) (10Gb) (10Gb)
|
||||||
|
│ │ │ │
|
||||||
|
▼ ▼ ▼ ▼
|
||||||
|
vmbr0 vmbr1 vmbr2 vmbr0
|
||||||
|
│ │ │ │
|
||||||
|
│ │ │ │
|
||||||
|
PVE PVE PVE PVE2
|
||||||
|
General lmdev1 TrueNAS, gitea-vm,
|
||||||
|
VMs Saltbox, trading-vm
|
||||||
|
Traefik
|
||||||
|
|
||||||
|
Internal Only (no external access):
|
||||||
|
┌─────────────────────────────────────┐
|
||||||
|
│ vmbr3 (10.10.20.0/24) - Virtual │
|
||||||
|
│ No physical NIC - inter-VM only │
|
||||||
|
│ │
|
||||||
|
│ TrueNAS ◄──► Saltbox │
|
||||||
|
│ ▲ ▲ │
|
||||||
|
│ │ │ │
|
||||||
|
│ └─── lmdev1 ┘ │
|
||||||
|
│ ▲ │
|
||||||
|
│ │ │
|
||||||
|
│ copyparty │
|
||||||
|
└─────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Determining Physical Connections
|
||||||
|
|
||||||
|
To determine which 10Gb port goes where, check:
|
||||||
|
1. **Physical cable tracing** - Follow cables from server to switch/firewall
|
||||||
|
2. **Switch port status** - Check UniFi controller for connected ports
|
||||||
|
3. **MAC addresses** - Compare `ip link show` MACs with switch ARP table
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On PVE - get MAC addresses
|
||||||
|
ip link show enp35s0f0 | grep ether
|
||||||
|
ip link show enp35s0f1 | grep ether
|
||||||
|
|
||||||
|
# On router - check ARP
|
||||||
|
ssh root@10.10.10.1 'cat /proc/net/arp'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Adding a New VM to a Specific Network
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Add VM to vmbr0 (standard)
|
||||||
|
qm set VMID --net0 virtio,bridge=vmbr0
|
||||||
|
|
||||||
|
# Add VM to vmbr2 (10Gb)
|
||||||
|
qm set VMID --net0 virtio,bridge=vmbr2
|
||||||
|
|
||||||
|
# Add second NIC for internal network
|
||||||
|
qm set VMID --net1 virtio,bridge=vmbr3
|
||||||
|
|
||||||
|
# For containers
|
||||||
|
pct set CTID --net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1
|
||||||
|
```
|
||||||
|
|
||||||
|
## MTU Configuration
|
||||||
|
|
||||||
|
All bridges use **MTU 9000** (jumbo frames) for optimal storage performance.
|
||||||
|
|
||||||
|
If adding a new VM that will access NFS/iSCSI storage, ensure the guest OS also uses MTU 9000:
|
||||||
|
```bash
|
||||||
|
# Linux guest
|
||||||
|
ip link set eth0 mtu 9000
|
||||||
|
|
||||||
|
# Permanent (netplan)
|
||||||
|
# /etc/netplan/00-installer-config.yaml
|
||||||
|
network:
|
||||||
|
ethernets:
|
||||||
|
eth0:
|
||||||
|
mtu: 9000
|
||||||
|
```
|
||||||
147
SHELL-ALIASES.md
Normal file
147
SHELL-ALIASES.md
Normal file
@@ -0,0 +1,147 @@
|
|||||||
|
# Shell Aliases & Shortcuts
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
ZSH aliases for quickly launching Claude Code in project directories with `--dangerously-skip-permissions` enabled. Aliases sync across devices via Syncthing.
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
### File Locations
|
||||||
|
```
|
||||||
|
~/.config/shell/shared.zsh # Main shared config (sourced by .zshrc)
|
||||||
|
~/.config/shell/claude-aliases.zsh # Claude Code aliases
|
||||||
|
~/Projects/homelab/configs/ # Symlinks for reference
|
||||||
|
```
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
Add to `~/.zshrc`:
|
||||||
|
```bash
|
||||||
|
source ~/.config/shell/shared.zsh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Claude Code Aliases
|
||||||
|
|
||||||
|
### Quick Start (--continue)
|
||||||
|
Continue the most recent session in each project:
|
||||||
|
|
||||||
|
| Alias | Directory | Command |
|
||||||
|
|-------|-----------|---------|
|
||||||
|
| `chomelab` | ~/Projects/homelab | `claude --dangerously-skip-permissions --continue` |
|
||||||
|
| `ctrading` | ~/Projects/ai-trading-platform | `claude --dangerously-skip-permissions --continue` |
|
||||||
|
| `cnotes` | ~/Notes | `claude --dangerously-skip-permissions --continue --ide` |
|
||||||
|
| `chome` | ~ | `claude --dangerously-skip-permissions --continue` |
|
||||||
|
| `cfindshyt` | ~/Desktop/findshyt-working-folder | `claude --dangerously-skip-permissions --continue` |
|
||||||
|
| `ciconik` | ~/Projects/iconik-uploader | `claude --dangerously-skip-permissions --continue` |
|
||||||
|
| `cghostty` | ~/.config/ghostty | `claude --dangerously-skip-permissions --continue` |
|
||||||
|
| `cprojects` | ~/Projects | `claude --dangerously-skip-permissions --continue` |
|
||||||
|
| `cclaudeui` | ~/Projects/claude-ui | `claude --dangerously-skip-permissions --continue` |
|
||||||
|
| `clucid` | ~/Projects/lucidlink-upgrade | `claude --dangerously-skip-permissions --continue` |
|
||||||
|
| `cbeeper` | ~/Projects/beeper | `claude --dangerously-skip-permissions --continue` |
|
||||||
|
|
||||||
|
### Resume (--resume)
|
||||||
|
Show list of sessions to pick from:
|
||||||
|
|
||||||
|
| Alias | Directory |
|
||||||
|
|-------|-----------|
|
||||||
|
| `chomelab-r` | ~/Projects/homelab |
|
||||||
|
| `ctrading-r` | ~/Projects/ai-trading-platform |
|
||||||
|
| `cnotes-r` | ~/Notes |
|
||||||
|
| `chome-r` | ~ |
|
||||||
|
| `ciconik-r` | ~/Projects/iconik-uploader |
|
||||||
|
| `cbeeper-r` | ~/Projects/beeper |
|
||||||
|
|
||||||
|
### Fresh Start (no flags)
|
||||||
|
Start a new session without resuming:
|
||||||
|
|
||||||
|
| Alias | Directory |
|
||||||
|
|-------|-----------|
|
||||||
|
| `chomelab-new` | ~/Projects/homelab |
|
||||||
|
| `ctrading-new` | ~/Projects/ai-trading-platform |
|
||||||
|
| `cnotes-new` | ~/Notes |
|
||||||
|
| `chome-new` | ~ |
|
||||||
|
|
||||||
|
## Usage Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Continue homelab session
|
||||||
|
chomelab
|
||||||
|
|
||||||
|
# Pick from recent homelab sessions
|
||||||
|
chomelab-r
|
||||||
|
|
||||||
|
# Start fresh homelab session
|
||||||
|
chomelab-new
|
||||||
|
|
||||||
|
# Quick AI trading work
|
||||||
|
ctrading
|
||||||
|
```
|
||||||
|
|
||||||
|
## Adding New Aliases
|
||||||
|
|
||||||
|
Edit `~/.config/shell/claude-aliases.zsh`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Template for new project
|
||||||
|
alias cproject='cd ~/Projects/new-project && claude --dangerously-skip-permissions --continue'
|
||||||
|
alias cproject-r='cd ~/Projects/new-project && claude --dangerously-skip-permissions --resume'
|
||||||
|
alias cproject-new='cd ~/Projects/new-project && claude --dangerously-skip-permissions'
|
||||||
|
```
|
||||||
|
|
||||||
|
Changes sync automatically to all devices via Syncthing (~/.config folder).
|
||||||
|
|
||||||
|
## Enterprise/Work Aliases (claude-gateway)
|
||||||
|
|
||||||
|
Use `ec` prefix for work Claude account via `claude-gateway`:
|
||||||
|
|
||||||
|
### Quick Start (--continue)
|
||||||
|
| Alias | Directory |
|
||||||
|
|-------|-----------|
|
||||||
|
| `echomelab` | ~/Projects/homelab |
|
||||||
|
| `ectrading` | ~/Projects/ai-trading-platform |
|
||||||
|
| `ecnotes` | ~/Notes |
|
||||||
|
| `echome` | ~ |
|
||||||
|
| `ecfindshyt` | ~/Desktop/findshyt-working-folder |
|
||||||
|
| `eciconik` | ~/Projects/iconik-uploader |
|
||||||
|
| `ecghostty` | ~/.config/ghostty |
|
||||||
|
| `ecprojects` | ~/Projects |
|
||||||
|
| `ecclaudeui` | ~/Projects/claude-ui |
|
||||||
|
| `eclucid` | ~/Projects/lucidlink-upgrade |
|
||||||
|
| `ecbeeper` | ~/Projects/beeper |
|
||||||
|
|
||||||
|
### Resume & Fresh
|
||||||
|
- Resume: `echomelab-r`, `ectrading-r`, `ecnotes-r`, `echome-r`, `eciconik-r`, `ecbeeper-r`
|
||||||
|
- Fresh: `echomelab-new`, `ectrading-new`, `ecnotes-new`, `echome-new`
|
||||||
|
|
||||||
|
## Full Alias File
|
||||||
|
|
||||||
|
Located at: `~/.config/shell/claude-aliases.zsh`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Claude Code Project Aliases
|
||||||
|
|
||||||
|
# Main projects
|
||||||
|
alias chome='cd ~ && claude --dangerously-skip-permissions --continue'
|
||||||
|
alias ctrading='cd ~/Projects/ai-trading-platform && claude --dangerously-skip-permissions --continue'
|
||||||
|
alias ciconik='cd ~/Projects/iconik-uploader && claude --dangerously-skip-permissions --continue'
|
||||||
|
alias cnotes='cd ~/Notes && claude --dangerously-skip-permissions --continue --ide'
|
||||||
|
alias chomelab='cd ~/Projects/homelab && claude --dangerously-skip-permissions --continue'
|
||||||
|
alias cfindshyt='cd ~/Desktop/findshyt-working-folder && claude --dangerously-skip-permissions --continue'
|
||||||
|
alias cghostty='cd ~/.config/ghostty && claude --dangerously-skip-permissions --continue'
|
||||||
|
alias cprojects='cd ~/Projects && claude --dangerously-skip-permissions --continue'
|
||||||
|
alias cclaudeui='cd ~/projects/claude-ui && claude --dangerously-skip-permissions --continue'
|
||||||
|
alias clucid='cd ~/Projects/lucidlink-upgrade && claude --dangerously-skip-permissions --continue'
|
||||||
|
alias cbeeper='cd ~/Projects/beeper && claude --dangerously-skip-permissions --continue'
|
||||||
|
|
||||||
|
# Resume variants
|
||||||
|
alias chome-r='cd ~ && claude --dangerously-skip-permissions --resume'
|
||||||
|
alias ctrading-r='cd ~/Projects/ai-trading-platform && claude --dangerously-skip-permissions --resume'
|
||||||
|
alias ciconik-r='cd ~/Projects/iconik-uploader && claude --dangerously-skip-permissions --resume'
|
||||||
|
alias cnotes-r='cd ~/Notes && claude --dangerously-skip-permissions --resume --ide'
|
||||||
|
alias chomelab-r='cd ~/Projects/homelab && claude --dangerously-skip-permissions --resume'
|
||||||
|
alias cbeeper-r='cd ~/Projects/beeper && claude --dangerously-skip-permissions --resume'
|
||||||
|
|
||||||
|
# Fresh start
|
||||||
|
alias chome-new='cd ~ && claude --dangerously-skip-permissions'
|
||||||
|
alias ctrading-new='cd ~/Projects/ai-trading-platform && claude --dangerously-skip-permissions'
|
||||||
|
alias cnotes-new='cd ~/Notes && claude --dangerously-skip-permissions --ide'
|
||||||
|
alias chomelab-new='cd ~/Projects/homelab && claude --dangerously-skip-permissions'
|
||||||
|
```
|
||||||
166
SYNCTHING.md
Normal file
166
SYNCTHING.md
Normal file
@@ -0,0 +1,166 @@
|
|||||||
|
# Syncthing Setup
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
Syncthing provides real-time file synchronization across all devices. Files sync automatically when devices connect.
|
||||||
|
|
||||||
|
## Devices
|
||||||
|
|
||||||
|
| Device | ID Prefix | Local IP | Tailscale IP | Port | Role |
|
||||||
|
|--------|-----------|----------|--------------|------|------|
|
||||||
|
| Mac Mini | L3PJR73 | 10.10.10.123 | 100.108.89.58 | 22000 | Primary workstation |
|
||||||
|
| MacBook Pro | 3TFMYEI | 10.10.10.147 | 100.88.161.1 | 22000 | Laptop |
|
||||||
|
| TrueNAS | TPO72EY | 10.10.10.200 | 100.100.94.71 | 20978 | Storage server (central hub) |
|
||||||
|
| Windows PC | YDCPUQK | 10.10.10.150 | 100.120.97.76 | 22000 | Windows workstation |
|
||||||
|
| Phone (Android) | XLMZCCH | 10.10.10.54 | 100.106.175.37 | 22000 | Android, Notes only, HTTPS API |
|
||||||
|
|
||||||
|
## Network Configuration
|
||||||
|
|
||||||
|
**IPv4 Only** - All devices configured with explicit IPv4 addresses (no dynamic/IPv6):
|
||||||
|
- Local network: `10.10.10.0/24`
|
||||||
|
- Tailscale network: `100.x.x.x`
|
||||||
|
|
||||||
|
Device address format: `tcp4://IP:PORT` (e.g., `tcp4://10.10.10.123:22000`)
|
||||||
|
|
||||||
|
## Synced Folders
|
||||||
|
|
||||||
|
| Folder | Path | Devices | Notes |
|
||||||
|
|--------|------|---------|-------|
|
||||||
|
| Downloads | ~/Downloads | Mac Mini, MacBook, TrueNAS, Windows | Large folder, 3600s rescan |
|
||||||
|
| Notes | ~/Notes | Mac Mini, MacBook, TrueNAS | Documentation |
|
||||||
|
| Projects | ~/Projects | Mac Mini, MacBook, TrueNAS | Code repositories |
|
||||||
|
| bin | ~/bin | Mac Mini, MacBook, TrueNAS | Scripts and tools |
|
||||||
|
| Documents | ~/Documents | Mac Mini, MacBook, TrueNAS | Personal documents |
|
||||||
|
| Desktop | ~/Desktop | Mac Mini, MacBook, TrueNAS | Desktop files |
|
||||||
|
| config | ~/.config | Mac Mini, MacBook | Shell configs, app settings |
|
||||||
|
| Antigravity | ~/.gemini | Mac Mini, MacBook, TrueNAS | Gemini config |
|
||||||
|
|
||||||
|
## API Access
|
||||||
|
|
||||||
|
### Mac Mini
|
||||||
|
```bash
|
||||||
|
API_KEY="oSQSrPnMnrEXuHqjWrRdrvq3TSXesAT5"
|
||||||
|
curl -s "http://127.0.0.1:8384/rest/system/status" -H "X-API-Key: $API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
### MacBook Pro
|
||||||
|
```bash
|
||||||
|
API_KEY="qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ"
|
||||||
|
curl -s "http://127.0.0.1:8384/rest/system/status" -H "X-API-Key: $API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Windows PC
|
||||||
|
```bash
|
||||||
|
API_KEY="KPHGteJv6APPE7zFun33b3qM3Vn5KSA7"
|
||||||
|
curl -s "http://10.10.10.150:8384/rest/system/status" -H "X-API-Key: $API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phone (Android) - Uses HTTPS
|
||||||
|
```bash
|
||||||
|
API_KEY="Xxz3jDT4akUJe6psfwZsbZwG2LhfZuDM"
|
||||||
|
# Access via local IP (use -k to skip cert verification)
|
||||||
|
curl -sk "https://10.10.10.54:8384/rest/system/status" -H "X-API-Key: $API_KEY"
|
||||||
|
# Or via Tailscale
|
||||||
|
curl -sk "https://100.106.175.37:8384/rest/system/status" -H "X-API-Key: $API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Commands
|
||||||
|
|
||||||
|
### Check Status
|
||||||
|
```bash
|
||||||
|
# Folder status
|
||||||
|
curl -s "http://127.0.0.1:8384/rest/db/status?folder=downloads" -H "X-API-Key: $API_KEY"
|
||||||
|
|
||||||
|
# Connection status
|
||||||
|
curl -s "http://127.0.0.1:8384/rest/system/connections" -H "X-API-Key: $API_KEY"
|
||||||
|
|
||||||
|
# Device completion for a folder
|
||||||
|
curl -s "http://127.0.0.1:8384/rest/db/completion?folder=downloads&device=DEVICE_ID" -H "X-API-Key: $API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Errors
|
||||||
|
```bash
|
||||||
|
curl -s "http://127.0.0.1:8384/rest/folder/errors?folder=downloads" -H "X-API-Key: $API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rescan Folder
|
||||||
|
```bash
|
||||||
|
curl -X POST "http://127.0.0.1:8384/rest/db/scan?folder=downloads" -H "X-API-Key: $API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration Files
|
||||||
|
|
||||||
|
| Device | Config Path |
|
||||||
|
|--------|-------------|
|
||||||
|
| Mac Mini | ~/Library/Application Support/Syncthing/config.xml |
|
||||||
|
| MacBook Pro | ~/Library/Application Support/Syncthing/config.xml |
|
||||||
|
| TrueNAS | /mnt/tank/syncthing/config/config.xml |
|
||||||
|
|
||||||
|
## Performance Tuning
|
||||||
|
|
||||||
|
### Speed Optimizations (2024-12-17)
|
||||||
|
|
||||||
|
#### Global Options
|
||||||
|
| Setting | Value | Effect |
|
||||||
|
|---------|-------|--------|
|
||||||
|
| `numConnections` | 4 | Parallel transfers per device |
|
||||||
|
| `compression` | never | No CPU overhead on fast LAN |
|
||||||
|
| `setLowPriority` | false | Normal CPU priority |
|
||||||
|
| `connectionPriorityQuicLan` | 10 | QUIC preferred on LAN |
|
||||||
|
| `connectionPriorityTcpLan` | 20 | TCP fallback on LAN |
|
||||||
|
| `connectionPriorityQuicWan` | 30 | QUIC preferred on WAN |
|
||||||
|
| `connectionPriorityTcpWan` | 40 | TCP fallback on WAN |
|
||||||
|
| `progressUpdateIntervalS` | -1 | Disabled progress updates (reduces overhead) |
|
||||||
|
| `maxConcurrentIncomingRequestKiB` | 1048576 | 1GB buffer for incoming requests |
|
||||||
|
|
||||||
|
**Applied to**: Mac Mini, MacBook, Windows PC (Phone uses 512MB buffer)
|
||||||
|
|
||||||
|
#### Folder-Level Settings
|
||||||
|
| Setting | Value | Effect |
|
||||||
|
|---------|-------|--------|
|
||||||
|
| `pullerMaxPendingKiB` | 131072-262144 | 128-256MB pending data buffer per folder |
|
||||||
|
|
||||||
|
**Applied to**: downloads, projects, documents, desktop, notes folders
|
||||||
|
|
||||||
|
### Rescan Intervals (set to 3600s for large folders)
|
||||||
|
Large folders like Downloads use 1-hour rescan intervals to reduce CPU usage:
|
||||||
|
- File system watcher handles real-time changes
|
||||||
|
- Hourly rescan catches anything missed
|
||||||
|
|
||||||
|
### Power Optimization
|
||||||
|
From CLAUDE.md - Syncthing rescan optimization saved ~60-80W on TrueNAS VM.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Device Not Syncing
|
||||||
|
1. Check connection status:
|
||||||
|
```bash
|
||||||
|
curl -s "http://127.0.0.1:8384/rest/system/connections" -H "X-API-Key: $API_KEY" | python3 -c "import sys,json; d=json.load(sys.stdin)['connections']; [print(f'{k[:7]}: {v[\"connected\"]}') for k,v in d.items()]"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Check folder completion:
|
||||||
|
```bash
|
||||||
|
curl -s "http://127.0.0.1:8384/rest/db/status?folder=FOLDER" -H "X-API-Key: $API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Check for errors:
|
||||||
|
```bash
|
||||||
|
curl -s "http://127.0.0.1:8384/rest/folder/errors?folder=FOLDER" -H "X-API-Key: $API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Many Pending Deletes
|
||||||
|
If a device shows thousands of "needDeletes", it means files were deleted elsewhere and need to propagate. This is normal after reorganization - let it complete.
|
||||||
|
|
||||||
|
### Web UI
|
||||||
|
Access Syncthing web interface at http://127.0.0.1:8384
|
||||||
|
|
||||||
|
## SSH Access to Devices
|
||||||
|
|
||||||
|
### MacBook Pro (via Tailscale)
|
||||||
|
```bash
|
||||||
|
sshpass -p 'GrilledCh33s3#' ssh -o StrictHostKeyChecking=no hutson@100.88.161.1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Syncthing remotely
|
||||||
|
```bash
|
||||||
|
sshpass -p 'GrilledCh33s3#' ssh hutson@100.88.161.1 'curl -s "http://127.0.0.1:8384/rest/db/status?folder=downloads" -H "X-API-Key: qYkNdVLwy9qZZZ6MqnJr7tHX7KKdxGMJ"'
|
||||||
|
```
|
||||||
1
configs/claude-aliases.zsh
Symbolic link
1
configs/claude-aliases.zsh
Symbolic link
@@ -0,0 +1 @@
|
|||||||
|
/Users/hutson/.config/shell/claude-aliases.zsh
|
||||||
5
configs/ghostty.conf
Normal file
5
configs/ghostty.conf
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
theme = Gruvbox Dark
|
||||||
|
font-feature = -liga
|
||||||
|
font-size = 16
|
||||||
|
font-family = "JetBrains Mono"
|
||||||
|
split-divider-color = #83a598
|
||||||
16
mcp-central/.env.example
Normal file
16
mcp-central/.env.example
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
# MCP Central Server Environment Variables
|
||||||
|
# Copy to .env and fill in your values
|
||||||
|
|
||||||
|
# Airtable
|
||||||
|
AIRTABLE_API_KEY=patIrM3XYParyuHQL.xxxxx
|
||||||
|
|
||||||
|
# Exa
|
||||||
|
EXA_API_KEY=your_exa_api_key
|
||||||
|
|
||||||
|
# TickTick (if using)
|
||||||
|
TICKTICK_CLIENT_ID=your_client_id
|
||||||
|
TICKTICK_CLIENT_SECRET=your_client_secret
|
||||||
|
|
||||||
|
# Slack (if using)
|
||||||
|
SLACK_BOT_TOKEN=xoxb-xxxxx
|
||||||
|
SLACK_USER_TOKEN=xoxp-xxxxx
|
||||||
129
mcp-central/README.md
Normal file
129
mcp-central/README.md
Normal file
@@ -0,0 +1,129 @@
|
|||||||
|
# Centralized MCP Servers for Homelab
|
||||||
|
|
||||||
|
## Current State of MCP Remote Access
|
||||||
|
|
||||||
|
**The Problem**: Most MCP servers use `stdio` transport (local process communication).
|
||||||
|
Claude Code clients expect to spawn local processes.
|
||||||
|
|
||||||
|
**The Solution**: Use `mcp-remote` to bridge local clients to remote servers.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ docker-host (10.10.10.206) │
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||||
|
│ │ airtable-mcp│ │ exa-mcp │ │ ticktick-mcp│ ... │
|
||||||
|
│ │ :3001/sse │ │ :3002/sse │ │ :3003/sse │ │
|
||||||
|
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
▲ ▲ ▲
|
||||||
|
│ │ │
|
||||||
|
┌──────┴───────────────┴───────────────┴──────┐
|
||||||
|
│ Tailscale / LAN │
|
||||||
|
└──────┬───────────────┬───────────────┬──────┘
|
||||||
|
│ │ │
|
||||||
|
┌─────────▼─────┐ ┌───────▼───────┐ ┌─────▼─────────┐
|
||||||
|
│ MacBook │ │ Mac Mini │ │ Windows PC │
|
||||||
|
│ Claude Code │ │ Claude Code │ │ Claude Code │
|
||||||
|
│ mcp-remote │ │ mcp-remote │ │ mcp-remote │
|
||||||
|
└───────────────┘ └───────────────┘ └───────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
### Step 1: Deploy MCP Servers on docker-host
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh hutson@10.10.10.206
|
||||||
|
cd /opt/mcp-central
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Configure Claude Code Clients
|
||||||
|
|
||||||
|
Each device needs `mcp-remote` installed and configured.
|
||||||
|
|
||||||
|
**Install mcp-remote:**
|
||||||
|
```bash
|
||||||
|
npm install -g mcp-remote
|
||||||
|
```
|
||||||
|
|
||||||
|
**Update ~/.claude/settings.json:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"airtable": {
|
||||||
|
"command": "npx",
|
||||||
|
"args": ["mcp-remote", "http://10.10.10.206:3001/sse"]
|
||||||
|
},
|
||||||
|
"exa": {
|
||||||
|
"command": "npx",
|
||||||
|
"args": ["mcp-remote", "http://10.10.10.206:3002/sse"]
|
||||||
|
},
|
||||||
|
"ticktick": {
|
||||||
|
"command": "npx",
|
||||||
|
"args": ["mcp-remote", "http://10.10.10.206:3003/sse"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**For remote access via Tailscale, use Tailscale IP:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"airtable": {
|
||||||
|
"command": "npx",
|
||||||
|
"args": ["mcp-remote", "http://100.x.x.x:3001/sse"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Which Servers Can Be Centralized?
|
||||||
|
|
||||||
|
| Server | Centralizable | Notes |
|
||||||
|
|--------|--------------|-------|
|
||||||
|
| Airtable | Yes | Just needs API key |
|
||||||
|
| Exa | Yes | Just needs API key |
|
||||||
|
| TickTick | Yes | OAuth token stored server-side |
|
||||||
|
| Slack | Yes | Bot token stored server-side |
|
||||||
|
| Ref | Yes | API key only |
|
||||||
|
| Beeper | No | Needs local Beeper Desktop |
|
||||||
|
| Google Sheets | Partial | OAuth flow needs user interaction |
|
||||||
|
| Monarch Money | Partial | Credentials stored server-side |
|
||||||
|
|
||||||
|
## Alternative: Shared Config File
|
||||||
|
|
||||||
|
If full centralization is too complex, you can at least share the config:
|
||||||
|
|
||||||
|
1. Store `settings.json` in a synced folder (e.g., Syncthing `configs/`)
|
||||||
|
2. Symlink from each device:
|
||||||
|
```bash
|
||||||
|
ln -s ~/Sync/configs/claude-settings.json ~/.claude/settings.json
|
||||||
|
```
|
||||||
|
|
||||||
|
This doesn't centralize the servers, but ensures all devices have the same config.
|
||||||
|
|
||||||
|
## Traefik Integration (Optional)
|
||||||
|
|
||||||
|
Add to Traefik for HTTPS access:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# /etc/traefik/conf.d/mcp.yaml
|
||||||
|
http:
|
||||||
|
routers:
|
||||||
|
mcp-airtable:
|
||||||
|
rule: "Host(`mcp-airtable.htsn.io`)"
|
||||||
|
service: mcp-airtable
|
||||||
|
tls:
|
||||||
|
certResolver: cloudflare
|
||||||
|
services:
|
||||||
|
mcp-airtable:
|
||||||
|
loadBalancer:
|
||||||
|
servers:
|
||||||
|
- url: "http://10.10.10.206:3001"
|
||||||
|
```
|
||||||
|
|
||||||
|
Then use: `http://mcp-airtable.htsn.io/sse` in your config.
|
||||||
58
mcp-central/docker-compose.yml
Normal file
58
mcp-central/docker-compose.yml
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
# Centralized MCP Server Stack
|
||||||
|
# Deploy on docker-host (10.10.10.206)
|
||||||
|
# All Claude Code clients connect via HTTP/SSE
|
||||||
|
|
||||||
|
version: "3.8"
|
||||||
|
|
||||||
|
services:
|
||||||
|
# MCP Gateway - Routes all MCP requests
|
||||||
|
mcp-gateway:
|
||||||
|
image: node:20-slim
|
||||||
|
container_name: mcp-gateway
|
||||||
|
working_dir: /app
|
||||||
|
volumes:
|
||||||
|
- ./gateway:/app
|
||||||
|
ports:
|
||||||
|
- "3100:3100"
|
||||||
|
command: node server.js
|
||||||
|
restart: unless-stopped
|
||||||
|
environment:
|
||||||
|
- PORT=3100
|
||||||
|
networks:
|
||||||
|
- mcp-network
|
||||||
|
|
||||||
|
# Airtable MCP Server
|
||||||
|
airtable-mcp:
|
||||||
|
image: node:20-slim
|
||||||
|
container_name: airtable-mcp
|
||||||
|
working_dir: /app
|
||||||
|
command: sh -c "npm install airtable-mcp-server && npx airtable-mcp-server"
|
||||||
|
environment:
|
||||||
|
- AIRTABLE_API_KEY=${AIRTABLE_API_KEY}
|
||||||
|
- MCP_TRANSPORT=sse
|
||||||
|
- MCP_PORT=3001
|
||||||
|
ports:
|
||||||
|
- "3001:3001"
|
||||||
|
restart: unless-stopped
|
||||||
|
networks:
|
||||||
|
- mcp-network
|
||||||
|
|
||||||
|
# Exa MCP Server
|
||||||
|
exa-mcp:
|
||||||
|
image: node:20-slim
|
||||||
|
container_name: exa-mcp
|
||||||
|
working_dir: /app
|
||||||
|
command: sh -c "npm install @anthropic/mcp-server-exa && npx @anthropic/mcp-server-exa"
|
||||||
|
environment:
|
||||||
|
- EXA_API_KEY=${EXA_API_KEY}
|
||||||
|
- MCP_TRANSPORT=sse
|
||||||
|
- MCP_PORT=3002
|
||||||
|
ports:
|
||||||
|
- "3002:3002"
|
||||||
|
restart: unless-stopped
|
||||||
|
networks:
|
||||||
|
- mcp-network
|
||||||
|
|
||||||
|
networks:
|
||||||
|
mcp-network:
|
||||||
|
driver: bridge
|
||||||
159
scripts/fix-immich-raf-files.sh
Normal file
159
scripts/fix-immich-raf-files.sh
Normal file
@@ -0,0 +1,159 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
#
|
||||||
|
# Fix Immich RAF files that were mislabeled as JPG
|
||||||
|
# This script:
|
||||||
|
# 1. Finds all JPG files that are actually Fujifilm RAF (RAW) files
|
||||||
|
# 2. Renames them from .jpg to .raf on the filesystem
|
||||||
|
# 3. Updates Immich's database to match
|
||||||
|
# 4. Triggers thumbnail regeneration
|
||||||
|
#
|
||||||
|
# Run from Mac Mini or any machine with SSH access to PVE
|
||||||
|
#
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Config
|
||||||
|
SSH_PASS="GrilledCh33s3#"
|
||||||
|
PVE_IP="10.10.10.120"
|
||||||
|
SSH_OPTS="-o StrictHostKeyChecking=no"
|
||||||
|
|
||||||
|
# Colors
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
echo "=========================================="
|
||||||
|
echo " Immich RAF File Fixer"
|
||||||
|
echo "=========================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Test connectivity
|
||||||
|
echo "Testing connection to Saltbox..."
|
||||||
|
if ! sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm status 101' &>/dev/null; then
|
||||||
|
echo -e "${RED}Error: Cannot connect to PVE or Saltbox VM not running${NC}"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo -e "${GREEN}Connected${NC}"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Step 1: Find mislabeled files
|
||||||
|
echo "Step 1: Finding JPG files that are actually RAF..."
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
MISLABELED_COUNT=$(sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "echo \"SELECT COUNT(*) FROM asset a JOIN asset_exif e ON a.id = e.\\\"assetId\\\" WHERE a.\\\"originalFileName\\\" ILIKE '"'"'%.jpg'"'"' AND e.\\\"fileSizeInByte\\\" > 35000000 AND e.make = '"'"'FUJIFILM'"'"';\" | docker exec -i immich-postgres psql -U hutson -d immich -t"' 2>/dev/null | grep -o '[0-9]*' | head -1)
|
||||||
|
|
||||||
|
echo -e "Found ${YELLOW}${MISLABELED_COUNT}${NC} mislabeled files"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
if [ "$MISLABELED_COUNT" -eq 0 ]; then
|
||||||
|
echo -e "${GREEN}No mislabeled files found. Nothing to fix!${NC}"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Confirm before proceeding
|
||||||
|
read -p "Proceed with fixing these files? (y/N) " -n 1 -r
|
||||||
|
echo ""
|
||||||
|
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||||
|
echo "Aborted."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "Step 2: Creating fix script on Saltbox..."
|
||||||
|
|
||||||
|
# Create the fix script on Saltbox
|
||||||
|
sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "cat > /tmp/fix-raf-files.sh << '"'"'SCRIPT'"'"'
|
||||||
|
#!/bin/bash
|
||||||
|
set -e
|
||||||
|
|
||||||
|
echo "Getting list of mislabeled files..."
|
||||||
|
|
||||||
|
# Get list of files to fix
|
||||||
|
docker exec -i immich-postgres psql -U hutson -d immich -t -A -F\",\" -c "
|
||||||
|
SELECT a.id, a.\"originalPath\", a.\"originalFileName\"
|
||||||
|
FROM asset a
|
||||||
|
JOIN asset_exif e ON a.id = e.\"assetId\"
|
||||||
|
WHERE a.\"originalFileName\" ILIKE '"'"'"'"'"'"'"'"'%.jpg'"'"'"'"'"'"'"'"'
|
||||||
|
AND e.\"fileSizeInByte\" > 35000000
|
||||||
|
AND e.make = '"'"'"'"'"'"'"'"'FUJIFILM'"'"'"'"'"'"'"'"'
|
||||||
|
" > /tmp/files_to_fix.csv
|
||||||
|
|
||||||
|
TOTAL=$(wc -l < /tmp/files_to_fix.csv)
|
||||||
|
echo "Processing $TOTAL files..."
|
||||||
|
|
||||||
|
COUNT=0
|
||||||
|
ERRORS=0
|
||||||
|
|
||||||
|
while IFS="," read -r asset_id old_path old_filename; do
|
||||||
|
COUNT=$((COUNT + 1))
|
||||||
|
|
||||||
|
# Skip empty lines
|
||||||
|
[ -z "$asset_id" ] && continue
|
||||||
|
|
||||||
|
# Calculate new paths
|
||||||
|
new_filename=$(echo "$old_filename" | sed "s/\.[jJ][pP][gG]$/.RAF/")
|
||||||
|
new_path=$(echo "$old_path" | sed "s/\.[jJ][pP][gG]$/.raf/")
|
||||||
|
|
||||||
|
echo "[$COUNT/$TOTAL] $old_filename -> $new_filename"
|
||||||
|
|
||||||
|
# Rename file on filesystem (inside immich container)
|
||||||
|
if docker exec immich test -f "$old_path"; then
|
||||||
|
docker exec immich mv "$old_path" "$new_path" 2>/dev/null
|
||||||
|
if [ $? -ne 0 ]; then
|
||||||
|
echo " ERROR: Failed to rename file"
|
||||||
|
ERRORS=$((ERRORS + 1))
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo " WARNING: File not found at $old_path"
|
||||||
|
ERRORS=$((ERRORS + 1))
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Update database
|
||||||
|
docker exec -i immich-postgres psql -U hutson -d immich -c "
|
||||||
|
UPDATE asset
|
||||||
|
SET \"originalPath\" = '"'"'"'"'"'"'"'"'$new_path'"'"'"'"'"'"'"'"',
|
||||||
|
\"originalFileName\" = '"'"'"'"'"'"'"'"'$new_filename'"'"'"'"'"'"'"'"'
|
||||||
|
WHERE id = '"'"'"'"'"'"'"'"'$asset_id'"'"'"'"'"'"'"'"'::uuid;
|
||||||
|
" > /dev/null 2>&1
|
||||||
|
|
||||||
|
if [ $? -ne 0 ]; then
|
||||||
|
echo " ERROR: Failed to update database"
|
||||||
|
# Try to rename back
|
||||||
|
docker exec immich mv "$new_path" "$old_path" 2>/dev/null
|
||||||
|
ERRORS=$((ERRORS + 1))
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
done < /tmp/files_to_fix.csv
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "=========================================="
|
||||||
|
echo "Completed: $((COUNT - ERRORS)) fixed, $ERRORS errors"
|
||||||
|
echo "=========================================="
|
||||||
|
|
||||||
|
# Cleanup
|
||||||
|
rm -f /tmp/files_to_fix.csv
|
||||||
|
SCRIPT
|
||||||
|
chmod +x /tmp/fix-raf-files.sh"'
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "Step 3: Running fix script (this may take a while)..."
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Run the fix script
|
||||||
|
sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "/tmp/fix-raf-files.sh"' 2>&1 | grep -o '"out-data"[^}]*' | sed 's/"out-data" *: *"//' | sed 's/\\n/\n/g' | sed 's/\\t/\t/g' | sed 's/"$//'
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "Step 4: Restarting Immich to pick up changes..."
|
||||||
|
|
||||||
|
sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP 'qm guest exec 101 -- bash -c "docker restart immich"' > /dev/null 2>&1
|
||||||
|
|
||||||
|
echo -e "${GREEN}Done!${NC}"
|
||||||
|
echo ""
|
||||||
|
echo "Next steps:"
|
||||||
|
echo "1. Go to Immich Admin -> Jobs -> Thumbnail Generation -> All -> Start"
|
||||||
|
echo "2. This will regenerate thumbnails for all assets"
|
||||||
|
echo ""
|
||||||
318
scripts/health-check.sh
Executable file
318
scripts/health-check.sh
Executable file
@@ -0,0 +1,318 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
#
|
||||||
|
# Homelab Health Check & Recovery Script
|
||||||
|
# Run this to check status and bring services online
|
||||||
|
#
|
||||||
|
# Usage: ./health-check.sh [--fix]
|
||||||
|
# Without --fix: Read-only health check
|
||||||
|
# With --fix: Attempt to start stopped services and fix issues
|
||||||
|
#
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Colors
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
NC='\033[0m' # No Color
|
||||||
|
|
||||||
|
# Config
|
||||||
|
SSH_PASS="GrilledCh33s3#"
|
||||||
|
PVE_IP="10.10.10.120"
|
||||||
|
PVE2_IP="10.10.10.102"
|
||||||
|
SSH_OPTS="-o StrictHostKeyChecking=no -o ConnectTimeout=5"
|
||||||
|
|
||||||
|
FIX_MODE=false
|
||||||
|
if [[ "$1" == "--fix" ]]; then
|
||||||
|
FIX_MODE=true
|
||||||
|
echo -e "${YELLOW}Running in FIX mode - will attempt to start stopped services${NC}"
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Helper functions
|
||||||
|
ssh_pve() {
|
||||||
|
sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE_IP "$@" 2>/dev/null
|
||||||
|
}
|
||||||
|
|
||||||
|
ssh_pve2() {
|
||||||
|
sshpass -p "$SSH_PASS" ssh $SSH_OPTS root@$PVE2_IP "$@" 2>/dev/null
|
||||||
|
}
|
||||||
|
|
||||||
|
print_status() {
|
||||||
|
if [[ "$2" == "ok" ]]; then
|
||||||
|
echo -e " ${GREEN}✓${NC} $1"
|
||||||
|
elif [[ "$2" == "warn" ]]; then
|
||||||
|
echo -e " ${YELLOW}!${NC} $1"
|
||||||
|
else
|
||||||
|
echo -e " ${RED}✗${NC} $1"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check if sshpass is installed
|
||||||
|
if ! command -v sshpass &> /dev/null; then
|
||||||
|
echo -e "${RED}Error: sshpass is not installed${NC}"
|
||||||
|
echo "Install with: brew install hudochenkov/sshpass/sshpass"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "================================"
|
||||||
|
echo " HOMELAB HEALTH CHECK"
|
||||||
|
echo " $(date '+%Y-%m-%d %H:%M:%S')"
|
||||||
|
echo "================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# ============================================
|
||||||
|
# PVE (Primary Server)
|
||||||
|
# ============================================
|
||||||
|
echo "--- PVE (10.10.10.120) ---"
|
||||||
|
|
||||||
|
# Check connectivity
|
||||||
|
if ssh_pve "echo ok" > /dev/null 2>&1; then
|
||||||
|
print_status "PVE Reachable" "ok"
|
||||||
|
else
|
||||||
|
print_status "PVE Unreachable" "fail"
|
||||||
|
echo ""
|
||||||
|
echo "--- PVE2 (10.10.10.102) ---"
|
||||||
|
if ssh_pve2 "echo ok" > /dev/null 2>&1; then
|
||||||
|
print_status "PVE2 Reachable" "ok"
|
||||||
|
else
|
||||||
|
print_status "PVE2 Unreachable" "fail"
|
||||||
|
fi
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check cluster quorum
|
||||||
|
QUORUM=$(ssh_pve "pvecm status 2>&1 | grep 'Quorate:' | awk '{print \$2}'" || echo "Unknown")
|
||||||
|
if [[ "$QUORUM" == "Yes" ]]; then
|
||||||
|
print_status "Cluster Quorum: $QUORUM" "ok"
|
||||||
|
else
|
||||||
|
print_status "Cluster Quorum: $QUORUM" "fail"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check CPU temp
|
||||||
|
TEMP=$(ssh_pve 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo $(($(cat $f)/1000)); fi; done')
|
||||||
|
if [[ -n "$TEMP" ]]; then
|
||||||
|
if [[ "$TEMP" -lt 85 ]]; then
|
||||||
|
print_status "CPU Temp: ${TEMP}°C" "ok"
|
||||||
|
elif [[ "$TEMP" -lt 90 ]]; then
|
||||||
|
print_status "CPU Temp: ${TEMP}°C (warm)" "warn"
|
||||||
|
else
|
||||||
|
print_status "CPU Temp: ${TEMP}°C (HOT!)" "fail"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check ZFS pools
|
||||||
|
ZFS_STATUS=$(ssh_pve "zpool status -x" || echo "Unknown")
|
||||||
|
if [[ "$ZFS_STATUS" == "all pools are healthy" ]]; then
|
||||||
|
print_status "ZFS Pools: Healthy" "ok"
|
||||||
|
else
|
||||||
|
print_status "ZFS Pools: $ZFS_STATUS" "fail"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check VMs
|
||||||
|
echo ""
|
||||||
|
echo " VMs:"
|
||||||
|
CRITICAL_VMS="100 101 110 206" # TrueNAS, Saltbox, HomeAssistant, Docker-host
|
||||||
|
STOPPED_VMS=""
|
||||||
|
TRUENAS_ZFS_SUSPENDED=false
|
||||||
|
|
||||||
|
while IFS= read -r line; do
|
||||||
|
VMID=$(echo "$line" | awk '{print $1}')
|
||||||
|
NAME=$(echo "$line" | awk '{print $2}')
|
||||||
|
STATUS=$(echo "$line" | awk '{print $3}')
|
||||||
|
|
||||||
|
if [[ "$STATUS" == "running" ]]; then
|
||||||
|
print_status "$VMID $NAME: $STATUS" "ok"
|
||||||
|
else
|
||||||
|
print_status "$VMID $NAME: $STATUS" "fail"
|
||||||
|
if [[ " $CRITICAL_VMS " =~ " $VMID " ]]; then
|
||||||
|
STOPPED_VMS="$STOPPED_VMS $VMID"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
done < <(ssh_pve "qm list" | tail -n +2)
|
||||||
|
|
||||||
|
# Check TrueNAS ZFS (VM 100) if running
|
||||||
|
if ssh_pve "qm status 100" 2>/dev/null | grep -q running; then
|
||||||
|
echo ""
|
||||||
|
echo " TrueNAS ZFS:"
|
||||||
|
TRUENAS_ZFS=$(ssh_pve 'qm guest exec 100 -- bash -c "zpool list -H -o name,health vault 2>/dev/null"' 2>/dev/null | grep -o '"out-data"[^}]*' | sed 's/"out-data" : "//' | tr -d '\\n"' || echo "Unknown")
|
||||||
|
|
||||||
|
if [[ "$TRUENAS_ZFS" == *"ONLINE"* ]]; then
|
||||||
|
print_status "vault pool: ONLINE" "ok"
|
||||||
|
elif [[ "$TRUENAS_ZFS" == *"SUSPENDED"* ]]; then
|
||||||
|
print_status "vault pool: SUSPENDED (needs zpool clear)" "fail"
|
||||||
|
TRUENAS_ZFS_SUSPENDED=true
|
||||||
|
elif [[ "$TRUENAS_ZFS" == *"DEGRADED"* ]]; then
|
||||||
|
print_status "vault pool: DEGRADED" "warn"
|
||||||
|
else
|
||||||
|
print_status "vault pool: $TRUENAS_ZFS" "fail"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check Containers
|
||||||
|
echo ""
|
||||||
|
echo " Containers:"
|
||||||
|
CRITICAL_CTS="200 202" # PiHole, Traefik
|
||||||
|
STOPPED_CTS=""
|
||||||
|
|
||||||
|
while IFS= read -r line; do
|
||||||
|
CTID=$(echo "$line" | awk '{print $1}')
|
||||||
|
STATUS=$(echo "$line" | awk '{print $2}')
|
||||||
|
NAME=$(echo "$line" | awk '{print $4}')
|
||||||
|
|
||||||
|
if [[ "$STATUS" == "running" ]]; then
|
||||||
|
print_status "$CTID $NAME: $STATUS" "ok"
|
||||||
|
else
|
||||||
|
print_status "$CTID $NAME: $STATUS" "fail"
|
||||||
|
if [[ " $CRITICAL_CTS " =~ " $CTID " ]]; then
|
||||||
|
STOPPED_CTS="$STOPPED_CTS $CTID"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
done < <(ssh_pve "pct list" | tail -n +2)
|
||||||
|
|
||||||
|
# ============================================
|
||||||
|
# PVE2 (Secondary Server)
|
||||||
|
# ============================================
|
||||||
|
echo ""
|
||||||
|
echo "--- PVE2 (10.10.10.102) ---"
|
||||||
|
|
||||||
|
if ssh_pve2 "echo ok" > /dev/null 2>&1; then
|
||||||
|
print_status "PVE2 Reachable" "ok"
|
||||||
|
|
||||||
|
# Check CPU temp
|
||||||
|
TEMP2=$(ssh_pve2 'for f in /sys/class/hwmon/hwmon*/temp*_input; do label=$(cat ${f%_input}_label 2>/dev/null); if [ "$label" = "Tctl" ]; then echo $(($(cat $f)/1000)); fi; done')
|
||||||
|
if [[ -n "$TEMP2" ]]; then
|
||||||
|
if [[ "$TEMP2" -lt 85 ]]; then
|
||||||
|
print_status "CPU Temp: ${TEMP2}°C" "ok"
|
||||||
|
elif [[ "$TEMP2" -lt 90 ]]; then
|
||||||
|
print_status "CPU Temp: ${TEMP2}°C (warm)" "warn"
|
||||||
|
else
|
||||||
|
print_status "CPU Temp: ${TEMP2}°C (HOT!)" "fail"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check VMs
|
||||||
|
echo ""
|
||||||
|
echo " VMs:"
|
||||||
|
while IFS= read -r line; do
|
||||||
|
VMID=$(echo "$line" | awk '{print $1}')
|
||||||
|
NAME=$(echo "$line" | awk '{print $2}')
|
||||||
|
STATUS=$(echo "$line" | awk '{print $3}')
|
||||||
|
|
||||||
|
if [[ "$STATUS" == "running" ]]; then
|
||||||
|
print_status "$VMID $NAME: $STATUS" "ok"
|
||||||
|
else
|
||||||
|
print_status "$VMID $NAME: $STATUS" "fail"
|
||||||
|
fi
|
||||||
|
done < <(ssh_pve2 "qm list" | tail -n +2)
|
||||||
|
else
|
||||||
|
print_status "PVE2 Unreachable" "fail"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# ============================================
|
||||||
|
# FIX MODE - Start stopped services
|
||||||
|
# ============================================
|
||||||
|
if $FIX_MODE && [[ -n "$STOPPED_VMS" || -n "$STOPPED_CTS" || "$TRUENAS_ZFS_SUSPENDED" == "true" ]]; then
|
||||||
|
echo ""
|
||||||
|
echo "================================"
|
||||||
|
echo " RECOVERY MODE"
|
||||||
|
echo "================================"
|
||||||
|
|
||||||
|
# Fix TrueNAS ZFS SUSPENDED state first (critical for mounts)
|
||||||
|
if [[ "$TRUENAS_ZFS_SUSPENDED" == "true" ]]; then
|
||||||
|
echo ""
|
||||||
|
echo "Clearing TrueNAS ZFS pool errors..."
|
||||||
|
ZFS_CLEAR_RESULT=$(ssh_pve 'qm guest exec 100 -- bash -c "zpool clear vault 2>&1 && zpool list -H -o health vault"' 2>/dev/null | grep -o '"out-data"[^}]*' | sed 's/"out-data" : "//' | tr -d '\\n"' || echo "FAILED")
|
||||||
|
|
||||||
|
if [[ "$ZFS_CLEAR_RESULT" == *"ONLINE"* ]]; then
|
||||||
|
print_status "vault pool recovered: ONLINE" "ok"
|
||||||
|
else
|
||||||
|
print_status "vault pool recovery failed: $ZFS_CLEAR_RESULT" "fail"
|
||||||
|
fi
|
||||||
|
sleep 5 # Give ZFS time to stabilize
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Start TrueNAS first (it provides storage)
|
||||||
|
if [[ " $STOPPED_VMS " =~ " 100 " ]]; then
|
||||||
|
echo ""
|
||||||
|
echo "Starting TrueNAS (VM 100) first..."
|
||||||
|
ssh_pve "qm start 100" && print_status "TrueNAS started" "ok" || print_status "Failed to start TrueNAS" "fail"
|
||||||
|
echo "Waiting 60s for TrueNAS to boot..."
|
||||||
|
sleep 60
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Start other VMs
|
||||||
|
for VMID in $STOPPED_VMS; do
|
||||||
|
if [[ "$VMID" != "100" ]]; then
|
||||||
|
NAME=$(ssh_pve "qm config $VMID | grep '^name:' | awk '{print \$2}'")
|
||||||
|
echo "Starting VM $VMID ($NAME)..."
|
||||||
|
ssh_pve "qm start $VMID" && print_status "$NAME started" "ok" || print_status "Failed to start $NAME" "fail"
|
||||||
|
sleep 5
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
# Start containers
|
||||||
|
for CTID in $STOPPED_CTS; do
|
||||||
|
NAME=$(ssh_pve "pct config $CTID | grep '^hostname:' | awk '{print \$2}'")
|
||||||
|
echo "Starting CT $CTID ($NAME)..."
|
||||||
|
ssh_pve "pct start $CTID" && print_status "$NAME started" "ok" || print_status "Failed to start $NAME" "fail"
|
||||||
|
sleep 3
|
||||||
|
done
|
||||||
|
|
||||||
|
# Mount TrueNAS shares on Saltbox if Saltbox is running
|
||||||
|
if ssh_pve "qm status 101" 2>/dev/null | grep -q running; then
|
||||||
|
echo ""
|
||||||
|
echo "Checking TrueNAS mounts on Saltbox..."
|
||||||
|
sleep 10 # Give services time to start
|
||||||
|
|
||||||
|
MOUNT_STATUS=$(ssh_pve 'qm guest exec 101 -- bash -c "mount | grep -c Media"' 2>/dev/null | grep -o '"out-data"[^}]*' | grep -o '[0-9]' || echo "0")
|
||||||
|
|
||||||
|
if [[ "$MOUNT_STATUS" == "0" ]]; then
|
||||||
|
echo "Mounting TrueNAS shares..."
|
||||||
|
ssh_pve 'qm guest exec 101 -- bash -c "mount /mnt/local/Media; mount /mnt/local/downloads"' 2>/dev/null
|
||||||
|
print_status "TrueNAS mounts attempted" "ok"
|
||||||
|
|
||||||
|
# Restart Immich
|
||||||
|
echo "Restarting Immich..."
|
||||||
|
ssh_pve 'qm guest exec 101 -- bash -c "docker restart immich"' 2>/dev/null
|
||||||
|
print_status "Immich restarted" "ok"
|
||||||
|
else
|
||||||
|
print_status "TrueNAS mounts already present" "ok"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# ============================================
|
||||||
|
# Summary
|
||||||
|
# ============================================
|
||||||
|
echo ""
|
||||||
|
echo "================================"
|
||||||
|
echo " SUMMARY"
|
||||||
|
echo "================================"
|
||||||
|
|
||||||
|
ISSUES=0
|
||||||
|
|
||||||
|
if [[ -n "$STOPPED_VMS" ]] && ! $FIX_MODE; then
|
||||||
|
echo -e "${YELLOW}Stopped critical VMs:${NC}$STOPPED_VMS"
|
||||||
|
ISSUES=$((ISSUES + 1))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -n "$STOPPED_CTS" ]] && ! $FIX_MODE; then
|
||||||
|
echo -e "${YELLOW}Stopped critical containers:${NC}$STOPPED_CTS"
|
||||||
|
ISSUES=$((ISSUES + 1))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$TRUENAS_ZFS_SUSPENDED" == "true" ]] && ! $FIX_MODE; then
|
||||||
|
echo -e "${RED}TrueNAS ZFS pool SUSPENDED!${NC} SMB mounts will fail."
|
||||||
|
ISSUES=$((ISSUES + 1))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$ISSUES" -eq 0 ]]; then
|
||||||
|
echo -e "${GREEN}All critical services healthy!${NC}"
|
||||||
|
else
|
||||||
|
echo ""
|
||||||
|
echo -e "Run ${YELLOW}./health-check.sh --fix${NC} to attempt recovery"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "Done: $(date '+%Y-%m-%d %H:%M:%S')"
|
||||||
Reference in New Issue
Block a user