Auto-sync: 20260105-172251

This commit is contained in:
Hutson
2026-01-05 17:22:52 -05:00
parent eddd98c57f
commit 54a71124ae
2 changed files with 313 additions and 3 deletions

View File

@@ -11,6 +11,7 @@ This is your **quick reference guide** for common homelab tasks. For detailed in
| Task | Documentation | Quick Command |
|------|--------------|---------------|
| **Gateway issues** | [GATEWAY.md](GATEWAY.md) | `ssh ucg-fiber 'free -m'` |
| **Tailscale/VPN issues** | [TAILSCALE.md](TAILSCALE.md) | `tailscale status` |
| **Add new public service** | [TRAEFIK.md](TRAEFIK.md) | Create Traefik config + Cloudflare DNS |
| **Check UPS status** | [UPS.md](UPS.md) | `ssh pve 'upsc cyberpower@localhost'` |
| **Check server temps** | [Temperature Check](#server-temperature-check) | `ssh pve 'grep Tctl ...'` |
@@ -85,6 +86,9 @@ nc -zw1 10.10.10.150 22000 && echo "Windows: UP" || echo "Windows: DOWN"
| Symptom | Check | Fix | Docs |
|---------|-------|-----|------|
| **Network down** | `ssh ucg-fiber 'free -m'` | Check memory, watchdog reboots auto | [GATEWAY.md](GATEWAY.md) |
| **Tailscale DNS not working** | `tailscale status` | Check PVE online, subnet routing | [TAILSCALE.md](TAILSCALE.md) |
| **Subnet unreachable** | `ping 10.10.10.10` | Check `--accept-routes` on local devices | [TAILSCALE.md](TAILSCALE.md) |
| **Relay-only connections** | `tailscale ping <ip>` | Check for VPN conflicts, restart tailscaled | [TAILSCALE.md](TAILSCALE.md) |
| Device not syncing | `curl Syncthing API` | Restart Syncthing | [SYNCTHING.md](SYNCTHING.md) |
| VM won't start | Storage/RAM available? | `ssh pve 'qm start VMID'` | [VMS.md](VMS.md) |
| Server running hot | Check KSM, CPU processes | Disable KSM | [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) |
@@ -246,9 +250,10 @@ ssh pve 'qm guest exec VMID -- bash -c "COMMAND"'
### Infrastructure
- [README.md](README.md) - Start here
- [GATEWAY.md](GATEWAY.md) - UniFi gateway, monitoring services
- [TAILSCALE.md](TAILSCALE.md) - VPN, subnet routing, DNS
- [VMS.md](VMS.md) - VM/CT inventory
- [STORAGE.md](STORAGE.md) - ZFS pools, shares
- [NETWORK.md](NETWORK.md) - Bridges, VLANs, Tailscale
- [NETWORK.md](NETWORK.md) - Bridges, VLANs, MTU
- [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) - Optimizations
- [UPS.md](UPS.md) - UPS config, NUT monitoring
@@ -310,6 +315,15 @@ git add -A && git commit -m "Update docs" && git push
## Recent Changes
### 2026-01-05
- Created [TAILSCALE.md](TAILSCALE.md) - comprehensive Tailscale VPN documentation
- **Fixed Tailscale subnet routing issues:**
- Switched primary subnet router from UCG-Fiber to PVE (gateway had relay-only connections)
- Disabled `--accept-routes` on UCG-Fiber and PiHole (devices on subnet must not accept subnet routes)
- Fixed PiHole ProtonVPN from full-tunnel to split-tunnel (DNS-only via fwmark routing)
- **Root cause:** Devices directly on 10.10.10.0/24 with `--accept-routes=true` were routing local traffic through Tailscale mesh instead of local interface
- **Key lesson:** Any device directly connected to an advertised subnet MUST have `--accept-routes=false`
### 2026-01-03
- Deployed **Crafty Controller 4** on docker-host2 for Minecraft server management
- URL: https://mc.htsn.io (Web GUI)
@@ -348,8 +362,8 @@ git add -A && git commit -m "Update docs" && git push
---
**Last Updated**: 2026-01-03
**Documentation Status**: ✅ Phase 1 Complete + Gateway Monitoring + MetaMCP
**Last Updated**: 2026-01-05
**Documentation Status**: ✅ Phase 1 Complete + Gateway Monitoring + MetaMCP + Tailscale
---

296
TAILSCALE.md Normal file
View File

@@ -0,0 +1,296 @@
# Tailscale VPN Configuration
## Overview
Tailscale provides secure remote access to the homelab via a mesh VPN. This document covers the configuration, subnet routing, and critical gotchas learned from troubleshooting.
---
## Network Architecture
```
Remote Clients (MacBook, Phone)
▼ Tailscale Mesh (100.x.x.x)
┌───────┴────────┐
│ │
▼ ▼
PVE (Subnet Router) UCG-Fiber (Gateway)
100.113.177.80 100.94.246.32
│ │
│ 10.10.10.0/24 │
└──────────┬───────────┘
┌──────┴──────┐
│ │
PiHole TrueNAS
10.10.10.10 10.10.10.200
```
---
## Device Configuration
| Device | Tailscale IP | Role | Accept Routes | Advertise Routes |
|--------|--------------|------|---------------|------------------|
| **PVE** | 100.113.177.80 | Subnet Router (Primary) | **NO** | 10.10.10.0/24, 10.10.20.0/24 |
| **UCG-Fiber** | 100.94.246.32 | Gateway (backup) | **NO** | (disabled) |
| **PiHole** | 100.112.59.128 | DNS Server | **NO** | None |
| **TrueNAS** | 100.100.94.71 | NAS | Yes | None |
| **Mac-Mini** | 100.108.89.58 | Desktop | Yes | None |
| **MacBook** | 100.88.161.1 | Laptop | Yes | None |
| **Phone** | 100.106.175.37 | Mobile | Yes | None |
---
## Critical Configuration Rules
### 1. Devices on the Advertised Subnet MUST Have `--accept-routes=false`
**Problem:** If a device is directly connected to 10.10.10.0/24 AND has `--accept-routes=true`, Tailscale will route local subnet traffic through the mesh instead of the local interface.
**Symptom:** Device can't reach neighbors on the same subnet; `ip route get 10.10.10.X` shows `dev tailscale0` instead of the local interface.
**Fix:**
```bash
# On any device directly connected to 10.10.10.0/24
tailscale set --accept-routes=false
```
**Affected devices:**
- UCG-Fiber (gateway) - directly on 10.10.10.0/24
- PiHole - directly on 10.10.10.0/24
- PVE - directly on 10.10.10.0/24 (but is the subnet router, so different)
### 2. Only ONE Device Should Be Primary Subnet Router
**Problem:** Multiple devices advertising the same subnet can cause routing conflicts or failover issues.
**Current Setup:**
- **PVE** is the primary subnet router for both 10.10.10.0/24 and 10.10.20.0/24
- **UCG-Fiber** has subnet advertisement DISABLED (was causing relay-only connections)
**To change subnet router:**
1. Go to https://login.tailscale.com/admin/machines
2. Disable route on old device, enable on new device
3. Or set primary if both advertise
### 3. VPNs on Tailscale Devices Can Break Connectivity
**Problem:** A full-tunnel VPN (like ProtonVPN with `AllowedIPs = 0.0.0.0/0`) will route Tailscale's DERP/STUN traffic through the VPN, breaking NAT traversal.
**Symptom:** Device shows relay-only connections with asymmetric traffic (high TX, near-zero RX).
**Fix:** Use split-tunnel configuration that excludes Tailscale traffic. See [PiHole ProtonVPN Configuration](#pihole-protonvpn-split-tunnel) below.
---
## DNS Configuration
### Tailscale Admin DNS Settings
- **Nameserver:** 10.10.10.10 (PiHole via subnet route)
- **Fallback:** None configured
### How DNS Works
1. Remote client enables "Use Tailscale DNS"
2. DNS queries go to 10.10.10.10
3. Traffic routes through PVE (subnet router) to PiHole
4. PiHole resolves via Unbound (recursive) through ProtonVPN
---
## Subnet Routing
### Current Primary Routes
```
PVE advertises:
- 10.10.10.0/24 (LAN)
- 10.10.20.0/24 (Storage network)
```
### Verifying Routes
```bash
# From MacBook - check who's advertising routes
tailscale status --json | python3 -c "
import sys, json
data = json.load(sys.stdin)
for peer in data.get('Peer', {}).values():
routes = peer.get('PrimaryRoutes', [])
if routes:
print(f\"{peer.get('HostName')}: {routes}\")"
```
### Testing Subnet Connectivity
```bash
# Test from remote client
ping 10.10.10.10 # PiHole
ping 10.10.10.120 # PVE
ping 10.10.10.1 # Gateway
dig @10.10.10.10 google.com # DNS
```
---
## PiHole ProtonVPN Split-Tunnel
PiHole runs a WireGuard tunnel to ProtonVPN for encrypted upstream DNS queries. The configuration uses policy-based routing to ONLY route Unbound's DNS traffic through the VPN.
### Configuration File: `/etc/wireguard/piehole.conf`
```ini
[Interface]
PrivateKey = <key>
Address = 10.2.0.2/32
# CRITICAL: Disable automatic routing - we handle it manually
Table = off
# Policy routing: only route Unbound DNS through VPN
PostUp = ip route add default dev %i table 51820
PostUp = ip rule add fwmark 0x51820 table 51820 priority 100
PostUp = iptables -t mangle -N UNBOUND_VPN 2>/dev/null || true
PostUp = iptables -t mangle -F UNBOUND_VPN
PostUp = iptables -t mangle -A UNBOUND_VPN -d 10.0.0.0/8 -j RETURN
PostUp = iptables -t mangle -A UNBOUND_VPN -d 127.0.0.0/8 -j RETURN
PostUp = iptables -t mangle -A UNBOUND_VPN -d 100.64.0.0/10 -j RETURN
PostUp = iptables -t mangle -A UNBOUND_VPN -d 192.168.0.0/16 -j RETURN
PostUp = iptables -t mangle -A UNBOUND_VPN -d 172.16.0.0/12 -j RETURN
PostUp = iptables -t mangle -A UNBOUND_VPN -j MARK --set-mark 0x51820
PostUp = iptables -t mangle -A OUTPUT -p udp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN
PostUp = iptables -t mangle -A OUTPUT -p tcp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN
PostUp = iptables -t nat -A POSTROUTING -o %i -j MASQUERADE
PostDown = iptables -t mangle -D OUTPUT -p udp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN
PostDown = iptables -t mangle -D OUTPUT -p tcp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN
PostDown = iptables -t mangle -F UNBOUND_VPN
PostDown = iptables -t mangle -X UNBOUND_VPN
PostDown = ip rule del fwmark 0x51820 table 51820 priority 100
PostDown = ip route del default dev %i table 51820
PostDown = iptables -t nat -D POSTROUTING -o %i -j MASQUERADE
[Peer]
PublicKey = <ProtonVPN-key>
AllowedIPs = 0.0.0.0/0, ::/0
Endpoint = 149.102.242.1:51820
PersistentKeepalive = 25
```
**Key Points:**
- `Table = off` prevents wg-quick from adding default routes
- Only traffic from the `unbound` user to port 53 gets marked and routed through VPN
- Local, private, and Tailscale (100.64.0.0/10) traffic is excluded
---
## Troubleshooting
### Symptom: Can't reach subnet (10.10.10.x) from remote
**Check 1:** Is PVE online and advertising routes?
```bash
tailscale status | grep pve
# Should show "active" not "offline"
```
**Check 2:** Is PVE the primary subnet router?
```bash
tailscale status --json | python3 -c "..." # See above
```
**Check 3:** Can PVE reach the target on local network?
```bash
ssh pve 'ping -c 1 10.10.10.10'
```
### Symptom: Device shows "relay" with asymmetric traffic (high TX, low RX)
**Cause:** Usually a VPN or firewall blocking Tailscale's UDP traffic.
**Check:** Run netcheck on the affected device:
```bash
tailscale netcheck
```
Look for:
- Wrong external IP (indicates VPN routing issue)
- Missing DERP latencies
- `MappingVariesByDestIP: true` with no direct connections
### Symptom: Local devices can't reach each other
**Cause:** `--accept-routes=true` on a device that's directly on the subnet.
**Fix:**
```bash
# Check current setting
tailscale debug prefs | grep -i route
# Disable accept-routes
tailscale set --accept-routes=false
```
### Symptom: Gateway can ping Tailscale IPs but not local IPs
**Check routing:**
```bash
ip route get 10.10.10.120
# If it shows "dev tailscale0" instead of "dev br0", that's the problem
```
**Fix:** `tailscale set --accept-routes=false` on the gateway
---
## Maintenance Commands
### Restart Tailscale
```bash
# On Linux
systemctl restart tailscaled
# Check status
tailscale status
```
### Re-advertise Routes (PVE)
```bash
tailscale set --advertise-routes=10.10.10.0/24,10.10.20.0/24
```
### Check Connection Type
```bash
# Shows direct vs relay for each peer
tailscale status
# Detailed ping with path info
tailscale ping <tailscale-ip>
```
### Force Re-connection
```bash
tailscale down && tailscale up
```
---
## Known Issues
### UCG-Fiber Relay-Only Connections
The UniFi gateway sometimes fails to establish direct Tailscale connections, falling back to relay. This appears related to memory pressure or the gateway's NAT implementation. Current workaround: use PVE as the subnet router instead.
### Gateway Memory Pressure
The UCG-Fiber has limited RAM (~3GB) and can become unstable under load. The internet-watchdog service will auto-reboot if connectivity is lost. See [GATEWAY.md](GATEWAY.md).
---
## Change History
### 2026-01-05
- Switched subnet router from UCG-Fiber to PVE
- Fixed PiHole ProtonVPN from full-tunnel to split-tunnel (DNS-only)
- Disabled `--accept-routes` on UCG-Fiber and PiHole
- Documented critical configuration rules
---
**Last Updated:** 2026-01-05