From 54a71124aeda8ab2d37521eac5b2279e82173aff Mon Sep 17 00:00:00 2001 From: Hutson Date: Mon, 5 Jan 2026 17:22:52 -0500 Subject: [PATCH] Auto-sync: 20260105-172251 --- CLAUDE.md | 20 +++- TAILSCALE.md | 296 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 313 insertions(+), 3 deletions(-) create mode 100644 TAILSCALE.md diff --git a/CLAUDE.md b/CLAUDE.md index 2cbc24e..620e647 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -11,6 +11,7 @@ This is your **quick reference guide** for common homelab tasks. For detailed in | Task | Documentation | Quick Command | |------|--------------|---------------| | **Gateway issues** | [GATEWAY.md](GATEWAY.md) | `ssh ucg-fiber 'free -m'` | +| **Tailscale/VPN issues** | [TAILSCALE.md](TAILSCALE.md) | `tailscale status` | | **Add new public service** | [TRAEFIK.md](TRAEFIK.md) | Create Traefik config + Cloudflare DNS | | **Check UPS status** | [UPS.md](UPS.md) | `ssh pve 'upsc cyberpower@localhost'` | | **Check server temps** | [Temperature Check](#server-temperature-check) | `ssh pve 'grep Tctl ...'` | @@ -85,6 +86,9 @@ nc -zw1 10.10.10.150 22000 && echo "Windows: UP" || echo "Windows: DOWN" | Symptom | Check | Fix | Docs | |---------|-------|-----|------| | **Network down** | `ssh ucg-fiber 'free -m'` | Check memory, watchdog reboots auto | [GATEWAY.md](GATEWAY.md) | +| **Tailscale DNS not working** | `tailscale status` | Check PVE online, subnet routing | [TAILSCALE.md](TAILSCALE.md) | +| **Subnet unreachable** | `ping 10.10.10.10` | Check `--accept-routes` on local devices | [TAILSCALE.md](TAILSCALE.md) | +| **Relay-only connections** | `tailscale ping ` | Check for VPN conflicts, restart tailscaled | [TAILSCALE.md](TAILSCALE.md) | | Device not syncing | `curl Syncthing API` | Restart Syncthing | [SYNCTHING.md](SYNCTHING.md) | | VM won't start | Storage/RAM available? | `ssh pve 'qm start VMID'` | [VMS.md](VMS.md) | | Server running hot | Check KSM, CPU processes | Disable KSM | [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) | @@ -246,9 +250,10 @@ ssh pve 'qm guest exec VMID -- bash -c "COMMAND"' ### Infrastructure - [README.md](README.md) - Start here - [GATEWAY.md](GATEWAY.md) - UniFi gateway, monitoring services +- [TAILSCALE.md](TAILSCALE.md) - VPN, subnet routing, DNS - [VMS.md](VMS.md) - VM/CT inventory - [STORAGE.md](STORAGE.md) - ZFS pools, shares -- [NETWORK.md](NETWORK.md) - Bridges, VLANs, Tailscale +- [NETWORK.md](NETWORK.md) - Bridges, VLANs, MTU - [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) - Optimizations - [UPS.md](UPS.md) - UPS config, NUT monitoring @@ -310,6 +315,15 @@ git add -A && git commit -m "Update docs" && git push ## Recent Changes +### 2026-01-05 +- Created [TAILSCALE.md](TAILSCALE.md) - comprehensive Tailscale VPN documentation +- **Fixed Tailscale subnet routing issues:** + - Switched primary subnet router from UCG-Fiber to PVE (gateway had relay-only connections) + - Disabled `--accept-routes` on UCG-Fiber and PiHole (devices on subnet must not accept subnet routes) + - Fixed PiHole ProtonVPN from full-tunnel to split-tunnel (DNS-only via fwmark routing) +- **Root cause:** Devices directly on 10.10.10.0/24 with `--accept-routes=true` were routing local traffic through Tailscale mesh instead of local interface +- **Key lesson:** Any device directly connected to an advertised subnet MUST have `--accept-routes=false` + ### 2026-01-03 - Deployed **Crafty Controller 4** on docker-host2 for Minecraft server management - URL: https://mc.htsn.io (Web GUI) @@ -348,8 +362,8 @@ git add -A && git commit -m "Update docs" && git push --- -**Last Updated**: 2026-01-03 -**Documentation Status**: ✅ Phase 1 Complete + Gateway Monitoring + MetaMCP +**Last Updated**: 2026-01-05 +**Documentation Status**: ✅ Phase 1 Complete + Gateway Monitoring + MetaMCP + Tailscale --- diff --git a/TAILSCALE.md b/TAILSCALE.md new file mode 100644 index 0000000..749286a --- /dev/null +++ b/TAILSCALE.md @@ -0,0 +1,296 @@ +# Tailscale VPN Configuration + +## Overview + +Tailscale provides secure remote access to the homelab via a mesh VPN. This document covers the configuration, subnet routing, and critical gotchas learned from troubleshooting. + +--- + +## Network Architecture + +``` +Remote Clients (MacBook, Phone) + │ + ▼ Tailscale Mesh (100.x.x.x) + │ +┌───────┴────────┐ +│ │ +▼ ▼ +PVE (Subnet Router) UCG-Fiber (Gateway) +100.113.177.80 100.94.246.32 + │ │ + │ 10.10.10.0/24 │ + └──────────┬───────────┘ + │ + ┌──────┴──────┐ + │ │ + PiHole TrueNAS + 10.10.10.10 10.10.10.200 +``` + +--- + +## Device Configuration + +| Device | Tailscale IP | Role | Accept Routes | Advertise Routes | +|--------|--------------|------|---------------|------------------| +| **PVE** | 100.113.177.80 | Subnet Router (Primary) | **NO** | 10.10.10.0/24, 10.10.20.0/24 | +| **UCG-Fiber** | 100.94.246.32 | Gateway (backup) | **NO** | (disabled) | +| **PiHole** | 100.112.59.128 | DNS Server | **NO** | None | +| **TrueNAS** | 100.100.94.71 | NAS | Yes | None | +| **Mac-Mini** | 100.108.89.58 | Desktop | Yes | None | +| **MacBook** | 100.88.161.1 | Laptop | Yes | None | +| **Phone** | 100.106.175.37 | Mobile | Yes | None | + +--- + +## Critical Configuration Rules + +### 1. Devices on the Advertised Subnet MUST Have `--accept-routes=false` + +**Problem:** If a device is directly connected to 10.10.10.0/24 AND has `--accept-routes=true`, Tailscale will route local subnet traffic through the mesh instead of the local interface. + +**Symptom:** Device can't reach neighbors on the same subnet; `ip route get 10.10.10.X` shows `dev tailscale0` instead of the local interface. + +**Fix:** +```bash +# On any device directly connected to 10.10.10.0/24 +tailscale set --accept-routes=false +``` + +**Affected devices:** +- UCG-Fiber (gateway) - directly on 10.10.10.0/24 +- PiHole - directly on 10.10.10.0/24 +- PVE - directly on 10.10.10.0/24 (but is the subnet router, so different) + +### 2. Only ONE Device Should Be Primary Subnet Router + +**Problem:** Multiple devices advertising the same subnet can cause routing conflicts or failover issues. + +**Current Setup:** +- **PVE** is the primary subnet router for both 10.10.10.0/24 and 10.10.20.0/24 +- **UCG-Fiber** has subnet advertisement DISABLED (was causing relay-only connections) + +**To change subnet router:** +1. Go to https://login.tailscale.com/admin/machines +2. Disable route on old device, enable on new device +3. Or set primary if both advertise + +### 3. VPNs on Tailscale Devices Can Break Connectivity + +**Problem:** A full-tunnel VPN (like ProtonVPN with `AllowedIPs = 0.0.0.0/0`) will route Tailscale's DERP/STUN traffic through the VPN, breaking NAT traversal. + +**Symptom:** Device shows relay-only connections with asymmetric traffic (high TX, near-zero RX). + +**Fix:** Use split-tunnel configuration that excludes Tailscale traffic. See [PiHole ProtonVPN Configuration](#pihole-protonvpn-split-tunnel) below. + +--- + +## DNS Configuration + +### Tailscale Admin DNS Settings +- **Nameserver:** 10.10.10.10 (PiHole via subnet route) +- **Fallback:** None configured + +### How DNS Works +1. Remote client enables "Use Tailscale DNS" +2. DNS queries go to 10.10.10.10 +3. Traffic routes through PVE (subnet router) to PiHole +4. PiHole resolves via Unbound (recursive) through ProtonVPN + +--- + +## Subnet Routing + +### Current Primary Routes +``` +PVE advertises: + - 10.10.10.0/24 (LAN) + - 10.10.20.0/24 (Storage network) +``` + +### Verifying Routes +```bash +# From MacBook - check who's advertising routes +tailscale status --json | python3 -c " +import sys, json +data = json.load(sys.stdin) +for peer in data.get('Peer', {}).values(): + routes = peer.get('PrimaryRoutes', []) + if routes: + print(f\"{peer.get('HostName')}: {routes}\")" +``` + +### Testing Subnet Connectivity +```bash +# Test from remote client +ping 10.10.10.10 # PiHole +ping 10.10.10.120 # PVE +ping 10.10.10.1 # Gateway +dig @10.10.10.10 google.com # DNS +``` + +--- + +## PiHole ProtonVPN Split-Tunnel + +PiHole runs a WireGuard tunnel to ProtonVPN for encrypted upstream DNS queries. The configuration uses policy-based routing to ONLY route Unbound's DNS traffic through the VPN. + +### Configuration File: `/etc/wireguard/piehole.conf` + +```ini +[Interface] +PrivateKey = +Address = 10.2.0.2/32 +# CRITICAL: Disable automatic routing - we handle it manually +Table = off + +# Policy routing: only route Unbound DNS through VPN +PostUp = ip route add default dev %i table 51820 +PostUp = ip rule add fwmark 0x51820 table 51820 priority 100 +PostUp = iptables -t mangle -N UNBOUND_VPN 2>/dev/null || true +PostUp = iptables -t mangle -F UNBOUND_VPN +PostUp = iptables -t mangle -A UNBOUND_VPN -d 10.0.0.0/8 -j RETURN +PostUp = iptables -t mangle -A UNBOUND_VPN -d 127.0.0.0/8 -j RETURN +PostUp = iptables -t mangle -A UNBOUND_VPN -d 100.64.0.0/10 -j RETURN +PostUp = iptables -t mangle -A UNBOUND_VPN -d 192.168.0.0/16 -j RETURN +PostUp = iptables -t mangle -A UNBOUND_VPN -d 172.16.0.0/12 -j RETURN +PostUp = iptables -t mangle -A UNBOUND_VPN -j MARK --set-mark 0x51820 +PostUp = iptables -t mangle -A OUTPUT -p udp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN +PostUp = iptables -t mangle -A OUTPUT -p tcp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN +PostUp = iptables -t nat -A POSTROUTING -o %i -j MASQUERADE + +PostDown = iptables -t mangle -D OUTPUT -p udp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN +PostDown = iptables -t mangle -D OUTPUT -p tcp --dport 53 -m owner --uid-owner unbound -j UNBOUND_VPN +PostDown = iptables -t mangle -F UNBOUND_VPN +PostDown = iptables -t mangle -X UNBOUND_VPN +PostDown = ip rule del fwmark 0x51820 table 51820 priority 100 +PostDown = ip route del default dev %i table 51820 +PostDown = iptables -t nat -D POSTROUTING -o %i -j MASQUERADE + +[Peer] +PublicKey = +AllowedIPs = 0.0.0.0/0, ::/0 +Endpoint = 149.102.242.1:51820 +PersistentKeepalive = 25 +``` + +**Key Points:** +- `Table = off` prevents wg-quick from adding default routes +- Only traffic from the `unbound` user to port 53 gets marked and routed through VPN +- Local, private, and Tailscale (100.64.0.0/10) traffic is excluded + +--- + +## Troubleshooting + +### Symptom: Can't reach subnet (10.10.10.x) from remote + +**Check 1:** Is PVE online and advertising routes? +```bash +tailscale status | grep pve +# Should show "active" not "offline" +``` + +**Check 2:** Is PVE the primary subnet router? +```bash +tailscale status --json | python3 -c "..." # See above +``` + +**Check 3:** Can PVE reach the target on local network? +```bash +ssh pve 'ping -c 1 10.10.10.10' +``` + +### Symptom: Device shows "relay" with asymmetric traffic (high TX, low RX) + +**Cause:** Usually a VPN or firewall blocking Tailscale's UDP traffic. + +**Check:** Run netcheck on the affected device: +```bash +tailscale netcheck +``` + +Look for: +- Wrong external IP (indicates VPN routing issue) +- Missing DERP latencies +- `MappingVariesByDestIP: true` with no direct connections + +### Symptom: Local devices can't reach each other + +**Cause:** `--accept-routes=true` on a device that's directly on the subnet. + +**Fix:** +```bash +# Check current setting +tailscale debug prefs | grep -i route + +# Disable accept-routes +tailscale set --accept-routes=false +``` + +### Symptom: Gateway can ping Tailscale IPs but not local IPs + +**Check routing:** +```bash +ip route get 10.10.10.120 +# If it shows "dev tailscale0" instead of "dev br0", that's the problem +``` + +**Fix:** `tailscale set --accept-routes=false` on the gateway + +--- + +## Maintenance Commands + +### Restart Tailscale +```bash +# On Linux +systemctl restart tailscaled + +# Check status +tailscale status +``` + +### Re-advertise Routes (PVE) +```bash +tailscale set --advertise-routes=10.10.10.0/24,10.10.20.0/24 +``` + +### Check Connection Type +```bash +# Shows direct vs relay for each peer +tailscale status + +# Detailed ping with path info +tailscale ping +``` + +### Force Re-connection +```bash +tailscale down && tailscale up +``` + +--- + +## Known Issues + +### UCG-Fiber Relay-Only Connections +The UniFi gateway sometimes fails to establish direct Tailscale connections, falling back to relay. This appears related to memory pressure or the gateway's NAT implementation. Current workaround: use PVE as the subnet router instead. + +### Gateway Memory Pressure +The UCG-Fiber has limited RAM (~3GB) and can become unstable under load. The internet-watchdog service will auto-reboot if connectivity is lost. See [GATEWAY.md](GATEWAY.md). + +--- + +## Change History + +### 2026-01-05 +- Switched subnet router from UCG-Fiber to PVE +- Fixed PiHole ProtonVPN from full-tunnel to split-tunnel (DNS-only) +- Disabled `--accept-routes` on UCG-Fiber and PiHole +- Documented critical configuration rules + +--- + +**Last Updated:** 2026-01-05