Auto-sync: 20260105-122831
This commit is contained in:
339
GATEWAY.md
Normal file
339
GATEWAY.md
Normal file
@@ -0,0 +1,339 @@
|
||||
# UniFi Gateway (UCG-Fiber)
|
||||
|
||||
Documentation for the UniFi Cloud Gateway Fiber (10.10.10.1) - the primary network gateway and router.
|
||||
|
||||
## Overview
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Device** | UniFi Cloud Gateway Fiber (UCG-Fiber) |
|
||||
| **IP Address** | 10.10.10.1 |
|
||||
| **SSH User** | root |
|
||||
| **SSH Auth** | SSH key (`~/.ssh/id_ed25519`) |
|
||||
| **Host Aliases** | `ucg-fiber`, `gateway` |
|
||||
| **Firmware** | v4.4.9 (as of 2026-01-02) |
|
||||
| **UniFi Core** | 4.4.19 |
|
||||
| **RAM** | 2.9 GB (shared with UniFi apps) |
|
||||
|
||||
---
|
||||
|
||||
## SSH Access
|
||||
|
||||
SSH key authentication is configured. Use host aliases:
|
||||
|
||||
```bash
|
||||
# Quick access
|
||||
ssh ucg-fiber 'hostname'
|
||||
ssh gateway 'free -m'
|
||||
|
||||
# Or use IP directly
|
||||
ssh root@10.10.10.1 'uptime'
|
||||
```
|
||||
|
||||
**Note**: SSH key may need re-deployment after firmware updates if UniFi clears authorized_keys.
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Services
|
||||
|
||||
Two custom monitoring services run on the gateway to prevent and diagnose issues.
|
||||
|
||||
### Internet Watchdog Service
|
||||
|
||||
**Purpose**: Auto-reboots gateway if internet connectivity is lost for 5+ minutes
|
||||
|
||||
**Location**: `/data/scripts/internet-watchdog.sh`
|
||||
|
||||
**How it works**:
|
||||
1. Pings 1.1.1.1, 8.8.8.8, 208.67.222.222 every 60 seconds
|
||||
2. If all three fail, increments failure counter
|
||||
3. After 5 consecutive failures (~5 minutes), triggers reboot
|
||||
4. Logs all activity to `/var/log/internet-watchdog.log`
|
||||
|
||||
**Commands**:
|
||||
```bash
|
||||
# Check service status
|
||||
ssh ucg-fiber 'systemctl status internet-watchdog'
|
||||
|
||||
# View recent logs
|
||||
ssh ucg-fiber 'tail -50 /var/log/internet-watchdog.log'
|
||||
|
||||
# Stop temporarily (if troubleshooting)
|
||||
ssh ucg-fiber 'systemctl stop internet-watchdog'
|
||||
|
||||
# Restart
|
||||
ssh ucg-fiber 'systemctl restart internet-watchdog'
|
||||
```
|
||||
|
||||
**Log Format**:
|
||||
```
|
||||
2026-01-02 22:45:01 - Watchdog started
|
||||
2026-01-02 22:46:01 - Internet check failed (1/5)
|
||||
2026-01-02 22:47:01 - Internet restored after 1 failures
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Memory Monitor Service
|
||||
|
||||
**Purpose**: Logs memory usage and top processes every 10 minutes for diagnostics
|
||||
|
||||
**Location**: `/data/scripts/memory-monitor.sh`
|
||||
|
||||
**Log File**: `/data/logs/memory-history.log`
|
||||
|
||||
**How it works**:
|
||||
1. Every 10 minutes, logs current memory usage (`free -m`)
|
||||
2. Logs top 12 memory-consuming processes
|
||||
3. Auto-rotates log when it exceeds 10MB (keeps one .old file)
|
||||
|
||||
**Commands**:
|
||||
```bash
|
||||
# Check service status
|
||||
ssh ucg-fiber 'systemctl status memory-monitor'
|
||||
|
||||
# View recent memory history
|
||||
ssh ucg-fiber 'tail -100 /data/logs/memory-history.log'
|
||||
|
||||
# Check current memory usage
|
||||
ssh ucg-fiber 'free -m'
|
||||
|
||||
# See top memory consumers right now
|
||||
ssh ucg-fiber 'ps -eo pid,rss,comm --sort=-rss | head -12'
|
||||
```
|
||||
|
||||
**Log Format**:
|
||||
```
|
||||
========== 2026-01-02 22:30:00 ==========
|
||||
--- MEMORY ---
|
||||
total used free shared buff/cache available
|
||||
Mem: 2892 1890 102 456 899 1002
|
||||
Swap: 512 88 424
|
||||
--- TOP MEMORY PROCESSES ---
|
||||
PID RSS COMMAND
|
||||
1234 327456 unifi-protect
|
||||
2345 252108 mongod
|
||||
3456 236544 java
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Known Memory Consumers
|
||||
|
||||
| Process | Typical Memory | Purpose |
|
||||
|---------|----------------|---------|
|
||||
| unifi-protect | ~320 MB | Camera/NVR management |
|
||||
| mongod | ~250 MB | UniFi configuration database |
|
||||
| java (controller) | ~230 MB | UniFi Network controller |
|
||||
| postgres | ~180 MB | PostgreSQL database |
|
||||
| unifi-core | ~150 MB | UniFi OS core |
|
||||
| tailscaled | ~80 MB | Tailscale VPN |
|
||||
|
||||
**Total available**: ~2.9 GB
|
||||
**Typical usage**: ~1.8-2.0 GB (leaves ~1 GB free)
|
||||
**Warning threshold**: <500 MB free
|
||||
**Critical**: <200 MB free or swap >50% used
|
||||
|
||||
---
|
||||
|
||||
## Disabled Services
|
||||
|
||||
The following services were disabled to reduce memory usage:
|
||||
|
||||
| Service | Memory Saved | Reason Disabled |
|
||||
|---------|--------------|-----------------|
|
||||
| UniFi Connect | ~200 MB | Not needed (cameras use Protect) |
|
||||
|
||||
To re-enable if needed:
|
||||
```bash
|
||||
ssh ucg-fiber 'systemctl enable unifi-connect && systemctl start unifi-connect'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Gateway Freeze / Network Loss
|
||||
|
||||
**Symptoms**:
|
||||
- All devices lose internet
|
||||
- Cannot ping 10.10.10.1
|
||||
- Physical reboot required
|
||||
|
||||
**Root Cause**: Memory exhaustion causing soft lockup
|
||||
|
||||
**Prevention**:
|
||||
1. Internet watchdog auto-reboots after 5 min outage
|
||||
2. Memory monitor logs help identify runaway processes
|
||||
3. UniFi Connect disabled to free ~200 MB
|
||||
|
||||
**Post-Incident Analysis**:
|
||||
```bash
|
||||
# Check memory history for spike before freeze
|
||||
ssh ucg-fiber 'grep -B5 "Swap:" /data/logs/memory-history.log | tail -50'
|
||||
|
||||
# Check watchdog logs
|
||||
ssh ucg-fiber 'cat /var/log/internet-watchdog.log'
|
||||
|
||||
# Check system logs for errors
|
||||
ssh ucg-fiber 'dmesg | tail -100'
|
||||
ssh ucg-fiber 'journalctl -p err --since "1 hour ago"'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### High Memory Usage
|
||||
|
||||
**Check current state**:
|
||||
```bash
|
||||
ssh ucg-fiber 'free -m && echo "---" && ps -eo pid,rss,comm --sort=-rss | head -15'
|
||||
```
|
||||
|
||||
**If swap is heavily used**:
|
||||
```bash
|
||||
# Check swap usage
|
||||
ssh ucg-fiber 'cat /proc/swaps'
|
||||
|
||||
# See what's in swap
|
||||
ssh ucg-fiber 'for pid in $(ls /proc | grep -E "^[0-9]+$"); do
|
||||
swap=$(grep VmSwap /proc/$pid/status 2>/dev/null | awk "{print \$2}");
|
||||
[ "$swap" -gt 10000 ] 2>/dev/null && echo "$pid: ${swap}kB - $(cat /proc/$pid/comm)";
|
||||
done | sort -t: -k2 -rn | head -10'
|
||||
```
|
||||
|
||||
**Consider reboot if**:
|
||||
- Available memory <200 MB
|
||||
- Swap usage >300 MB
|
||||
- System becoming unresponsive
|
||||
|
||||
---
|
||||
|
||||
### Tailscale Issues
|
||||
|
||||
**Check Tailscale status**:
|
||||
```bash
|
||||
ssh ucg-fiber 'tailscale status'
|
||||
```
|
||||
|
||||
**Common errors and fixes**:
|
||||
|
||||
| Error | Fix |
|
||||
|-------|-----|
|
||||
| `DNS resolution failed` | Check upstream DNS (Pi-hole at 10.10.10.10) |
|
||||
| `TLS handshake failed` | Usually temporary; Tailscale auto-reconnects |
|
||||
| `Not connected` | `ssh ucg-fiber 'tailscale up'` |
|
||||
|
||||
---
|
||||
|
||||
## Firmware Updates
|
||||
|
||||
**Check current version**:
|
||||
```bash
|
||||
ssh ucg-fiber 'ubnt-systool version'
|
||||
```
|
||||
|
||||
**Update process**:
|
||||
1. Check UniFi site for latest stable firmware
|
||||
2. Download via UI or CLI
|
||||
3. Schedule update during low-usage time
|
||||
|
||||
**After update**:
|
||||
- Verify SSH key still works
|
||||
- Check custom services still running
|
||||
- Verify Tailscale reconnects
|
||||
|
||||
**Re-deploy SSH key if needed**:
|
||||
```bash
|
||||
ssh-copy-id -i ~/.ssh/id_ed25519 root@10.10.10.1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Service Locations
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `/data/scripts/internet-watchdog.sh` | Watchdog script |
|
||||
| `/data/scripts/memory-monitor.sh` | Memory monitor script |
|
||||
| `/etc/systemd/system/internet-watchdog.service` | Watchdog systemd unit |
|
||||
| `/etc/systemd/system/memory-monitor.service` | Memory monitor systemd unit |
|
||||
| `/var/log/internet-watchdog.log` | Watchdog log |
|
||||
| `/data/logs/memory-history.log` | Memory history log |
|
||||
|
||||
**Note**: `/data/` persists across firmware updates. `/var/log/` may not.
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Commands
|
||||
|
||||
```bash
|
||||
# System status
|
||||
ssh ucg-fiber 'uptime && free -m'
|
||||
|
||||
# Check both monitoring services
|
||||
ssh ucg-fiber 'systemctl status internet-watchdog memory-monitor'
|
||||
|
||||
# Memory history (last hour)
|
||||
ssh ucg-fiber 'tail -60 /data/logs/memory-history.log'
|
||||
|
||||
# Watchdog activity
|
||||
ssh ucg-fiber 'tail -20 /var/log/internet-watchdog.log'
|
||||
|
||||
# Network devices (ARP table)
|
||||
ssh ucg-fiber 'cat /proc/net/arp'
|
||||
|
||||
# Tailscale status
|
||||
ssh ucg-fiber 'tailscale status'
|
||||
|
||||
# System logs
|
||||
ssh ucg-fiber 'journalctl -p warning --since "1 hour ago" | head -50'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Backup Considerations
|
||||
|
||||
Custom services in `/data/scripts/` persist across firmware updates but may need:
|
||||
- Systemd services re-enabled after major updates
|
||||
- Script permissions re-applied if wiped
|
||||
|
||||
**Backup critical files**:
|
||||
```bash
|
||||
# Copy scripts locally for reference
|
||||
scp ucg-fiber:/data/scripts/*.sh ~/Projects/homelab/data/scripts/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [SSH-ACCESS.md](SSH-ACCESS.md) - SSH configuration and host aliases
|
||||
- [NETWORK.md](NETWORK.md) - Network architecture
|
||||
- [MONITORING.md](MONITORING.md) - Overall monitoring strategy
|
||||
- [HOMEASSISTANT.md](HOMEASSISTANT.md) - Home Assistant integration
|
||||
|
||||
---
|
||||
|
||||
## Incident History
|
||||
|
||||
### 2025-12-27 to 2025-12-29: Gateway Freeze
|
||||
|
||||
**Timeline**:
|
||||
- Dec 7: Firmware update to v4.4.9
|
||||
- Dec 24: Last healthy system logs
|
||||
- Dec 27-29: "No internet detected" errors in logs
|
||||
- Dec 29+: Complete silence (gateway frozen)
|
||||
- Jan 2: Physical reboot restored access
|
||||
|
||||
**Root Cause**: Memory exhaustion causing soft lockup (no crash dump saved)
|
||||
|
||||
**Resolution**:
|
||||
- Deployed internet-watchdog service
|
||||
- Deployed memory-monitor service
|
||||
- Disabled UniFi Connect (~200 MB saved)
|
||||
- Configured SSH key auth
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2026-01-02
|
||||
Reference in New Issue
Block a user