Auto-sync: 20260105-122831

This commit is contained in:
Hutson
2026-01-05 12:28:33 -05:00
parent 56b82df497
commit eddd98c57f
17 changed files with 1770 additions and 27 deletions

View File

@@ -6,17 +6,18 @@ Documentation for system monitoring, health checks, and alerting across the home
| Component | Monitored? | Method | Alerts | Notes |
|-----------|------------|--------|--------|-------|
| **Gateway** | ✅ Yes | Custom services | ✅ Auto-reboot | Internet watchdog + memory monitor |
| **UPS** | ✅ Yes | NUT + Home Assistant | ❌ No | Battery, load, runtime tracked |
| **Syncthing** | ✅ Partial | API (manual checks) | ❌ No | Connection status available |
| **Server temps** | ✅ Partial | Manual checks | ❌ No | Via `sensors` command |
| **VM status** | ✅ Partial | Proxmox UI | ❌ No | Manual monitoring |
| **ZFS health** | ❌ No | Manual `zpool status` | ❌ No | No automated checks |
| **Disk health (SMART)** | ❌ No | Manual `smartctl` | ❌ No | No automated checks |
| **Network** | ❌ No | - | ❌ No | No uptime monitoring |
| **Network** | ✅ Partial | Gateway watchdog | ✅ Auto-reboot | Connectivity check every 60s |
| **Services** | ❌ No | - | ❌ No | No health checks |
| **Backups** | ❌ No | - | ❌ No | No verification |
**Overall Status**: ⚠️ **MINIMAL** - Most monitoring is manual, no automated alerts
**Overall Status**: ⚠️ **PARTIAL** - Gateway monitoring active, most else is manual
---
@@ -51,6 +52,41 @@ ssh pve 'upsc cyberpower@localhost | grep -E "battery.charge:|battery.runtime:|u
---
### Gateway Monitoring
**Status**: ✅ **Active with auto-recovery**
Two custom systemd services monitor the UCG-Fiber gateway (10.10.10.1):
**1. Internet Watchdog** (`internet-watchdog.service`)
- Pings external DNS (1.1.1.1, 8.8.8.8, 208.67.222.222) every 60 seconds
- Auto-reboots gateway after 5 consecutive failures (~5 minutes)
- Logs to `/var/log/internet-watchdog.log`
**2. Memory Monitor** (`memory-monitor.service`)
- Logs memory usage and top processes every 10 minutes
- Logs to `/data/logs/memory-history.log`
- Auto-rotates when log exceeds 10MB
**Quick Commands**:
```bash
# Check service status
ssh ucg-fiber 'systemctl status internet-watchdog memory-monitor'
# View watchdog activity
ssh ucg-fiber 'tail -20 /var/log/internet-watchdog.log'
# View memory history
ssh ucg-fiber 'tail -100 /data/logs/memory-history.log'
# Current memory usage
ssh ucg-fiber 'free -m && ps -eo pid,rss,comm --sort=-rss | head -12'
```
**See**: [GATEWAY.md](GATEWAY.md)
---
### Syncthing Monitoring
**Status**: ⚠️ **Partial** - API available, no automated monitoring
@@ -534,6 +570,7 @@ done'
## Related Documentation
- [GATEWAY.md](GATEWAY.md) - Gateway monitoring and troubleshooting
- [UPS.md](UPS.md) - UPS monitoring details
- [STORAGE.md](STORAGE.md) - ZFS health checks
- [SERVICES.md](SERVICES.md) - Service inventory
@@ -542,5 +579,5 @@ done'
---
**Last Updated**: 2025-12-22
**Status**: ⚠️ **Minimal monitoring currently in place - implementation needed**
**Last Updated**: 2026-01-02
**Status**: ⚠️ **Partial monitoring - Gateway active, other systems need implementation**