Auto-sync: 20260105-122831
This commit is contained in:
@@ -6,17 +6,18 @@ Documentation for system monitoring, health checks, and alerting across the home
|
||||
|
||||
| Component | Monitored? | Method | Alerts | Notes |
|
||||
|-----------|------------|--------|--------|-------|
|
||||
| **Gateway** | ✅ Yes | Custom services | ✅ Auto-reboot | Internet watchdog + memory monitor |
|
||||
| **UPS** | ✅ Yes | NUT + Home Assistant | ❌ No | Battery, load, runtime tracked |
|
||||
| **Syncthing** | ✅ Partial | API (manual checks) | ❌ No | Connection status available |
|
||||
| **Server temps** | ✅ Partial | Manual checks | ❌ No | Via `sensors` command |
|
||||
| **VM status** | ✅ Partial | Proxmox UI | ❌ No | Manual monitoring |
|
||||
| **ZFS health** | ❌ No | Manual `zpool status` | ❌ No | No automated checks |
|
||||
| **Disk health (SMART)** | ❌ No | Manual `smartctl` | ❌ No | No automated checks |
|
||||
| **Network** | ❌ No | - | ❌ No | No uptime monitoring |
|
||||
| **Network** | ✅ Partial | Gateway watchdog | ✅ Auto-reboot | Connectivity check every 60s |
|
||||
| **Services** | ❌ No | - | ❌ No | No health checks |
|
||||
| **Backups** | ❌ No | - | ❌ No | No verification |
|
||||
|
||||
**Overall Status**: ⚠️ **MINIMAL** - Most monitoring is manual, no automated alerts
|
||||
**Overall Status**: ⚠️ **PARTIAL** - Gateway monitoring active, most else is manual
|
||||
|
||||
---
|
||||
|
||||
@@ -51,6 +52,41 @@ ssh pve 'upsc cyberpower@localhost | grep -E "battery.charge:|battery.runtime:|u
|
||||
|
||||
---
|
||||
|
||||
### Gateway Monitoring
|
||||
|
||||
**Status**: ✅ **Active with auto-recovery**
|
||||
|
||||
Two custom systemd services monitor the UCG-Fiber gateway (10.10.10.1):
|
||||
|
||||
**1. Internet Watchdog** (`internet-watchdog.service`)
|
||||
- Pings external DNS (1.1.1.1, 8.8.8.8, 208.67.222.222) every 60 seconds
|
||||
- Auto-reboots gateway after 5 consecutive failures (~5 minutes)
|
||||
- Logs to `/var/log/internet-watchdog.log`
|
||||
|
||||
**2. Memory Monitor** (`memory-monitor.service`)
|
||||
- Logs memory usage and top processes every 10 minutes
|
||||
- Logs to `/data/logs/memory-history.log`
|
||||
- Auto-rotates when log exceeds 10MB
|
||||
|
||||
**Quick Commands**:
|
||||
```bash
|
||||
# Check service status
|
||||
ssh ucg-fiber 'systemctl status internet-watchdog memory-monitor'
|
||||
|
||||
# View watchdog activity
|
||||
ssh ucg-fiber 'tail -20 /var/log/internet-watchdog.log'
|
||||
|
||||
# View memory history
|
||||
ssh ucg-fiber 'tail -100 /data/logs/memory-history.log'
|
||||
|
||||
# Current memory usage
|
||||
ssh ucg-fiber 'free -m && ps -eo pid,rss,comm --sort=-rss | head -12'
|
||||
```
|
||||
|
||||
**See**: [GATEWAY.md](GATEWAY.md)
|
||||
|
||||
---
|
||||
|
||||
### Syncthing Monitoring
|
||||
|
||||
**Status**: ⚠️ **Partial** - API available, no automated monitoring
|
||||
@@ -534,6 +570,7 @@ done'
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [GATEWAY.md](GATEWAY.md) - Gateway monitoring and troubleshooting
|
||||
- [UPS.md](UPS.md) - UPS monitoring details
|
||||
- [STORAGE.md](STORAGE.md) - ZFS health checks
|
||||
- [SERVICES.md](SERVICES.md) - Service inventory
|
||||
@@ -542,5 +579,5 @@ done'
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-22
|
||||
**Status**: ⚠️ **Minimal monitoring currently in place - implementation needed**
|
||||
**Last Updated**: 2026-01-02
|
||||
**Status**: ⚠️ **Partial monitoring - Gateway active, other systems need implementation**
|
||||
|
||||
Reference in New Issue
Block a user