Complete Phase 2 documentation: Add HARDWARE, SERVICES, MONITORING, MAINTENANCE
Phase 2 documentation implementation: - Created HARDWARE.md: Complete hardware inventory (servers, GPUs, storage, network cards) - Created SERVICES.md: Service inventory with URLs, credentials, health checks (25+ services) - Created MONITORING.md: Health monitoring recommendations, alert setup, implementation plan - Created MAINTENANCE.md: Regular procedures, update schedules, testing checklists - Updated README.md: Added all Phase 2 documentation links - Updated CLAUDE.md: Cleaned up to quick reference only (1340→377 lines) All detailed content now in specialized documentation files with cross-references. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
605
UPS.md
Normal file
605
UPS.md
Normal file
@@ -0,0 +1,605 @@
|
||||
# UPS and Power Management
|
||||
|
||||
Documentation for UPS (Uninterruptible Power Supply) configuration, NUT (Network UPS Tools) monitoring, and power failure procedures.
|
||||
|
||||
## Hardware
|
||||
|
||||
### Current UPS
|
||||
|
||||
| Specification | Value |
|
||||
|---------------|-------|
|
||||
| **Model** | CyberPower OR2200PFCRT2U |
|
||||
| **Capacity** | 2200VA / 1320W |
|
||||
| **Form Factor** | 2U rackmount |
|
||||
| **Output** | PFC Sinewave (compatible with active PFC PSUs) |
|
||||
| **Outlets** | 2x NEMA 5-20R + 6x NEMA 5-15R (all battery + surge) |
|
||||
| **Input Plug** | ⚠️ Originally NEMA 5-20P (20A), **rewired to 5-15P (15A)** |
|
||||
| **Runtime** | ~15-20 min at typical load (~33% / 440W) |
|
||||
| **Installed** | 2025-12-21 |
|
||||
| **Status** | Active |
|
||||
|
||||
### ⚠️ Temporary Wiring Modification
|
||||
|
||||
**Issue**: UPS came with NEMA 5-20P plug (20A) but server rack is on 15A circuit
|
||||
**Solution**: Temporarily rewired plug from 5-20P → 5-15P for compatibility
|
||||
**Risk**: UPS can output 1320W but circuit limited to 1800W max (15A × 120V)
|
||||
**Current draw**: ~1000-1350W total (safe margin)
|
||||
**Backlog**: Upgrade to 20A circuit, restore original 5-20P plug
|
||||
|
||||
### Previous UPS
|
||||
|
||||
| Model | Capacity | Issue | Replaced |
|
||||
|-------|----------|-------|----------|
|
||||
| WattBox WB-1100-IPVMB-6 | 1100VA / 660W | Insufficient for dual Threadripper setup | 2025-12-21 |
|
||||
|
||||
**Why replaced**: Combined server load of 1000-1350W exceeded 660W capacity.
|
||||
|
||||
---
|
||||
|
||||
## Power Draw Estimates
|
||||
|
||||
### Typical Load
|
||||
|
||||
| Component | Idle | Load | Notes |
|
||||
|-----------|------|------|-------|
|
||||
| PVE Server | 250-350W | 500-750W | CPU + TITAN RTX + P2000 + storage |
|
||||
| PVE2 Server | 200-300W | 450-600W | CPU + RTX A6000 + storage |
|
||||
| Network gear | ~50W | ~50W | Router, switches |
|
||||
| **Total** | **500-700W** | **1000-1400W** | Varies by workload |
|
||||
|
||||
**UPS Load**: ~33-50% typical, 70-80% under heavy load
|
||||
|
||||
### Runtime Calculation
|
||||
|
||||
At **440W load** (33%): ~15-20 min runtime (tested 2025-12-21)
|
||||
At **660W load** (50%): ~10-12 min estimated
|
||||
At **1000W load** (75%): ~6-8 min estimated
|
||||
|
||||
**NUT shutdown trigger**: 120 seconds (2 min) remaining runtime
|
||||
|
||||
---
|
||||
|
||||
## NUT (Network UPS Tools) Configuration
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
UPS (USB) ──> PVE (NUT Server/Master) ──> PVE2 (NUT Client/Slave)
|
||||
│
|
||||
└──> Home Assistant (monitoring only)
|
||||
```
|
||||
|
||||
**Master**: PVE (10.10.10.120) - UPS connected via USB, runs NUT server
|
||||
**Slave**: PVE2 (10.10.10.102) - Monitors PVE's NUT server, shuts down when triggered
|
||||
|
||||
### NUT Server Configuration (PVE)
|
||||
|
||||
#### 1. UPS Driver Config: `/etc/nut/ups.conf`
|
||||
|
||||
```ini
|
||||
[cyberpower]
|
||||
driver = usbhid-ups
|
||||
port = auto
|
||||
desc = "CyberPower OR2200PFCRT2U"
|
||||
override.battery.charge.low = 20
|
||||
override.battery.runtime.low = 120
|
||||
```
|
||||
|
||||
**Key settings**:
|
||||
- `driver = usbhid-ups`: USB HID UPS driver (generic for CyberPower)
|
||||
- `port = auto`: Auto-detect USB device
|
||||
- `override.battery.runtime.low = 120`: Trigger shutdown at 120 seconds (2 min) remaining
|
||||
|
||||
#### 2. NUT Server Config: `/etc/nut/upsd.conf`
|
||||
|
||||
```ini
|
||||
LISTEN 127.0.0.1 3493
|
||||
LISTEN 10.10.10.120 3493
|
||||
```
|
||||
|
||||
**Listens on**:
|
||||
- Localhost (for local monitoring)
|
||||
- LAN IP (for PVE2 to connect)
|
||||
|
||||
#### 3. User Config: `/etc/nut/upsd.users`
|
||||
|
||||
```ini
|
||||
[admin]
|
||||
password = upsadmin123
|
||||
actions = SET
|
||||
instcmds = ALL
|
||||
|
||||
[upsmon]
|
||||
password = upsmon123
|
||||
upsmon master
|
||||
```
|
||||
|
||||
**Users**:
|
||||
- `admin`: Full control, can run commands
|
||||
- `upsmon`: Monitoring only (used by PVE2)
|
||||
|
||||
#### 4. Monitor Config: `/etc/nut/upsmon.conf`
|
||||
|
||||
```ini
|
||||
MONITOR cyberpower@localhost 1 upsmon upsmon123 master
|
||||
|
||||
MINSUPPLIES 1
|
||||
SHUTDOWNCMD "/usr/local/bin/ups-shutdown.sh"
|
||||
NOTIFYCMD /usr/sbin/upssched
|
||||
POLLFREQ 5
|
||||
POLLFREQALERT 5
|
||||
HOSTSYNC 15
|
||||
DEADTIME 15
|
||||
POWERDOWNFLAG /etc/killpower
|
||||
|
||||
NOTIFYMSG ONLINE "UPS %s on line power"
|
||||
NOTIFYMSG ONBATT "UPS %s on battery"
|
||||
NOTIFYMSG LOWBATT "UPS %s battery is low"
|
||||
NOTIFYMSG FSD "UPS %s: forced shutdown in progress"
|
||||
NOTIFYMSG COMMOK "Communications with UPS %s established"
|
||||
NOTIFYMSG COMMBAD "Communications with UPS %s lost"
|
||||
NOTIFYMSG SHUTDOWN "Auto logout and shutdown proceeding"
|
||||
NOTIFYMSG REPLBATT "UPS %s battery needs to be replaced"
|
||||
NOTIFYMSG NOCOMM "UPS %s is unavailable"
|
||||
NOTIFYMSG NOPARENT "upsmon parent process died - shutdown impossible"
|
||||
|
||||
NOTIFYFLAG ONLINE SYSLOG+WALL
|
||||
NOTIFYFLAG ONBATT SYSLOG+WALL
|
||||
NOTIFYFLAG LOWBATT SYSLOG+WALL
|
||||
NOTIFYFLAG FSD SYSLOG+WALL
|
||||
NOTIFYFLAG COMMOK SYSLOG+WALL
|
||||
NOTIFYFLAG COMMBAD SYSLOG+WALL
|
||||
NOTIFYFLAG SHUTDOWN SYSLOG+WALL
|
||||
NOTIFYFLAG REPLBATT SYSLOG+WALL
|
||||
NOTIFYFLAG NOCOMM SYSLOG+WALL
|
||||
NOTIFYFLAG NOPARENT SYSLOG
|
||||
```
|
||||
|
||||
**Key settings**:
|
||||
- `MONITOR cyberpower@localhost 1 upsmon upsmon123 master`: Monitor local UPS
|
||||
- `SHUTDOWNCMD "/usr/local/bin/ups-shutdown.sh"`: Custom shutdown script
|
||||
- `POLLFREQ 5`: Check UPS every 5 seconds
|
||||
|
||||
#### 5. USB Permissions: `/etc/udev/rules.d/99-nut-ups.rules`
|
||||
|
||||
```udev
|
||||
SUBSYSTEM=="usb", ATTR{idVendor}=="0764", ATTR{idProduct}=="0501", MODE="0660", GROUP="nut"
|
||||
```
|
||||
|
||||
**Purpose**: Ensure NUT can access USB UPS device
|
||||
|
||||
**Apply rule**:
|
||||
```bash
|
||||
udevadm control --reload-rules
|
||||
udevadm trigger
|
||||
```
|
||||
|
||||
### NUT Client Configuration (PVE2)
|
||||
|
||||
#### Monitor Config: `/etc/nut/upsmon.conf`
|
||||
|
||||
```ini
|
||||
MONITOR cyberpower@10.10.10.120 1 upsmon upsmon123 slave
|
||||
|
||||
MINSUPPLIES 1
|
||||
SHUTDOWNCMD "/usr/local/bin/ups-shutdown.sh"
|
||||
POLLFREQ 5
|
||||
POLLFREQALERT 5
|
||||
HOSTSYNC 15
|
||||
DEADTIME 15
|
||||
POWERDOWNFLAG /etc/killpower
|
||||
|
||||
# Same NOTIFYMSG and NOTIFYFLAG as PVE
|
||||
```
|
||||
|
||||
**Key difference**: `slave` instead of `master` - monitors remote UPS on PVE
|
||||
|
||||
---
|
||||
|
||||
## Custom Shutdown Script
|
||||
|
||||
### `/usr/local/bin/ups-shutdown.sh` (Same on both PVE and PVE2)
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Graceful VM/CT shutdown when UPS battery low
|
||||
|
||||
LOG="/var/log/ups-shutdown.log"
|
||||
|
||||
log() {
|
||||
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG"
|
||||
}
|
||||
|
||||
log "=== UPS Shutdown Triggered ==="
|
||||
log "Battery low - initiating graceful shutdown of VMs/CTs"
|
||||
|
||||
# Get list of running VMs (skip TrueNAS for now)
|
||||
VMS=$(qm list | awk '$3=="running" && $1!=100 {print $1}')
|
||||
for VMID in $VMS; do
|
||||
log "Stopping VM $VMID..."
|
||||
qm shutdown $VMID
|
||||
done
|
||||
|
||||
# Get list of running containers
|
||||
CTS=$(pct list | awk '$2=="running" {print $1}')
|
||||
for CTID in $CTS; do
|
||||
log "Stopping CT $CTID..."
|
||||
pct shutdown $CTID
|
||||
done
|
||||
|
||||
# Wait for VMs/CTs to stop
|
||||
log "Waiting 60 seconds for VMs/CTs to shut down..."
|
||||
sleep 60
|
||||
|
||||
# Now stop TrueNAS (storage - must be last)
|
||||
if qm status 100 | grep -q running; then
|
||||
log "Stopping TrueNAS (VM 100) last..."
|
||||
qm shutdown 100
|
||||
sleep 30
|
||||
fi
|
||||
|
||||
log "All VMs/CTs stopped. Host will remain running until UPS dies."
|
||||
log "=== UPS Shutdown Complete ==="
|
||||
```
|
||||
|
||||
**Make executable**:
|
||||
```bash
|
||||
chmod +x /usr/local/bin/ups-shutdown.sh
|
||||
```
|
||||
|
||||
**Script behavior**:
|
||||
1. Stops all VMs (except TrueNAS)
|
||||
2. Stops all containers
|
||||
3. Waits 60 seconds
|
||||
4. Stops TrueNAS last (storage must be cleanly unmounted)
|
||||
5. **Does NOT shut down Proxmox hosts** - intentionally left running
|
||||
|
||||
**Why not shut down hosts?**
|
||||
- BIOS configured to "Restore on AC Power Loss"
|
||||
- When power returns, servers auto-boot and start VMs in order
|
||||
- Avoids need for manual intervention
|
||||
|
||||
---
|
||||
|
||||
## Power Failure Behavior
|
||||
|
||||
### When Power Fails
|
||||
|
||||
1. **UPS switches to battery** (`OB DISCHRG` status)
|
||||
2. **NUT monitors runtime** - polls every 5 seconds
|
||||
3. **At 120 seconds (2 min) remaining**:
|
||||
- NUT triggers `/usr/local/bin/ups-shutdown.sh` on both servers
|
||||
- Script gracefully stops all VMs/CTs
|
||||
- TrueNAS stopped last (storage integrity)
|
||||
4. **Hosts remain running** until UPS battery depletes
|
||||
5. **UPS battery dies** → Hosts lose power (ungraceful but safe - VMs already stopped)
|
||||
|
||||
### When Power Returns
|
||||
|
||||
1. **UPS charges battery**, power returns to servers
|
||||
2. **BIOS "Restore on AC Power Loss"** boots both servers
|
||||
3. **Proxmox starts** and auto-starts VMs in configured order:
|
||||
|
||||
| Order | Wait | VMs/CTs | Reason |
|
||||
|-------|------|---------|--------|
|
||||
| 1 | 30s | TrueNAS (VM 100) | Storage must start first |
|
||||
| 2 | 60s | Saltbox (VM 101) | Depends on TrueNAS NFS |
|
||||
| 3 | 10s | fs-dev, homeassistant, lmdev1, copyparty, docker-host | General VMs |
|
||||
| 4 | 5s | pihole, traefik, findshyt | Containers |
|
||||
|
||||
PVE2 VMs: order=1, wait=10s
|
||||
|
||||
**Total recovery time**: ~7 minutes from power restoration to fully operational (tested 2025-12-21)
|
||||
|
||||
---
|
||||
|
||||
## UPS Status Codes
|
||||
|
||||
| Code | Meaning | Action |
|
||||
|------|---------|--------|
|
||||
| `OL` | Online (AC power) | Normal operation |
|
||||
| `OB` | On Battery | Power outage - monitor runtime |
|
||||
| `LB` | Low Battery | <2 min remaining - shutdown imminent |
|
||||
| `CHRG` | Charging | Battery charging after power restored |
|
||||
| `DISCHRG` | Discharging | On battery, draining |
|
||||
| `FSD` | Forced Shutdown | NUT triggered shutdown |
|
||||
|
||||
---
|
||||
|
||||
## Monitoring & Commands
|
||||
|
||||
### Check UPS Status
|
||||
|
||||
```bash
|
||||
# Full status
|
||||
ssh pve 'upsc cyberpower@localhost'
|
||||
|
||||
# Key metrics only
|
||||
ssh pve 'upsc cyberpower@localhost | grep -E "battery.charge:|battery.runtime:|ups.load:|ups.status:"'
|
||||
|
||||
# Example output:
|
||||
# battery.charge: 100
|
||||
# battery.runtime: 1234 (seconds remaining)
|
||||
# ups.load: 33 (% load)
|
||||
# ups.status: OL (online)
|
||||
```
|
||||
|
||||
### Control UPS Beeper
|
||||
|
||||
```bash
|
||||
# Mute beeper (temporary - until next power event)
|
||||
ssh pve 'upscmd -u admin -p upsadmin123 cyberpower@localhost beeper.mute'
|
||||
|
||||
# Disable beeper (permanent)
|
||||
ssh pve 'upscmd -u admin -p upsadmin123 cyberpower@localhost beeper.disable'
|
||||
|
||||
# Enable beeper
|
||||
ssh pve 'upscmd -u admin -p upsadmin123 cyberpower@localhost beeper.enable'
|
||||
```
|
||||
|
||||
### Test Shutdown Procedure
|
||||
|
||||
**Simulate low battery** (careful - this will shut down VMs!):
|
||||
|
||||
```bash
|
||||
# Set a very high low battery threshold to trigger shutdown
|
||||
ssh pve 'upsrw -s battery.runtime.low=300 -u admin -p upsadmin123 cyberpower@localhost'
|
||||
|
||||
# Watch it trigger (when runtime drops below 300 seconds)
|
||||
ssh pve 'tail -f /var/log/ups-shutdown.log'
|
||||
|
||||
# Reset to normal
|
||||
ssh pve 'upsrw -s battery.runtime.low=120 -u admin -p upsadmin123 cyberpower@localhost'
|
||||
```
|
||||
|
||||
**Better test**: Run shutdown script manually without actually triggering NUT:
|
||||
```bash
|
||||
ssh pve '/usr/local/bin/ups-shutdown.sh'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Home Assistant Integration
|
||||
|
||||
UPS metrics are exposed to Home Assistant via NUT integration.
|
||||
|
||||
### Available Sensors
|
||||
|
||||
| Entity ID | Description |
|
||||
|-----------|-------------|
|
||||
| `sensor.cyberpower_battery_charge` | Battery % (0-100) |
|
||||
| `sensor.cyberpower_battery_runtime` | Seconds remaining on battery |
|
||||
| `sensor.cyberpower_load` | Load % (0-100) |
|
||||
| `sensor.cyberpower_input_voltage` | Input voltage (V AC) |
|
||||
| `sensor.cyberpower_output_voltage` | Output voltage (V AC) |
|
||||
| `sensor.cyberpower_status` | Status text (OL, OB, LB, etc.) |
|
||||
|
||||
### Configuration
|
||||
|
||||
**Home Assistant**: See [HOMEASSISTANT.md](HOMEASSISTANT.md) for integration setup.
|
||||
|
||||
### Example Automations
|
||||
|
||||
**Send notification when on battery**:
|
||||
```yaml
|
||||
automation:
|
||||
- alias: "UPS On Battery Alert"
|
||||
trigger:
|
||||
- platform: state
|
||||
entity_id: sensor.cyberpower_status
|
||||
to: "OB"
|
||||
action:
|
||||
- service: notify.mobile_app
|
||||
data:
|
||||
message: "⚠️ Power outage! UPS on battery. Runtime: {{ states('sensor.cyberpower_battery_runtime') }}s"
|
||||
```
|
||||
|
||||
**Alert when battery low**:
|
||||
```yaml
|
||||
automation:
|
||||
- alias: "UPS Low Battery Alert"
|
||||
trigger:
|
||||
- platform: numeric_state
|
||||
entity_id: sensor.cyberpower_battery_runtime
|
||||
below: 300
|
||||
action:
|
||||
- service: notify.mobile_app
|
||||
data:
|
||||
message: "🚨 UPS battery low! {{ states('sensor.cyberpower_battery_runtime') }}s remaining"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Results
|
||||
|
||||
### Full Power Failure Test (2025-12-21)
|
||||
|
||||
Complete end-to-end test of power failure and recovery:
|
||||
|
||||
| Event | Time | Duration | Notes |
|
||||
|-------|------|----------|-------|
|
||||
| **Power pulled** | 22:30 | - | UPS on battery, ~15 min runtime at 33% load |
|
||||
| **Low battery trigger** | 22:40:38 | +10:38 | Runtime < 120s, shutdown script ran |
|
||||
| **All VMs stopped** | 22:41:36 | +0:58 | Graceful shutdown completed |
|
||||
| **UPS died** | 22:46:29 | +4:53 | Hosts lost power at 0% battery |
|
||||
| **Power restored** | ~22:47 | - | Plugged back in |
|
||||
| **PVE online** | 22:49:11 | +2:11 | BIOS boot, Proxmox started |
|
||||
| **PVE2 online** | 22:50:47 | +3:47 | BIOS boot, Proxmox started |
|
||||
| **All VMs running** | 22:53:39 | +6:39 | Auto-started in correct order |
|
||||
| **Total recovery** | - | **~7 min** | From power return to fully operational |
|
||||
|
||||
**Results**:
|
||||
✅ VMs shut down gracefully
|
||||
✅ Hosts remained running until UPS died (as intended)
|
||||
✅ Auto-boot on power restoration worked
|
||||
✅ VMs started in correct order with appropriate delays
|
||||
✅ No data corruption or issues
|
||||
|
||||
**Runtime calculation**:
|
||||
- Load: ~33% (440W estimated)
|
||||
- Total runtime on battery: ~16 minutes (22:30 → 22:46:29)
|
||||
- Matches manufacturer estimate for 33% load
|
||||
|
||||
---
|
||||
|
||||
## Proxmox Cluster Quorum Fix
|
||||
|
||||
### Problem
|
||||
|
||||
With a 2-node cluster, if one node goes down, the other loses quorum and can't manage VMs.
|
||||
|
||||
During UPS testing, this would prevent the remaining node from starting VMs after power restoration.
|
||||
|
||||
### Solution
|
||||
|
||||
Modified `/etc/pve/corosync.conf` to enable 2-node mode:
|
||||
|
||||
```
|
||||
quorum {
|
||||
provider: corosync_votequorum
|
||||
two_node: 1
|
||||
}
|
||||
```
|
||||
|
||||
**Effect**:
|
||||
- Either node can operate independently if the other is down
|
||||
- No more waiting for quorum when one server is offline
|
||||
- Both nodes visible in single Proxmox interface when both up
|
||||
|
||||
**Applied**: 2025-12-21
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Monthly Checks
|
||||
|
||||
```bash
|
||||
# Check UPS status
|
||||
ssh pve 'upsc cyberpower@localhost'
|
||||
|
||||
# Check NUT server running
|
||||
ssh pve 'systemctl status nut-server'
|
||||
ssh pve 'systemctl status nut-monitor'
|
||||
|
||||
# Check NUT client running (PVE2)
|
||||
ssh pve2 'systemctl status nut-monitor'
|
||||
|
||||
# Verify PVE2 can see UPS
|
||||
ssh pve2 'upsc cyberpower@10.10.10.120'
|
||||
|
||||
# Check logs for errors
|
||||
ssh pve 'journalctl -u nut-server -n 50'
|
||||
ssh pve 'journalctl -u nut-monitor -n 50'
|
||||
```
|
||||
|
||||
### Battery Health
|
||||
|
||||
**Check battery stats**:
|
||||
```bash
|
||||
ssh pve 'upsc cyberpower@localhost | grep battery'
|
||||
|
||||
# Key metrics:
|
||||
# battery.charge: 100 (should be near 100 when on AC)
|
||||
# battery.runtime: 1200+ (seconds at current load)
|
||||
# battery.voltage: ~24V (normal for 24V battery system)
|
||||
```
|
||||
|
||||
**Battery replacement**: When runtime significantly decreases or UPS reports `REPLBATT`:
|
||||
```bash
|
||||
ssh pve 'upsc cyberpower@localhost | grep battery.mfr.date'
|
||||
```
|
||||
|
||||
CyberPower batteries typically last 3-5 years.
|
||||
|
||||
### Firmware Updates
|
||||
|
||||
Check CyberPower website for firmware updates:
|
||||
https://www.cyberpowersystems.com/support/firmware/
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### UPS Not Detected
|
||||
|
||||
```bash
|
||||
# Check USB connection
|
||||
ssh pve 'lsusb | grep Cyber'
|
||||
|
||||
# Expected:
|
||||
# Bus 001 Device 003: ID 0764:0501 Cyber Power System, Inc. CP1500 AVR UPS
|
||||
|
||||
# Restart NUT driver
|
||||
ssh pve 'systemctl restart nut-driver'
|
||||
ssh pve 'systemctl status nut-driver'
|
||||
```
|
||||
|
||||
### PVE2 Can't Connect
|
||||
|
||||
```bash
|
||||
# Verify NUT server listening
|
||||
ssh pve 'netstat -tuln | grep 3493'
|
||||
|
||||
# Should show:
|
||||
# tcp 0 0 10.10.10.120:3493 0.0.0.0:* LISTEN
|
||||
|
||||
# Test connection from PVE2
|
||||
ssh pve2 'telnet 10.10.10.120 3493'
|
||||
|
||||
# Check firewall (should allow port 3493)
|
||||
ssh pve 'iptables -L -n | grep 3493'
|
||||
```
|
||||
|
||||
### Shutdown Script Not Running
|
||||
|
||||
```bash
|
||||
# Check script permissions
|
||||
ssh pve 'ls -la /usr/local/bin/ups-shutdown.sh'
|
||||
|
||||
# Should be: -rwxr-xr-x (executable)
|
||||
|
||||
# Check logs
|
||||
ssh pve 'cat /var/log/ups-shutdown.log'
|
||||
|
||||
# Test script manually
|
||||
ssh pve '/usr/local/bin/ups-shutdown.sh'
|
||||
```
|
||||
|
||||
### UPS Status Shows UNKNOWN
|
||||
|
||||
```bash
|
||||
# Driver may not be compatible
|
||||
ssh pve 'upsc cyberpower@localhost ups.status'
|
||||
|
||||
# Try different driver (in /etc/nut/ups.conf)
|
||||
# driver = usbhid-ups
|
||||
# or
|
||||
# driver = blazer_usb
|
||||
|
||||
# Restart after change
|
||||
ssh pve 'systemctl restart nut-driver nut-server'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Future Improvements
|
||||
|
||||
- [ ] Add email alerts for UPS events (power fail, low battery)
|
||||
- [ ] Log runtime statistics to track battery degradation
|
||||
- [ ] Set up Grafana dashboard for UPS metrics
|
||||
- [ ] Test battery runtime at different load levels
|
||||
- [ ] Upgrade to 20A circuit, restore original 5-20P plug
|
||||
- [ ] Consider adding network management card for out-of-band UPS access
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) - Overall power optimization
|
||||
- [VMS.md](VMS.md) - VM startup order configuration
|
||||
- [HOMEASSISTANT.md](HOMEASSISTANT.md) - UPS sensor integration
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-22
|
||||
Reference in New Issue
Block a user