- CLAUDE.md: Main homelab assistant context and instructions - IP-ASSIGNMENTS.md: Complete IP address assignments - NETWORK.md: Network bridges, VLANs, and configuration - EMC-ENCLOSURE.md: EMC storage enclosure documentation - SYNCTHING.md: Syncthing setup and device list - SHELL-ALIASES.md: ZSH aliases for Claude Code sessions - HOMEASSISTANT.md: Home Assistant API and automations - INFRASTRUCTURE.md: Server hardware and power management - configs/: Shared shell configurations - scripts/: Utility scripts - mcp-central/: MCP server configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.5 KiB
Homelab Changelog
2024-12-16
Power Investigation
Investigated UPS power limit issues across both Proxmox servers.
Findings
-
KSMD (Kernel Same-page Merging Daemon) was consuming 50-57% CPU constantly on PVE
sleep_millisecsset to 12ms (extremely aggressive, default is 200ms)general_profitwas negative (-320MB) meaning it was wasting CPU- No memory overcommit situation (98GB allocated on 128GB RAM)
- Diverse workloads (TrueNAS, Windows, Linux) = few duplicate pages to merge
-
GPU Power Draw identified as major consumers:
- RTX A6000 on PVE2: up to 300W TDP
- TITAN RTX on PVE: up to 280W TDP
- Quadro P2000 on PVE: up to 75W TDP
-
TrueNAS VM occasionally spiking to 86% CPU (needs investigation)
Changes Made
- Disabled KSMD on PVE (10.10.10.120)
echo 0 > /sys/kernel/mm/ksm/run- Immediate result: KSMD CPU dropped from 51-57% to 0%
- Load average dropped from 1.88 to 1.28
- Estimated savings: ~7-10W continuous
Additional Changes
- Made KSMD disable persistent on both hosts
- Note: KSM is controlled via sysfs, not sysctl
- Created systemd service
/etc/systemd/system/disable-ksm.service:
[Unit] Description=Disable KSM (Kernel Same-page Merging) After=multi-user.target [Service] Type=oneshot ExecStart=/bin/sh -c "echo 0 > /sys/kernel/mm/ksm/run" RemainAfterExit=yes [Install] WantedBy=multi-user.target- Enabled on both PVE and PVE2:
systemctl enable disable-ksm.service
Syncthing Rescan Interval Fix
Root Cause: Syncthing on TrueNAS was rescanning 56GB of data every 60 seconds, causing constant 100% CPU usage (~3172 minutes CPU time in 3 days).
Folders affected (changed from 60s to 3600s):
- downloads (38GB)
- documents (11GB)
- desktop (7.2GB)
- config, movies, notes, pictures
Fix applied:
# Downloaded config from TrueNAS
ssh pve 'qm guest exec 100 -- cat /mnt/.ix-apps/app_mounts/syncthing/config/config/config.xml'
# Changed all rescanIntervalS="60" to rescanIntervalS="3600"
sed -i 's/rescanIntervalS="60"/rescanIntervalS="3600"/g' config.xml
# Uploaded and restarted Syncthing
curl -X POST -H "X-API-Key: xxx" http://localhost:20910/rest/system/restart
Note: fsWatcher is enabled, so changes are detected in real-time. The rescan is just a safety net.
Estimated savings: ~60-80W (TrueNAS VM CPU will drop from 86% to ~5-10% at idle)
GPU Power State Investigation
| GPU | VM | Idle Power | P-State | Status |
|---|---|---|---|---|
| RTX A6000 | trading-vm (301) | 11W | P8 | Optimal |
| TITAN RTX | lmdev1 (111) | 2W | P8 | Excellent! |
| Quadro P2000 | saltbox (101) | 25W | P0 | Stuck due to Plex |
Findings:
- RTX A6000: Properly entering P8 (11W idle) - excellent
- TITAN RTX: Only 2W at idle despite ComfyUI/Python processes (436MiB VRAM used)
- Modern GPUs have much better idle power management
- Quadro P2000: Stuck in P0 at 25W because Plex Transcoder holds GPU memory
- Older Quadro cards don't idle as efficiently with processes attached
- Power limit fixed at 75W (not adjustable)
Changes made:
- Installed QEMU guest agent on lmdev1 (VM 111)
- Added SSH key access to lmdev1 (10.10.10.111)
- Updated ~/.ssh/config with lmdev1 entry
CPU Governor Optimization
Issue: Both servers using performance CPU governor, keeping CPUs at high frequencies (3-4GHz) even when 99% idle.
Changes:
PVE (10.10.10.120)
- Driver:
amd-pstate-epp(modern AMD P-State with Energy Performance Preference) - Change: Governor
performance→powersave, EPPperformance→balance_power - Result: Idle frequencies dropped from ~4GHz to ~1.7GHz
- Persistence: Created
/etc/systemd/system/cpu-powersave.service[Unit] Description=Set CPU governor to powersave with balance_power EPP After=multi-user.target [Service] Type=oneshot ExecStart=/bin/bash -c 'for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo powersave > "$gov"; done; for epp in /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference; do echo balance_power > "$epp"; done' RemainAfterExit=yes [Install] WantedBy=multi-user.target
PVE2 (10.10.10.102)
- Driver:
acpi-cpufreq(older driver) - Change: Governor
performance→schedutil - Result: Idle frequencies dropped from ~4GHz to ~2.2GHz
- Persistence: Created
/etc/systemd/system/cpu-powersave.service[Unit] Description=Set CPU governor to schedutil for power savings After=multi-user.target [Service] Type=oneshot ExecStart=/bin/bash -c 'for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo schedutil > "$gov"; done' RemainAfterExit=yes [Install] WantedBy=multi-user.target
Estimated savings: 30-60W per server (60-120W total)
ksmtuned Service Disabled
Issue: The ksmtuned (KSM tuning daemon) was still running on both servers even after KSMD was disabled. Consuming ~39 min CPU on PVE and ~12 min CPU on PVE2 over 3 days.
Fix:
systemctl stop ksmtuned
systemctl disable ksmtuned
Applied to both PVE and PVE2.
Estimated savings: ~2-5W
HDD Spindown on PVE2
Issue: Two WD Red 6TB drives (local-zfs2 pool) spinning 24/7 despite pool having only 768KB used. Each drive uses 5-8W spinning.
Fix:
# Set 30-minute spindown timeout
hdparm -S 241 /dev/sda /dev/sdb
Persistence: Created udev rule /etc/udev/rules.d/69-hdd-spindown.rules:
ACTION=="add", KERNEL=="sd[a-z]", ATTRS{model}=="WDC WD60EFRX-68L*", RUN+="/usr/sbin/hdparm -S 241 /dev/%k"
Estimated savings: ~10-16W (when drives spin down)
Pending Changes
- Monitor overall power consumption after all optimizations
- Consider PCIe ASPM optimization
- Consider NMI watchdog disable
SSH Key Setup
- Added SSH key authentication to both Proxmox servers
- Updated
~/.ssh/configwith entries forpveandpve2
Notes
What is KSMD?
Kernel Same-page Merging Daemon - scans memory for duplicate pages across VMs and merges them. Trades CPU cycles for RAM savings. Useful when:
- Overcommitting memory
- Running many identical VMs
Not useful when:
- Plenty of RAM headroom (our case)
- Diverse workloads with few duplicate pages
general_profitis negative
What is Memory Ballooning?
Guest-cooperative memory management. Hypervisor can request VMs to give back unused RAM. Independent from KSMD. Both are Proxmox/KVM memory optimization features but serve different purposes.