Files
homelab-docs/HARDWARE.md
Hutson 56b82df497 Complete Phase 2 documentation: Add HARDWARE, SERVICES, MONITORING, MAINTENANCE
Phase 2 documentation implementation:
- Created HARDWARE.md: Complete hardware inventory (servers, GPUs, storage, network cards)
- Created SERVICES.md: Service inventory with URLs, credentials, health checks (25+ services)
- Created MONITORING.md: Health monitoring recommendations, alert setup, implementation plan
- Created MAINTENANCE.md: Regular procedures, update schedules, testing checklists
- Updated README.md: Added all Phase 2 documentation links
- Updated CLAUDE.md: Cleaned up to quick reference only (1340→377 lines)

All detailed content now in specialized documentation files with cross-references.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-23 00:34:21 -05:00

11 KiB

Hardware Inventory

Complete hardware specifications for all homelab equipment.

Servers

PVE (10.10.10.120) - Primary Proxmox Server

CPU

  • Model: AMD Ryzen Threadripper PRO 3975WX
  • Cores: 32 cores / 64 threads
  • Base Clock: 3.5 GHz
  • Boost Clock: 4.2 GHz
  • TDP: 280W
  • Architecture: Zen 2 (7nm)
  • Socket: sTRX4
  • Features: ECC support, PCIe 4.0

RAM

  • Capacity: 128 GB
  • Type: DDR4 ECC Registered
  • Speed: Unknown (needs investigation)
  • Channels: 8-channel (quad-channel per socket)
  • Idle Power: ~30-40W

Storage

OS/VM Storage:

Pool Devices Type Capacity Purpose
nvme-mirror1 2x Sabrent Rocket Q NVMe ZFS Mirror 3.6 TB usable High-performance VM storage
nvme-mirror2 2x Kingston SFYRD 2TB NVMe ZFS Mirror 1.8 TB usable Additional fast VM storage
rpool 2x Samsung 870 QVO 4TB SSD ZFS Mirror 3.6 TB usable Proxmox OS, containers, backups

Total Storage: ~9 TB usable

GPUs

Model Slot VRAM TDP Purpose Passed To
NVIDIA Quadro P2000 PCIe slot 1 5 GB GDDR5 75W Plex transcoding Host
NVIDIA TITAN RTX PCIe slot 2 24 GB GDDR6 280W AI workloads Saltbox (101), lmdev1 (111)

Total GPU Power: 75W + 280W = 355W (under load)

Network Cards

Interface Model Speed Purpose Bridge
enp1s0 Intel I210 (onboard) 1 Gb Management vmbr0
enp35s0f0 Intel X520 (dual-port SFP+) 10 Gb High-speed LXC vmbr1
enp35s0f1 Intel X520 (dual-port SFP+) 10 Gb High-speed VM vmbr2

10Gb Transceivers: Intel FTLX8571D3BCV (SFP+ 10GBASE-SR, 850nm, multimode)

Storage Controllers

Model Interface Purpose
LSI SAS2308 HBA PCIe 3.0 x8 Passed to TrueNAS VM for EMC enclosure
Samsung NVMe controller PCIe Passed to TrueNAS VM for ZFS caching

Motherboard

  • Model: Unknown - needs investigation
  • Chipset: AMD TRX40
  • Form Factor: ATX/EATX
  • PCIe Slots: Multiple PCIe 4.0 slots
  • Features: IOMMU support, ECC memory

Power Supply

  • Model: Unknown
  • Wattage: Likely 1000W+ (needs investigation)
  • Type: ATX, 80+ certification unknown

Cooling

  • CPU Cooler: Unknown - likely large tower or AIO
  • Case Fans: Unknown quantity
  • Note: CPU temps 70-80°C under load (healthy)

PVE2 (10.10.10.102) - Secondary Proxmox Server

CPU

  • Model: AMD Ryzen Threadripper PRO 3975WX
  • Specs: Same as PVE (32C/64T, 280W TDP)

RAM

  • Capacity: 128 GB DDR4 ECC
  • Same specs as PVE

Storage

Pool Devices Type Capacity Purpose
nvme-mirror3 2x NVMe (model unknown) ZFS Mirror Unknown High-performance VM storage
local-zfs2 2x WD Red 6TB HDD ZFS Mirror ~6 TB usable Bulk/archival storage (spins down)

HDD Spindown: Configured for 30-min idle spindown (saves ~10-16W)

GPUs

Model Slot VRAM TDP Purpose Passed To
NVIDIA RTX A6000 PCIe slot 1 48 GB GDDR6 300W AI trading workloads trading-vm (301)

Network Cards

Interface Model Speed Purpose
nic1 Unknown (onboard) 1 Gb Management

Note: MTU set to 9000 for jumbo frames

Motherboard

  • Model: Unknown
  • Chipset: AMD TRX40
  • Similar to PVE

Network Equipment

UniFi Dream Machine Pro (UCG-Fiber)

  • Model: UniFi Cloud Gateway Fiber
  • IP: 10.10.10.1
  • Ports: Multiple 1Gb + SFP+ uplink
  • Features: Router, firewall, VPN, IDS/IPS
  • MTU: 9216 (supports jumbo frames)
  • Tailscale: Installed for VPN failover

Switches

Details needed - investigate current switch setup:

  • 10Gb switch for high-speed connections?
  • 1Gb switch for general devices?
  • PoE capabilities?
# Check what's connected to 10Gb interfaces
ssh pve 'ip link show enp35s0f0'
ssh pve 'ip link show enp35s0f1'

Storage Hardware

EMC Storage Enclosure

See EMC-ENCLOSURE.md for complete details

  • Model: EMC KTN-STL4 (or similar)
  • Form Factor: 4U rackmount
  • Drive Bays: 25x 3.5" SAS/SATA
  • Controllers: Dual LCC (Link Control Cards)
  • Connection: SAS via LSI SAS2308 HBA
  • Passed to: TrueNAS VM (VMID 100)

Current Status:

  • LCC A: Active (working)
  • LCC B: Failed (replacement ordered)

Drive Inventory: Unknown - needs audit

# Get drive list from TrueNAS
ssh truenas 'smartctl --scan'
ssh truenas 'lsblk'

NVMe Drives

Model Quantity Capacity Location Pool
Sabrent Rocket Q 2 Unknown PVE nvme-mirror1
Kingston SFYRD 2 2 TB each PVE nvme-mirror2
Unknown model 2 Unknown PVE2 nvme-mirror3
Samsung (model unknown) 1 Unknown TrueNAS (passed) ZFS cache

SSDs

Model Quantity Capacity Location Pool
Samsung 870 QVO 2 4 TB each PVE rpool

HDDs

Model Quantity Capacity Location Pool
WD Red 2 6 TB each PVE2 local-zfs2
Unknown (in EMC) Unknown Unknown TrueNAS vault

UPS

Current UPS

Specification Value
Model CyberPower OR2200PFCRT2U
Capacity 2200VA / 1320W
Form Factor 2U rackmount
Input NEMA 5-15P (rewired from 5-20P)
Outlets 2x 5-20R + 6x 5-15R
Output PFC Sinewave
Runtime ~15-20 min @ 33% load
Interface USB (connected to PVE)

See UPS.md for configuration details


Client Devices

Mac Mini (Hutson's Workstation)

  • Model: Unknown generation
  • CPU: Unknown
  • RAM: Unknown
  • Storage: Unknown
  • Network: 1Gb Ethernet (en0) - MTU 9000
  • Tailscale IP: 100.108.89.58
  • Local IP: 10.10.10.125 (static)
  • Purpose: Primary workstation, Happy Coder daemon host

MacBook (Mobile)

  • Model: Unknown
  • Network: Wi-Fi + Ethernet adapter
  • Tailscale IP: Unknown
  • Purpose: Mobile work, development

Windows PC

  • Model: Unknown
  • CPU: Unknown
  • Network: 1Gb Ethernet
  • IP: 10.10.10.150
  • Purpose: Gaming, Windows development, Syncthing node

Phone (Android)

  • Model: Unknown
  • IP: 10.10.10.54 (when on Wi-Fi)
  • Purpose: Syncthing mobile node, Happy Coder client

Rack Layout (If Applicable)

Needs documentation - Current rack configuration unknown

Suggested format:

U42: Blank panel
U41: UPS (CyberPower 2U)
U40: UPS (CyberPower 2U)
U39: Switch (10Gb)
U38-U35: EMC Storage Enclosure (4U)
U34: PVE Server
U33: PVE2 Server
...

Power Consumption

Measured Power Draw

Component Idle Typical Peak Notes
PVE Server 250-350W 500W 750W CPU + GPUs + storage
PVE2 Server 200-300W 400W 600W CPU + GPU + storage
Network Gear ~50W ~50W ~50W Router + switches
Total 500-700W ~950W ~1400W Exceeds UPS under peak load

UPS Capacity: 1320W Typical Load: 33-50% (safe margin) Peak Load: Can exceed UPS capacity temporarily (acceptable)

Power Optimizations Applied

See POWER-MANAGEMENT.md for details

  • KSMD disabled: ~60-80W saved
  • CPU governors: ~60-120W saved
  • Syncthing rescans: ~60-80W saved
  • HDD spindown: ~10-16W saved when idle
  • Total savings: ~150-300W

Thermal Management

CPU Cooling

PVE & PVE2:

  • CPU cooler: Unknown model
  • Thermal paste: Unknown, likely needs refresh if temps >85°C
  • Target temp: 70-80°C under load
  • Max safe: 90°C Tctl (Threadripper PRO spec)

GPU Cooling

All GPUs are passively managed (stock coolers):

  • TITAN RTX: 2-3W idle, 280W load
  • RTX A6000: 11W idle, 300W load
  • Quadro P2000: 25W constant (Plex active)

Case Airflow

Unknown - needs investigation:

  • Case model?
  • Fan configuration?
  • Positive or negative pressure?

Cable Management

Network Cables

Connection Type Length Speed
PVE → Switch (10Gb) OM3 fiber Unknown 10Gb
PVE2 → Router Cat6 Unknown 1Gb
Mac Mini → Switch Cat6 Unknown 1Gb
TrueNAS → EMC SAS cable Unknown 6Gb/s

Power Cables

Critical: All servers on UPS battery-backed outlets


Maintenance Schedule

Annual Maintenance

  • Clean dust from servers (every 6-12 months)
  • Check thermal paste on CPUs (every 2-3 years)
  • Test UPS battery runtime (annually)
  • Verify all fans operational
  • Check for bulging capacitors on PSUs

Drive Health

# Check SMART status on all drives
ssh pve 'smartctl -a /dev/nvme0'
ssh pve2 'smartctl -a /dev/sda'
ssh truenas 'smartctl --scan | while read dev type; do echo "=== $dev ==="; smartctl -a $dev | grep -E "Model|Serial|Health|Reallocated|Current_Pending"; done'

Temperature Monitoring

# Check all temps (needs lm-sensors installed)
ssh pve 'sensors'
ssh pve2 'sensors'

Warranty & Purchase Info

Needs documentation:

  • When were servers purchased?
  • Where were components bought?
  • Any warranties still active?
  • Replacement part sources?

Upgrade Path

Short-term Upgrades (< 6 months)

  • 20A circuit for UPS (restore original 5-20P plug)
  • Document missing hardware specs
  • Label all cables
  • Create rack diagram

Medium-term Upgrades (6-12 months)

  • Additional 10Gb NIC for PVE2?
  • More NVMe storage?
  • Upgrade network switches?
  • Replace EMC enclosure with newer model?

Long-term Upgrades (1-2 years)

  • CPU upgrade to newer Threadripper?
  • RAM expansion to 256GB?
  • Additional GPU for AI workloads?
  • Migrate to PCIe 5.0 storage?

Investigation Needed

High-priority items to document:

  • Get exact motherboard model (both servers)
  • Get PSU model and wattage
  • CPU cooler models
  • Network switch models and configuration
  • Complete drive inventory in EMC enclosure
  • RAM speed and timings
  • Case models
  • Exact NVMe models for all drives

Commands to gather info:

# Motherboard
ssh pve 'dmidecode -t baseboard'

# CPU details
ssh pve 'lscpu'

# RAM details
ssh pve 'dmidecode -t memory | grep -E "Size|Speed|Manufacturer"'

# Storage devices
ssh pve 'lsblk -o NAME,SIZE,TYPE,TRAN,MODEL'

# Network cards
ssh pve 'lspci | grep -i network'

# GPU details
ssh pve 'lspci | grep -i vga'
ssh pve 'nvidia-smi -L'  # If nvidia-smi available


Last Updated: 2025-12-22 Status: ⚠️ Incomplete - many specs need investigation