Phase 2 documentation implementation: - Created HARDWARE.md: Complete hardware inventory (servers, GPUs, storage, network cards) - Created SERVICES.md: Service inventory with URLs, credentials, health checks (25+ services) - Created MONITORING.md: Health monitoring recommendations, alert setup, implementation plan - Created MAINTENANCE.md: Regular procedures, update schedules, testing checklists - Updated README.md: Added all Phase 2 documentation links - Updated CLAUDE.md: Cleaned up to quick reference only (1340→377 lines) All detailed content now in specialized documentation files with cross-references. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
456 lines
11 KiB
Markdown
456 lines
11 KiB
Markdown
# Hardware Inventory
|
|
|
|
Complete hardware specifications for all homelab equipment.
|
|
|
|
## Servers
|
|
|
|
### PVE (10.10.10.120) - Primary Proxmox Server
|
|
|
|
#### CPU
|
|
- **Model**: AMD Ryzen Threadripper PRO 3975WX
|
|
- **Cores**: 32 cores / 64 threads
|
|
- **Base Clock**: 3.5 GHz
|
|
- **Boost Clock**: 4.2 GHz
|
|
- **TDP**: 280W
|
|
- **Architecture**: Zen 2 (7nm)
|
|
- **Socket**: sTRX4
|
|
- **Features**: ECC support, PCIe 4.0
|
|
|
|
#### RAM
|
|
- **Capacity**: 128 GB
|
|
- **Type**: DDR4 ECC Registered
|
|
- **Speed**: Unknown (needs investigation)
|
|
- **Channels**: 8-channel (quad-channel per socket)
|
|
- **Idle Power**: ~30-40W
|
|
|
|
#### Storage
|
|
|
|
**OS/VM Storage:**
|
|
|
|
| Pool | Devices | Type | Capacity | Purpose |
|
|
|------|---------|------|----------|---------|
|
|
| `nvme-mirror1` | 2x Sabrent Rocket Q NVMe | ZFS Mirror | 3.6 TB usable | High-performance VM storage |
|
|
| `nvme-mirror2` | 2x Kingston SFYRD 2TB NVMe | ZFS Mirror | 1.8 TB usable | Additional fast VM storage |
|
|
| `rpool` | 2x Samsung 870 QVO 4TB SSD | ZFS Mirror | 3.6 TB usable | Proxmox OS, containers, backups |
|
|
|
|
**Total Storage**: ~9 TB usable
|
|
|
|
#### GPUs
|
|
|
|
| Model | Slot | VRAM | TDP | Purpose | Passed To |
|
|
|-------|------|------|-----|---------|-----------|
|
|
| NVIDIA Quadro P2000 | PCIe slot 1 | 5 GB GDDR5 | 75W | Plex transcoding | Host |
|
|
| NVIDIA TITAN RTX | PCIe slot 2 | 24 GB GDDR6 | 280W | AI workloads | Saltbox (101), lmdev1 (111) |
|
|
|
|
**Total GPU Power**: 75W + 280W = 355W (under load)
|
|
|
|
#### Network Cards
|
|
|
|
| Interface | Model | Speed | Purpose | Bridge |
|
|
|-----------|-------|-------|---------|--------|
|
|
| enp1s0 | Intel I210 (onboard) | 1 Gb | Management | vmbr0 |
|
|
| enp35s0f0 | Intel X520 (dual-port SFP+) | 10 Gb | High-speed LXC | vmbr1 |
|
|
| enp35s0f1 | Intel X520 (dual-port SFP+) | 10 Gb | High-speed VM | vmbr2 |
|
|
|
|
**10Gb Transceivers**: Intel FTLX8571D3BCV (SFP+ 10GBASE-SR, 850nm, multimode)
|
|
|
|
#### Storage Controllers
|
|
|
|
| Model | Interface | Purpose |
|
|
|-------|-----------|---------|
|
|
| LSI SAS2308 HBA | PCIe 3.0 x8 | Passed to TrueNAS VM for EMC enclosure |
|
|
| Samsung NVMe controller | PCIe | Passed to TrueNAS VM for ZFS caching |
|
|
|
|
#### Motherboard
|
|
- **Model**: Unknown - needs investigation
|
|
- **Chipset**: AMD TRX40
|
|
- **Form Factor**: ATX/EATX
|
|
- **PCIe Slots**: Multiple PCIe 4.0 slots
|
|
- **Features**: IOMMU support, ECC memory
|
|
|
|
#### Power Supply
|
|
- **Model**: Unknown
|
|
- **Wattage**: Likely 1000W+ (needs investigation)
|
|
- **Type**: ATX, 80+ certification unknown
|
|
|
|
#### Cooling
|
|
- **CPU Cooler**: Unknown - likely large tower or AIO
|
|
- **Case Fans**: Unknown quantity
|
|
- **Note**: CPU temps 70-80°C under load (healthy)
|
|
|
|
---
|
|
|
|
### PVE2 (10.10.10.102) - Secondary Proxmox Server
|
|
|
|
#### CPU
|
|
- **Model**: AMD Ryzen Threadripper PRO 3975WX
|
|
- **Specs**: Same as PVE (32C/64T, 280W TDP)
|
|
|
|
#### RAM
|
|
- **Capacity**: 128 GB DDR4 ECC
|
|
- **Same specs as PVE**
|
|
|
|
#### Storage
|
|
|
|
| Pool | Devices | Type | Capacity | Purpose |
|
|
|------|---------|------|----------|---------|
|
|
| `nvme-mirror3` | 2x NVMe (model unknown) | ZFS Mirror | Unknown | High-performance VM storage |
|
|
| `local-zfs2` | 2x WD Red 6TB HDD | ZFS Mirror | ~6 TB usable | Bulk/archival storage (spins down) |
|
|
|
|
**HDD Spindown**: Configured for 30-min idle spindown (saves ~10-16W)
|
|
|
|
#### GPUs
|
|
|
|
| Model | Slot | VRAM | TDP | Purpose | Passed To |
|
|
|-------|------|------|-----|---------|-----------|
|
|
| NVIDIA RTX A6000 | PCIe slot 1 | 48 GB GDDR6 | 300W | AI trading workloads | trading-vm (301) |
|
|
|
|
#### Network Cards
|
|
|
|
| Interface | Model | Speed | Purpose |
|
|
|-----------|-------|-------|---------|
|
|
| nic1 | Unknown (onboard) | 1 Gb | Management |
|
|
|
|
**Note**: MTU set to 9000 for jumbo frames
|
|
|
|
#### Motherboard
|
|
- **Model**: Unknown
|
|
- **Chipset**: AMD TRX40
|
|
- **Similar to PVE**
|
|
|
|
---
|
|
|
|
## Network Equipment
|
|
|
|
### UniFi Dream Machine Pro (UCG-Fiber)
|
|
|
|
- **Model**: UniFi Cloud Gateway Fiber
|
|
- **IP**: 10.10.10.1
|
|
- **Ports**: Multiple 1Gb + SFP+ uplink
|
|
- **Features**: Router, firewall, VPN, IDS/IPS
|
|
- **MTU**: 9216 (supports jumbo frames)
|
|
- **Tailscale**: Installed for VPN failover
|
|
|
|
### Switches
|
|
|
|
**Details needed** - investigate current switch setup:
|
|
- 10Gb switch for high-speed connections?
|
|
- 1Gb switch for general devices?
|
|
- PoE capabilities?
|
|
|
|
```bash
|
|
# Check what's connected to 10Gb interfaces
|
|
ssh pve 'ip link show enp35s0f0'
|
|
ssh pve 'ip link show enp35s0f1'
|
|
```
|
|
|
|
---
|
|
|
|
## Storage Hardware
|
|
|
|
### EMC Storage Enclosure
|
|
|
|
**See [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) for complete details**
|
|
|
|
- **Model**: EMC KTN-STL4 (or similar)
|
|
- **Form Factor**: 4U rackmount
|
|
- **Drive Bays**: 25x 3.5" SAS/SATA
|
|
- **Controllers**: Dual LCC (Link Control Cards)
|
|
- **Connection**: SAS via LSI SAS2308 HBA
|
|
- **Passed to**: TrueNAS VM (VMID 100)
|
|
|
|
**Current Status**:
|
|
- LCC A: Active (working)
|
|
- LCC B: Failed (replacement ordered)
|
|
|
|
**Drive Inventory**: Unknown - needs audit
|
|
|
|
```bash
|
|
# Get drive list from TrueNAS
|
|
ssh truenas 'smartctl --scan'
|
|
ssh truenas 'lsblk'
|
|
```
|
|
|
|
### NVMe Drives
|
|
|
|
| Model | Quantity | Capacity | Location | Pool |
|
|
|-------|----------|----------|----------|------|
|
|
| Sabrent Rocket Q | 2 | Unknown | PVE | nvme-mirror1 |
|
|
| Kingston SFYRD | 2 | 2 TB each | PVE | nvme-mirror2 |
|
|
| Unknown model | 2 | Unknown | PVE2 | nvme-mirror3 |
|
|
| Samsung (model unknown) | 1 | Unknown | TrueNAS (passed) | ZFS cache |
|
|
|
|
### SSDs
|
|
|
|
| Model | Quantity | Capacity | Location | Pool |
|
|
|-------|----------|----------|----------|------|
|
|
| Samsung 870 QVO | 2 | 4 TB each | PVE | rpool |
|
|
|
|
### HDDs
|
|
|
|
| Model | Quantity | Capacity | Location | Pool |
|
|
|-------|----------|----------|----------|------|
|
|
| WD Red | 2 | 6 TB each | PVE2 | local-zfs2 |
|
|
| Unknown (in EMC) | Unknown | Unknown | TrueNAS | vault |
|
|
|
|
---
|
|
|
|
## UPS
|
|
|
|
### Current UPS
|
|
|
|
| Specification | Value |
|
|
|---------------|-------|
|
|
| **Model** | CyberPower OR2200PFCRT2U |
|
|
| **Capacity** | 2200VA / 1320W |
|
|
| **Form Factor** | 2U rackmount |
|
|
| **Input** | NEMA 5-15P (rewired from 5-20P) |
|
|
| **Outlets** | 2x 5-20R + 6x 5-15R |
|
|
| **Output** | PFC Sinewave |
|
|
| **Runtime** | ~15-20 min @ 33% load |
|
|
| **Interface** | USB (connected to PVE) |
|
|
|
|
**See [UPS.md](UPS.md) for configuration details**
|
|
|
|
---
|
|
|
|
## Client Devices
|
|
|
|
### Mac Mini (Hutson's Workstation)
|
|
|
|
- **Model**: Unknown generation
|
|
- **CPU**: Unknown
|
|
- **RAM**: Unknown
|
|
- **Storage**: Unknown
|
|
- **Network**: 1Gb Ethernet (en0) - MTU 9000
|
|
- **Tailscale IP**: 100.108.89.58
|
|
- **Local IP**: 10.10.10.125 (static)
|
|
- **Purpose**: Primary workstation, Happy Coder daemon host
|
|
|
|
### MacBook (Mobile)
|
|
|
|
- **Model**: Unknown
|
|
- **Network**: Wi-Fi + Ethernet adapter
|
|
- **Tailscale IP**: Unknown
|
|
- **Purpose**: Mobile work, development
|
|
|
|
### Windows PC
|
|
|
|
- **Model**: Unknown
|
|
- **CPU**: Unknown
|
|
- **Network**: 1Gb Ethernet
|
|
- **IP**: 10.10.10.150
|
|
- **Purpose**: Gaming, Windows development, Syncthing node
|
|
|
|
### Phone (Android)
|
|
|
|
- **Model**: Unknown
|
|
- **IP**: 10.10.10.54 (when on Wi-Fi)
|
|
- **Purpose**: Syncthing mobile node, Happy Coder client
|
|
|
|
---
|
|
|
|
## Rack Layout (If Applicable)
|
|
|
|
**Needs documentation** - Current rack configuration unknown
|
|
|
|
Suggested format:
|
|
```
|
|
U42: Blank panel
|
|
U41: UPS (CyberPower 2U)
|
|
U40: UPS (CyberPower 2U)
|
|
U39: Switch (10Gb)
|
|
U38-U35: EMC Storage Enclosure (4U)
|
|
U34: PVE Server
|
|
U33: PVE2 Server
|
|
...
|
|
```
|
|
|
|
---
|
|
|
|
## Power Consumption
|
|
|
|
### Measured Power Draw
|
|
|
|
| Component | Idle | Typical | Peak | Notes |
|
|
|-----------|------|---------|------|-------|
|
|
| PVE Server | 250-350W | 500W | 750W | CPU + GPUs + storage |
|
|
| PVE2 Server | 200-300W | 400W | 600W | CPU + GPU + storage |
|
|
| Network Gear | ~50W | ~50W | ~50W | Router + switches |
|
|
| **Total** | **500-700W** | **~950W** | **~1400W** | Exceeds UPS under peak load |
|
|
|
|
**UPS Capacity**: 1320W
|
|
**Typical Load**: 33-50% (safe margin)
|
|
**Peak Load**: Can exceed UPS capacity temporarily (acceptable)
|
|
|
|
### Power Optimizations Applied
|
|
|
|
**See [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) for details**
|
|
|
|
- KSMD disabled: ~60-80W saved
|
|
- CPU governors: ~60-120W saved
|
|
- Syncthing rescans: ~60-80W saved
|
|
- HDD spindown: ~10-16W saved when idle
|
|
- **Total savings**: ~150-300W
|
|
|
|
---
|
|
|
|
## Thermal Management
|
|
|
|
### CPU Cooling
|
|
|
|
**PVE & PVE2**:
|
|
- CPU cooler: Unknown model
|
|
- Thermal paste: Unknown, likely needs refresh if temps >85°C
|
|
- Target temp: 70-80°C under load
|
|
- Max safe: 90°C Tctl (Threadripper PRO spec)
|
|
|
|
### GPU Cooling
|
|
|
|
All GPUs are passively managed (stock coolers):
|
|
- TITAN RTX: 2-3W idle, 280W load
|
|
- RTX A6000: 11W idle, 300W load
|
|
- Quadro P2000: 25W constant (Plex active)
|
|
|
|
### Case Airflow
|
|
|
|
**Unknown** - needs investigation:
|
|
- Case model?
|
|
- Fan configuration?
|
|
- Positive or negative pressure?
|
|
|
|
---
|
|
|
|
## Cable Management
|
|
|
|
### Network Cables
|
|
|
|
| Connection | Type | Length | Speed |
|
|
|------------|------|--------|-------|
|
|
| PVE → Switch (10Gb) | OM3 fiber | Unknown | 10Gb |
|
|
| PVE2 → Router | Cat6 | Unknown | 1Gb |
|
|
| Mac Mini → Switch | Cat6 | Unknown | 1Gb |
|
|
| TrueNAS → EMC | SAS cable | Unknown | 6Gb/s |
|
|
|
|
### Power Cables
|
|
|
|
**Critical**: All servers on UPS battery-backed outlets
|
|
|
|
---
|
|
|
|
## Maintenance Schedule
|
|
|
|
### Annual Maintenance
|
|
|
|
- [ ] Clean dust from servers (every 6-12 months)
|
|
- [ ] Check thermal paste on CPUs (every 2-3 years)
|
|
- [ ] Test UPS battery runtime (annually)
|
|
- [ ] Verify all fans operational
|
|
- [ ] Check for bulging capacitors on PSUs
|
|
|
|
### Drive Health
|
|
|
|
```bash
|
|
# Check SMART status on all drives
|
|
ssh pve 'smartctl -a /dev/nvme0'
|
|
ssh pve2 'smartctl -a /dev/sda'
|
|
ssh truenas 'smartctl --scan | while read dev type; do echo "=== $dev ==="; smartctl -a $dev | grep -E "Model|Serial|Health|Reallocated|Current_Pending"; done'
|
|
```
|
|
|
|
### Temperature Monitoring
|
|
|
|
```bash
|
|
# Check all temps (needs lm-sensors installed)
|
|
ssh pve 'sensors'
|
|
ssh pve2 'sensors'
|
|
```
|
|
|
|
---
|
|
|
|
## Warranty & Purchase Info
|
|
|
|
**Needs documentation**:
|
|
- When were servers purchased?
|
|
- Where were components bought?
|
|
- Any warranties still active?
|
|
- Replacement part sources?
|
|
|
|
---
|
|
|
|
## Upgrade Path
|
|
|
|
### Short-term Upgrades (< 6 months)
|
|
|
|
- [ ] 20A circuit for UPS (restore original 5-20P plug)
|
|
- [ ] Document missing hardware specs
|
|
- [ ] Label all cables
|
|
- [ ] Create rack diagram
|
|
|
|
### Medium-term Upgrades (6-12 months)
|
|
|
|
- [ ] Additional 10Gb NIC for PVE2?
|
|
- [ ] More NVMe storage?
|
|
- [ ] Upgrade network switches?
|
|
- [ ] Replace EMC enclosure with newer model?
|
|
|
|
### Long-term Upgrades (1-2 years)
|
|
|
|
- [ ] CPU upgrade to newer Threadripper?
|
|
- [ ] RAM expansion to 256GB?
|
|
- [ ] Additional GPU for AI workloads?
|
|
- [ ] Migrate to PCIe 5.0 storage?
|
|
|
|
---
|
|
|
|
## Investigation Needed
|
|
|
|
High-priority items to document:
|
|
|
|
- [ ] Get exact motherboard model (both servers)
|
|
- [ ] Get PSU model and wattage
|
|
- [ ] CPU cooler models
|
|
- [ ] Network switch models and configuration
|
|
- [ ] Complete drive inventory in EMC enclosure
|
|
- [ ] RAM speed and timings
|
|
- [ ] Case models
|
|
- [ ] Exact NVMe models for all drives
|
|
|
|
**Commands to gather info**:
|
|
|
|
```bash
|
|
# Motherboard
|
|
ssh pve 'dmidecode -t baseboard'
|
|
|
|
# CPU details
|
|
ssh pve 'lscpu'
|
|
|
|
# RAM details
|
|
ssh pve 'dmidecode -t memory | grep -E "Size|Speed|Manufacturer"'
|
|
|
|
# Storage devices
|
|
ssh pve 'lsblk -o NAME,SIZE,TYPE,TRAN,MODEL'
|
|
|
|
# Network cards
|
|
ssh pve 'lspci | grep -i network'
|
|
|
|
# GPU details
|
|
ssh pve 'lspci | grep -i vga'
|
|
ssh pve 'nvidia-smi -L' # If nvidia-smi available
|
|
```
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- [VMS.md](VMS.md) - VM resource allocation
|
|
- [STORAGE.md](STORAGE.md) - Storage pools and usage
|
|
- [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) - Power optimizations
|
|
- [UPS.md](UPS.md) - UPS configuration
|
|
- [NETWORK.md](NETWORK.md) - Network configuration
|
|
- [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) - Storage enclosure details
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-12-22
|
|
**Status**: ⚠️ Incomplete - many specs need investigation
|