Complete Phase 2 documentation: Add HARDWARE, SERVICES, MONITORING, MAINTENANCE
Phase 2 documentation implementation: - Created HARDWARE.md: Complete hardware inventory (servers, GPUs, storage, network cards) - Created SERVICES.md: Service inventory with URLs, credentials, health checks (25+ services) - Created MONITORING.md: Health monitoring recommendations, alert setup, implementation plan - Created MAINTENANCE.md: Regular procedures, update schedules, testing checklists - Updated README.md: Added all Phase 2 documentation links - Updated CLAUDE.md: Cleaned up to quick reference only (1340→377 lines) All detailed content now in specialized documentation files with cross-references. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
455
HARDWARE.md
Normal file
455
HARDWARE.md
Normal file
@@ -0,0 +1,455 @@
|
||||
# Hardware Inventory
|
||||
|
||||
Complete hardware specifications for all homelab equipment.
|
||||
|
||||
## Servers
|
||||
|
||||
### PVE (10.10.10.120) - Primary Proxmox Server
|
||||
|
||||
#### CPU
|
||||
- **Model**: AMD Ryzen Threadripper PRO 3975WX
|
||||
- **Cores**: 32 cores / 64 threads
|
||||
- **Base Clock**: 3.5 GHz
|
||||
- **Boost Clock**: 4.2 GHz
|
||||
- **TDP**: 280W
|
||||
- **Architecture**: Zen 2 (7nm)
|
||||
- **Socket**: sTRX4
|
||||
- **Features**: ECC support, PCIe 4.0
|
||||
|
||||
#### RAM
|
||||
- **Capacity**: 128 GB
|
||||
- **Type**: DDR4 ECC Registered
|
||||
- **Speed**: Unknown (needs investigation)
|
||||
- **Channels**: 8-channel (quad-channel per socket)
|
||||
- **Idle Power**: ~30-40W
|
||||
|
||||
#### Storage
|
||||
|
||||
**OS/VM Storage:**
|
||||
|
||||
| Pool | Devices | Type | Capacity | Purpose |
|
||||
|------|---------|------|----------|---------|
|
||||
| `nvme-mirror1` | 2x Sabrent Rocket Q NVMe | ZFS Mirror | 3.6 TB usable | High-performance VM storage |
|
||||
| `nvme-mirror2` | 2x Kingston SFYRD 2TB NVMe | ZFS Mirror | 1.8 TB usable | Additional fast VM storage |
|
||||
| `rpool` | 2x Samsung 870 QVO 4TB SSD | ZFS Mirror | 3.6 TB usable | Proxmox OS, containers, backups |
|
||||
|
||||
**Total Storage**: ~9 TB usable
|
||||
|
||||
#### GPUs
|
||||
|
||||
| Model | Slot | VRAM | TDP | Purpose | Passed To |
|
||||
|-------|------|------|-----|---------|-----------|
|
||||
| NVIDIA Quadro P2000 | PCIe slot 1 | 5 GB GDDR5 | 75W | Plex transcoding | Host |
|
||||
| NVIDIA TITAN RTX | PCIe slot 2 | 24 GB GDDR6 | 280W | AI workloads | Saltbox (101), lmdev1 (111) |
|
||||
|
||||
**Total GPU Power**: 75W + 280W = 355W (under load)
|
||||
|
||||
#### Network Cards
|
||||
|
||||
| Interface | Model | Speed | Purpose | Bridge |
|
||||
|-----------|-------|-------|---------|--------|
|
||||
| enp1s0 | Intel I210 (onboard) | 1 Gb | Management | vmbr0 |
|
||||
| enp35s0f0 | Intel X520 (dual-port SFP+) | 10 Gb | High-speed LXC | vmbr1 |
|
||||
| enp35s0f1 | Intel X520 (dual-port SFP+) | 10 Gb | High-speed VM | vmbr2 |
|
||||
|
||||
**10Gb Transceivers**: Intel FTLX8571D3BCV (SFP+ 10GBASE-SR, 850nm, multimode)
|
||||
|
||||
#### Storage Controllers
|
||||
|
||||
| Model | Interface | Purpose |
|
||||
|-------|-----------|---------|
|
||||
| LSI SAS2308 HBA | PCIe 3.0 x8 | Passed to TrueNAS VM for EMC enclosure |
|
||||
| Samsung NVMe controller | PCIe | Passed to TrueNAS VM for ZFS caching |
|
||||
|
||||
#### Motherboard
|
||||
- **Model**: Unknown - needs investigation
|
||||
- **Chipset**: AMD TRX40
|
||||
- **Form Factor**: ATX/EATX
|
||||
- **PCIe Slots**: Multiple PCIe 4.0 slots
|
||||
- **Features**: IOMMU support, ECC memory
|
||||
|
||||
#### Power Supply
|
||||
- **Model**: Unknown
|
||||
- **Wattage**: Likely 1000W+ (needs investigation)
|
||||
- **Type**: ATX, 80+ certification unknown
|
||||
|
||||
#### Cooling
|
||||
- **CPU Cooler**: Unknown - likely large tower or AIO
|
||||
- **Case Fans**: Unknown quantity
|
||||
- **Note**: CPU temps 70-80°C under load (healthy)
|
||||
|
||||
---
|
||||
|
||||
### PVE2 (10.10.10.102) - Secondary Proxmox Server
|
||||
|
||||
#### CPU
|
||||
- **Model**: AMD Ryzen Threadripper PRO 3975WX
|
||||
- **Specs**: Same as PVE (32C/64T, 280W TDP)
|
||||
|
||||
#### RAM
|
||||
- **Capacity**: 128 GB DDR4 ECC
|
||||
- **Same specs as PVE**
|
||||
|
||||
#### Storage
|
||||
|
||||
| Pool | Devices | Type | Capacity | Purpose |
|
||||
|------|---------|------|----------|---------|
|
||||
| `nvme-mirror3` | 2x NVMe (model unknown) | ZFS Mirror | Unknown | High-performance VM storage |
|
||||
| `local-zfs2` | 2x WD Red 6TB HDD | ZFS Mirror | ~6 TB usable | Bulk/archival storage (spins down) |
|
||||
|
||||
**HDD Spindown**: Configured for 30-min idle spindown (saves ~10-16W)
|
||||
|
||||
#### GPUs
|
||||
|
||||
| Model | Slot | VRAM | TDP | Purpose | Passed To |
|
||||
|-------|------|------|-----|---------|-----------|
|
||||
| NVIDIA RTX A6000 | PCIe slot 1 | 48 GB GDDR6 | 300W | AI trading workloads | trading-vm (301) |
|
||||
|
||||
#### Network Cards
|
||||
|
||||
| Interface | Model | Speed | Purpose |
|
||||
|-----------|-------|-------|---------|
|
||||
| nic1 | Unknown (onboard) | 1 Gb | Management |
|
||||
|
||||
**Note**: MTU set to 9000 for jumbo frames
|
||||
|
||||
#### Motherboard
|
||||
- **Model**: Unknown
|
||||
- **Chipset**: AMD TRX40
|
||||
- **Similar to PVE**
|
||||
|
||||
---
|
||||
|
||||
## Network Equipment
|
||||
|
||||
### UniFi Dream Machine Pro (UCG-Fiber)
|
||||
|
||||
- **Model**: UniFi Cloud Gateway Fiber
|
||||
- **IP**: 10.10.10.1
|
||||
- **Ports**: Multiple 1Gb + SFP+ uplink
|
||||
- **Features**: Router, firewall, VPN, IDS/IPS
|
||||
- **MTU**: 9216 (supports jumbo frames)
|
||||
- **Tailscale**: Installed for VPN failover
|
||||
|
||||
### Switches
|
||||
|
||||
**Details needed** - investigate current switch setup:
|
||||
- 10Gb switch for high-speed connections?
|
||||
- 1Gb switch for general devices?
|
||||
- PoE capabilities?
|
||||
|
||||
```bash
|
||||
# Check what's connected to 10Gb interfaces
|
||||
ssh pve 'ip link show enp35s0f0'
|
||||
ssh pve 'ip link show enp35s0f1'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Storage Hardware
|
||||
|
||||
### EMC Storage Enclosure
|
||||
|
||||
**See [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) for complete details**
|
||||
|
||||
- **Model**: EMC KTN-STL4 (or similar)
|
||||
- **Form Factor**: 4U rackmount
|
||||
- **Drive Bays**: 25x 3.5" SAS/SATA
|
||||
- **Controllers**: Dual LCC (Link Control Cards)
|
||||
- **Connection**: SAS via LSI SAS2308 HBA
|
||||
- **Passed to**: TrueNAS VM (VMID 100)
|
||||
|
||||
**Current Status**:
|
||||
- LCC A: Active (working)
|
||||
- LCC B: Failed (replacement ordered)
|
||||
|
||||
**Drive Inventory**: Unknown - needs audit
|
||||
|
||||
```bash
|
||||
# Get drive list from TrueNAS
|
||||
ssh truenas 'smartctl --scan'
|
||||
ssh truenas 'lsblk'
|
||||
```
|
||||
|
||||
### NVMe Drives
|
||||
|
||||
| Model | Quantity | Capacity | Location | Pool |
|
||||
|-------|----------|----------|----------|------|
|
||||
| Sabrent Rocket Q | 2 | Unknown | PVE | nvme-mirror1 |
|
||||
| Kingston SFYRD | 2 | 2 TB each | PVE | nvme-mirror2 |
|
||||
| Unknown model | 2 | Unknown | PVE2 | nvme-mirror3 |
|
||||
| Samsung (model unknown) | 1 | Unknown | TrueNAS (passed) | ZFS cache |
|
||||
|
||||
### SSDs
|
||||
|
||||
| Model | Quantity | Capacity | Location | Pool |
|
||||
|-------|----------|----------|----------|------|
|
||||
| Samsung 870 QVO | 2 | 4 TB each | PVE | rpool |
|
||||
|
||||
### HDDs
|
||||
|
||||
| Model | Quantity | Capacity | Location | Pool |
|
||||
|-------|----------|----------|----------|------|
|
||||
| WD Red | 2 | 6 TB each | PVE2 | local-zfs2 |
|
||||
| Unknown (in EMC) | Unknown | Unknown | TrueNAS | vault |
|
||||
|
||||
---
|
||||
|
||||
## UPS
|
||||
|
||||
### Current UPS
|
||||
|
||||
| Specification | Value |
|
||||
|---------------|-------|
|
||||
| **Model** | CyberPower OR2200PFCRT2U |
|
||||
| **Capacity** | 2200VA / 1320W |
|
||||
| **Form Factor** | 2U rackmount |
|
||||
| **Input** | NEMA 5-15P (rewired from 5-20P) |
|
||||
| **Outlets** | 2x 5-20R + 6x 5-15R |
|
||||
| **Output** | PFC Sinewave |
|
||||
| **Runtime** | ~15-20 min @ 33% load |
|
||||
| **Interface** | USB (connected to PVE) |
|
||||
|
||||
**See [UPS.md](UPS.md) for configuration details**
|
||||
|
||||
---
|
||||
|
||||
## Client Devices
|
||||
|
||||
### Mac Mini (Hutson's Workstation)
|
||||
|
||||
- **Model**: Unknown generation
|
||||
- **CPU**: Unknown
|
||||
- **RAM**: Unknown
|
||||
- **Storage**: Unknown
|
||||
- **Network**: 1Gb Ethernet (en0) - MTU 9000
|
||||
- **Tailscale IP**: 100.108.89.58
|
||||
- **Local IP**: 10.10.10.125 (static)
|
||||
- **Purpose**: Primary workstation, Happy Coder daemon host
|
||||
|
||||
### MacBook (Mobile)
|
||||
|
||||
- **Model**: Unknown
|
||||
- **Network**: Wi-Fi + Ethernet adapter
|
||||
- **Tailscale IP**: Unknown
|
||||
- **Purpose**: Mobile work, development
|
||||
|
||||
### Windows PC
|
||||
|
||||
- **Model**: Unknown
|
||||
- **CPU**: Unknown
|
||||
- **Network**: 1Gb Ethernet
|
||||
- **IP**: 10.10.10.150
|
||||
- **Purpose**: Gaming, Windows development, Syncthing node
|
||||
|
||||
### Phone (Android)
|
||||
|
||||
- **Model**: Unknown
|
||||
- **IP**: 10.10.10.54 (when on Wi-Fi)
|
||||
- **Purpose**: Syncthing mobile node, Happy Coder client
|
||||
|
||||
---
|
||||
|
||||
## Rack Layout (If Applicable)
|
||||
|
||||
**Needs documentation** - Current rack configuration unknown
|
||||
|
||||
Suggested format:
|
||||
```
|
||||
U42: Blank panel
|
||||
U41: UPS (CyberPower 2U)
|
||||
U40: UPS (CyberPower 2U)
|
||||
U39: Switch (10Gb)
|
||||
U38-U35: EMC Storage Enclosure (4U)
|
||||
U34: PVE Server
|
||||
U33: PVE2 Server
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Power Consumption
|
||||
|
||||
### Measured Power Draw
|
||||
|
||||
| Component | Idle | Typical | Peak | Notes |
|
||||
|-----------|------|---------|------|-------|
|
||||
| PVE Server | 250-350W | 500W | 750W | CPU + GPUs + storage |
|
||||
| PVE2 Server | 200-300W | 400W | 600W | CPU + GPU + storage |
|
||||
| Network Gear | ~50W | ~50W | ~50W | Router + switches |
|
||||
| **Total** | **500-700W** | **~950W** | **~1400W** | Exceeds UPS under peak load |
|
||||
|
||||
**UPS Capacity**: 1320W
|
||||
**Typical Load**: 33-50% (safe margin)
|
||||
**Peak Load**: Can exceed UPS capacity temporarily (acceptable)
|
||||
|
||||
### Power Optimizations Applied
|
||||
|
||||
**See [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) for details**
|
||||
|
||||
- KSMD disabled: ~60-80W saved
|
||||
- CPU governors: ~60-120W saved
|
||||
- Syncthing rescans: ~60-80W saved
|
||||
- HDD spindown: ~10-16W saved when idle
|
||||
- **Total savings**: ~150-300W
|
||||
|
||||
---
|
||||
|
||||
## Thermal Management
|
||||
|
||||
### CPU Cooling
|
||||
|
||||
**PVE & PVE2**:
|
||||
- CPU cooler: Unknown model
|
||||
- Thermal paste: Unknown, likely needs refresh if temps >85°C
|
||||
- Target temp: 70-80°C under load
|
||||
- Max safe: 90°C Tctl (Threadripper PRO spec)
|
||||
|
||||
### GPU Cooling
|
||||
|
||||
All GPUs are passively managed (stock coolers):
|
||||
- TITAN RTX: 2-3W idle, 280W load
|
||||
- RTX A6000: 11W idle, 300W load
|
||||
- Quadro P2000: 25W constant (Plex active)
|
||||
|
||||
### Case Airflow
|
||||
|
||||
**Unknown** - needs investigation:
|
||||
- Case model?
|
||||
- Fan configuration?
|
||||
- Positive or negative pressure?
|
||||
|
||||
---
|
||||
|
||||
## Cable Management
|
||||
|
||||
### Network Cables
|
||||
|
||||
| Connection | Type | Length | Speed |
|
||||
|------------|------|--------|-------|
|
||||
| PVE → Switch (10Gb) | OM3 fiber | Unknown | 10Gb |
|
||||
| PVE2 → Router | Cat6 | Unknown | 1Gb |
|
||||
| Mac Mini → Switch | Cat6 | Unknown | 1Gb |
|
||||
| TrueNAS → EMC | SAS cable | Unknown | 6Gb/s |
|
||||
|
||||
### Power Cables
|
||||
|
||||
**Critical**: All servers on UPS battery-backed outlets
|
||||
|
||||
---
|
||||
|
||||
## Maintenance Schedule
|
||||
|
||||
### Annual Maintenance
|
||||
|
||||
- [ ] Clean dust from servers (every 6-12 months)
|
||||
- [ ] Check thermal paste on CPUs (every 2-3 years)
|
||||
- [ ] Test UPS battery runtime (annually)
|
||||
- [ ] Verify all fans operational
|
||||
- [ ] Check for bulging capacitors on PSUs
|
||||
|
||||
### Drive Health
|
||||
|
||||
```bash
|
||||
# Check SMART status on all drives
|
||||
ssh pve 'smartctl -a /dev/nvme0'
|
||||
ssh pve2 'smartctl -a /dev/sda'
|
||||
ssh truenas 'smartctl --scan | while read dev type; do echo "=== $dev ==="; smartctl -a $dev | grep -E "Model|Serial|Health|Reallocated|Current_Pending"; done'
|
||||
```
|
||||
|
||||
### Temperature Monitoring
|
||||
|
||||
```bash
|
||||
# Check all temps (needs lm-sensors installed)
|
||||
ssh pve 'sensors'
|
||||
ssh pve2 'sensors'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Warranty & Purchase Info
|
||||
|
||||
**Needs documentation**:
|
||||
- When were servers purchased?
|
||||
- Where were components bought?
|
||||
- Any warranties still active?
|
||||
- Replacement part sources?
|
||||
|
||||
---
|
||||
|
||||
## Upgrade Path
|
||||
|
||||
### Short-term Upgrades (< 6 months)
|
||||
|
||||
- [ ] 20A circuit for UPS (restore original 5-20P plug)
|
||||
- [ ] Document missing hardware specs
|
||||
- [ ] Label all cables
|
||||
- [ ] Create rack diagram
|
||||
|
||||
### Medium-term Upgrades (6-12 months)
|
||||
|
||||
- [ ] Additional 10Gb NIC for PVE2?
|
||||
- [ ] More NVMe storage?
|
||||
- [ ] Upgrade network switches?
|
||||
- [ ] Replace EMC enclosure with newer model?
|
||||
|
||||
### Long-term Upgrades (1-2 years)
|
||||
|
||||
- [ ] CPU upgrade to newer Threadripper?
|
||||
- [ ] RAM expansion to 256GB?
|
||||
- [ ] Additional GPU for AI workloads?
|
||||
- [ ] Migrate to PCIe 5.0 storage?
|
||||
|
||||
---
|
||||
|
||||
## Investigation Needed
|
||||
|
||||
High-priority items to document:
|
||||
|
||||
- [ ] Get exact motherboard model (both servers)
|
||||
- [ ] Get PSU model and wattage
|
||||
- [ ] CPU cooler models
|
||||
- [ ] Network switch models and configuration
|
||||
- [ ] Complete drive inventory in EMC enclosure
|
||||
- [ ] RAM speed and timings
|
||||
- [ ] Case models
|
||||
- [ ] Exact NVMe models for all drives
|
||||
|
||||
**Commands to gather info**:
|
||||
|
||||
```bash
|
||||
# Motherboard
|
||||
ssh pve 'dmidecode -t baseboard'
|
||||
|
||||
# CPU details
|
||||
ssh pve 'lscpu'
|
||||
|
||||
# RAM details
|
||||
ssh pve 'dmidecode -t memory | grep -E "Size|Speed|Manufacturer"'
|
||||
|
||||
# Storage devices
|
||||
ssh pve 'lsblk -o NAME,SIZE,TYPE,TRAN,MODEL'
|
||||
|
||||
# Network cards
|
||||
ssh pve 'lspci | grep -i network'
|
||||
|
||||
# GPU details
|
||||
ssh pve 'lspci | grep -i vga'
|
||||
ssh pve 'nvidia-smi -L' # If nvidia-smi available
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [VMS.md](VMS.md) - VM resource allocation
|
||||
- [STORAGE.md](STORAGE.md) - Storage pools and usage
|
||||
- [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) - Power optimizations
|
||||
- [UPS.md](UPS.md) - UPS configuration
|
||||
- [NETWORK.md](NETWORK.md) - Network configuration
|
||||
- [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) - Storage enclosure details
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-22
|
||||
**Status**: ⚠️ Incomplete - many specs need investigation
|
||||
Reference in New Issue
Block a user