Files
homelab-docs/VMS.md
Hutson 56b82df497 Complete Phase 2 documentation: Add HARDWARE, SERVICES, MONITORING, MAINTENANCE
Phase 2 documentation implementation:
- Created HARDWARE.md: Complete hardware inventory (servers, GPUs, storage, network cards)
- Created SERVICES.md: Service inventory with URLs, credentials, health checks (25+ services)
- Created MONITORING.md: Health monitoring recommendations, alert setup, implementation plan
- Created MAINTENANCE.md: Regular procedures, update schedules, testing checklists
- Updated README.md: Added all Phase 2 documentation links
- Updated CLAUDE.md: Cleaned up to quick reference only (1340→377 lines)

All detailed content now in specialized documentation files with cross-references.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-23 00:34:21 -05:00

580 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# VMs and Containers
Complete inventory of all virtual machines and LXC containers across both Proxmox servers.
## Overview
| Server | VMs | LXCs | Total |
|--------|-----|------|-------|
| **PVE** (10.10.10.120) | 6 | 3 | 9 |
| **PVE2** (10.10.10.102) | 2 | 0 | 2 |
| **Total** | **8** | **3** | **11** |
---
## PVE (10.10.10.120) - Primary Server
### Virtual Machines
| VMID | Name | IP | vCPUs | RAM | Storage | Purpose | GPU/Passthrough | QEMU Agent |
|------|------|-----|-------|-----|---------|---------|-----------------|------------|
| **100** | truenas | 10.10.10.200 | 8 | 32GB | nvme-mirror1 | NAS, central file storage | LSI SAS2308 HBA, Samsung NVMe | ✅ Yes |
| **101** | saltbox | 10.10.10.100 | 16 | 16GB | nvme-mirror1 | Media automation (Plex, *arr) | TITAN RTX | ✅ Yes |
| **105** | fs-dev | 10.10.10.5 | 10 | 8GB | rpool | Development environment | - | ✅ Yes |
| **110** | homeassistant | 10.10.10.110 | 2 | 2GB | rpool | Home automation platform | - | ❌ No |
| **111** | lmdev1 | 10.10.10.111 | 8 | 32GB | nvme-mirror1 | AI/LLM development | TITAN RTX | ✅ Yes |
| **201** | copyparty | 10.10.10.201 | 2 | 2GB | rpool | File sharing service | - | ✅ Yes |
| **206** | docker-host | 10.10.10.206 | 2 | 4GB | rpool | Docker services (Excalidraw, Happy, Pulse) | - | ✅ Yes |
### LXC Containers
| CTID | Name | IP | RAM | Storage | Purpose |
|------|------|-----|-----|---------|---------|
| **200** | pihole | 10.10.10.10 | - | rpool | DNS, ad blocking |
| **202** | traefik | 10.10.10.250 | - | rpool | Reverse proxy (primary) |
| **205** | findshyt | 10.10.10.8 | - | rpool | Custom app |
---
## PVE2 (10.10.10.102) - Secondary Server
### Virtual Machines
| VMID | Name | IP | vCPUs | RAM | Storage | Purpose | GPU/Passthrough | QEMU Agent |
|------|------|-----|-------|-----|---------|---------|-----------------|------------|
| **300** | gitea-vm | 10.10.10.220 | 2 | 4GB | nvme-mirror3 | Git server (Gitea) | - | ✅ Yes |
| **301** | trading-vm | 10.10.10.221 | 16 | 32GB | nvme-mirror3 | AI trading platform | RTX A6000 | ✅ Yes |
### LXC Containers
None on PVE2.
---
## VM Details
### 100 - TrueNAS (Storage Server)
**Purpose**: Central NAS for all file storage, NFS/SMB shares, and media libraries
**Specs**:
- **OS**: TrueNAS SCALE
- **vCPUs**: 8
- **RAM**: 32 GB
- **Storage**: nvme-mirror1 (OS), EMC storage enclosure (data pool via HBA passthrough)
- **Network**:
- Primary: 10 Gb (vmbr2)
- Secondary: Internal storage network (vmbr3 @ 10.10.20.x)
**Hardware Passthrough**:
- LSI SAS2308 HBA (for EMC enclosure drives)
- Samsung NVMe (for ZFS caching)
**ZFS Pools**:
- `vault`: Main storage pool on EMC drives
- Boot pool on passed-through NVMe
**See**: [STORAGE.md](STORAGE.md), [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md)
---
### 101 - Saltbox (Media Automation)
**Purpose**: Media server stack - Plex, Sonarr, Radarr, SABnzbd, Overseerr, etc.
**Specs**:
- **OS**: Ubuntu 22.04
- **vCPUs**: 16
- **RAM**: 16 GB
- **Storage**: nvme-mirror1
- **Network**: 10 Gb (vmbr2)
**GPU Passthrough**:
- NVIDIA TITAN RTX (for Plex hardware transcoding)
**Services**:
- Plex Media Server (plex.htsn.io)
- Sonarr, Radarr, Lidarr (TV/movie/music automation)
- SABnzbd, NZBGet (downloaders)
- Overseerr (request management)
- Tautulli (Plex stats)
- Organizr (dashboard)
- Authelia (SSO authentication)
- Traefik (reverse proxy - separate from CT 202)
**Managed By**: Saltbox Ansible playbooks
**See**: [SALTBOX.md](#) (coming soon)
---
### 105 - fs-dev (Development Environment)
**Purpose**: General development work, testing, prototyping
**Specs**:
- **OS**: Ubuntu 22.04
- **vCPUs**: 10
- **RAM**: 8 GB
- **Storage**: rpool
- **Network**: 1 Gb (vmbr0)
---
### 110 - Home Assistant (Home Automation)
**Purpose**: Smart home automation platform
**Specs**:
- **OS**: Home Assistant OS
- **vCPUs**: 2
- **RAM**: 2 GB
- **Storage**: rpool
- **Network**: 1 Gb (vmbr0)
**Access**:
- Web UI: https://homeassistant.htsn.io
- API: See [HOMEASSISTANT.md](HOMEASSISTANT.md)
**Special Notes**:
- ❌ No QEMU agent (Home Assistant OS doesn't support it)
- No SSH server by default (access via web terminal)
---
### 111 - lmdev1 (AI/LLM Development)
**Purpose**: AI model development, fine-tuning, inference
**Specs**:
- **OS**: Ubuntu 22.04
- **vCPUs**: 8
- **RAM**: 32 GB
- **Storage**: nvme-mirror1
- **Network**: 1 Gb (vmbr0)
**GPU Passthrough**:
- NVIDIA TITAN RTX (shared with Saltbox, but can be dedicated if needed)
**Installed**:
- CUDA toolkit
- Python 3.11+
- PyTorch, TensorFlow
- Hugging Face transformers
---
### 201 - Copyparty (File Sharing)
**Purpose**: Simple HTTP file sharing server
**Specs**:
- **OS**: Ubuntu 22.04
- **vCPUs**: 2
- **RAM**: 2 GB
- **Storage**: rpool
- **Network**: 1 Gb (vmbr0)
**Access**: https://copyparty.htsn.io
---
### 206 - docker-host (Docker Services)
**Purpose**: General-purpose Docker host for miscellaneous services
**Specs**:
- **OS**: Ubuntu 22.04
- **vCPUs**: 2
- **RAM**: 4 GB
- **Storage**: rpool
- **Network**: 1 Gb (vmbr0)
- **CPU**: `host` passthrough (for x86-64-v3 support)
**Services Running**:
- Excalidraw (excalidraw.htsn.io) - Whiteboard
- Happy Coder relay server (happy.htsn.io) - Self-hosted relay for Happy Coder mobile app
- Pulse (pulse.htsn.io) - Monitoring dashboard
**Docker Compose Files**: `/opt/*/docker-compose.yml`
---
### 300 - gitea-vm (Git Server)
**Purpose**: Self-hosted Git server
**Specs**:
- **OS**: Ubuntu 22.04
- **vCPUs**: 2
- **RAM**: 4 GB
- **Storage**: nvme-mirror3 (PVE2)
- **Network**: 1 Gb (vmbr0)
**Access**: https://git.htsn.io
**Repositories**:
- homelab-docs (this documentation)
- Personal projects
- Private repos
---
### 301 - trading-vm (AI Trading Platform)
**Purpose**: Algorithmic trading system with AI models
**Specs**:
- **OS**: Ubuntu 22.04
- **vCPUs**: 16
- **RAM**: 32 GB
- **Storage**: nvme-mirror3 (PVE2)
- **Network**: 1 Gb (vmbr0)
**GPU Passthrough**:
- NVIDIA RTX A6000 (300W TDP, 48GB VRAM)
**Software**:
- Trading algorithms
- AI models for market prediction
- Real-time data feeds
- Backtesting infrastructure
---
## LXC Container Details
### 200 - Pi-hole (DNS & Ad Blocking)
**Purpose**: Network-wide DNS server and ad blocker
**Type**: LXC (unprivileged)
**OS**: Ubuntu 22.04
**IP**: 10.10.10.10
**Storage**: rpool
**Access**:
- Web UI: http://10.10.10.10/admin
- Public URL: https://pihole.htsn.io
**Configuration**:
- Upstream DNS: Cloudflare (1.1.1.1)
- DHCP: Disabled (router handles DHCP)
- Interface: All interfaces
**Usage**: Set router DNS to 10.10.10.10 for network-wide ad blocking
---
### 202 - Traefik (Reverse Proxy)
**Purpose**: Primary reverse proxy for all public-facing services
**Type**: LXC (unprivileged)
**OS**: Ubuntu 22.04
**IP**: 10.10.10.250
**Storage**: rpool
**Configuration**: `/etc/traefik/`
**Dynamic Configs**: `/etc/traefik/conf.d/*.yaml`
**See**: [TRAEFIK.md](TRAEFIK.md) for complete documentation
**⚠️ Important**: This is the PRIMARY Traefik instance. Do NOT confuse with Saltbox's Traefik (VM 101).
---
### 205 - FindShyt (Custom App)
**Purpose**: Custom application (details TBD)
**Type**: LXC (unprivileged)
**OS**: Ubuntu 22.04
**IP**: 10.10.10.8
**Storage**: rpool
**Access**: https://findshyt.htsn.io
---
## VM Startup Order & Dependencies
### Power-On Sequence
When servers boot (after power failure or restart), VMs/CTs start in this order:
#### PVE (10.10.10.120)
| Order | Wait | VMID | Name | Reason |
|-------|------|------|------|--------|
| **1** | 30s | 100 | TrueNAS | ⚠️ Storage must start first - other VMs depend on NFS |
| **2** | 60s | 101 | Saltbox | Depends on TrueNAS NFS mounts for media |
| **3** | 10s | 105, 110, 111, 201, 206 | Other VMs | General VMs, no critical dependencies |
| **4** | 5s | 200, 202, 205 | Containers | Lightweight, start quickly |
**Configure startup order** (already set):
```bash
# View current config
ssh pve 'qm config 100 | grep -E "startup|onboot"'
# Set startup order (example)
ssh pve 'qm set 100 --onboot 1 --startup order=1,up=30'
ssh pve 'qm set 101 --onboot 1 --startup order=2,up=60'
```
#### PVE2 (10.10.10.102)
| Order | Wait | VMID | Name |
|-------|------|------|------|
| **1** | 10s | 300, 301 | All VMs |
**Less critical** - no dependencies between PVE2 VMs.
---
## Resource Allocation Summary
### Total Allocated (PVE)
| Resource | Allocated | Physical | % Used |
|----------|-----------|----------|--------|
| **vCPUs** | 56 | 64 (32 cores × 2 threads) | 88% |
| **RAM** | 98 GB | 128 GB | 77% |
**Note**: vCPU overcommit is acceptable (VMs rarely use all cores simultaneously)
### Total Allocated (PVE2)
| Resource | Allocated | Physical | % Used |
|----------|-----------|----------|--------|
| **vCPUs** | 18 | 64 | 28% |
| **RAM** | 36 GB | 128 GB | 28% |
**PVE2** has significant headroom for additional VMs.
---
## Adding a New VM
### Quick Template
```bash
# Create VM
ssh pve 'qm create VMID \
--name myvm \
--memory 4096 \
--cores 2 \
--net0 virtio,bridge=vmbr0 \
--scsihw virtio-scsi-pci \
--scsi0 nvme-mirror1:32 \
--boot order=scsi0 \
--ostype l26 \
--agent enabled=1'
# Attach ISO for installation
ssh pve 'qm set VMID --ide2 local:iso/ubuntu-22.04.iso,media=cdrom'
# Start VM
ssh pve 'qm start VMID'
# Access console
ssh pve 'qm vncproxy VMID' # Then connect with VNC client
# Or via Proxmox web UI
```
### Cloud-Init Template (Faster)
Use cloud-init for automated VM deployment:
```bash
# Download cloud image
ssh pve 'wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img -O /var/lib/vz/template/iso/ubuntu-22.04-cloud.img'
# Create VM
ssh pve 'qm create VMID --name myvm --memory 4096 --cores 2 --net0 virtio,bridge=vmbr0'
# Import disk
ssh pve 'qm importdisk VMID /var/lib/vz/template/iso/ubuntu-22.04-cloud.img nvme-mirror1'
# Attach disk
ssh pve 'qm set VMID --scsi0 nvme-mirror1:vm-VMID-disk-0'
# Add cloud-init drive
ssh pve 'qm set VMID --ide2 nvme-mirror1:cloudinit'
# Set boot disk
ssh pve 'qm set VMID --boot order=scsi0'
# Configure cloud-init (user, SSH key, network)
ssh pve 'qm set VMID --ciuser hutson --sshkeys ~/.ssh/homelab.pub --ipconfig0 ip=10.10.10.XXX/24,gw=10.10.10.1'
# Enable QEMU agent
ssh pve 'qm set VMID --agent enabled=1'
# Resize disk (cloud images are small by default)
ssh pve 'qm resize VMID scsi0 +30G'
# Start VM
ssh pve 'qm start VMID'
```
**Cloud-init VMs boot ready-to-use** with SSH keys, static IP, and user configured.
---
## Adding a New LXC Container
```bash
# Download template (if not already downloaded)
ssh pve 'pveam update'
ssh pve 'pveam available | grep ubuntu'
ssh pve 'pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst'
# Create container
ssh pve 'pct create CTID local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
--hostname mycontainer \
--memory 2048 \
--cores 2 \
--net0 name=eth0,bridge=vmbr0,ip=10.10.10.XXX/24,gw=10.10.10.1 \
--rootfs local-zfs:8 \
--unprivileged 1 \
--features nesting=1 \
--start 1'
# Set root password
ssh pve 'pct exec CTID -- passwd'
# Add SSH key
ssh pve 'pct exec CTID -- mkdir -p /root/.ssh'
ssh pve 'pct exec CTID -- bash -c "echo \"$(cat ~/.ssh/homelab.pub)\" >> /root/.ssh/authorized_keys"'
ssh pve 'pct exec CTID -- chmod 700 /root/.ssh && chmod 600 /root/.ssh/authorized_keys'
```
---
## GPU Passthrough Configuration
### Current GPU Assignments
| GPU | Location | Passed To | VMID | Purpose |
|-----|----------|-----------|------|---------|
| **NVIDIA Quadro P2000** | PVE | - | - | Proxmox host (Plex transcoding via driver) |
| **NVIDIA TITAN RTX** | PVE | saltbox, lmdev1 | 101, 111 | Media transcoding + AI dev (shared) |
| **NVIDIA RTX A6000** | PVE2 | trading-vm | 301 | AI trading (dedicated) |
### How to Pass GPU to VM
1. **Identify GPU PCI ID**:
```bash
ssh pve 'lspci | grep -i nvidia'
# Example output:
# 81:00.0 VGA compatible controller: NVIDIA Corporation TU102 [TITAN RTX] (rev a1)
# 81:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
```
2. **Pass GPU to VM** (include both VGA and Audio):
```bash
ssh pve 'qm set VMID -hostpci0 81:00.0,pcie=1'
# If multi-function device (GPU + Audio), use:
ssh pve 'qm set VMID -hostpci0 81:00,pcie=1'
```
3. **Configure VM for GPU**:
```bash
# Set machine type to q35
ssh pve 'qm set VMID --machine q35'
# Set BIOS to OVMF (UEFI)
ssh pve 'qm set VMID --bios ovmf'
# Add EFI disk
ssh pve 'qm set VMID --efidisk0 nvme-mirror1:1,format=raw,efitype=4m,pre-enrolled-keys=1'
```
4. **Reboot VM** and install NVIDIA drivers inside the VM
**See**: [GPU-PASSTHROUGH.md](#) (coming soon) for detailed guide
---
## Backup Priority
See [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) for complete backup plan.
### Critical VMs (Must Backup)
| Priority | VMID | Name | Reason |
|----------|------|------|--------|
| 🔴 **CRITICAL** | 100 | truenas | All storage lives here - catastrophic if lost |
| 🟡 **HIGH** | 101 | saltbox | Complex media stack config |
| 🟡 **HIGH** | 110 | homeassistant | Home automation config |
| 🟡 **HIGH** | 300 | gitea-vm | Git repositories (code, docs) |
| 🟡 **HIGH** | 301 | trading-vm | Trading algorithms and AI models |
### Medium Priority
| VMID | Name | Notes |
|------|------|-------|
| 200 | pihole | Easy to rebuild, but DNS config valuable |
| 202 | traefik | Config files backed up separately |
### Low Priority (Ephemeral/Rebuildable)
| VMID | Name | Notes |
|------|------|-------|
| 105 | fs-dev | Development - code is in Git |
| 111 | lmdev1 | Ephemeral development |
| 201 | copyparty | Simple app, easy to redeploy |
| 206 | docker-host | Docker Compose files backed up separately |
---
## Quick Reference Commands
```bash
# List all VMs
ssh pve 'qm list'
ssh pve2 'qm list'
# List all containers
ssh pve 'pct list'
# Start/stop VM
ssh pve 'qm start VMID'
ssh pve 'qm stop VMID'
ssh pve 'qm shutdown VMID' # Graceful
# Start/stop container
ssh pve 'pct start CTID'
ssh pve 'pct stop CTID'
ssh pve 'pct shutdown CTID' # Graceful
# VM console
ssh pve 'qm terminal VMID'
# Container console
ssh pve 'pct enter CTID'
# Clone VM
ssh pve 'qm clone VMID NEW_VMID --name newvm'
# Delete VM
ssh pve 'qm destroy VMID'
# Delete container
ssh pve 'pct destroy CTID'
```
---
## Related Documentation
- [STORAGE.md](STORAGE.md) - Storage pool assignments
- [SSH-ACCESS.md](SSH-ACCESS.md) - How to access VMs
- [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) - VM backup strategy
- [POWER-MANAGEMENT.md](POWER-MANAGEMENT.md) - VM resource optimization
- [NETWORK.md](NETWORK.md) - Which bridge to use for new VMs
---
**Last Updated**: 2025-12-22