Complete Phase 2 documentation: Add HARDWARE, SERVICES, MONITORING, MAINTENANCE
Phase 2 documentation implementation: - Created HARDWARE.md: Complete hardware inventory (servers, GPUs, storage, network cards) - Created SERVICES.md: Service inventory with URLs, credentials, health checks (25+ services) - Created MONITORING.md: Health monitoring recommendations, alert setup, implementation plan - Created MAINTENANCE.md: Regular procedures, update schedules, testing checklists - Updated README.md: Added all Phase 2 documentation links - Updated CLAUDE.md: Cleaned up to quick reference only (1340→377 lines) All detailed content now in specialized documentation files with cross-references. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
510
STORAGE.md
Normal file
510
STORAGE.md
Normal file
@@ -0,0 +1,510 @@
|
||||
# Storage Architecture
|
||||
|
||||
Documentation of all storage pools, datasets, shares, and capacity planning across the homelab.
|
||||
|
||||
## Overview
|
||||
|
||||
### Storage Distribution
|
||||
|
||||
| Location | Type | Capacity | Purpose |
|
||||
|----------|------|----------|---------|
|
||||
| **PVE** | NVMe + SSD mirrors | ~9 TB usable | VM storage, fast IO |
|
||||
| **PVE2** | NVMe + HDD mirrors | ~6+ TB usable | VM storage, bulk data |
|
||||
| **TrueNAS** | ZFS pool + EMC enclosure | ~12+ TB usable | Central file storage, NFS/SMB |
|
||||
|
||||
---
|
||||
|
||||
## PVE (10.10.10.120) Storage Pools
|
||||
|
||||
### nvme-mirror1 (Primary Fast Storage)
|
||||
- **Type**: ZFS mirror
|
||||
- **Devices**: 2x Sabrent Rocket Q NVMe
|
||||
- **Capacity**: 3.6 TB usable
|
||||
- **Purpose**: High-performance VM storage
|
||||
- **Used By**:
|
||||
- Critical VMs requiring fast IO
|
||||
- Database workloads
|
||||
- Development environments
|
||||
|
||||
**Check status**:
|
||||
```bash
|
||||
ssh pve 'zpool status nvme-mirror1'
|
||||
ssh pve 'zpool list nvme-mirror1'
|
||||
```
|
||||
|
||||
### nvme-mirror2 (Secondary Fast Storage)
|
||||
- **Type**: ZFS mirror
|
||||
- **Devices**: 2x Kingston SFYRD 2TB NVMe
|
||||
- **Capacity**: 1.8 TB usable
|
||||
- **Purpose**: Additional fast VM storage
|
||||
- **Used By**: TBD
|
||||
|
||||
**Check status**:
|
||||
```bash
|
||||
ssh pve 'zpool status nvme-mirror2'
|
||||
ssh pve 'zpool list nvme-mirror2'
|
||||
```
|
||||
|
||||
### rpool (Root Pool)
|
||||
- **Type**: ZFS mirror
|
||||
- **Devices**: 2x Samsung 870 QVO 4TB SSD
|
||||
- **Capacity**: 3.6 TB usable
|
||||
- **Purpose**: Proxmox OS, container storage, VM backups
|
||||
- **Used By**:
|
||||
- Proxmox root filesystem
|
||||
- LXC containers
|
||||
- Local VM backups
|
||||
|
||||
**Check status**:
|
||||
```bash
|
||||
ssh pve 'zpool status rpool'
|
||||
ssh pve 'df -h /var/lib/vz'
|
||||
```
|
||||
|
||||
### Storage Pool Usage Summary (PVE)
|
||||
|
||||
**Get current usage**:
|
||||
```bash
|
||||
ssh pve 'zpool list'
|
||||
ssh pve 'pvesm status'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## PVE2 (10.10.10.102) Storage Pools
|
||||
|
||||
### nvme-mirror3 (Fast Storage)
|
||||
- **Type**: ZFS mirror
|
||||
- **Devices**: 2x NVMe (model unknown)
|
||||
- **Capacity**: Unknown (needs investigation)
|
||||
- **Purpose**: High-performance VM storage
|
||||
- **Used By**: Trading VM (301), other VMs
|
||||
|
||||
**Check status**:
|
||||
```bash
|
||||
ssh pve2 'zpool status nvme-mirror3'
|
||||
ssh pve2 'zpool list nvme-mirror3'
|
||||
```
|
||||
|
||||
### local-zfs2 (Bulk Storage)
|
||||
- **Type**: ZFS mirror
|
||||
- **Devices**: 2x WD Red 6TB HDD
|
||||
- **Capacity**: ~6 TB usable
|
||||
- **Purpose**: Bulk/archival storage
|
||||
- **Power Management**: 30-minute spindown configured
|
||||
- Saves ~10-16W when idle
|
||||
- Udev rule: `/etc/udev/rules.d/69-hdd-spindown.rules`
|
||||
- Command: `hdparm -S 241` (30 min)
|
||||
|
||||
**Notes**:
|
||||
- Pool had only 768 KB used as of 2024-12-16
|
||||
- Drives configured to spin down after 30 min idle
|
||||
- Good for archival, NOT for active workloads
|
||||
|
||||
**Check status**:
|
||||
```bash
|
||||
ssh pve2 'zpool status local-zfs2'
|
||||
ssh pve2 'zpool list local-zfs2'
|
||||
|
||||
# Check if drives are spun down
|
||||
ssh pve2 'hdparm -C /dev/sdX' # Shows active/standby
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## TrueNAS (VM 100 @ 10.10.10.200) - Central Storage
|
||||
|
||||
### ZFS Pool: vault
|
||||
|
||||
**Primary storage pool** for all shared data.
|
||||
|
||||
**Devices**: ❓ Needs investigation
|
||||
- EMC storage enclosure with multiple drives
|
||||
- SAS connection via LSI SAS2308 HBA (passed through to VM)
|
||||
|
||||
**Capacity**: ❓ Needs investigation
|
||||
|
||||
**Check pool status**:
|
||||
```bash
|
||||
ssh truenas 'zpool status vault'
|
||||
ssh truenas 'zpool list vault'
|
||||
|
||||
# Get detailed capacity
|
||||
ssh truenas 'zfs list -o name,used,avail,refer,mountpoint'
|
||||
```
|
||||
|
||||
### Datasets (Known)
|
||||
|
||||
Based on Syncthing configuration, likely datasets:
|
||||
|
||||
| Dataset | Purpose | Synced Devices | Notes |
|
||||
|---------|---------|----------------|-------|
|
||||
| vault/documents | Personal documents | Mac Mini, MacBook, Windows PC, Phone | ~11 GB |
|
||||
| vault/downloads | Downloads folder | Mac Mini, TrueNAS | ~38 GB |
|
||||
| vault/pictures | Photos | Mac Mini, MacBook, Phone | Unknown size |
|
||||
| vault/notes | Note files | Mac Mini, MacBook, Phone | Unknown size |
|
||||
| vault/desktop | Desktop sync | Unknown | 7.2 GB |
|
||||
| vault/movies | Movie library | Unknown | Unknown size |
|
||||
| vault/config | Config files | Mac Mini, MacBook | Unknown size |
|
||||
|
||||
**Get complete dataset list**:
|
||||
```bash
|
||||
ssh truenas 'zfs list -r vault'
|
||||
```
|
||||
|
||||
### NFS/SMB Shares
|
||||
|
||||
**Status**: ❓ Not documented
|
||||
|
||||
**Needs investigation**:
|
||||
```bash
|
||||
# List NFS exports
|
||||
ssh truenas 'showmount -e localhost'
|
||||
|
||||
# List SMB shares
|
||||
ssh truenas 'smbclient -L localhost -N'
|
||||
|
||||
# Via TrueNAS API/UI
|
||||
# Sharing → Unix Shares (NFS)
|
||||
# Sharing → Windows Shares (SMB)
|
||||
```
|
||||
|
||||
**Expected shares**:
|
||||
- Media libraries for Plex (on Saltbox VM)
|
||||
- Document storage
|
||||
- VM backups?
|
||||
- ISO storage?
|
||||
|
||||
### EMC Storage Enclosure
|
||||
|
||||
**Model**: EMC KTN-STL4 (or similar)
|
||||
**Connection**: SAS via LSI SAS2308 HBA (passthrough to TrueNAS VM)
|
||||
**Drives**: ❓ Unknown count and capacity
|
||||
|
||||
**See [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md)** for:
|
||||
- SES commands
|
||||
- Fan control
|
||||
- LCC (Link Control Card) troubleshooting
|
||||
- Maintenance procedures
|
||||
|
||||
**Check enclosure status**:
|
||||
```bash
|
||||
ssh truenas 'sg_ses --page=0x02 /dev/sgX' # Element descriptor
|
||||
ssh truenas 'smartctl --scan' # List all drives
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Storage Network Architecture
|
||||
|
||||
### Internal Storage Network (10.10.10.20.0/24)
|
||||
|
||||
**Purpose**: Dedicated network for NFS/iSCSI traffic to reduce congestion on main network.
|
||||
|
||||
**Bridge**: vmbr3 on PVE (virtual bridge, no physical NIC)
|
||||
**Subnet**: 10.10.10.20.0/24
|
||||
**DHCP**: No
|
||||
**Gateway**: No (internal only, no internet)
|
||||
|
||||
**Connected VMs**:
|
||||
- TrueNAS VM (secondary NIC)
|
||||
- Saltbox VM (secondary NIC) - for NFS mounts
|
||||
- Other VMs needing storage access
|
||||
|
||||
**Configuration**:
|
||||
```bash
|
||||
# On TrueNAS VM - check second NIC
|
||||
ssh truenas 'ip addr show enp6s19'
|
||||
|
||||
# On Saltbox - check NFS mounts
|
||||
ssh saltbox 'mount | grep nfs'
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
- Separates storage traffic from general network
|
||||
- Prevents NFS/SMB from saturating main network
|
||||
- Better performance for storage-heavy workloads
|
||||
|
||||
---
|
||||
|
||||
## Storage Capacity Planning
|
||||
|
||||
### Current Usage (Estimate)
|
||||
|
||||
**Needs actual audit**:
|
||||
```bash
|
||||
# PVE pools
|
||||
ssh pve 'zpool list -o name,size,alloc,free'
|
||||
|
||||
# PVE2 pools
|
||||
ssh pve2 'zpool list -o name,size,alloc,free'
|
||||
|
||||
# TrueNAS vault pool
|
||||
ssh truenas 'zpool list vault'
|
||||
|
||||
# Get detailed breakdown
|
||||
ssh truenas 'zfs list -r vault -o name,used,avail'
|
||||
```
|
||||
|
||||
### Growth Rate
|
||||
|
||||
**Needs tracking** - recommend monthly snapshots of capacity:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Save as ~/bin/storage-capacity-report.sh
|
||||
|
||||
DATE=$(date +%Y-%m-%d)
|
||||
REPORT=~/Backups/storage-reports/capacity-$DATE.txt
|
||||
|
||||
mkdir -p ~/Backups/storage-reports
|
||||
|
||||
echo "Storage Capacity Report - $DATE" > $REPORT
|
||||
echo "================================" >> $REPORT
|
||||
echo "" >> $REPORT
|
||||
|
||||
echo "PVE Pools:" >> $REPORT
|
||||
ssh pve 'zpool list' >> $REPORT
|
||||
echo "" >> $REPORT
|
||||
|
||||
echo "PVE2 Pools:" >> $REPORT
|
||||
ssh pve2 'zpool list' >> $REPORT
|
||||
echo "" >> $REPORT
|
||||
|
||||
echo "TrueNAS Pools:" >> $REPORT
|
||||
ssh truenas 'zpool list' >> $REPORT
|
||||
echo "" >> $REPORT
|
||||
|
||||
echo "TrueNAS Datasets:" >> $REPORT
|
||||
ssh truenas 'zfs list -r vault -o name,used,avail' >> $REPORT
|
||||
|
||||
echo "Report saved to $REPORT"
|
||||
```
|
||||
|
||||
**Run monthly via cron**:
|
||||
```cron
|
||||
0 9 1 * * ~/bin/storage-capacity-report.sh
|
||||
```
|
||||
|
||||
### Expansion Planning
|
||||
|
||||
**When to expand**:
|
||||
- Pool reaches 80% capacity
|
||||
- Performance degrades
|
||||
- New workloads require more space
|
||||
|
||||
**Expansion options**:
|
||||
1. Add drives to existing pools (if mirrors, add mirror vdev)
|
||||
2. Add new NVMe drives to PVE/PVE2
|
||||
3. Expand EMC enclosure (add more drives)
|
||||
4. Add second EMC enclosure
|
||||
|
||||
**Cost estimates**: TBD
|
||||
|
||||
---
|
||||
|
||||
## ZFS Health Monitoring
|
||||
|
||||
### Daily Health Checks
|
||||
|
||||
```bash
|
||||
# Check for errors on all pools
|
||||
ssh pve 'zpool status -x' # Shows only unhealthy pools
|
||||
ssh pve2 'zpool status -x'
|
||||
ssh truenas 'zpool status -x'
|
||||
|
||||
# Check scrub status
|
||||
ssh pve 'zpool status | grep scrub'
|
||||
ssh pve2 'zpool status | grep scrub'
|
||||
ssh truenas 'zpool status | grep scrub'
|
||||
```
|
||||
|
||||
### Scrub Schedule
|
||||
|
||||
**Recommended**: Monthly scrub on all pools
|
||||
|
||||
**Configure scrub**:
|
||||
```bash
|
||||
# Via Proxmox UI: Node → Disks → ZFS → Select pool → Scrub
|
||||
# Or via cron:
|
||||
0 2 1 * * /sbin/zpool scrub nvme-mirror1
|
||||
0 2 1 * * /sbin/zpool scrub rpool
|
||||
```
|
||||
|
||||
**On TrueNAS**:
|
||||
- Configure via UI: Storage → Pools → Scrub Tasks
|
||||
- Recommended: 1st of every month at 2 AM
|
||||
|
||||
### SMART Monitoring
|
||||
|
||||
**Check drive health**:
|
||||
```bash
|
||||
# PVE
|
||||
ssh pve 'smartctl -a /dev/nvme0'
|
||||
ssh pve 'smartctl -a /dev/sda'
|
||||
|
||||
# TrueNAS
|
||||
ssh truenas 'smartctl --scan'
|
||||
ssh truenas 'smartctl -a /dev/sdX' # For each drive
|
||||
```
|
||||
|
||||
**Configure SMART tests**:
|
||||
- TrueNAS UI: Tasks → S.M.A.R.T. Tests
|
||||
- Recommended: Weekly short test, monthly long test
|
||||
|
||||
### Alerts
|
||||
|
||||
**Set up email alerts for**:
|
||||
- ZFS pool errors
|
||||
- SMART test failures
|
||||
- Pool capacity > 80%
|
||||
- Scrub failures
|
||||
|
||||
---
|
||||
|
||||
## Storage Performance Tuning
|
||||
|
||||
### ZFS ARC (Cache)
|
||||
|
||||
**Check ARC usage**:
|
||||
```bash
|
||||
ssh pve 'arc_summary'
|
||||
ssh truenas 'arc_summary'
|
||||
```
|
||||
|
||||
**Tuning** (if needed):
|
||||
- PVE/PVE2: Set max ARC in `/etc/modprobe.d/zfs.conf`
|
||||
- TrueNAS: Configure via UI (System → Advanced → Tunables)
|
||||
|
||||
### NFS Performance
|
||||
|
||||
**Mount options** (on clients like Saltbox):
|
||||
```
|
||||
rsize=131072,wsize=131072,hard,timeo=600,retrans=2,vers=3
|
||||
```
|
||||
|
||||
**Verify NFS mounts**:
|
||||
```bash
|
||||
ssh saltbox 'mount | grep nfs'
|
||||
```
|
||||
|
||||
### Record Size Optimization
|
||||
|
||||
**Different workloads need different record sizes**:
|
||||
- VMs: 64K (default, good for VMs)
|
||||
- Databases: 8K or 16K
|
||||
- Media files: 1M (large sequential reads)
|
||||
|
||||
**Set record size** (on TrueNAS datasets):
|
||||
```bash
|
||||
ssh truenas 'zfs set recordsize=1M vault/movies'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Disaster Recovery
|
||||
|
||||
### Pool Recovery
|
||||
|
||||
**If a pool fails to import**:
|
||||
```bash
|
||||
# Try importing with different name
|
||||
zpool import -f -N poolname newpoolname
|
||||
|
||||
# Check pool with readonly
|
||||
zpool import -f -o readonly=on poolname
|
||||
|
||||
# Force import (last resort)
|
||||
zpool import -f -F poolname
|
||||
```
|
||||
|
||||
### Drive Replacement
|
||||
|
||||
**When a drive fails**:
|
||||
```bash
|
||||
# Identify failed drive
|
||||
zpool status poolname
|
||||
|
||||
# Replace drive
|
||||
zpool replace poolname old-device new-device
|
||||
|
||||
# Monitor resilver
|
||||
watch zpool status poolname
|
||||
```
|
||||
|
||||
### Data Recovery
|
||||
|
||||
**If pool is completely lost**:
|
||||
1. Restore from offsite backup (see [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md))
|
||||
2. Recreate pool structure
|
||||
3. Restore data
|
||||
|
||||
**Critical**: This is why we need offsite backups!
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Common Commands
|
||||
|
||||
```bash
|
||||
# Pool status
|
||||
zpool status [poolname]
|
||||
zpool list
|
||||
|
||||
# Dataset usage
|
||||
zfs list
|
||||
zfs list -r vault
|
||||
|
||||
# Check pool health (only unhealthy)
|
||||
zpool status -x
|
||||
|
||||
# Scrub pool
|
||||
zpool scrub poolname
|
||||
|
||||
# Get pool IO stats
|
||||
zpool iostat -v 1
|
||||
|
||||
# Snapshot management
|
||||
zfs snapshot poolname/dataset@snapname
|
||||
zfs list -t snapshot
|
||||
zfs rollback poolname/dataset@snapname
|
||||
zfs destroy poolname/dataset@snapname
|
||||
```
|
||||
|
||||
### Storage Locations by Use Case
|
||||
|
||||
| Use Case | Recommended Storage | Why |
|
||||
|----------|---------------------|-----|
|
||||
| VM OS disk | nvme-mirror1 (PVE) | Fastest IO |
|
||||
| Database | nvme-mirror1/2 | Low latency |
|
||||
| Media files | TrueNAS vault | Large capacity |
|
||||
| Development | nvme-mirror2 | Fast, mid-tier |
|
||||
| Containers | rpool | Good performance |
|
||||
| Backups | TrueNAS or rpool | Large capacity |
|
||||
| Archive | local-zfs2 (PVE2) | Cheap, can spin down |
|
||||
|
||||
---
|
||||
|
||||
## Investigation Needed
|
||||
|
||||
- [ ] Get complete TrueNAS dataset list
|
||||
- [ ] Document NFS/SMB share configuration
|
||||
- [ ] Inventory EMC enclosure drives (count, capacity, model)
|
||||
- [ ] Document current pool usage percentages
|
||||
- [ ] Set up monthly capacity reports
|
||||
- [ ] Configure ZFS scrub schedules
|
||||
- [ ] Set up storage health alerts
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [BACKUP-STRATEGY.md](BACKUP-STRATEGY.md) - Backup and snapshot strategy
|
||||
- [EMC-ENCLOSURE.md](EMC-ENCLOSURE.md) - Storage enclosure maintenance
|
||||
- [VMS.md](VMS.md) - VM storage assignments
|
||||
- [NETWORK.md](NETWORK.md) - Storage network configuration
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-22
|
||||
Reference in New Issue
Block a user