Files
homelab-docs/STORAGE.md
Hutson 56b82df497 Complete Phase 2 documentation: Add HARDWARE, SERVICES, MONITORING, MAINTENANCE
Phase 2 documentation implementation:
- Created HARDWARE.md: Complete hardware inventory (servers, GPUs, storage, network cards)
- Created SERVICES.md: Service inventory with URLs, credentials, health checks (25+ services)
- Created MONITORING.md: Health monitoring recommendations, alert setup, implementation plan
- Created MAINTENANCE.md: Regular procedures, update schedules, testing checklists
- Updated README.md: Added all Phase 2 documentation links
- Updated CLAUDE.md: Cleaned up to quick reference only (1340→377 lines)

All detailed content now in specialized documentation files with cross-references.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-23 00:34:21 -05:00

11 KiB

Storage Architecture

Documentation of all storage pools, datasets, shares, and capacity planning across the homelab.

Overview

Storage Distribution

Location Type Capacity Purpose
PVE NVMe + SSD mirrors ~9 TB usable VM storage, fast IO
PVE2 NVMe + HDD mirrors ~6+ TB usable VM storage, bulk data
TrueNAS ZFS pool + EMC enclosure ~12+ TB usable Central file storage, NFS/SMB

PVE (10.10.10.120) Storage Pools

nvme-mirror1 (Primary Fast Storage)

  • Type: ZFS mirror
  • Devices: 2x Sabrent Rocket Q NVMe
  • Capacity: 3.6 TB usable
  • Purpose: High-performance VM storage
  • Used By:
    • Critical VMs requiring fast IO
    • Database workloads
    • Development environments

Check status:

ssh pve 'zpool status nvme-mirror1'
ssh pve 'zpool list nvme-mirror1'

nvme-mirror2 (Secondary Fast Storage)

  • Type: ZFS mirror
  • Devices: 2x Kingston SFYRD 2TB NVMe
  • Capacity: 1.8 TB usable
  • Purpose: Additional fast VM storage
  • Used By: TBD

Check status:

ssh pve 'zpool status nvme-mirror2'
ssh pve 'zpool list nvme-mirror2'

rpool (Root Pool)

  • Type: ZFS mirror
  • Devices: 2x Samsung 870 QVO 4TB SSD
  • Capacity: 3.6 TB usable
  • Purpose: Proxmox OS, container storage, VM backups
  • Used By:
    • Proxmox root filesystem
    • LXC containers
    • Local VM backups

Check status:

ssh pve 'zpool status rpool'
ssh pve 'df -h /var/lib/vz'

Storage Pool Usage Summary (PVE)

Get current usage:

ssh pve 'zpool list'
ssh pve 'pvesm status'

PVE2 (10.10.10.102) Storage Pools

nvme-mirror3 (Fast Storage)

  • Type: ZFS mirror
  • Devices: 2x NVMe (model unknown)
  • Capacity: Unknown (needs investigation)
  • Purpose: High-performance VM storage
  • Used By: Trading VM (301), other VMs

Check status:

ssh pve2 'zpool status nvme-mirror3'
ssh pve2 'zpool list nvme-mirror3'

local-zfs2 (Bulk Storage)

  • Type: ZFS mirror
  • Devices: 2x WD Red 6TB HDD
  • Capacity: ~6 TB usable
  • Purpose: Bulk/archival storage
  • Power Management: 30-minute spindown configured
    • Saves ~10-16W when idle
    • Udev rule: /etc/udev/rules.d/69-hdd-spindown.rules
    • Command: hdparm -S 241 (30 min)

Notes:

  • Pool had only 768 KB used as of 2024-12-16
  • Drives configured to spin down after 30 min idle
  • Good for archival, NOT for active workloads

Check status:

ssh pve2 'zpool status local-zfs2'
ssh pve2 'zpool list local-zfs2'

# Check if drives are spun down
ssh pve2 'hdparm -C /dev/sdX'  # Shows active/standby

TrueNAS (VM 100 @ 10.10.10.200) - Central Storage

ZFS Pool: vault

Primary storage pool for all shared data.

Devices: Needs investigation

  • EMC storage enclosure with multiple drives
  • SAS connection via LSI SAS2308 HBA (passed through to VM)

Capacity: Needs investigation

Check pool status:

ssh truenas 'zpool status vault'
ssh truenas 'zpool list vault'

# Get detailed capacity
ssh truenas 'zfs list -o name,used,avail,refer,mountpoint'

Datasets (Known)

Based on Syncthing configuration, likely datasets:

Dataset Purpose Synced Devices Notes
vault/documents Personal documents Mac Mini, MacBook, Windows PC, Phone ~11 GB
vault/downloads Downloads folder Mac Mini, TrueNAS ~38 GB
vault/pictures Photos Mac Mini, MacBook, Phone Unknown size
vault/notes Note files Mac Mini, MacBook, Phone Unknown size
vault/desktop Desktop sync Unknown 7.2 GB
vault/movies Movie library Unknown Unknown size
vault/config Config files Mac Mini, MacBook Unknown size

Get complete dataset list:

ssh truenas 'zfs list -r vault'

NFS/SMB Shares

Status: Not documented

Needs investigation:

# List NFS exports
ssh truenas 'showmount -e localhost'

# List SMB shares
ssh truenas 'smbclient -L localhost -N'

# Via TrueNAS API/UI
# Sharing → Unix Shares (NFS)
# Sharing → Windows Shares (SMB)

Expected shares:

  • Media libraries for Plex (on Saltbox VM)
  • Document storage
  • VM backups?
  • ISO storage?

EMC Storage Enclosure

Model: EMC KTN-STL4 (or similar) Connection: SAS via LSI SAS2308 HBA (passthrough to TrueNAS VM) Drives: Unknown count and capacity

See EMC-ENCLOSURE.md for:

  • SES commands
  • Fan control
  • LCC (Link Control Card) troubleshooting
  • Maintenance procedures

Check enclosure status:

ssh truenas 'sg_ses --page=0x02 /dev/sgX'  # Element descriptor
ssh truenas 'smartctl --scan'              # List all drives

Storage Network Architecture

Internal Storage Network (10.10.10.20.0/24)

Purpose: Dedicated network for NFS/iSCSI traffic to reduce congestion on main network.

Bridge: vmbr3 on PVE (virtual bridge, no physical NIC) Subnet: 10.10.10.20.0/24 DHCP: No Gateway: No (internal only, no internet)

Connected VMs:

  • TrueNAS VM (secondary NIC)
  • Saltbox VM (secondary NIC) - for NFS mounts
  • Other VMs needing storage access

Configuration:

# On TrueNAS VM - check second NIC
ssh truenas 'ip addr show enp6s19'

# On Saltbox - check NFS mounts
ssh saltbox 'mount | grep nfs'

Benefits:

  • Separates storage traffic from general network
  • Prevents NFS/SMB from saturating main network
  • Better performance for storage-heavy workloads

Storage Capacity Planning

Current Usage (Estimate)

Needs actual audit:

# PVE pools
ssh pve 'zpool list -o name,size,alloc,free'

# PVE2 pools
ssh pve2 'zpool list -o name,size,alloc,free'

# TrueNAS vault pool
ssh truenas 'zpool list vault'

# Get detailed breakdown
ssh truenas 'zfs list -r vault -o name,used,avail'

Growth Rate

Needs tracking - recommend monthly snapshots of capacity:

#!/bin/bash
# Save as ~/bin/storage-capacity-report.sh

DATE=$(date +%Y-%m-%d)
REPORT=~/Backups/storage-reports/capacity-$DATE.txt

mkdir -p ~/Backups/storage-reports

echo "Storage Capacity Report - $DATE" > $REPORT
echo "================================" >> $REPORT
echo "" >> $REPORT

echo "PVE Pools:" >> $REPORT
ssh pve 'zpool list' >> $REPORT
echo "" >> $REPORT

echo "PVE2 Pools:" >> $REPORT
ssh pve2 'zpool list' >> $REPORT
echo "" >> $REPORT

echo "TrueNAS Pools:" >> $REPORT
ssh truenas 'zpool list' >> $REPORT
echo "" >> $REPORT

echo "TrueNAS Datasets:" >> $REPORT
ssh truenas 'zfs list -r vault -o name,used,avail' >> $REPORT

echo "Report saved to $REPORT"

Run monthly via cron:

0 9 1 * * ~/bin/storage-capacity-report.sh

Expansion Planning

When to expand:

  • Pool reaches 80% capacity
  • Performance degrades
  • New workloads require more space

Expansion options:

  1. Add drives to existing pools (if mirrors, add mirror vdev)
  2. Add new NVMe drives to PVE/PVE2
  3. Expand EMC enclosure (add more drives)
  4. Add second EMC enclosure

Cost estimates: TBD


ZFS Health Monitoring

Daily Health Checks

# Check for errors on all pools
ssh pve 'zpool status -x'     # Shows only unhealthy pools
ssh pve2 'zpool status -x'
ssh truenas 'zpool status -x'

# Check scrub status
ssh pve 'zpool status | grep scrub'
ssh pve2 'zpool status | grep scrub'
ssh truenas 'zpool status | grep scrub'

Scrub Schedule

Recommended: Monthly scrub on all pools

Configure scrub:

# Via Proxmox UI: Node → Disks → ZFS → Select pool → Scrub
# Or via cron:
0 2 1 * * /sbin/zpool scrub nvme-mirror1
0 2 1 * * /sbin/zpool scrub rpool

On TrueNAS:

  • Configure via UI: Storage → Pools → Scrub Tasks
  • Recommended: 1st of every month at 2 AM

SMART Monitoring

Check drive health:

# PVE
ssh pve 'smartctl -a /dev/nvme0'
ssh pve 'smartctl -a /dev/sda'

# TrueNAS
ssh truenas 'smartctl --scan'
ssh truenas 'smartctl -a /dev/sdX'  # For each drive

Configure SMART tests:

  • TrueNAS UI: Tasks → S.M.A.R.T. Tests
  • Recommended: Weekly short test, monthly long test

Alerts

Set up email alerts for:

  • ZFS pool errors
  • SMART test failures
  • Pool capacity > 80%
  • Scrub failures

Storage Performance Tuning

ZFS ARC (Cache)

Check ARC usage:

ssh pve 'arc_summary'
ssh truenas 'arc_summary'

Tuning (if needed):

  • PVE/PVE2: Set max ARC in /etc/modprobe.d/zfs.conf
  • TrueNAS: Configure via UI (System → Advanced → Tunables)

NFS Performance

Mount options (on clients like Saltbox):

rsize=131072,wsize=131072,hard,timeo=600,retrans=2,vers=3

Verify NFS mounts:

ssh saltbox 'mount | grep nfs'

Record Size Optimization

Different workloads need different record sizes:

  • VMs: 64K (default, good for VMs)
  • Databases: 8K or 16K
  • Media files: 1M (large sequential reads)

Set record size (on TrueNAS datasets):

ssh truenas 'zfs set recordsize=1M vault/movies'

Disaster Recovery

Pool Recovery

If a pool fails to import:

# Try importing with different name
zpool import -f -N poolname newpoolname

# Check pool with readonly
zpool import -f -o readonly=on poolname

# Force import (last resort)
zpool import -f -F poolname

Drive Replacement

When a drive fails:

# Identify failed drive
zpool status poolname

# Replace drive
zpool replace poolname old-device new-device

# Monitor resilver
watch zpool status poolname

Data Recovery

If pool is completely lost:

  1. Restore from offsite backup (see BACKUP-STRATEGY.md)
  2. Recreate pool structure
  3. Restore data

Critical: This is why we need offsite backups!


Quick Reference

Common Commands

# Pool status
zpool status [poolname]
zpool list

# Dataset usage
zfs list
zfs list -r vault

# Check pool health (only unhealthy)
zpool status -x

# Scrub pool
zpool scrub poolname

# Get pool IO stats
zpool iostat -v 1

# Snapshot management
zfs snapshot poolname/dataset@snapname
zfs list -t snapshot
zfs rollback poolname/dataset@snapname
zfs destroy poolname/dataset@snapname

Storage Locations by Use Case

Use Case Recommended Storage Why
VM OS disk nvme-mirror1 (PVE) Fastest IO
Database nvme-mirror1/2 Low latency
Media files TrueNAS vault Large capacity
Development nvme-mirror2 Fast, mid-tier
Containers rpool Good performance
Backups TrueNAS or rpool Large capacity
Archive local-zfs2 (PVE2) Cheap, can spin down

Investigation Needed

  • Get complete TrueNAS dataset list
  • Document NFS/SMB share configuration
  • Inventory EMC enclosure drives (count, capacity, model)
  • Document current pool usage percentages
  • Set up monthly capacity reports
  • Configure ZFS scrub schedules
  • Set up storage health alerts


Last Updated: 2025-12-22