Initial commit: Homelab infrastructure documentation
- CLAUDE.md: Main homelab assistant context and instructions - IP-ASSIGNMENTS.md: Complete IP address assignments - NETWORK.md: Network bridges, VLANs, and configuration - EMC-ENCLOSURE.md: EMC storage enclosure documentation - SYNCTHING.md: Syncthing setup and device list - SHELL-ALIASES.md: ZSH aliases for Claude Code sessions - HOMEASSISTANT.md: Home Assistant API and automations - INFRASTRUCTURE.md: Server hardware and power management - configs/: Shared shell configurations - scripts/: Utility scripts - mcp-central/: MCP server configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
247
EMC-ENCLOSURE.md
Normal file
247
EMC-ENCLOSURE.md
Normal file
@@ -0,0 +1,247 @@
|
||||
# EMC Storage Enclosure Documentation
|
||||
|
||||
## Hardware Overview
|
||||
|
||||
| Component | Details |
|
||||
|-----------|---------|
|
||||
| **Model** | EMC ESES Viper DAE (KTN-STL3) |
|
||||
| **Capacity** | 15x 3.5" SAS/SATA drive bays |
|
||||
| **SES Device** | `/dev/sg15` (on TrueNAS) |
|
||||
| **Connection** | SAS to LSI SAS2308 HBA (mpt2sas driver) |
|
||||
| **Location** | Connected to PVE (10.10.10.120) via TrueNAS VM |
|
||||
|
||||
## Components
|
||||
|
||||
### LCC Controllers (Link Control Cards)
|
||||
The enclosure has **dual LCC controllers** for redundancy:
|
||||
|
||||
| Controller | Slot | Status | Notes |
|
||||
|------------|------|--------|-------|
|
||||
| **LCC A** | Left | Working | Currently in use |
|
||||
| **LCC B** | Right | Faulty | Causes high fan speed, SAS discovery failure |
|
||||
|
||||
**Replacement Part**: EMC 303-108-000E VIPER 6G SAS LCC (~$15 on eBay)
|
||||
|
||||
### Power Supplies
|
||||
Two redundant PSUs with integrated fans.
|
||||
|
||||
### Fans
|
||||
Multiple cooling fans controlled by enclosure firmware. Fan speeds are **automatically managed** based on temperature - manual override is not supported on EMC ESES enclosures.
|
||||
|
||||
**Fan Speed Codes**:
|
||||
| Code | Description | RPM (approx) |
|
||||
|------|-------------|--------------|
|
||||
| 1 | Lowest | ~1500 |
|
||||
| 2 | Second lowest | ~2000 |
|
||||
| 3 | Third lowest | ~2670 |
|
||||
| 4 | Medium | ~3300 |
|
||||
| 5 | Fifth | ~4160 |
|
||||
| 6 | Sixth | ~4800 |
|
||||
| 7 | Highest | ~5500+ |
|
||||
|
||||
## ZFS Pool Using This Enclosure
|
||||
|
||||
```
|
||||
Pool: vault
|
||||
Size: 164TB raidz1
|
||||
Drives: 13x HDD in raidz1 + special mirror + NVMe cache/log
|
||||
Mount: /mnt/vault on TrueNAS
|
||||
```
|
||||
|
||||
## SES Commands Reference
|
||||
|
||||
All commands run from TrueNAS (VM 100):
|
||||
|
||||
```bash
|
||||
# Check overall enclosure status
|
||||
sg_ses -p 0x02 /dev/sg15
|
||||
|
||||
# Check fan speeds
|
||||
sg_ses --index=coo,-1 --get=speed_code /dev/sg15
|
||||
|
||||
# Check temperatures
|
||||
sg_ses -p 0x02 /dev/sg15 | grep -E "(Temperature|Cooling)"
|
||||
|
||||
# Check PSU status
|
||||
sg_ses -p 0x02 /dev/sg15 | grep -A5 "Power supply"
|
||||
|
||||
# Check LCC controller status
|
||||
sg_ses -p 0x02 /dev/sg15 | grep -A5 "Enclosure services controller"
|
||||
|
||||
# List all SES elements
|
||||
sg_ses -p 0x07 /dev/sg15
|
||||
|
||||
# Identify enclosure (flash LEDs)
|
||||
sg_ses --index=enc,0 --set=ident:1 /dev/sg15
|
||||
```
|
||||
|
||||
### Running SES Commands via Proxmox
|
||||
|
||||
```bash
|
||||
# From Mac (via SSH key auth)
|
||||
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15"'
|
||||
|
||||
# Quick fan check
|
||||
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
|
||||
|
||||
# Quick temp check
|
||||
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep Temperature"'
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Symptom: Fans Running Loud (Speed 5+)
|
||||
|
||||
**Possible Causes**:
|
||||
1. **Faulty LCC controller** - Switch to other LCC
|
||||
2. **High temperatures** - Check temp sensors
|
||||
3. **PSU issue** - Check PSU status via SES
|
||||
4. **Failed drive** - Check drive status LEDs
|
||||
|
||||
**Diagnosis Steps**:
|
||||
```bash
|
||||
# 1. Check current fan speed
|
||||
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
|
||||
# Normal: 1-3, High: 4-5, Critical: 6-7
|
||||
|
||||
# 2. Check temperatures
|
||||
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep Temperature"'
|
||||
# Normal: 25-40C, Warning: 45-50C, Critical: 55C+
|
||||
|
||||
# 3. Check for component failures
|
||||
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep -i fail"'
|
||||
|
||||
# 4. If no obvious cause, try switching LCC
|
||||
# Power down enclosure, move SAS cable to other LCC port
|
||||
```
|
||||
|
||||
### Symptom: Drives Not Detected After Enclosure Power Cycle
|
||||
|
||||
**Possible Causes**:
|
||||
1. Enclosure not fully initialized (wait for green LEDs to stop blinking)
|
||||
2. Faulty LCC controller
|
||||
3. SAS cable loose
|
||||
4. HBA needs rescan
|
||||
|
||||
**Diagnosis Steps**:
|
||||
```bash
|
||||
# 1. Check SAS link status
|
||||
cat /sys/class/sas_phy/*/negotiated_linkrate
|
||||
|
||||
# 2. Check for expanders (should show enclosure)
|
||||
lsscsi -g | grep -i enclo
|
||||
|
||||
# 3. Force HBA rescan
|
||||
echo "- - -" > /sys/class/scsi_host/host0/scan
|
||||
|
||||
# 4. If no expander, check SAS cable and try other LCC port
|
||||
```
|
||||
|
||||
### Symptom: Pool Won't Import After Enclosure Maintenance
|
||||
|
||||
```bash
|
||||
# 1. Wait for enclosure to fully initialize (1-2 minutes)
|
||||
|
||||
# 2. Rescan for devices
|
||||
echo "- - -" > /sys/class/scsi_host/host0/scan
|
||||
|
||||
# 3. Import pool
|
||||
zpool import vault
|
||||
|
||||
# 4. If read-only mount issues, reboot TrueNAS
|
||||
ssh pve 'qm reboot 100'
|
||||
```
|
||||
|
||||
## Maintenance Procedures
|
||||
|
||||
### Safe Shutdown for Enclosure Maintenance
|
||||
|
||||
```bash
|
||||
# 1. Stop services using the pool
|
||||
ssh pve 'qm guest exec 101 -- bash -c "docker stop \$(docker ps -q)"'
|
||||
|
||||
# 2. Shutdown TrueNAS (auto-exports ZFS pool)
|
||||
ssh pve 'qm shutdown 100 --timeout 120'
|
||||
|
||||
# 3. Wait for TrueNAS to fully stop
|
||||
ssh pve 'while qm status 100 | grep -q running; do sleep 5; done'
|
||||
|
||||
# 4. Power off enclosure
|
||||
# (Physical switch or PDU)
|
||||
|
||||
# 5. Perform maintenance
|
||||
|
||||
# 6. Power on enclosure, wait for initialization (green LEDs solid)
|
||||
|
||||
# 7. Start TrueNAS
|
||||
ssh pve 'qm start 100'
|
||||
|
||||
# 8. Verify pool imported
|
||||
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
|
||||
```
|
||||
|
||||
### Hot-Swap LCC Controller
|
||||
|
||||
LCCs can be hot-swapped while enclosure is running:
|
||||
|
||||
1. Order replacement LCC (EMC 303-108-000E)
|
||||
2. Move SAS cable to working LCC (if not already)
|
||||
3. Wait for drives to come online via new LCC
|
||||
4. Remove faulty LCC
|
||||
5. Install replacement LCC
|
||||
6. Optionally move SAS cable back to original port
|
||||
|
||||
## Incident Log
|
||||
|
||||
### 2024-12-19: LCC B Failure
|
||||
|
||||
**Symptoms**:
|
||||
- Fans running at speed code 5 (~4160 RPM) - very loud
|
||||
- After enclosure power cycle, drives not detected
|
||||
- SAS link UP (4 PHYs at 6.0 Gbit) but no expander discovery
|
||||
|
||||
**Root Cause**:
|
||||
LCC B controller malfunction causing:
|
||||
- False temperature/error readings → high fan speed
|
||||
- SAS expander not responding → drives not enumerated
|
||||
|
||||
**Resolution**:
|
||||
1. Moved SAS cable from LCC B to LCC A
|
||||
2. Drives immediately appeared
|
||||
3. Fan speed dropped to code 3 (2670 RPM) - quiet
|
||||
4. Imported vault pool, all data intact
|
||||
|
||||
**Replacement Ordered**:
|
||||
- Part: EMC 303-108-000E VIPER 6G SAS LCC
|
||||
- Source: eBay
|
||||
- Price: $14.95 + free shipping
|
||||
|
||||
## LED Status Reference
|
||||
|
||||
### Drive LEDs
|
||||
| LED | Color | Status |
|
||||
|-----|-------|--------|
|
||||
| Solid Blue | Power | Drive has power |
|
||||
| Blinking Blue | Activity | I/O in progress |
|
||||
| Solid Amber | Fault | Drive failed |
|
||||
| Blinking Amber | Identify | Drive being located |
|
||||
|
||||
### LCC LEDs
|
||||
| LED | Color | Status |
|
||||
|-----|-------|--------|
|
||||
| Solid Green | Link | SAS connection active |
|
||||
| Blinking Green | Activity | Data transfer |
|
||||
| Amber | Fault | LCC issue |
|
||||
|
||||
### PSU LEDs
|
||||
| LED | Color | Status |
|
||||
|-----|-------|--------|
|
||||
| Solid Green | OK | Power supply healthy |
|
||||
| Off | No Power | No AC input |
|
||||
| Amber | Fault | PSU failure |
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [CLAUDE.md](CLAUDE.md) - Main homelab documentation
|
||||
- [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - Network configuration
|
||||
- TrueNAS Web UI: https://10.10.10.200
|
||||
Reference in New Issue
Block a user