Files
homelab-docs/EMC-ENCLOSURE.md
Hutson 93821d1557 Initial commit: Homelab infrastructure documentation
- CLAUDE.md: Main homelab assistant context and instructions
- IP-ASSIGNMENTS.md: Complete IP address assignments
- NETWORK.md: Network bridges, VLANs, and configuration
- EMC-ENCLOSURE.md: EMC storage enclosure documentation
- SYNCTHING.md: Syncthing setup and device list
- SHELL-ALIASES.md: ZSH aliases for Claude Code sessions
- HOMEASSISTANT.md: Home Assistant API and automations
- INFRASTRUCTURE.md: Server hardware and power management
- configs/: Shared shell configurations
- scripts/: Utility scripts
- mcp-central/: MCP server configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 02:31:02 -05:00

248 lines
6.4 KiB
Markdown

# EMC Storage Enclosure Documentation
## Hardware Overview
| Component | Details |
|-----------|---------|
| **Model** | EMC ESES Viper DAE (KTN-STL3) |
| **Capacity** | 15x 3.5" SAS/SATA drive bays |
| **SES Device** | `/dev/sg15` (on TrueNAS) |
| **Connection** | SAS to LSI SAS2308 HBA (mpt2sas driver) |
| **Location** | Connected to PVE (10.10.10.120) via TrueNAS VM |
## Components
### LCC Controllers (Link Control Cards)
The enclosure has **dual LCC controllers** for redundancy:
| Controller | Slot | Status | Notes |
|------------|------|--------|-------|
| **LCC A** | Left | Working | Currently in use |
| **LCC B** | Right | Faulty | Causes high fan speed, SAS discovery failure |
**Replacement Part**: EMC 303-108-000E VIPER 6G SAS LCC (~$15 on eBay)
### Power Supplies
Two redundant PSUs with integrated fans.
### Fans
Multiple cooling fans controlled by enclosure firmware. Fan speeds are **automatically managed** based on temperature - manual override is not supported on EMC ESES enclosures.
**Fan Speed Codes**:
| Code | Description | RPM (approx) |
|------|-------------|--------------|
| 1 | Lowest | ~1500 |
| 2 | Second lowest | ~2000 |
| 3 | Third lowest | ~2670 |
| 4 | Medium | ~3300 |
| 5 | Fifth | ~4160 |
| 6 | Sixth | ~4800 |
| 7 | Highest | ~5500+ |
## ZFS Pool Using This Enclosure
```
Pool: vault
Size: 164TB raidz1
Drives: 13x HDD in raidz1 + special mirror + NVMe cache/log
Mount: /mnt/vault on TrueNAS
```
## SES Commands Reference
All commands run from TrueNAS (VM 100):
```bash
# Check overall enclosure status
sg_ses -p 0x02 /dev/sg15
# Check fan speeds
sg_ses --index=coo,-1 --get=speed_code /dev/sg15
# Check temperatures
sg_ses -p 0x02 /dev/sg15 | grep -E "(Temperature|Cooling)"
# Check PSU status
sg_ses -p 0x02 /dev/sg15 | grep -A5 "Power supply"
# Check LCC controller status
sg_ses -p 0x02 /dev/sg15 | grep -A5 "Enclosure services controller"
# List all SES elements
sg_ses -p 0x07 /dev/sg15
# Identify enclosure (flash LEDs)
sg_ses --index=enc,0 --set=ident:1 /dev/sg15
```
### Running SES Commands via Proxmox
```bash
# From Mac (via SSH key auth)
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15"'
# Quick fan check
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
# Quick temp check
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep Temperature"'
```
## Troubleshooting
### Symptom: Fans Running Loud (Speed 5+)
**Possible Causes**:
1. **Faulty LCC controller** - Switch to other LCC
2. **High temperatures** - Check temp sensors
3. **PSU issue** - Check PSU status via SES
4. **Failed drive** - Check drive status LEDs
**Diagnosis Steps**:
```bash
# 1. Check current fan speed
ssh pve 'qm guest exec 100 -- bash -c "sg_ses --index=coo,-1 --get=speed_code /dev/sg15"'
# Normal: 1-3, High: 4-5, Critical: 6-7
# 2. Check temperatures
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep Temperature"'
# Normal: 25-40C, Warning: 45-50C, Critical: 55C+
# 3. Check for component failures
ssh pve 'qm guest exec 100 -- bash -c "sg_ses -p 0x02 /dev/sg15 | grep -i fail"'
# 4. If no obvious cause, try switching LCC
# Power down enclosure, move SAS cable to other LCC port
```
### Symptom: Drives Not Detected After Enclosure Power Cycle
**Possible Causes**:
1. Enclosure not fully initialized (wait for green LEDs to stop blinking)
2. Faulty LCC controller
3. SAS cable loose
4. HBA needs rescan
**Diagnosis Steps**:
```bash
# 1. Check SAS link status
cat /sys/class/sas_phy/*/negotiated_linkrate
# 2. Check for expanders (should show enclosure)
lsscsi -g | grep -i enclo
# 3. Force HBA rescan
echo "- - -" > /sys/class/scsi_host/host0/scan
# 4. If no expander, check SAS cable and try other LCC port
```
### Symptom: Pool Won't Import After Enclosure Maintenance
```bash
# 1. Wait for enclosure to fully initialize (1-2 minutes)
# 2. Rescan for devices
echo "- - -" > /sys/class/scsi_host/host0/scan
# 3. Import pool
zpool import vault
# 4. If read-only mount issues, reboot TrueNAS
ssh pve 'qm reboot 100'
```
## Maintenance Procedures
### Safe Shutdown for Enclosure Maintenance
```bash
# 1. Stop services using the pool
ssh pve 'qm guest exec 101 -- bash -c "docker stop \$(docker ps -q)"'
# 2. Shutdown TrueNAS (auto-exports ZFS pool)
ssh pve 'qm shutdown 100 --timeout 120'
# 3. Wait for TrueNAS to fully stop
ssh pve 'while qm status 100 | grep -q running; do sleep 5; done'
# 4. Power off enclosure
# (Physical switch or PDU)
# 5. Perform maintenance
# 6. Power on enclosure, wait for initialization (green LEDs solid)
# 7. Start TrueNAS
ssh pve 'qm start 100'
# 8. Verify pool imported
ssh pve 'qm guest exec 100 -- bash -c "zpool status vault"'
```
### Hot-Swap LCC Controller
LCCs can be hot-swapped while enclosure is running:
1. Order replacement LCC (EMC 303-108-000E)
2. Move SAS cable to working LCC (if not already)
3. Wait for drives to come online via new LCC
4. Remove faulty LCC
5. Install replacement LCC
6. Optionally move SAS cable back to original port
## Incident Log
### 2024-12-19: LCC B Failure
**Symptoms**:
- Fans running at speed code 5 (~4160 RPM) - very loud
- After enclosure power cycle, drives not detected
- SAS link UP (4 PHYs at 6.0 Gbit) but no expander discovery
**Root Cause**:
LCC B controller malfunction causing:
- False temperature/error readings → high fan speed
- SAS expander not responding → drives not enumerated
**Resolution**:
1. Moved SAS cable from LCC B to LCC A
2. Drives immediately appeared
3. Fan speed dropped to code 3 (2670 RPM) - quiet
4. Imported vault pool, all data intact
**Replacement Ordered**:
- Part: EMC 303-108-000E VIPER 6G SAS LCC
- Source: eBay
- Price: $14.95 + free shipping
## LED Status Reference
### Drive LEDs
| LED | Color | Status |
|-----|-------|--------|
| Solid Blue | Power | Drive has power |
| Blinking Blue | Activity | I/O in progress |
| Solid Amber | Fault | Drive failed |
| Blinking Amber | Identify | Drive being located |
### LCC LEDs
| LED | Color | Status |
|-----|-------|--------|
| Solid Green | Link | SAS connection active |
| Blinking Green | Activity | Data transfer |
| Amber | Fault | LCC issue |
### PSU LEDs
| LED | Color | Status |
|-----|-------|--------|
| Solid Green | OK | Power supply healthy |
| Off | No Power | No AC input |
| Amber | Fault | PSU failure |
## Related Documentation
- [CLAUDE.md](CLAUDE.md) - Main homelab documentation
- [IP-ASSIGNMENTS.md](IP-ASSIGNMENTS.md) - Network configuration
- TrueNAS Web UI: https://10.10.10.200