Auto-sync: 20260116-150510

This commit is contained in:
Hutson
2026-01-16 15:05:12 -05:00
parent d38de8bfb1
commit 8c1cbf3dac

View File

@@ -16,8 +16,9 @@ Documentation for system monitoring, health checks, and alerting across the home
| **Network** | ✅ Partial | Gateway watchdog | ✅ Auto-reboot | Connectivity check every 60s | | **Network** | ✅ Partial | Gateway watchdog | ✅ Auto-reboot | Connectivity check every 60s |
| **Services** | ❌ No | - | ❌ No | No health checks | | **Services** | ❌ No | - | ❌ No | No health checks |
| **Backups** | ❌ No | - | ❌ No | No verification | | **Backups** | ❌ No | - | ❌ No | No verification |
| **Claude Code** | ✅ Yes | Prometheus + Grafana | ✅ Yes | Token usage, burn rate, cost tracking |
**Overall Status**: ⚠️ **PARTIAL** - Gateway monitoring active, most else is manual **Overall Status**: ⚠️ **PARTIAL** - Gateway monitoring active, Claude Code active, most else is manual
--- ---
@@ -87,6 +88,102 @@ ssh ucg-fiber 'free -m && ps -eo pid,rss,comm --sort=-rss | head -12'
--- ---
### Claude Code Token Monitoring
**Status**: ✅ **Active with alerts**
Monitors Claude Code token usage across all machines to track subscription consumption and prevent hitting weekly limits.
**Architecture**:
```
Claude Code (MacBook/Mac Mini)
▼ (OpenTelemetry Prometheus exporter :9464)
Prometheus (docker-host:9090)
├──► Grafana Dashboard
└──► Alertmanager (burn rate alerts)
```
**Monitored Devices**:
| Device | IP Address | Metrics Port |
|--------|------------|--------------|
| MacBook | 10.10.10.147 | 9464 |
| Mac Mini | 10.10.10.123 | 9464 |
**What's monitored**:
- Token usage (input/output/cache) over time
- Burn rate (tokens/hour)
- Cost tracking (USD)
- Usage by model (Opus, Sonnet, Haiku)
- Session count
- Per-device breakdown
**Dashboard**: https://grafana.htsn.io/d/claude-code-usage/claude-code-token-usage
**Alerts Configured**:
| Alert | Threshold | Severity |
|-------|-----------|----------|
| High Burn Rate | >100k tokens/hour for 15min | Warning |
| Weekly Limit Risk | Projected >5M tokens/week | Critical |
| No Metrics | Scrape fails for 5min | Info |
**Configuration Files**:
- Claude settings: `~/.claude/settings.json` (on each Mac)
- Prometheus scrape: `/opt/monitoring/prometheus/prometheus.yml` (docker-host)
- Alert rules: `/opt/monitoring/prometheus/rules/claude-code.yml` (docker-host)
**Claude Code Settings** (in `~/.claude/settings.json`):
```json
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "prometheus",
"OTEL_EXPORTER_PROMETHEUS_PORT": "9464",
"OTEL_METRIC_EXPORT_INTERVAL": "60000"
}
}
```
**Prometheus Scrape Config**:
```yaml
- job_name: "claude-code"
scrape_interval: 60s
static_configs:
- targets: ["10.10.10.147:9464"]
labels:
device: "macbook"
- targets: ["10.10.10.123:9464"]
labels:
device: "mac-mini"
```
**Useful PromQL Queries**:
```promql
# Total tokens this session
sum(claude_code_token_usage_total)
# Burn rate (tokens/hour)
sum(rate(claude_code_token_usage_total[1h])) * 3600
# Usage by device
sum(claude_code_token_usage_total) by (device)
# Projected weekly usage
sum(increase(claude_code_token_usage_total[24h])) * 7
```
**Important Notes**:
- Claude Code must be restarted after changing telemetry settings
- Metrics only flow while Claude Code is running
- Weekly subscription resets Monday 1am (America/New_York)
**Added**: 2026-01-16
---
### Syncthing Monitoring ### Syncthing Monitoring
**Status**: ⚠️ **Partial** - API available, no automated monitoring **Status**: ⚠️ **Partial** - API available, no automated monitoring