Auto-sync: 20260116-150510
This commit is contained in:
@@ -16,8 +16,9 @@ Documentation for system monitoring, health checks, and alerting across the home
|
||||
| **Network** | ✅ Partial | Gateway watchdog | ✅ Auto-reboot | Connectivity check every 60s |
|
||||
| **Services** | ❌ No | - | ❌ No | No health checks |
|
||||
| **Backups** | ❌ No | - | ❌ No | No verification |
|
||||
| **Claude Code** | ✅ Yes | Prometheus + Grafana | ✅ Yes | Token usage, burn rate, cost tracking |
|
||||
|
||||
**Overall Status**: ⚠️ **PARTIAL** - Gateway monitoring active, most else is manual
|
||||
**Overall Status**: ⚠️ **PARTIAL** - Gateway monitoring active, Claude Code active, most else is manual
|
||||
|
||||
---
|
||||
|
||||
@@ -87,6 +88,102 @@ ssh ucg-fiber 'free -m && ps -eo pid,rss,comm --sort=-rss | head -12'
|
||||
|
||||
---
|
||||
|
||||
### Claude Code Token Monitoring
|
||||
|
||||
**Status**: ✅ **Active with alerts**
|
||||
|
||||
Monitors Claude Code token usage across all machines to track subscription consumption and prevent hitting weekly limits.
|
||||
|
||||
**Architecture**:
|
||||
```
|
||||
Claude Code (MacBook/Mac Mini)
|
||||
│
|
||||
▼ (OpenTelemetry Prometheus exporter :9464)
|
||||
│
|
||||
Prometheus (docker-host:9090)
|
||||
│
|
||||
├──► Grafana Dashboard
|
||||
│
|
||||
└──► Alertmanager (burn rate alerts)
|
||||
```
|
||||
|
||||
**Monitored Devices**:
|
||||
| Device | IP Address | Metrics Port |
|
||||
|--------|------------|--------------|
|
||||
| MacBook | 10.10.10.147 | 9464 |
|
||||
| Mac Mini | 10.10.10.123 | 9464 |
|
||||
|
||||
**What's monitored**:
|
||||
- Token usage (input/output/cache) over time
|
||||
- Burn rate (tokens/hour)
|
||||
- Cost tracking (USD)
|
||||
- Usage by model (Opus, Sonnet, Haiku)
|
||||
- Session count
|
||||
- Per-device breakdown
|
||||
|
||||
**Dashboard**: https://grafana.htsn.io/d/claude-code-usage/claude-code-token-usage
|
||||
|
||||
**Alerts Configured**:
|
||||
| Alert | Threshold | Severity |
|
||||
|-------|-----------|----------|
|
||||
| High Burn Rate | >100k tokens/hour for 15min | Warning |
|
||||
| Weekly Limit Risk | Projected >5M tokens/week | Critical |
|
||||
| No Metrics | Scrape fails for 5min | Info |
|
||||
|
||||
**Configuration Files**:
|
||||
- Claude settings: `~/.claude/settings.json` (on each Mac)
|
||||
- Prometheus scrape: `/opt/monitoring/prometheus/prometheus.yml` (docker-host)
|
||||
- Alert rules: `/opt/monitoring/prometheus/rules/claude-code.yml` (docker-host)
|
||||
|
||||
**Claude Code Settings** (in `~/.claude/settings.json`):
|
||||
```json
|
||||
{
|
||||
"env": {
|
||||
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
|
||||
"OTEL_METRICS_EXPORTER": "prometheus",
|
||||
"OTEL_EXPORTER_PROMETHEUS_PORT": "9464",
|
||||
"OTEL_METRIC_EXPORT_INTERVAL": "60000"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Prometheus Scrape Config**:
|
||||
```yaml
|
||||
- job_name: "claude-code"
|
||||
scrape_interval: 60s
|
||||
static_configs:
|
||||
- targets: ["10.10.10.147:9464"]
|
||||
labels:
|
||||
device: "macbook"
|
||||
- targets: ["10.10.10.123:9464"]
|
||||
labels:
|
||||
device: "mac-mini"
|
||||
```
|
||||
|
||||
**Useful PromQL Queries**:
|
||||
```promql
|
||||
# Total tokens this session
|
||||
sum(claude_code_token_usage_total)
|
||||
|
||||
# Burn rate (tokens/hour)
|
||||
sum(rate(claude_code_token_usage_total[1h])) * 3600
|
||||
|
||||
# Usage by device
|
||||
sum(claude_code_token_usage_total) by (device)
|
||||
|
||||
# Projected weekly usage
|
||||
sum(increase(claude_code_token_usage_total[24h])) * 7
|
||||
```
|
||||
|
||||
**Important Notes**:
|
||||
- Claude Code must be restarted after changing telemetry settings
|
||||
- Metrics only flow while Claude Code is running
|
||||
- Weekly subscription resets Monday 1am (America/New_York)
|
||||
|
||||
**Added**: 2026-01-16
|
||||
|
||||
---
|
||||
|
||||
### Syncthing Monitoring
|
||||
|
||||
**Status**: ⚠️ **Partial** - API available, no automated monitoring
|
||||
|
||||
Reference in New Issue
Block a user