From 42cfdd8552380eef92c2b7ec67218fea27f34daf Mon Sep 17 00:00:00 2001 From: Hutson Date: Fri, 16 Jan 2026 15:50:17 -0500 Subject: [PATCH] Auto-sync: 20260116-155016 --- MONITORING.md | 68 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 42 insertions(+), 26 deletions(-) diff --git a/MONITORING.md b/MONITORING.md index 3876f7f..44bf209 100644 --- a/MONITORING.md +++ b/MONITORING.md @@ -98,19 +98,21 @@ Monitors Claude Code token usage across all machines to track subscription consu ``` Claude Code (MacBook/Mac Mini) │ - ▼ (OTLP HTTP push) + ▼ (OTLP HTTP push every 60s) │ OTEL Collector (docker-host:4318) │ - ▼ (Remote Write) + ▼ (Prometheus exporter on :8889) │ -Prometheus (docker-host:9090) +Prometheus (docker-host:9090) ─── scrapes ───► otel-collector:8889 │ ├──► Grafana Dashboard │ └──► Alertmanager (burn rate alerts) ``` +**Note**: Uses Prometheus exporter instead of Remote Write because Claude Code sends Delta temporality metrics, which Remote Write doesn't support. + **Monitored Devices**: All Claude Code sessions on any device automatically push metrics via OTLP. @@ -132,23 +134,22 @@ All Claude Code sessions on any device automatically push metrics via OTLP. | No Metrics | Scrape fails for 5min | Info | **Configuration Files**: -- Claude settings: `~/.claude/settings.json` (on each Mac - synced via Syncthing) +- Shell config: `~/.zshrc` (on each Mac - synced via Syncthing) - OTEL Collector: `/opt/monitoring/otel-collector/config.yaml` (docker-host) - Alert rules: `/opt/monitoring/prometheus/rules/claude-code.yml` (docker-host) -**Claude Code Settings** (in `~/.claude/settings.json`): -```json -{ - "env": { - "CLAUDE_CODE_ENABLE_TELEMETRY": "1", - "OTEL_METRICS_EXPORTER": "otlp", - "OTEL_EXPORTER_OTLP_ENDPOINT": "http://10.10.10.206:4318", - "OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf", - "OTEL_METRIC_EXPORT_INTERVAL": "60000" - } -} +**Shell Environment Setup** (in `~/.zshrc`): +```bash +# Claude Code OpenTelemetry Metrics (push to OTEL Collector) +export CLAUDE_CODE_ENABLE_TELEMETRY=1 +export OTEL_METRICS_EXPORTER=otlp +export OTEL_EXPORTER_OTLP_ENDPOINT="http://10.10.10.206:4318" +export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf" +export OTEL_METRIC_EXPORT_INTERVAL=60000 ``` +**Note**: These can be set either in shell environment (`~/.zshrc`) or in `~/.claude/settings.json` under the `env` block. Both methods work. + **OTEL Collector Config** (`/opt/monitoring/otel-collector/config.yaml`): ```yaml receivers: @@ -164,36 +165,51 @@ processors: timeout: 10s exporters: - prometheusremotewrite: - endpoint: "http://prometheus:9090/api/v1/write" + prometheus: + endpoint: 0.0.0.0:8889 + resource_to_telemetry_conversion: + enabled: true service: pipelines: metrics: receivers: [otlp] processors: [batch] - exporters: [prometheusremotewrite] + exporters: [prometheus] +``` + +**Prometheus Scrape Config** (add to `/opt/monitoring/prometheus/prometheus.yml`): +```yaml + - job_name: "claude-code" + static_configs: + - targets: ["otel-collector:8889"] + labels: + group: "claude-code" ``` **Useful PromQL Queries**: ```promql -# Total tokens this session -sum(claude_code_token_usage_total) +# Total tokens by model +sum(claude_code_token_usage_tokens_total) by (model) # Burn rate (tokens/hour) -sum(rate(claude_code_token_usage_total[1h])) * 3600 +sum(rate(claude_code_token_usage_tokens_total[1h])) * 3600 -# Usage by device -sum(claude_code_token_usage_total) by (device) +# Total cost by model +sum(claude_code_cost_usage_USD_total) by (model) -# Projected weekly usage -sum(increase(claude_code_token_usage_total[24h])) * 7 +# Usage by type (input, output, cacheRead, cacheCreation) +sum(claude_code_token_usage_tokens_total) by (type) + +# Projected weekly usage (rough estimate) +sum(increase(claude_code_token_usage_tokens_total[24h])) * 7 ``` **Important Notes**: -- Claude Code must be restarted after changing telemetry settings +- After changing `~/.zshrc`, start a new terminal/shell session before running Claude Code - Metrics only flow while Claude Code is running - Weekly subscription resets Monday 1am (America/New_York) +- Verify env vars are set: `env | grep OTEL` **Added**: 2026-01-16