Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.8 KiB
tags, type
| tags | type | ||
|---|---|---|---|
|
reference |
WZP Telemetry & Observability
Overview
WarzonePhone exports Prometheus-compatible metrics from all services (relay, web bridge, client) for Grafana dashboards. Inter-relay health probes provide always-on monitoring with negligible bandwidth overhead via multiplexed test lines.
Architecture
┌──────────┐ probe (1 pkt/s) ┌──────────┐
│ Relay A │◄─────────────────────►│ Relay B │
│ :4433 │ │ :4433 │
│ /metrics │ │ /metrics │
└────┬─────┘ └────┬─────┘
│ │
│ scrape │ scrape
▼ ▼
┌─────────────────────────────────────────────┐
│ Prometheus │
└─────────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Grafana │
│ ┌─────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Relay │ │ Per-call │ │ Inter-relay │ │
│ │ Health │ │ Quality │ │ Latency Map │ │
│ └─────────┘ └──────────┘ └──────────────┘ │
└─────────────────────────────────────────────┘
Metrics Exported
Relay (/metrics on HTTP port, default :9090)
| Metric | Type | Labels | Description |
|---|---|---|---|
wzp_relay_active_sessions |
Gauge | — | Current active sessions |
wzp_relay_active_rooms |
Gauge | — | Current active rooms |
wzp_relay_packets_forwarded_total |
Counter | room |
Total packets forwarded |
wzp_relay_bytes_forwarded_total |
Counter | room |
Total bytes forwarded |
wzp_relay_auth_attempts_total |
Counter | result (ok/fail) |
Auth validation attempts |
wzp_relay_handshake_duration_seconds |
Histogram | — | Crypto handshake time |
wzp_relay_session_jitter_buffer_depth |
Gauge | session_id |
Buffer depth per session |
wzp_relay_session_loss_pct |
Gauge | session_id |
Packet loss percentage |
wzp_relay_session_rtt_ms |
Gauge | session_id |
Round-trip time |
wzp_relay_session_underruns_total |
Counter | session_id |
Jitter buffer underruns |
wzp_relay_session_overruns_total |
Counter | session_id |
Jitter buffer overruns |
Web Bridge (/metrics on same HTTP port)
| Metric | Type | Labels | Description |
|---|---|---|---|
wzp_web_active_connections |
Gauge | — | Current WebSocket connections |
wzp_web_frames_bridged_total |
Counter | direction (up/down) |
Audio frames bridged |
wzp_web_auth_failures_total |
Counter | — | Browser auth failures |
wzp_web_handshake_latency_seconds |
Histogram | — | Relay handshake time |
Inter-Relay Probes
| Metric | Type | Labels | Description |
|---|---|---|---|
wzp_probe_rtt_ms |
Gauge | target |
RTT to peer relay |
wzp_probe_loss_pct |
Gauge | target |
Loss to peer relay |
wzp_probe_jitter_ms |
Gauge | target |
Jitter to peer relay |
wzp_probe_up |
Gauge | target |
1 if reachable, 0 if not |
Client (JSONL file)
When --metrics-file <path> is used, the client writes one JSON object per second:
{
"ts": "2026-03-28T06:30:00Z",
"buffer_depth": 45,
"underruns": 0,
"overruns": 0,
"loss_pct": 1.2,
"rtt_ms": 34,
"jitter_ms": 8,
"frames_sent": 50,
"frames_received": 49,
"quality_profile": "GOOD"
}
Task Breakdown
WZP-P2-T5: Telemetry & Observability
| ID | Task | Dependencies | Effort |
|---|---|---|---|
| S1 | Prometheus /metrics on relay |
None | 2-3h |
| S2 | Per-session metrics (jitter, loss, RTT) | S1 | 2-3h |
| S3 | Prometheus /metrics on web bridge |
None | 2h |
| S4 | Client --metrics-file JSONL export |
None | 2h |
| S5 | Inter-relay health probe (--probe) |
S1 | 4-6h |
| S6 | Probe mesh mode (all relays probe each other) | S5 | 2-3h |
| S7 | Grafana dashboard JSON | S1-S6 | 2h |
Parallelization
- Group A (parallel): S1, S3, S4 — three different binaries, no file overlap
- Group B (sequential): S2 after S1, then S5 → S6
- Last: S7 after all metrics are defined
Inter-Relay Health Probes
The probe is a multiplexed test line: one QUIC connection per peer relay, one silent media packet per second (~50 bytes/s). This provides:
- Continuous RTT measurement: Ping/Pong signals timed to <1ms precision
- Loss detection: Sequence gaps tracked over sliding 60s window
- Jitter monitoring: Variation in inter-packet arrival times
- Outage detection:
wzp_probe_updrops to 0 within seconds
Why multiplexed?
WZP already multiplexes media on a single QUIC connection. The probe session shares the same connection pool — no extra ports, no extra TLS handshakes. At 1 pkt/s of silence (~50 bytes after Opus encoding + headers), the overhead is negligible even on metered links.
Probe mesh example
With 3 relays (A, B, C), each probes the other 2:
A → B: rtt=12ms loss=0.0% jitter=2ms
A → C: rtt=45ms loss=0.1% jitter=5ms
B → A: rtt=13ms loss=0.0% jitter=2ms
B → C: rtt=38ms loss=0.0% jitter=4ms
C → A: rtt=44ms loss=0.2% jitter=6ms
C → B: rtt=37ms loss=0.0% jitter=3ms
This matrix feeds the Grafana latency heatmap and triggers alerts on degradation.
Usage
# Relay with metrics
wzp-relay --listen 0.0.0.0:4433 --metrics-port 9090
# Relay with metrics + probe peer
wzp-relay --listen 0.0.0.0:4433 --metrics-port 9090 --probe relay-b:4433
# Web bridge with metrics
wzp-web --port 8080 --relay 127.0.0.1:4433 --metrics-port 9091
# Client with JSONL telemetry
wzp-client --live --metrics-file /tmp/call-metrics.jsonl relay:4433
Grafana Dashboard
The pre-built dashboard (docs/grafana-dashboard.json) includes:
- Relay Health — active sessions, rooms, packets/s, bytes/s
- Call Quality — per-session jitter depth, loss%, RTT, underruns over time
- Inter-Relay Mesh — latency heatmap, probe status, loss trends
- Web Bridge — active connections, frames bridged, auth failures