feat: probe mesh mode + Grafana dashboard (T5-S6/S7) — completes T5

WZP-P2-T5-S6: Probe mesh mode
- ProbeMesh coordinator: wraps multiple ProbeRunners, spawns all concurrently
- mesh_summary(): scans registry, formats human-readable health table
- /mesh HTTP endpoint on metrics port alongside /metrics
- --probe-mesh flag, --mesh-status for CLI diagnostics
- Replaces individual probe spawn loop with ProbeMesh::run_all()
- 4 tests: mesh creation, empty/populated summary, zero targets

WZP-P2-T5-S7: Grafana dashboard
- docs/grafana-dashboard.json — importable directly into Grafana
- Row 1: Relay Health (sessions, rooms, packets/s, bytes/s, auth, handshake)
- Row 2: Call Quality (buffer depth, loss%, RTT, underruns per session)
- Row 3: Inter-Relay Mesh (RTT heatmap, loss, jitter, probe up/down)
- Row 4: Web Bridge (connections, frames bridged, auth failures, latency)
- Datasource variable ${DS_PROMETHEUS}, auto-refresh 10s
- Color thresholds: loss 2%/5%, RTT 100ms/300ms, probe up=green/down=red

T5 Telemetry & Observability is now COMPLETE (all 7 subtasks).
235 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-03-28 13:18:50 +04:00
parent 216ebf4a25
commit a64b79d953
5 changed files with 1105 additions and 15 deletions

View File

@@ -29,6 +29,10 @@ pub struct RelayConfig {
/// Each target gets a persistent QUIC connection sending 1 Ping/s.
#[serde(default)]
pub probe_targets: Vec<SocketAddr>,
/// Enable mesh mode: each relay probes all configured targets concurrently.
/// Discovery is manual via multiple --probe flags; this flag signals intent.
#[serde(default)]
pub probe_mesh: bool,
}
impl Default for RelayConfig {
@@ -43,6 +47,7 @@ impl Default for RelayConfig {
auth_url: None,
metrics_port: None,
probe_targets: Vec::new(),
probe_mesh: false,
}
}
}