feat: per-session metrics + inter-relay health probe (T5-S2/S5)

WZP-P2-T5-S2: Per-session Prometheus metrics
- 5 new per-session gauges/counters: buffer_depth, loss_pct, rtt_ms,
  underruns, overruns — all labeled by session_id
- update_session_quality() reads QualityReport from packet headers
- update_session_buffer() tracks jitter buffer state per session
- remove_session_metrics() cleans up labels on disconnect
- Delta-aware counter increments avoid double-counting
- 2 tests: session_quality_update, session_metrics_cleanup

WZP-P2-T5-S5: Inter-relay health probe
- New probe.rs: ProbeConfig, ProbeMetrics, SlidingWindow, ProbeRunner
- --probe <addr> flag (repeatable) spawns background probe per target
- Sends Ping/s over QUIC, receives Pong, computes RTT/loss/jitter
- SlidingWindow(60): tracks last 60 pings, loss = missed pongs,
  jitter = std deviation of RTT
- Prometheus gauges: wzp_probe_rtt_ms, loss_pct, jitter_ms, up
  with target label
- Probe connections use SNI "_probe" — relay responds with Pong loop,
  skipping auth/handshake
- Auto-reconnect with 5s backoff on disconnect
- 6 tests: metrics_register, rtt/loss/jitter calculation,
  window eviction, empty edge cases

231 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-03-28 13:09:52 +04:00
parent 39f6908478
commit 216ebf4a25
6 changed files with 650 additions and 4 deletions

View File

@@ -25,6 +25,10 @@ pub struct RelayConfig {
/// Port for the Prometheus metrics HTTP endpoint (e.g., 9090).
/// If None, the metrics endpoint is disabled.
pub metrics_port: Option<u16>,
/// Peer relay addresses to probe for health monitoring.
/// Each target gets a persistent QUIC connection sending 1 Ping/s.
#[serde(default)]
pub probe_targets: Vec<SocketAddr>,
}
impl Default for RelayConfig {
@@ -38,6 +42,7 @@ impl Default for RelayConfig {
log_level: "info".to_string(),
auth_url: None,
metrics_port: None,
probe_targets: Vec::new(),
}
}
}