Siavash Sameni 99c0173590 feat(telemetry): Phase 4 — LossRecoveryUpdate protocol + relay metrics + DebugReporter
Phase 4 lays the telemetry foundation for distinguishing DRED recoveries
from classical PLC in production: a new SignalMessage variant, two new
per-session Prometheus counters on the relay side, and a highlighted
loss-recovery section in the Android DebugReporter.

The periodic emitter (client → relay) and Grafana panel are deferred to
Phase 4b — this commit ships the protocol surface, the relay sink, and
the immediate user-visible debug output. Once 4b lands the full path
(emitter → relay → Prometheus → Grafana), the metrics here will
automatically start receiving data.

Scope decision — why not extend QualityReport instead:
The existing wire-format QualityReport is a fixed 4-byte media packet
trailer. Adding counter fields to it would shift the binary layout and
break backward compatibility (old receivers would parse the last 4
bytes of the extended trailer as QR, corrupting audio). Using a
new SignalMessage variant on the reliable QUIC signal stream sidesteps
the wire-format problem entirely — serde JSON enums tolerate unknown
variants gracefully on old receivers, and the signal channel is the
right layer for periodic telemetry aggregates.

Changes:

  wzp-proto/src/packet.rs:
    - New SignalMessage::LossRecoveryUpdate variant carrying:
        * dred_reconstructions: u64 (monotonic since call start)
        * classical_plc_invocations: u64 (monotonic)
        * frames_decoded: u64 (for rate calculation)
    - All three fields tagged #[serde(default)] for forward compat.

  wzp-client/src/featherchat.rs:
    - Added a match arm so signal_to_call_type() handles the new
      variant (treat as Offer for featherChat bridging purposes).

  wzp-relay/src/metrics.rs:
    - Two new IntCounterVec metrics on the relay, labeled by session_id:
        * wzp_relay_session_dred_reconstructions_total
        * wzp_relay_session_classical_plc_total
    - New method update_session_loss_recovery(session_id, dred, plc)
      applies monotonic deltas: if the incoming totals exceed the
      current counter, the difference is inc_by'd. If the incoming
      totals are LOWER (client restart or counter reset), the
      Prometheus counter holds steady until the client catches up.
      This matches the existing update_session_buffer delta pattern.
    - remove_session_metrics() now cleans up the two new labels.
    - New test session_loss_recovery_monotonic_delta exercises:
        * initial population (10 DRED, 2 PLC)
        * forward advance (25, 5 → delta +15, +3)
        * lower values ignored (client reset → counters unchanged)
        * client catches up (30, 8 → advances to new max)
    - Existing session_metrics_cleanup test extended to cover the
      new counters.

  android/app/src/main/java/com/wzp/debug/DebugReporter.kt:
    - Phase 4 users — and incident responders — need to quickly see
      whether DRED is actually firing during a call. The stats JSON
      already carries the counters (after Phase 3c), but they were
      buried in the trailing JSON dump. Added a dedicated
      "=== Loss Recovery ===" section to the meta preamble that
      extracts dred_reconstructions, classical_plc_invocations,
      frames_decoded, and fec_recovered from the JSON and displays
      them plainly, plus computed percentages when frames_decoded > 0.
    - New extractLongField helper: tiny hand-rolled JSON integer
      extractor. We don't want to pull in a full JSON parser for this
      single use case and CallStats has a flat, well-known schema.

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-proto --lib: 63 passing
- cargo test -p wzp-codec --lib: 68 passing
- cargo test -p wzp-client --lib: 35 passing (+1 ignored probe)
- cargo test -p wzp-relay --lib: 68 passing (+1 new Phase 4 test)
- cargo check -p wzp-android --lib: zero errors
- Android APK build verified earlier today (unridden-alfonso.apk
  via the remote Docker builder) — Phase 0–3c confirmed to compile
  end-to-end on the NDK target.

Phase 4b remaining (not blocking this commit):
- Periodic LossRecoveryUpdate emitter in wzp-client/src/call.rs and
  wzp-android/src/engine.rs (every ~5 s)
- Relay-side handler in main.rs that matches the new variant and
  calls metrics.update_session_loss_recovery
- Grafana "Loss recovery breakdown" panel in docs/grafana-dashboard.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:21:04 +04:00

WarzonePhone

Custom lossy VoIP protocol built in Rust. E2E encrypted, FEC-protected, adaptive quality, designed for hostile network conditions.

Quick Start

# Build
cargo build --release

# Run relay
./target/release/wzp-relay --listen 0.0.0.0:4433

# Send a test tone
./target/release/wzp-client --send-tone 5 relay-addr:4433

# Web bridge (browser calls)
./target/release/wzp-web --port 8080 --relay 127.0.0.1:4433 --tls
# Open https://localhost:8080/room-name in two browser tabs

Architecture

See docs/ARCHITECTURE.md for the full system architecture with Mermaid diagrams covering:

  • System overview and data flow
  • Crate dependency graph (8 crates)
  • Wire formats (MediaHeader, MiniHeader, TrunkFrame, SignalMessage)
  • Cryptographic handshake (X25519 + Ed25519 + ChaCha20-Poly1305)
  • Identity model (BIP39 seed, featherChat compatible)
  • Quality profiles (GOOD/DEGRADED/CATASTROPHIC)
  • FEC protection (RaptorQ with interleaving)
  • Adaptive jitter buffer (NetEq-inspired)
  • Telemetry stack (Prometheus + Grafana)
  • Deployment topology

Features

  • 3 quality tiers: Opus 24k (28.8 kbps) / Opus 6k (9 kbps) / Codec2 1200 (2.4 kbps)
  • RaptorQ FEC: Recovers from 20-100% packet loss depending on tier
  • E2E encryption: ChaCha20-Poly1305 with X25519 key exchange
  • Adaptive jitter buffer: EMA-based playout delay tracking
  • Silence suppression: VAD + comfort noise (~50% bandwidth savings)
  • ML noise removal: RNNoise (nnnoiseless pure Rust port)
  • Mini-frames: 67% header compression for steady-state packets
  • Trunking: Multiplex sessions into batched datagrams
  • featherChat integration: Shared BIP39 identity, token auth, call signaling
  • Prometheus metrics: Relay, web bridge, inter-relay probes
  • Grafana dashboard: Pre-built JSON with 18 panels

Documentation

Document Description
ARCHITECTURE.md Full system architecture with diagrams
TELEMETRY.md Prometheus metrics specification
INTEGRATION_TASKS.md featherChat integration tracker
WZP-FC-SHARED-CRATES.md Shared crate strategy
grafana-dashboard.json Importable Grafana dashboard

Binaries

Binary Description
wzp-relay Relay daemon (SFU room mode, forward mode, probes)
wzp-client CLI client (send-tone, record, live mic, echo-test, drift-test, sweep)
wzp-web Browser bridge (HTTPS + WebSocket + AudioWorklet)
wzp-bench Component benchmarks

Linux Build

./scripts/build-linux.sh --prepare   # Create Hetzner VM + install deps
./scripts/build-linux.sh --build     # Build release binaries
./scripts/build-linux.sh --transfer  # Download to target/linux-x86_64/
./scripts/build-linux.sh --destroy   # Delete VM

Tests

cargo test --workspace   # 272 tests

License

MIT OR Apache-2.0

Description
No description provided
Readme 147 MiB
Languages
Rust 78%
Kotlin 7.9%
Shell 6.7%
TypeScript 3.2%
C++ 1.5%
Other 2.6%