Commit Graph

2 Commits

Author SHA1 Message Date
Siavash Sameni
59ce52f8e8 feat(p2p): Phase 3.5 dual-path QUIC race + GUI call-flow debug logs
Two features in one commit because they ship and test together:
Phase 3.5 closes the hole-punching loop and the call-flow debug
logs give the user live visibility into every step of a call so
real-hardware testing of the new P2P path is debuggable.

## Phase 3.5 — dual-path QUIC connect race

Completes the hole-punching work Phase 3 scaffolded. On receiving
a CallSetup with peer_direct_addr, the client now actually races a
direct QUIC handshake against the relay dial and uses whichever
completes first. Symmetric role assignment avoids the two-conns-
per-call problem:

- Both peers compare `own_reflex_addr` vs `peer_reflex_addr`
  lexicographically.
- Smaller addr → **Acceptor** (A-role): builds a server-capable
  dual endpoint, awaits an incoming QUIC session. Does NOT dial.
- Larger addr → **Dialer** (D-role): builds a client-only
  endpoint, dials the peer's addr with `call-<id>` SNI. Does NOT
  listen.
- Both sides always dial the relay in parallel as fallback.
- `tokio::select!` with `biased` preference for direct, `tokio::pin!`
  so each branch can await the losing opposite as fallback.
- Direct timeout 2s, relay fallback timeout 5s (so 7s worst case
  from CallSetup to "no media path" error).

New crate module `wzp_client::dual_path::{race, WinningPath}`
(moved here from desktop/src-tauri so it's testable from a
workspace test). `determine_role` in `wzp_client::reflect` is
pure-function and unit-tested.

### CallEngine integration
- New `pre_connected_transport: Option<Arc<QuinnTransport>>` arg
  on both android + desktop `CallEngine::start` branches. Skips
  the internal wzp_transport::connect step when Some. Backward-
  compat: None keeps Phase 0 relay-only behavior.
- `connect` Tauri command reads own_reflex_addr from SignalState,
  computes role, runs the race, passes the winning transport
  into CallEngine. If ANY input is missing (no peer addr, no own
  addr, equal addrs), falls back to classic relay path —
  identical to pre-Phase-3.5 behavior.

### Tests (9 new, all passing)
- 6 unit tests for `determine_role` truth table in
  `wzp-client/src/reflect.rs` (smaller=Acceptor, larger=Dialer,
  port-only diff, equal, missing-side, symmetry)
- 3 integration tests in `crates/wzp-client/tests/dual_path.rs`:
    * `dual_path_direct_wins_on_loopback` — two-endpoint test
      rig, Dialer wins direct path vs loopback mock relay
    * `dual_path_relay_wins_when_direct_is_dead` — dead peer
      port, 2s direct timeout, relay fallback wins
    * `dual_path_errors_cleanly_when_both_paths_dead` — <10s
      error, no hang

## GUI call-flow debug logs

Runtime-toggled structured events at every step of a call so the
user can see where a call progressed or stalled on real hardware.
Modeled on the existing DRED_VERBOSE_LOGS pattern.

### Rust side
- `static CALL_DEBUG_LOGS: AtomicBool` + `emit_call_debug(&app,
  step, details)` helper. Always logs via `tracing::info!`
  (logcat always has a copy); GUI Tauri `call-debug-log` event
  only fires when the flag is on.
- Tauri commands `set_call_debug_logs` / `get_call_debug_logs`.

### Instrumented steps (24 emit_call_debug sites)
- `register_signal`: start, identity loaded, endpoint created,
  connect failed/ok, RegisterPresence sent, ack received/failed,
  recv loop spawning
- Recv loop: CallRinging, DirectCallOffer (w/ caller_reflexive_addr),
  DirectCallAnswer (w/ callee_reflexive_addr), CallSetup (w/
  peer_direct_addr), Hangup
- `place_call`: start, reflect query start/ok/none, offer sent,
  send failed
- `answer_call`: start, reflect query start/ok/none or privacy
  skip, answer sent, send failed
- `connect`: start, dual_path_race_start (w/ role), won (w/
  path), failed, skipped (w/ reasons), call_engine_starting/
  started/failed

### JS side
- New `callDebugLogs: boolean` field on Settings type.
- Boot-time hydrate of the Rust flag from localStorage so the
  choice survives restarts (like `dredDebugLogs`).
- Settings panel: new "Call flow debug logs" checkbox alongside
  the DRED toggle.
- New "Call Debug Log" section that ONLY shows when the flag is
  on. Rolling in-memory buffer of the last 200 events, rendered
  as monospace `HH:MM:SS.mmm step {details}` lines with auto-
  scroll and a Clear button.
- `listen("call-debug-log", ...)` subscribed at app startup,
  appends to the buffer, re-renders on every event.

Full workspace test goes from 404 → 413 passing. Clippy clean
on touched crates.

PRD: .taskmaster/docs/prd_phase35_dual_path_race.txt
Tasks: 61-69 all completed

Next: APK + desktop build carrying everything — Phase 2 NAT
detect, Phase 3 advertising, Phase 3.5 dual-path + call debug
logs, plus the earlier Android first-join diagnostics — so the
user can validate the P2P path on real hardware with live
per-step visibility into where any failures happen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:06:44 +04:00
Siavash Sameni
8d903f16c6 feat(reflect): multi-relay NAT type detection — Phase 2
Builds on Phase 1's SignalMessage::Reflect to probe N relays in
parallel through transient QUIC connections and classify the
client's NAT type for the future P2P hole-punching path. No wire
protocol changes — Phase 1's Reflect/ReflectResponse pair is
reused unchanged.

New client-side module (crates/wzp-client/src/reflect.rs):
- probe_reflect_addr(relay, timeout_ms): opens a throwaway
  quinn::Endpoint (fresh ephemeral source port per probe,
  essential for NAT-type detection — sharing one endpoint would
  make a symmetric NAT look like a cone NAT), connects to _signal,
  sends RegisterPresence with zero identity, consumes the Ack,
  sends Reflect, awaits ReflectResponse, cleanly closes.
- detect_nat_type(relays, timeout_ms): parallel probes via
  tokio::task::JoinSet (bounded by slowest probe not sum) and
  returns a NatDetection with per-probe results + aggregate
  classification.
- classify_nat(probes): pure-function classifier split out for
  network-free unit tests. Rules:
    * 0-1 successful probes              → Unknown
    * 2+ successes, same ip same port    → Cone (P2P viable)
    * 2+ successes, same ip diff ports   → SymmetricPort (relay)
    * 2+ successes, different ips        → Multiple (treat as
                                             symmetric)

Tauri command (desktop/src-tauri/src/lib.rs):
- detect_nat_type({ relays: [{ name, address }] }) -> NatDetection
  as JSON. Takes the relay list from JS because localStorage
  owns the config. Parse-up-front so a malformed entry fails
  clean instead of as a probe error. 1500ms per-probe timeout.

UI (desktop/index.html + src/main.ts):
- New "NAT type" row + "Detect NAT" button in the Network
  settings section. Renders per-probe status (name, address,
  observed addr, latency, or error) plus the colored verdict:
    * green  Cone — shows consensus addr
    * amber  SymmetricPort / Multiple — must relay
    * gray   Unknown — not enough data

Tests:
- 7 unit tests in wzp-client/src/reflect.rs covering every
  classifier branch (empty, 1 success, 2 identical, 2 diff ports,
  2 diff ips, success+failure mix, pure-failure).
- 3 integration tests in crates/wzp-relay/tests/multi_reflect.rs:
    * probe_reflect_addr_happy_path — single mock relay end-to-end
    * detect_nat_type_two_loopback_relays_is_cone — two concurrent
      relays, asserts both see 127.0.0.1 and classifier returns
      Cone or SymmetricPort (accepted because the test harness
      uses fresh ephemeral ports per probe which look like
      SymmetricPort on single-host loopback)
    * detect_nat_type_dead_relay_is_unknown — alive + dead port
      mix, asserts the dead probe surfaces an error string and
      the aggregator returns Unknown (only 1 success)

Full workspace test goes from 386 → 396 passing.

PRD: .taskmaster/docs/prd_multi_relay_reflect.txt
Tasks: 47-52 all completed

Next up: hole-punching (Phase 3) — use the reflected address in
DirectCallOffer/Answer and CallSetup so peers attempt a direct
QUIC handshake to each other, with relay fallback on timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 12:47:12 +04:00