feat(hole-punching): advertise peer reflexive addrs in DirectCall flow — Phase 3

Completes the signal-plane plumbing for P2P direct calling: both
peers now learn their own server-reflexive address (Phase 1
Reflect), include it in DirectCallOffer / DirectCallAnswer, and
the relay cross-wires them into each side's CallSetup so the
client knows the OTHER party's direct addr. Dual-path QUIC race
is scaffolded but deferred to Phase 3.5 — this commit ships the
full advertising layer so real-hardware testing can confirm the
addrs flow end-to-end before adding the concurrent-connect logic.

Wire protocol (wzp-proto/src/packet.rs):
- DirectCallOffer gains optional `caller_reflexive_addr`
- DirectCallAnswer gains optional `callee_reflexive_addr`
- CallSetup gains optional `peer_direct_addr`
- All #[serde(default, skip_serializing_if = "Option::is_none")] so
  pre-Phase-3 peers and relays stay backward compatible by
  construction — the new fields are elided from the JSON on the
  wire when None, and older clients parse the JSON ignoring any
  fields they don't know.
- 2 new roundtrip tests (Some + None cases, old-JSON parse-back).

Call registry (wzp-relay/src/call_registry.rs):
- DirectCall gains caller_reflexive_addr + callee_reflexive_addr.
- set_caller_reflexive_addr / set_callee_reflexive_addr setters.
- 2 new unit tests: stores and returns addrs, clearing works.

Relay cross-wiring (wzp-relay/src/main.rs):
- On DirectCallOffer: stash the caller's addr in the registry.
- On DirectCallAnswer: stash the callee's addr (only set by
  AcceptTrusted answers — privacy-mode leaves it None).
- Send two different CallSetup messages: one to the caller with
  peer_direct_addr=callee_addr, and one to the callee with
  peer_direct_addr=caller_addr. The cross-wiring means each side
  gets the OTHER party's direct addr, not its own.
- Logs `p2p_viable=true` when both sides advertised.

Client advertising (desktop/src-tauri/src/lib.rs):
- New `try_reflect_own_addr` helper that reuses the Phase 1
  oneshot pattern WITHOUT holding state.signal.lock() across the
  await (critical: the recv loop reacquires the same mutex to
  fire the oneshot, so holding it would deadlock).
- `place_call` queries reflect first and includes the returned
  addr in DirectCallOffer. Falls back to None on any failure —
  call still proceeds via the relay path.
- `answer_call` queries reflect ONLY on AcceptTrusted so
  AcceptGeneric keeps the callee's IP private by design. Reject
  and AcceptGeneric both pass None.
- recv loop's CallSetup handler destructures and forwards
  peer_direct_addr to the JS layer in the signal-event payload.

Client scaffolding for dual-path (desktop/src-tauri/src/lib.rs +
desktop/src/main.ts):
- `connect` Tauri command gets a new optional `peer_direct_addr`
  argument. Currently LOGS the addr but still uses the relay
  path for the media connection — Phase 3.5 will swap in a
  tokio::select! race between direct dial + relay dial. Scaffolding
  lands here so the JS wire is stable, real-hardware testing can
  confirm advertising works end-to-end, and Phase 3.5 is a pure
  Rust change with no JS touches.
- JS setup handler forwards `data.peer_direct_addr` to invoke.

Back-compat with the CLI client (crates/wzp-client/src/cli.rs):
- CLI test harness updated for the new fields — always passes
  None for both reflex addrs (no hole-punching). Also destructures
  peer_direct_addr: _ in its CallSetup handler.

Tests (8 new, all passing):
- wzp-proto: hole_punching_optional_fields_roundtrip,
  hole_punching_backward_compat_old_json_parses
- wzp-relay call_registry: call_registry_stores_reflexive_addrs,
  call_registry_clearing_reflex_addr_works
- wzp-relay integration: crates/wzp-relay/tests/hole_punching.rs
    * both_peers_advertise_reflex_addrs_cross_wire_in_setup
    * privacy_mode_answer_omits_callee_addr_from_setup
    * pre_phase3_caller_leaves_both_setups_relay_only
    * neither_peer_advertises_both_setups_are_relay_only

Full workspace test goes from 396 → 404 passing.

PRD: .taskmaster/docs/prd_hole_punching.txt
Tasks: 53-60 all completed (58 = scaffolding-only; 3.5 follow-up)

Next up: **Phase 3.5 — dual-path QUIC connect race**. With the
advertising layer live, this becomes a focused change: on
CallSetup-with-peer_direct_addr, start a server-capable dual
endpoint, and tokio::select! across (direct dial, relay dial,
inbound accept). Whichever QUIC handshake completes first wins,
the losers drop, 2s direct timeout falls back to relay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-04-11 13:37:04 +04:00
parent 8d903f16c6
commit 39277bf3a0
7 changed files with 777 additions and 44 deletions

View File

@@ -766,9 +766,15 @@ async fn main() -> anyhow::Result<()> {
match transport.recv_signal().await {
Ok(Some(msg)) => {
match msg {
SignalMessage::DirectCallOffer { ref target_fingerprint, ref call_id, .. } => {
SignalMessage::DirectCallOffer {
ref target_fingerprint,
ref call_id,
ref caller_reflexive_addr,
..
} => {
let target_fp = target_fingerprint.clone();
let call_id = call_id.clone();
let caller_addr_for_registry = caller_reflexive_addr.clone();
// Check if target is online
let online = {
@@ -783,10 +789,15 @@ async fn main() -> anyhow::Result<()> {
continue;
}
// Create call in registry
// Create call in registry + stash the caller's
// reflex addr (Phase 3 hole-punching). The relay
// treats the addr as opaque — no validation.
// Injected later into the callee's CallSetup as
// peer_direct_addr.
{
let mut reg = call_registry.lock().await;
reg.create_call(call_id.clone(), client_fp.clone(), target_fp.clone());
reg.set_caller_reflexive_addr(&call_id, caller_addr_for_registry);
}
// Forward offer to callee
@@ -803,9 +814,15 @@ async fn main() -> anyhow::Result<()> {
}).await;
}
SignalMessage::DirectCallAnswer { ref call_id, ref accept_mode, .. } => {
SignalMessage::DirectCallAnswer {
ref call_id,
ref accept_mode,
ref callee_reflexive_addr,
..
} => {
let call_id = call_id.clone();
let mode = *accept_mode;
let callee_addr_for_registry = callee_reflexive_addr.clone();
let peer_fp = {
let reg = call_registry.lock().await;
@@ -827,13 +844,30 @@ async fn main() -> anyhow::Result<()> {
reason: wzp_proto::HangupReason::Normal,
}).await;
} else {
// Accept — create private room
// Accept — create private room + stash the
// callee's reflex addr if it advertised one
// (AcceptTrusted only — privacy-mode answers
// leave it None by design). Then read back
// BOTH parties' addrs so we can cross-wire
// peer_direct_addr on the CallSetups below.
let room = format!("call-{call_id}");
{
let (caller_addr, callee_addr) = {
let mut reg = call_registry.lock().await;
reg.set_active(&call_id, mode, room.clone());
}
info!(call_id = %call_id, room = %room, mode = ?mode, "call accepted, creating room");
reg.set_callee_reflexive_addr(&call_id, callee_addr_for_registry);
let call = reg.get(&call_id);
(
call.and_then(|c| c.caller_reflexive_addr.clone()),
call.and_then(|c| c.callee_reflexive_addr.clone()),
)
};
info!(
call_id = %call_id,
room = %room,
?mode,
p2p_viable = caller_addr.is_some() && callee_addr.is_some(),
"call accepted, creating room"
);
// Forward answer to caller
{
@@ -843,25 +877,41 @@ async fn main() -> anyhow::Result<()> {
// Send CallSetup to both parties.
//
// BUG FIX: the previous version of this used `addr.ip()`
// which is `connection.remote_address()` — the CLIENT'S
// IP, not the relay's. So CallSetup told both parties to
// dial the answerer's own IP, which meant the caller was
// sending QUIC Initials into the callee's client (no
// server listening there) and the callee was sending to
// itself. In both cases endpoint.connect() hung forever.
// Each party's `peer_direct_addr` carries the
// OTHER party's reflex addr so they can attempt
// a direct QUIC handshake to each other in
// parallel with the relay path (Phase 3
// hole-punching). Both sides falling back to the
// relay path is the Phase 0 behavior, so
// emitting `None` here is always safe.
//
// Use the relay's precomputed advertised address instead.
// BUG FIX (pre-Phase 3): the previous version of
// this used `addr.ip()` which is the client's
// remote address, not the relay's. Use the
// precomputed advertised address.
let relay_addr_for_setup = advertised_addr_str.clone();
let setup = SignalMessage::CallSetup {
// peer_fp identifies the caller here (the
// fingerprint currently on the other end of this
// answer flow); client_fp identifies the callee.
// So the CALLER gets the callee's addr as its
// peer_direct_addr, and vice versa.
let setup_for_caller = SignalMessage::CallSetup {
call_id: call_id.clone(),
room: room.clone(),
relay_addr: relay_addr_for_setup.clone(),
peer_direct_addr: callee_addr.clone(),
};
let setup_for_callee = SignalMessage::CallSetup {
call_id: call_id.clone(),
room: room.clone(),
relay_addr: relay_addr_for_setup,
peer_direct_addr: caller_addr.clone(),
};
{
let hub = signal_hub.lock().await;
let _ = hub.send_to(&peer_fp, &setup).await;
let _ = hub.send_to(&client_fp, &setup).await;
let _ = hub.send_to(&peer_fp, &setup_for_caller).await;
let _ = hub.send_to(&client_fp, &setup_for_callee).await;
}
}
}