fix(p2p): skip direct P2P when peers are on different public IPs

Race condition: when two phones are on different networks (WiFi
vs LTE, home vs office, etc.), each side's dual-path race runs
independently. One side may pick Direct while the other picks
Relay, causing both to send media to different places — TX > 0,
RX: 0 on both sides, completely silent call.

Root cause: the dual-path race doesn't have a negotiation step.
Each side picks the first transport that completes a QUIC
handshake, which may be a different path than the other side
picked. On same-LAN this doesn't matter because direct always
wins on both (the 500ms relay delay guarantees it). On cross-
network, the asymmetry bites.

Heuristic fix: compare own_reflex_addr IP to peer_reflex_addr
IP. If they're different → different networks → force relay-only
(set role = None, which skips the dual-path race entirely).

Same public IP means same LAN / same NAT:
  → LAN host candidates work, direct always wins on both sides
  → Safe for P2P

Different public IPs means cross-network:
  → Direct may work on one side but not the other
  → Relay is the safe choice for both

This preserves the proven same-LAN P2P and eliminates the broken
cross-network case. The full fix is ICE-style path negotiation
(Phase 6) where both sides exchange connectivity check results
through the signal plane and agree on a winner before committing
media — but that's a 500+ line protocol change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-04-12 09:50:56 +04:00
parent 0a973b234b
commit de007ec2fd

View File

@@ -366,11 +366,59 @@ async fn connect(
.as_deref()
.and_then(|s| s.parse().ok());
let relay_addr_parsed: Option<std::net::SocketAddr> = relay.parse().ok();
let role = wzp_client::reflect::determine_role(
let mut role = wzp_client::reflect::determine_role(
own_reflex_addr.as_deref(),
peer_direct_addr.as_deref(),
);
// Phase 5.6 safety heuristic: only attempt P2P direct when
// both peers are behind the SAME public IP (= same LAN / same
// NAT). When they have DIFFERENT public IPs they're on
// different networks, and the current dual-path race has a
// race condition: one side may pick Direct while the other
// picks Relay, sending media to different places (TX > 0,
// RX: 0 on both — the screenshots that triggered this fix).
//
// Same public IP means:
// - Same LAN behind the same router → LAN candidates work
// - Same WAN NAT → reflex addr from the relay is the same
// endpoint for both peers, direct dial should work
//
// Different public IPs means:
// - Cross-network (WiFi ↔ LTE, home ↔ office, etc.)
// - Direct P2P requires symmetric hole-punching which the
// current architecture can't guarantee both sides agree on
// - Relay-only is the safe choice until ICE negotiation is
// implemented (Phase 6)
//
// This heuristic preserves same-LAN P2P (proven working) and
// eliminates the broken cross-network case. The full fix is
// ICE-style path negotiation where both sides agree on the
// winner before committing media.
if let Some(ref r) = role {
let same_public_ip = match (
own_reflex_addr.as_deref().and_then(|s| s.parse::<std::net::SocketAddr>().ok()),
peer_addr_parsed,
) {
(Some(own), Some(peer)) => own.ip() == peer.ip(),
_ => false,
};
if !same_public_ip {
tracing::info!(
?r,
own = ?own_reflex_addr,
peer = ?peer_direct_addr,
"connect: different public IPs → skipping P2P direct (relay-only until ICE negotiation is implemented)"
);
emit_call_debug(&app, "connect:cross_network_relay_only", serde_json::json!({
"own_reflex": own_reflex_addr,
"peer_reflex": peer_direct_addr,
"reason": "different public IPs — both sides must use relay to avoid path mismatch",
}));
role = None; // forces relay-only path
}
}
// Phase 5.5: build the full peer candidate bundle (reflex +
// LAN hosts). The dial_order helper will fan them out in
// priority order for the D-role race.