feat(p2p): Phase 5.5 — ICE LAN host candidates (IPv4 + IPv6)

Same-LAN P2P was failing because MikroTik masquerade (like most
consumer NATs) doesn't support NAT hairpinning — the advertised
WAN reflex addr is unreachable from a peer on the same LAN as
the advertiser. Phase 5 got us Cone NAT classification and fixed
the measurement artifact, but same-LAN direct dials still had
nowhere to land.

Phase 5.5 adds ICE-style host candidates: each client enumerates
its LAN-local network interface addresses, includes them in the
DirectCallOffer/Answer alongside the reflex addr, and the
dual-path race fans out to ALL peer candidates in parallel.
Same-LAN peers find each other via their RFC1918 IPv4 + ULA /
global-unicast IPv6 addresses without touching the NAT at all.

Dual-stack IPv6 is in scope from the start — on modern ISPs
(including Starlink) the v6 path often works even when v4
hairpinning doesn't, because there's no NAT on the v6 side.

## Changes

### `wzp_client::reflect::local_host_candidates(port)` (new)

Enumerates network interfaces via `if-addrs` and returns
SocketAddrs paired with the caller's port. Filters:

- IPv4: RFC1918 (10/8, 172.16/12, 192.168/16) + CGNAT (100.64/10)
- IPv6: global unicast (2000::/3) + ULA (fc00::/7)
- Skipped: loopback, link-local (169.254, fe80::), public v4
  (already covered by reflex-addr), unspecified

Safe from any thread, one `getifaddrs(3)` syscall.

### Wire protocol (wzp-proto/packet.rs)

Three new `#[serde(default, skip_serializing_if = "Vec::is_empty")]`
fields, backward-compat with pre-5.5 clients/relays by
construction:

- `DirectCallOffer.caller_local_addrs: Vec<String>`
- `DirectCallAnswer.callee_local_addrs: Vec<String>`
- `CallSetup.peer_local_addrs: Vec<String>`

### Call registry (wzp-relay/call_registry.rs)

`DirectCall` gains `caller_local_addrs` + `callee_local_addrs`
Vec<String> fields. New `set_caller_local_addrs` /
`set_callee_local_addrs` setters. Follow the same pattern as
the reflex addr fields.

### Relay cross-wiring (wzp-relay/main.rs)

Both the local-call and cross-relay-federation paths now track
the local_addrs through the registry and inject them into the
CallSetup's peer_local_addrs. Cross-wiring is identical to the
existing peer_direct_addr logic — each party's CallSetup
carries the OTHER party's LAN candidates.

### Client side (desktop/src-tauri/lib.rs)

- `place_call`: gathers local host candidates via
  `local_host_candidates(signal_endpoint.local_addr().port())`
  and includes them in `DirectCallOffer.caller_local_addrs`.
  The port match is critical — it's the Phase 5 shared signal
  socket, so incoming dials to these addrs land on the same
  endpoint that's already listening.
- `answer_call`: same, AcceptTrusted only (privacy mode keeps
  LAN addrs hidden too, for consistency with the reflex addr).
- `connect` Tauri command: new `peer_local_addrs: Vec<String>`
  arg. Builds a `PeerCandidates` bundle and passes it to the
  dual-path race.
- Recv loop's CallSetup handler: destructures + forwards the
  new field to JS via the signal-event payload.

### `dual_path::race` (wzp-client/dual_path.rs)

Signature change: takes `PeerCandidates` (reflex + local Vec)
instead of a single SocketAddr. The D-role branch now fans out
N parallel dials via `tokio::task::JoinSet` — one per candidate
— and the first successful dial wins (losers are aborted
immediately via `set.abort_all()`). Only when ALL candidates
have failed do we return Err; individual candidate failures are
just traced at debug level and the race waits for the others.

LAN host candidates are tried BEFORE the reflex addr in
`PeerCandidates::dial_order()` — they're faster when they work,
and the reflex addr is the fallback for the not-on-same-LAN
case.

### JS side (desktop/main.ts)

`connect` invoke now passes `peerLocalAddrs: data.peer_local_addrs ?? []`
alongside the existing `peerDirectAddr`.

### Tests

All existing test callsites updated for the new Vec<String>
fields (defaults to Vec::new() in tests — they don't exercise
the multi-candidate path). `dual_path.rs` integration tests
wrap the single `dead_peer` / `acceptor_listen_addr` in a
`PeerCandidates { reflexive: Some(_), local: Vec::new() }`.

Full workspace test: 423 passing (same as before 5.5).

## Expected behavior on the reporter's setup

Two phones behind MikroTik, both on the same LAN:

  place_call:host_candidates {"local_addrs": ["192.168.88.21:XXX", "2001:...:YY:XXX"]}
  recv:DirectCallAnswer {"callee_local_addrs": ["192.168.88.22:ZZZ", "2001:...:WW:ZZZ"]}
  recv:CallSetup {"peer_direct_addr":"150.228.49.65:NN",
                  "peer_local_addrs":["192.168.88.22:ZZZ","2001:...:WW:ZZZ"]}
  connect:dual_path_race_start {"peer_reflex":"...","peer_local":[...]}
  dual_path: direct dial succeeded on candidate 0   ← LAN v4 wins
  connect:dual_path_race_won {"path":"Direct"}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-04-12 07:34:49 +04:00
parent 8990514417
commit fa038df057
13 changed files with 463 additions and 82 deletions

View File

@@ -24,6 +24,12 @@ chrono = "0.4"
rustls = { version = "0.23", default-features = false, features = ["ring", "std"] }
cpal = { version = "0.15", optional = true }
libc = "0.2"
# Phase 5.5 — LAN host-candidate ICE: enumerate local network
# interface addresses for inclusion in DirectCallOffer/Answer so
# peers on the same LAN can direct-connect without NAT hairpinning
# through the WAN reflex addr (which many consumer NATs, including
# MikroTik's default masquerade, don't support).
if-addrs = "0.13"
# coreaudio-rs is Apple-framework-only; gate it to macOS so enabling
# the `vpio` feature from a non-macOS target builds cleanly instead of

View File

@@ -773,6 +773,7 @@ async fn run_signal_mode(
// CLI client doesn't attempt hole-punching; always
// relay-path.
caller_reflexive_addr: None,
caller_local_addrs: Vec::new(),
}).await?;
}
@@ -805,12 +806,13 @@ async fn run_signal_mode(
// CLI auto-accept uses generic (privacy) mode,
// so callee addr stays hidden from the caller.
callee_reflexive_addr: None,
callee_local_addrs: Vec::new(),
}).await;
}
SignalMessage::DirectCallAnswer { call_id, accept_mode, .. } => {
info!(call_id = %call_id, mode = ?accept_mode, "call answered");
}
SignalMessage::CallSetup { call_id, room, relay_addr: setup_relay, peer_direct_addr: _ } => {
SignalMessage::CallSetup { call_id, room, relay_addr: setup_relay, peer_direct_addr: _, peer_local_addrs: _ } => {
info!(call_id = %call_id, room = %room, relay = %setup_relay, "call setup — connecting to media room");
// Connect to the media room

View File

@@ -52,28 +52,66 @@ pub enum WinningPath {
/// genuinely fail (network partition). Returns
/// `Err(anyhow::anyhow!(...))` if both paths fail within the
/// timeout.
/// Phase 5.5 candidate bundle — full ICE-ish candidate list for
/// the peer. The race tries them all in parallel alongside the
/// relay path. At minimum this should contain the peer's
/// server-reflexive address; `local_addrs` carries LAN host
/// candidates gathered from their physical interfaces.
///
/// Empty is valid: the D-role has nothing to dial and the race
/// reduces to "relay only" + (if A-role) accepting on the
/// shared endpoint.
#[derive(Debug, Clone, Default)]
pub struct PeerCandidates {
/// Peer's server-reflexive address (Phase 3). `None` if the
/// peer didn't advertise one.
pub reflexive: Option<SocketAddr>,
/// Peer's LAN host addresses (Phase 5.5). Tried first on
/// same-LAN pairs — direct dials to these bypass the NAT
/// entirely.
pub local: Vec<SocketAddr>,
}
impl PeerCandidates {
/// Flatten into the list of addrs the D-role should dial.
/// Order: LAN host candidates first (fastest when they
/// work), then reflexive (covers the non-LAN case).
pub fn dial_order(&self) -> Vec<SocketAddr> {
let mut out = Vec::with_capacity(self.local.len() + 1);
out.extend(self.local.iter().copied());
if let Some(a) = self.reflexive {
// Only add if it's not already in the list (some
// edge cases on same-LAN could have the same addr
// in both).
if !out.contains(&a) {
out.push(a);
}
}
out
}
/// Is there anything for the D-role to dial? If not, the
/// race reduces to relay-only.
pub fn is_empty(&self) -> bool {
self.reflexive.is_none() && self.local.is_empty()
}
}
#[allow(clippy::too_many_arguments)]
pub async fn race(
role: Role,
peer_direct_addr: SocketAddr,
peer_candidates: PeerCandidates,
relay_addr: SocketAddr,
room_sni: String,
call_sni: String,
// Phase 5: when `Some`, reuse this endpoint for BOTH the
// direct-path branch AND the relay dial. This is critical
// for hole-punching through port-preserving NATs — the
// advertised reflex addr only matches what peers can dial if
// the listening socket is the SAME one that registered with
// the relay. Pass the signal endpoint here.
// direct-path branch AND the relay dial. Pass the signal
// endpoint. The endpoint MUST be server-capable (created
// with a server config) for the A-role accept branch to
// work.
//
// The endpoint MUST have been created with a server config
// (`create_endpoint(bind, Some(server_config()))`) if the
// A-role branch is going to run, otherwise `accept()` will
// return None immediately.
//
// When `None`, falls back to the pre-Phase-5 behavior of
// creating fresh endpoints per role. Used by tests and by
// paths where we're not registered to a relay.
// When `None`, falls back to fresh endpoints per role.
// Used by tests.
shared_endpoint: Option<wzp_transport::Endpoint>,
) -> anyhow::Result<(Arc<QuinnTransport>, WinningPath)> {
// Rustls provider must be installed before any quinn endpoint
@@ -81,9 +119,22 @@ pub async fn race(
let _ = rustls::crypto::ring::default_provider().install_default();
// Build the direct-path endpoint + future based on role.
// Each future returns an already-wrapped `QuinnTransport` so we
// don't need a direct `quinn::Connection` type in scope here
// (this crate doesn't depend on quinn directly).
//
// A-role: one accept future on the shared endpoint. The
// first incoming QUIC connection wins — we don't care
// which peer candidate the dialer used to reach us.
//
// D-role: N parallel dial futures, one per peer candidate
// (all LAN host addrs + the reflex addr), consolidated
// into a single direct_fut via FuturesUnordered-style
// "first OK wins" semantics. The first successful dial
// becomes the direct path; the losers are dropped (quinn
// will abort the in-flight handshakes via the dropped
// Connecting futures).
//
// Either way, direct_fut resolves to a single QuinnTransport
// (or an error) and is raced against the relay_fut by the
// outer tokio::select!.
let direct_ep: wzp_transport::Endpoint;
let direct_fut: std::pin::Pin<
Box<dyn std::future::Future<Output = anyhow::Result<QuinnTransport>> + Send>,
@@ -113,15 +164,12 @@ pub async fn race(
let ep_for_fut = ep.clone();
direct_fut = Box::pin(async move {
// `wzp_transport::accept` wraps the same
// `endpoint.accept().await?.await?` dance we want
// and maps errors into TransportError for us.
//
// `endpoint.accept().await?.await?` dance we want.
// If `ep_for_fut` is the shared signal endpoint,
// this accept pulls the NEXT incoming connection
// normally that's the peer's direct-P2P dial.
// Signal recv is done via the existing signal
// CONNECTION (accept_bi), not the endpoint, so
// there's no conflict.
// this pulls the NEXT incoming connection
// normally that's the peer's direct-P2P dial.
// Signal recv is done via the signal CONNECTION
// (accept_bi), not the endpoint, so no conflict.
let conn = wzp_transport::accept(&ep_for_fut)
.await
.map_err(|e| anyhow::anyhow!("direct accept: {e}"))?;
@@ -134,8 +182,8 @@ pub async fn race(
Some(ep) => {
tracing::info!(
local_addr = ?ep.local_addr().ok(),
%peer_direct_addr,
"dual_path: D-role reusing shared endpoint to dial peer"
candidates = ?peer_candidates.dial_order(),
"dual_path: D-role reusing shared endpoint to dial peer candidates"
);
ep
}
@@ -144,21 +192,86 @@ pub async fn race(
let fresh = wzp_transport::create_endpoint(bind, None)?;
tracing::info!(
local_addr = ?fresh.local_addr().ok(),
%peer_direct_addr,
"dual_path: D-role fresh endpoint up, dialing peer"
candidates = ?peer_candidates.dial_order(),
"dual_path: D-role fresh endpoint up, dialing peer candidates"
);
fresh
}
};
let ep_for_fut = ep.clone();
let client_cfg = wzp_transport::client_config();
let dial_order = peer_candidates.dial_order();
let sni = call_sni.clone();
direct_fut = Box::pin(async move {
let conn =
wzp_transport::connect(&ep_for_fut, peer_direct_addr, &sni, client_cfg)
.await
.map_err(|e| anyhow::anyhow!("direct dial: {e}"))?;
Ok(QuinnTransport::new(conn))
if dial_order.is_empty() {
// No candidates — the race reduces to
// relay-only. Surface a stable error so the
// outer select falls through to relay_fut
// without a spurious "direct failed" warning.
// Use a pending future that never resolves so
// the select's "other side wins" branch is
// the natural outcome.
std::future::pending::<anyhow::Result<QuinnTransport>>().await
} else {
// Fan out N parallel dials via JoinSet. First
// `Ok` wins; `Err` from a single candidate is
// not fatal — we wait for the others. Only
// when ALL have failed do we return Err.
let mut set = tokio::task::JoinSet::new();
for (idx, candidate) in dial_order.iter().enumerate() {
let ep = ep_for_fut.clone();
let client_cfg = wzp_transport::client_config();
let sni = sni.clone();
let candidate = *candidate;
set.spawn(async move {
let result = wzp_transport::connect(
&ep,
candidate,
&sni,
client_cfg,
)
.await;
(idx, candidate, result)
});
}
let mut last_err: Option<String> = None;
while let Some(join_res) = set.join_next().await {
let (idx, candidate, dial_res) = match join_res {
Ok(t) => t,
Err(e) => {
last_err = Some(format!("join {e}"));
continue;
}
};
match dial_res {
Ok(conn) => {
tracing::info!(
%candidate,
candidate_idx = idx,
"dual_path: direct dial succeeded on candidate"
);
// Abort the remaining in-flight
// dials so they don't complete
// and leak QUIC sessions.
set.abort_all();
return Ok(QuinnTransport::new(conn));
}
Err(e) => {
tracing::debug!(
%candidate,
candidate_idx = idx,
error = %e,
"dual_path: direct dial failed, trying others"
);
last_err = Some(format!("candidate {candidate}: {e}"));
}
}
}
Err(anyhow::anyhow!(
"all {} direct candidates failed; last: {}",
dial_order.len(),
last_err.unwrap_or_else(|| "n/a".into())
))
}
});
direct_ep = ep;
}
@@ -193,7 +306,12 @@ pub async fn race(
// below need to await the OPPOSITE future after the winning
// branch fires. Without pinning, tokio::select! moves the
// future out and we can't touch it again.
tracing::info!(?role, %peer_direct_addr, %relay_addr, "dual_path: racing direct vs relay");
tracing::info!(
?role,
candidates = ?peer_candidates.dial_order(),
%relay_addr,
"dual_path: racing direct vs relay"
);
let direct_timed = tokio::time::timeout(Duration::from_secs(2), direct_fut);
tokio::pin!(direct_timed, relay_fut);
@@ -202,7 +320,7 @@ pub async fn race(
direct_result = &mut direct_timed => {
match direct_result {
Ok(Ok(transport)) => {
tracing::info!(%peer_direct_addr, "dual_path: direct WON");
tracing::info!("dual_path: direct WON");
Ok((Arc::new(transport), WinningPath::Direct))
}
Ok(Err(e)) => {

View File

@@ -262,6 +262,88 @@ pub async fn detect_nat_type(
}
}
/// Enumerate LAN-local host candidates this client is reachable
/// on, paired with the given port (typically the signal
/// endpoint's bound port so that incoming dials land on the same
/// socket the advertised reflex addr points to).
///
/// Gathers BOTH IPv4 and IPv6 candidates:
///
/// - **IPv4**: RFC1918 private ranges (10/8, 172.16/12, 192.168/16)
/// and CGNAT shared-transition (100.64/10). Public IPv4 is
/// skipped because the reflex-addr path already covers it.
/// Loopback and link-local (169.254/16) are skipped.
///
/// - **IPv6**: ALL global-unicast addresses (2000::/3 — the real
/// routable IPv6 space) AND unique-local (fc00::/7). These
/// are directly dialable from a peer on the same LAN, and on
/// true dual-stack LANs (which most consumer ISPs now provide,
/// including Starlink) IPv6 often gives a direct path even
/// when IPv4 can't hairpin. Loopback (::1), unspecified (::),
/// and link-local (fe80::/10) are skipped — link-local would
/// require a scope ID to be useful and is basically never
/// reachable across interface boundaries.
///
/// The port must come from the caller — typically
/// `signal_endpoint.local_addr()?.port()`, so that the peer's
/// dials to these addresses land on the same socket that's
/// already listening (Phase 5 shared-endpoint architecture).
///
/// Safe to call from any thread; no I/O, no async. The `if-addrs`
/// crate reads the kernel's interface table via a single
/// getifaddrs(3) syscall.
pub fn local_host_candidates(port: u16) -> Vec<SocketAddr> {
let Ok(ifaces) = if_addrs::get_if_addrs() else {
return Vec::new();
};
let mut out = Vec::new();
for iface in ifaces {
if iface.is_loopback() {
continue;
}
match iface.ip() {
std::net::IpAddr::V4(v4) => {
if v4.is_link_local() {
continue;
}
// Keep RFC1918 private ranges and CGNAT — those
// are the LAN-dialable addrs we actually want.
// Skip public v4 because the reflex addr already
// covers that path.
if v4.is_private() {
out.push(SocketAddr::new(std::net::IpAddr::V4(v4), port));
} else if v4.octets()[0] == 100 && (v4.octets()[1] & 0xc0) == 0x40 {
// 100.64/10 CGNAT — rare but valid if two
// phones are on the same CGNAT-hairpinned
// carrier LAN (some hotspot setups).
out.push(SocketAddr::new(std::net::IpAddr::V4(v4), port));
}
}
std::net::IpAddr::V6(v6) => {
if v6.is_loopback() || v6.is_unspecified() {
continue;
}
// Link-local (fe80::/10) — skip because it needs
// a zone/scope ID to be usable and that scope is
// meaningless to the peer.
let first = v6.segments()[0];
if (first & 0xffc0) == 0xfe80 {
continue;
}
// Include everything else: ULA (fc00::/7, high
// bits 0xfc00/0xfd00) and global unicast
// (2000::/3, first segment 0x2000-0x3fff). Both
// are directly dialable from a peer on the same
// dual-stack LAN, and on Starlink / most modern
// ISPs the IPv6 path usually has no CGNAT and
// works even when the v4 path doesn't hairpin.
out.push(SocketAddr::new(std::net::IpAddr::V6(v6), port));
}
}
}
out
}
/// Role assignment for the Phase 3.5 dual-path QUIC race.
///
/// Both peers already know two strings at CallSetup time: their

View File

@@ -19,7 +19,7 @@
use std::net::{Ipv4Addr, SocketAddr};
use std::time::Duration;
use wzp_client::dual_path::{race, WinningPath};
use wzp_client::dual_path::{race, PeerCandidates, WinningPath};
use wzp_client::reflect::Role;
use wzp_transport::{create_endpoint, server_config};
@@ -110,7 +110,10 @@ async fn dual_path_direct_wins_on_loopback() {
// should win.
let result = race(
Role::Dialer,
acceptor_listen_addr,
PeerCandidates {
reflexive: Some(acceptor_listen_addr),
local: Vec::new(),
},
relay_addr,
"test-room".into(),
"call-test".into(),
@@ -148,7 +151,10 @@ async fn dual_path_relay_wins_when_direct_is_dead() {
let result = race(
Role::Dialer,
dead_peer,
PeerCandidates {
reflexive: Some(dead_peer),
local: Vec::new(),
},
relay_addr,
"test-room".into(),
"call-test".into(),
@@ -182,7 +188,10 @@ async fn dual_path_errors_cleanly_when_both_paths_dead() {
let start = std::time::Instant::now();
let result = race(
Role::Dialer,
dead_peer,
PeerCandidates {
reflexive: Some(dead_peer),
local: Vec::new(),
},
dead_relay,
"test-room".into(),
"call-test".into(),