feat(p2p): Phase 4 cross-relay direct calling over federation

Teaches the relay pair to route direct-call signaling across an
existing federation link. Alice on Relay A can now place a direct
call to Bob on Relay B if A and B are federation peers — the
wire protocol, call registry, and signal dispatch all learn to
track and route the cross-relay flow.

Phase 3.5's dual-path QUIC race then carries the media directly
peer-to-peer using the advertised reflex addrs, with zero
changes needed on the client side.

## Wire protocol (wzp-proto)

New `SignalMessage::FederatedSignalForward { inner, origin_relay_fp }`
envelope variant, appended at end of enum — JSON serde is
name-tagged so pre-Phase-4 relays just log "unknown variant" and
drop it. 2 new roundtrip tests (any-inner nesting + single
DirectCallOffer case).

## Call registry (wzp-relay)

`DirectCall.peer_relay_fp: Option<String>` — federation TLS fp
of the peer relay that forwarded the offer/answer for this call.
`None` on local calls, `Some` on cross-relay. Used by the answer
path to route the reply back through the same federation link
instead of trying (and failing) to deliver via local signal_hub.
New `set_peer_relay_fp` setter + 1 new unit test.

## FederationManager (wzp-relay)

Three new methods:
- `local_tls_fp()` — exposes the relay's own federation TLS fp
  so main.rs can build `origin_relay_fp` fields.
- `broadcast_signal(msg) -> usize` — fan out any signal message
  (in practice `FederatedSignalForward`) to every active peer
  link, returning the reach count. Used when Relay A doesn't
  know which peer has the target fingerprint.
- `send_signal_to_peer(fp, msg)` — targeted send for the reply
  path where the registry already knows which peer relay to
  hit.

Plus a new `cross_relay_signal_tx: Mutex<Option<Sender<...>>>`
field that `set_cross_relay_tx()` wires at startup so the
federation `handle_signal` can push unwrapped inner messages
into the main signal dispatcher.

## Federation handle_signal (wzp-relay)

New match arm for `FederatedSignalForward`:
- Loop prevention: drops forwards whose `origin_relay_fp` equals
  this relay's own fp (prevents A→B→A echo loops without needing
  TTL yet).
- Otherwise pulls the inner message out and pushes it through
  `cross_relay_signal_tx` so the main loop's dispatcher task
  handles it as if it had arrived locally.

## Main signal loop (wzp-relay)

### DirectCallOffer when target not local
Before falling through to Hangup, try the federation path:
- Wrap the offer in `FederatedSignalForward` with
  `origin_relay_fp = this relay's tls_fp`
- `fm.broadcast_signal(forward)` — returns peer count
- If any peers reached, stash the call in local registry with
  `caller_reflexive_addr` set, `peer_relay_fp` still None
  (broadcast — the answer-side will identify itself when it
  replies)
- Send `CallRinging` to caller immediately for UX feedback
- Only if no federation or no peers → legacy Hangup path

### DirectCallAnswer when peer is remote
- Registry lookup now reads both `peer_fingerprint` and
  `peer_relay_fp` in one acquisition
- If `peer_relay_fp.is_some()`:
  * Reject → forward a `Hangup` over federation via
    `send_signal_to_peer` instead of local signal_hub
  * Accept → wrap the raw answer in `FederatedSignalForward`,
    route to the specific origin peer, then emit the LOCAL
    CallSetup to our callee with `peer_direct_addr =
    caller_reflexive_addr` (caller is remote; this side only
    has the callee)
- If `peer_relay_fp.is_none()` → existing Phase 3 same-relay
  path with both CallSetups (caller + callee)

### Cross-relay signal dispatcher task
New long-running task reading `(inner, origin_relay_fp)` from
`cross_relay_rx`. In Phase 4 MVP handles:
- `DirectCallOffer` — if target is local, create the call in
  the registry with `peer_relay_fp = origin_relay_fp`, stash
  caller addr, deliver offer to local callee. If target isn't
  local, drop (no multi-hop in Phase 4 MVP).
- `DirectCallAnswer` — look up local caller by call_id, stash
  callee addr, forward raw answer to local caller via
  signal_hub, emit local CallSetup with `peer_direct_addr =
  callee_reflexive_addr` (peer is local now; this side only
  has the caller).
- `CallRinging` — best-effort forward to local caller for UX.
- `Hangup` — logged for now; Phase 4.1 will target by call_id.

## Integration tests

`crates/wzp-relay/tests/cross_relay_direct_call.rs` — 3 tests
that reproduce the main.rs cross-relay dispatcher logic inline
and assert the invariants without spinning up real binaries:

1. `cross_relay_offer_forwards_and_stashes_peer_relay_fp` —
   Relay A gets Alice's offer, broadcasts. Relay B's dispatcher
   creates the call with `peer_relay_fp = relay_a_tls_fp`.
2. `cross_relay_answer_crosswires_peer_direct_addrs` — full
   round trip; both CallSetups (one on each relay) carry the
   OTHER party's reflex addr.
3. `cross_relay_loop_prevention_drops_self_sourced_forward` —
   explicit loop-prevention check.

Full workspace test goes from 413 → 419 passing. Clippy clean
on touched files.

## Non-goals (deferred to Phase 4.1+)

- Relay-mediated media fallback across federation — if P2P
  direct fails (symmetric NAT on either side), the call errors
  out with "no media path". Making the existing federation
  media pipeline carry ephemeral call-<id> rooms is the Phase
  4.1 lift.
- Multi-hop federation (A → B → C). Phase 4 MVP supports a
  direct federation link between A and B only.
- Fingerprint → peer-relay routing gossip.

PRD: .taskmaster/docs/prd_phase4_cross_relay_p2p.txt
Tasks: 70-78 all completed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-04-11 17:31:43 +04:00
parent 59ce52f8e8
commit 8cdf8d486a
6 changed files with 976 additions and 48 deletions

View File

@@ -146,6 +146,14 @@ pub struct FederationManager {
event_log: EventLogger,
/// Per-room rate limiters for inbound federation media.
rate_limiters: Mutex<HashMap<String, RateLimiter>>,
/// Phase 4: channel for handing cross-relay direct-call
/// signaling (inner message + origin relay fp) back to the
/// main signal loop in `main.rs`. Set once at startup via
/// `set_cross_relay_tx`. `None` when the main loop hasn't
/// wired it up yet (e.g. during startup warmup) — forwards
/// that arrive before wiring are dropped with a warning.
cross_relay_signal_tx:
Mutex<Option<tokio::sync::mpsc::Sender<(wzp_proto::SignalMessage, String)>>>,
}
impl FederationManager {
@@ -171,6 +179,78 @@ impl FederationManager {
dedup: Mutex::new(Deduplicator::new(DEDUP_WINDOW_SIZE)),
event_log,
rate_limiters: Mutex::new(HashMap::new()),
cross_relay_signal_tx: Mutex::new(None),
}
}
/// Phase 4: expose this relay's federation TLS fingerprint so
/// the main signal loop can populate
/// `SignalMessage::FederatedSignalForward.origin_relay_fp`.
pub fn local_tls_fp(&self) -> &str {
&self.local_tls_fp
}
/// Phase 4: wire the channel that the main signal loop uses
/// to receive unwrapped cross-relay direct-call signals. Called
/// once at startup from `main.rs`.
pub async fn set_cross_relay_tx(
&self,
tx: tokio::sync::mpsc::Sender<(wzp_proto::SignalMessage, String)>,
) {
*self.cross_relay_signal_tx.lock().await = Some(tx);
}
/// Phase 4: broadcast a `SignalMessage::FederatedSignalForward`
/// to every active federation peer link. Returns the number of
/// peers the broadcast reached (not the number that successfully
/// delivered the message further). Used when the local relay
/// doesn't know which peer holds the target fingerprint for a
/// `DirectCallOffer` — whichever peer has it will unwrap and
/// handle locally; the rest drop silently after "target not
/// local" check.
///
/// Loop prevention: the receiving relay checks
/// `origin_relay_fp` against its own fp and drops self-sourced
/// forwards.
pub async fn broadcast_signal(&self, msg: &wzp_proto::SignalMessage) -> usize {
let links = self.peer_links.lock().await;
let mut count = 0;
for (fp, link) in links.iter() {
match link.transport.send_signal(msg).await {
Ok(()) => {
count += 1;
tracing::debug!(peer = %link.label, %fp, "federation: broadcast signal ok");
}
Err(e) => {
tracing::warn!(peer = %link.label, %fp, error = %e, "federation: broadcast signal failed");
}
}
}
count
}
/// Phase 4: targeted send — used by the
/// `DirectCallAnswer` path when the registry knows exactly
/// which peer relay to route the reply back to. More efficient
/// than re-broadcasting and avoids leaking the call to
/// uninvolved peers.
///
/// Returns `Ok(())` on success, `Err(String)` when the peer
/// isn't currently linked or the send fails.
pub async fn send_signal_to_peer(
&self,
peer_relay_fp: &str,
msg: &wzp_proto::SignalMessage,
) -> Result<(), String> {
let normalized = normalize_fp(peer_relay_fp);
let links = self.peer_links.lock().await;
match links.get(&normalized) {
Some(link) => link
.transport
.send_signal(msg)
.await
.map_err(|e| format!("send to peer {normalized}: {e}")),
None => Err(format!("no active federation link for {normalized}")),
}
}
@@ -852,6 +932,57 @@ async fn handle_signal(
}
}
}
// Phase 4: cross-relay direct-call signal envelope.
//
// Unwrap the inner message and hand it off to the main
// signal loop via the cross_relay_signal_tx channel. The
// main loop will then dispatch the inner DirectCallOffer/
// Answer/Ringing/Hangup exactly as if it had arrived on a
// local signal transport — with the extra context that
// the call is "federated" (origin_relay_fp).
//
// Loop prevention: drop any forward whose origin matches
// our own federation TLS fingerprint. With
// broadcast-to-all-peers this prevents A→B→A echo loops.
SignalMessage::FederatedSignalForward { inner, origin_relay_fp } => {
if origin_relay_fp == fm.local_tls_fp {
tracing::debug!(
peer = %peer_label,
"federation: dropping self-sourced FederatedSignalForward (loop prevention)"
);
return;
}
let tx_opt = {
let guard = fm.cross_relay_signal_tx.lock().await;
guard.clone()
};
match tx_opt {
Some(tx) => {
let inner_discriminant = std::mem::discriminant(&*inner);
if let Err(e) = tx.send((*inner, origin_relay_fp.clone())).await {
warn!(
peer = %peer_label,
?inner_discriminant,
error = %e,
"federation: cross-relay signal dispatcher full / closed"
);
} else {
tracing::debug!(
peer = %peer_label,
?inner_discriminant,
%origin_relay_fp,
"federation: forwarded cross-relay signal to main dispatcher"
);
}
}
None => {
warn!(
peer = %peer_label,
"federation: cross_relay_signal_tx not wired yet — dropping forward"
);
}
}
}
_ => {} // ignore other signals
}
}