New signal infrastructure for the lobby-first UI:
- PresenceUser struct: { fingerprint, alias }
- SignalMessage::PresenceList: relay broadcasts full user list
to all signal clients on every register/deregister
- SignalHub::presence_list(): builds the list from connected clients
- SignalHub::broadcast(): sends to ALL signal clients
- Relay calls broadcast on register + unregister
- Desktop emits "presence_list" signal-event to JS frontend
This gives clients real-time visibility of who's online via the
signal channel, without needing to join a voice room first.
603 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New SignalMessage variants for P2P quality coordination:
UpgradeProposal/UpgradeResponse/UpgradeConfirm (#28):
- Consensual quality upgrade flow — proposer sends desired profile,
peer accepts/rejects based on own conditions, confirm commits both
- All carry call_id for relay routing
QualityCapability (#30):
- Peer reports its max sustainable profile — enables asymmetric
encoding where each side uses its own best quality instead of
forcing everyone to the weakest link
Relay forwards all 4 signals to the call peer (same pattern as
MediaPathReport, CandidateUpdate, HardNatProbe).
Desktop signal recv loop handles all 4 with debug logging.
Encoder switching TODOs noted for wiring into CallEngine.
4 new serde roundtrip tests. 603 total, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Birthday attack for random symmetric NATs:
- birthday.rs: open_acceptor_ports() opens N sockets, STUN-probes
each to learn external ports. generate_dialer_targets() builds
hit list (known ports first, then random fill). spray_dialer()
sprays QUIC connects with rate limiting, first success wins.
- Default: 32 acceptor ports, 128 dialer probes, 20ms interval
Signal coordination:
- HardNatBirthdayStart { acceptor_ports, external_ip } sent by
Acceptor when peer's HardNatProbe shows random/sequential NAT
- Relay forwards it like other call signals
- Desktop recv loop handles and logs it
Hybrid waterfall integration:
- On receiving HardNatProbe with non-cone allocation, Acceptor
auto-opens birthday ports and sends BirthdayStart
- Sockets kept alive 10s for NAT mapping persistence
- Dialer spray integration into race() pending (needs transport
hot-swap for background upgrade)
6 new tests, 599 total, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase A of hard NAT traversal (PRD-hard-nat.md):
- PortAllocation enum: PortPreserving / Sequential{delta} / Random / Unknown
- detect_port_allocation(): sequential STUN probes from single socket,
analyzes port sequence for allocation pattern
- classify_port_allocation(): pure function with jitter tolerance,
wraparound handling, 60% threshold for noisy sequences
- predict_ports(): generates target port range from last_port + delta
- HardNatProbe signal message: carries port_sequence, allocation
pattern, external_ip for peer coordination
- Relay forwards HardNatProbe to call peer
- Netcheck gains port_allocation field + format_report display
588 tests pass (17 new), 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After 30s stable at a tier, the AdaptiveQualityController actively
probes the next tier up by switching the encoder and observing for 5s.
If loss/RTT stay within the target tier's thresholds, the upgrade
commits. If >1 bad report, the probe aborts with a 60s cooldown.
Probing is disabled on cellular (studio tiers aren't classified there)
and skipped when already at Studio64k (highest tier).
This complements the passive upgrade path (10 consecutive good reports)
by actively discovering that a path can sustain higher quality, rather
than waiting for the classification to drift upward.
New: ProbeState struct, check_probe() method, 4 constants, 5 tests.
377 tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P2P calls now adapt codec quality based on observed network conditions,
matching what relay calls already had.
Three-layer implementation:
- QualityReport::from_path_stats(): construct reports from local quinn
stats (loss%, RTT, jitter) without needing relay-generated reports
- CallEncoder.pending_quality_report: one-shot attachment to next
source packet (consumed on encode, not repeated)
- Engine send tasks: generate quality report every 50 frames (~1s)
from quinn_path_stats() and attach via set_pending_quality_report()
- Engine recv tasks: self-observe from own QUIC path stats every 50
packets, feed to AdaptiveQualityController for P2P adaptation
(works even if peer isn't sending quality reports yet)
Both relay and P2P calls now have adaptive quality. On relay calls,
both peer-sent reports AND local observations feed the controller.
Hysteresis (3 consecutive bad reports to downgrade) prevents thrashing.
372 tests passing (+4 new: from_path_stats encoding, clamping, zero
values, encoder quality report attachment).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good +
Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5,
studio:10)
- Handle QualityDirective signals in both desktop and Android engines
— relay-coordinated codec switching now works end-to-end
- Add periodic TAP STATS to debug tap: packets in/out, fan-out avg,
seq gaps, codecs seen (every 5s)
- Mark task #2 done (ParticipantInfo in federation signals already
implemented)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wire AdaptiveQualityController into desktop engine send/recv tasks
(mirrors Android pattern: AtomicU8 pending_profile, auto-mode check)
- Wire same into Android engine send task (was only in recv before)
- QualityDirective SignalMessage variant for relay-initiated codec switch
- ParticipantQuality tracking in relay RoomManager (per-participant
AdaptiveQualityController, weakest-link tier computation)
- Relay broadcasts QualityDirective to all participants when room-wide
tier degrades (coordinated codec switching)
- Oboe stream state polling: poll getState() for up to 2s after
requestStart() to ensure both streams reach Started before proceeding
(fixes intermittent silent calls on cold start, Nothing Phone A059)
Tasks: #7, #25, #26, #31, #35
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: Hangup had no call_id field. The relay forwarded hangups to
ALL active calls for a user. When user A hung up call 1 and user B
immediately placed call 2, the relay's processing of A's hangup would
also kill call 2 (race window ~1-2s).
Fix: add optional call_id to Hangup (backwards-compatible via serde
skip_serializing_if). When present, the relay only ends the named call.
Old clients send call_id=None and get the legacy broadcast behavior.
Also: clear pending_path_report in Hangup recv handler and
internal_deregister to prevent stale oneshot channels from blocking
subsequent call setups.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add relay_build field to RegisterPresenceAck so the client logs
which relay version it connected to. Shows in the debug log as
register_signal:ack_received {"relay_build":"f843a93"}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add caller_build_version / callee_build_version (git short hash)
to DirectCallOffer and DirectCallAnswer so peers can identify each
other's build in debug logs. Also log own build at register time.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Before Phase 6, each side's dual-path race ran independently and
committed to whichever transport completed first. When one side
picked Direct and the other picked Relay, they sent media to
different places — TX > 0 RX: 0 on both, completely silent call.
Phase 6 adds a negotiation step: after the local race completes,
each side sends a MediaPathReport { call_id, direct_ok, winner }
to the peer through the relay. Both wait for the other's report
before committing a transport to the CallEngine. The decision
rule is simple: if BOTH report direct_ok = true, use direct; if
EITHER reports false, BOTH use relay.
## Wire protocol
New `SignalMessage::MediaPathReport { call_id, direct_ok,
race_winner }`. The relay forwards it to the call peer via the
same signal_hub routing used for DirectCallOffer/Answer. The
cross-relay dispatcher also forwards it.
## dual_path::race restructured
Returns `RaceResult` instead of `(Arc<QuinnTransport>, WinningPath)`:
- `direct_transport: Option<Arc<QuinnTransport>>`
- `relay_transport: Option<Arc<QuinnTransport>>`
- `local_winner: WinningPath`
Both paths are run as spawned tasks. After the first completes,
a 1s grace period lets the loser also finish. The connect
command gets BOTH transports (when available) and picks the
right one based on the negotiation outcome. The unused transport
is dropped.
## connect command flow (revised)
1. Run race() → RaceResult with both transports
2. Send MediaPathReport to relay with our direct_ok
3. Install oneshot; wait for peer's report (3s timeout)
4. Decision: both direct_ok → use direct; else → use relay
5. Start CallEngine with the agreed transport
If the peer never responds (old build, timeout), falls back to
relay — backward compatible.
## Relay forwarding
MediaPathReport is forwarded like DirectCallOffer/Answer: via
signal_hub.send_to(peer_fp) for same-relay calls, and via
cross-relay dispatcher for federated calls.
## Debug log events
- `connect:dual_path_race_done` — local race result
- `connect:path_report_sent` — our report to the peer
- `connect:peer_report_received` — peer's report
- `connect:peer_report_timeout` — peer didn't respond (3s)
- `connect:path_negotiated` — final agreed path with reasons
Full workspace test: 423 passing (no regressions).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same-LAN P2P was failing because MikroTik masquerade (like most
consumer NATs) doesn't support NAT hairpinning — the advertised
WAN reflex addr is unreachable from a peer on the same LAN as
the advertiser. Phase 5 got us Cone NAT classification and fixed
the measurement artifact, but same-LAN direct dials still had
nowhere to land.
Phase 5.5 adds ICE-style host candidates: each client enumerates
its LAN-local network interface addresses, includes them in the
DirectCallOffer/Answer alongside the reflex addr, and the
dual-path race fans out to ALL peer candidates in parallel.
Same-LAN peers find each other via their RFC1918 IPv4 + ULA /
global-unicast IPv6 addresses without touching the NAT at all.
Dual-stack IPv6 is in scope from the start — on modern ISPs
(including Starlink) the v6 path often works even when v4
hairpinning doesn't, because there's no NAT on the v6 side.
## Changes
### `wzp_client::reflect::local_host_candidates(port)` (new)
Enumerates network interfaces via `if-addrs` and returns
SocketAddrs paired with the caller's port. Filters:
- IPv4: RFC1918 (10/8, 172.16/12, 192.168/16) + CGNAT (100.64/10)
- IPv6: global unicast (2000::/3) + ULA (fc00::/7)
- Skipped: loopback, link-local (169.254, fe80::), public v4
(already covered by reflex-addr), unspecified
Safe from any thread, one `getifaddrs(3)` syscall.
### Wire protocol (wzp-proto/packet.rs)
Three new `#[serde(default, skip_serializing_if = "Vec::is_empty")]`
fields, backward-compat with pre-5.5 clients/relays by
construction:
- `DirectCallOffer.caller_local_addrs: Vec<String>`
- `DirectCallAnswer.callee_local_addrs: Vec<String>`
- `CallSetup.peer_local_addrs: Vec<String>`
### Call registry (wzp-relay/call_registry.rs)
`DirectCall` gains `caller_local_addrs` + `callee_local_addrs`
Vec<String> fields. New `set_caller_local_addrs` /
`set_callee_local_addrs` setters. Follow the same pattern as
the reflex addr fields.
### Relay cross-wiring (wzp-relay/main.rs)
Both the local-call and cross-relay-federation paths now track
the local_addrs through the registry and inject them into the
CallSetup's peer_local_addrs. Cross-wiring is identical to the
existing peer_direct_addr logic — each party's CallSetup
carries the OTHER party's LAN candidates.
### Client side (desktop/src-tauri/lib.rs)
- `place_call`: gathers local host candidates via
`local_host_candidates(signal_endpoint.local_addr().port())`
and includes them in `DirectCallOffer.caller_local_addrs`.
The port match is critical — it's the Phase 5 shared signal
socket, so incoming dials to these addrs land on the same
endpoint that's already listening.
- `answer_call`: same, AcceptTrusted only (privacy mode keeps
LAN addrs hidden too, for consistency with the reflex addr).
- `connect` Tauri command: new `peer_local_addrs: Vec<String>`
arg. Builds a `PeerCandidates` bundle and passes it to the
dual-path race.
- Recv loop's CallSetup handler: destructures + forwards the
new field to JS via the signal-event payload.
### `dual_path::race` (wzp-client/dual_path.rs)
Signature change: takes `PeerCandidates` (reflex + local Vec)
instead of a single SocketAddr. The D-role branch now fans out
N parallel dials via `tokio::task::JoinSet` — one per candidate
— and the first successful dial wins (losers are aborted
immediately via `set.abort_all()`). Only when ALL candidates
have failed do we return Err; individual candidate failures are
just traced at debug level and the race waits for the others.
LAN host candidates are tried BEFORE the reflex addr in
`PeerCandidates::dial_order()` — they're faster when they work,
and the reflex addr is the fallback for the not-on-same-LAN
case.
### JS side (desktop/main.ts)
`connect` invoke now passes `peerLocalAddrs: data.peer_local_addrs ?? []`
alongside the existing `peerDirectAddr`.
### Tests
All existing test callsites updated for the new Vec<String>
fields (defaults to Vec::new() in tests — they don't exercise
the multi-candidate path). `dual_path.rs` integration tests
wrap the single `dead_peer` / `acceptor_listen_addr` in a
`PeerCandidates { reflexive: Some(_), local: Vec::new() }`.
Full workspace test: 423 passing (same as before 5.5).
## Expected behavior on the reporter's setup
Two phones behind MikroTik, both on the same LAN:
place_call:host_candidates {"local_addrs": ["192.168.88.21:XXX", "2001:...:YY:XXX"]}
recv:DirectCallAnswer {"callee_local_addrs": ["192.168.88.22:ZZZ", "2001:...:WW:ZZZ"]}
recv:CallSetup {"peer_direct_addr":"150.228.49.65:NN",
"peer_local_addrs":["192.168.88.22:ZZZ","2001:...:WW:ZZZ"]}
connect:dual_path_race_start {"peer_reflex":"...","peer_local":[...]}
dual_path: direct dial succeeded on candidate 0 ← LAN v4 wins
connect:dual_path_race_won {"path":"Direct"}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both sides of the signal channel previously broke their recv loop
on any deserialize error, which meant adding a new variant in one
build silently killed signal connections from peers running an
older build. This bit us during Phase 1 testing: a new client
sending SignalMessage::Reflect to a pre-Phase-1 relay caused the
relay to drop the whole signal connection, which looked like
"Error: not registered" on the next place_call.
Fix:
- New TransportError::Deserialize(String) variant in wzp-proto
carries serde errors as a distinct category.
- wzp-transport/reliable.rs::recv_signal returns Deserialize on
serde_json::from_slice failures (was wrapped in Internal).
- wzp-relay/main.rs signal loop matches on Deserialize → warn +
continue (instead of break).
- desktop/src-tauri/lib.rs recv loop does the same.
Other TransportError variants (ConnectionLost, Io, Internal) still
break the loop — only pure parse failures are recoverable.
This means future SignalMessage variant additions are backward-
compat by construction: older peers will see "unknown variant,
continuing" in their logs while newer peers can keep evolving the
protocol.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Teaches the relay pair to route direct-call signaling across an
existing federation link. Alice on Relay A can now place a direct
call to Bob on Relay B if A and B are federation peers — the
wire protocol, call registry, and signal dispatch all learn to
track and route the cross-relay flow.
Phase 3.5's dual-path QUIC race then carries the media directly
peer-to-peer using the advertised reflex addrs, with zero
changes needed on the client side.
## Wire protocol (wzp-proto)
New `SignalMessage::FederatedSignalForward { inner, origin_relay_fp }`
envelope variant, appended at end of enum — JSON serde is
name-tagged so pre-Phase-4 relays just log "unknown variant" and
drop it. 2 new roundtrip tests (any-inner nesting + single
DirectCallOffer case).
## Call registry (wzp-relay)
`DirectCall.peer_relay_fp: Option<String>` — federation TLS fp
of the peer relay that forwarded the offer/answer for this call.
`None` on local calls, `Some` on cross-relay. Used by the answer
path to route the reply back through the same federation link
instead of trying (and failing) to deliver via local signal_hub.
New `set_peer_relay_fp` setter + 1 new unit test.
## FederationManager (wzp-relay)
Three new methods:
- `local_tls_fp()` — exposes the relay's own federation TLS fp
so main.rs can build `origin_relay_fp` fields.
- `broadcast_signal(msg) -> usize` — fan out any signal message
(in practice `FederatedSignalForward`) to every active peer
link, returning the reach count. Used when Relay A doesn't
know which peer has the target fingerprint.
- `send_signal_to_peer(fp, msg)` — targeted send for the reply
path where the registry already knows which peer relay to
hit.
Plus a new `cross_relay_signal_tx: Mutex<Option<Sender<...>>>`
field that `set_cross_relay_tx()` wires at startup so the
federation `handle_signal` can push unwrapped inner messages
into the main signal dispatcher.
## Federation handle_signal (wzp-relay)
New match arm for `FederatedSignalForward`:
- Loop prevention: drops forwards whose `origin_relay_fp` equals
this relay's own fp (prevents A→B→A echo loops without needing
TTL yet).
- Otherwise pulls the inner message out and pushes it through
`cross_relay_signal_tx` so the main loop's dispatcher task
handles it as if it had arrived locally.
## Main signal loop (wzp-relay)
### DirectCallOffer when target not local
Before falling through to Hangup, try the federation path:
- Wrap the offer in `FederatedSignalForward` with
`origin_relay_fp = this relay's tls_fp`
- `fm.broadcast_signal(forward)` — returns peer count
- If any peers reached, stash the call in local registry with
`caller_reflexive_addr` set, `peer_relay_fp` still None
(broadcast — the answer-side will identify itself when it
replies)
- Send `CallRinging` to caller immediately for UX feedback
- Only if no federation or no peers → legacy Hangup path
### DirectCallAnswer when peer is remote
- Registry lookup now reads both `peer_fingerprint` and
`peer_relay_fp` in one acquisition
- If `peer_relay_fp.is_some()`:
* Reject → forward a `Hangup` over federation via
`send_signal_to_peer` instead of local signal_hub
* Accept → wrap the raw answer in `FederatedSignalForward`,
route to the specific origin peer, then emit the LOCAL
CallSetup to our callee with `peer_direct_addr =
caller_reflexive_addr` (caller is remote; this side only
has the callee)
- If `peer_relay_fp.is_none()` → existing Phase 3 same-relay
path with both CallSetups (caller + callee)
### Cross-relay signal dispatcher task
New long-running task reading `(inner, origin_relay_fp)` from
`cross_relay_rx`. In Phase 4 MVP handles:
- `DirectCallOffer` — if target is local, create the call in
the registry with `peer_relay_fp = origin_relay_fp`, stash
caller addr, deliver offer to local callee. If target isn't
local, drop (no multi-hop in Phase 4 MVP).
- `DirectCallAnswer` — look up local caller by call_id, stash
callee addr, forward raw answer to local caller via
signal_hub, emit local CallSetup with `peer_direct_addr =
callee_reflexive_addr` (peer is local now; this side only
has the caller).
- `CallRinging` — best-effort forward to local caller for UX.
- `Hangup` — logged for now; Phase 4.1 will target by call_id.
## Integration tests
`crates/wzp-relay/tests/cross_relay_direct_call.rs` — 3 tests
that reproduce the main.rs cross-relay dispatcher logic inline
and assert the invariants without spinning up real binaries:
1. `cross_relay_offer_forwards_and_stashes_peer_relay_fp` —
Relay A gets Alice's offer, broadcasts. Relay B's dispatcher
creates the call with `peer_relay_fp = relay_a_tls_fp`.
2. `cross_relay_answer_crosswires_peer_direct_addrs` — full
round trip; both CallSetups (one on each relay) carry the
OTHER party's reflex addr.
3. `cross_relay_loop_prevention_drops_self_sourced_forward` —
explicit loop-prevention check.
Full workspace test goes from 413 → 419 passing. Clippy clean
on touched files.
## Non-goals (deferred to Phase 4.1+)
- Relay-mediated media fallback across federation — if P2P
direct fails (symmetric NAT on either side), the call errors
out with "no media path". Making the existing federation
media pipeline carry ephemeral call-<id> rooms is the Phase
4.1 lift.
- Multi-hop federation (A → B → C). Phase 4 MVP supports a
direct federation link between A and B only.
- Fingerprint → peer-relay routing gossip.
PRD: .taskmaster/docs/prd_phase4_cross_relay_p2p.txt
Tasks: 70-78 all completed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes the signal-plane plumbing for P2P direct calling: both
peers now learn their own server-reflexive address (Phase 1
Reflect), include it in DirectCallOffer / DirectCallAnswer, and
the relay cross-wires them into each side's CallSetup so the
client knows the OTHER party's direct addr. Dual-path QUIC race
is scaffolded but deferred to Phase 3.5 — this commit ships the
full advertising layer so real-hardware testing can confirm the
addrs flow end-to-end before adding the concurrent-connect logic.
Wire protocol (wzp-proto/src/packet.rs):
- DirectCallOffer gains optional `caller_reflexive_addr`
- DirectCallAnswer gains optional `callee_reflexive_addr`
- CallSetup gains optional `peer_direct_addr`
- All #[serde(default, skip_serializing_if = "Option::is_none")] so
pre-Phase-3 peers and relays stay backward compatible by
construction — the new fields are elided from the JSON on the
wire when None, and older clients parse the JSON ignoring any
fields they don't know.
- 2 new roundtrip tests (Some + None cases, old-JSON parse-back).
Call registry (wzp-relay/src/call_registry.rs):
- DirectCall gains caller_reflexive_addr + callee_reflexive_addr.
- set_caller_reflexive_addr / set_callee_reflexive_addr setters.
- 2 new unit tests: stores and returns addrs, clearing works.
Relay cross-wiring (wzp-relay/src/main.rs):
- On DirectCallOffer: stash the caller's addr in the registry.
- On DirectCallAnswer: stash the callee's addr (only set by
AcceptTrusted answers — privacy-mode leaves it None).
- Send two different CallSetup messages: one to the caller with
peer_direct_addr=callee_addr, and one to the callee with
peer_direct_addr=caller_addr. The cross-wiring means each side
gets the OTHER party's direct addr, not its own.
- Logs `p2p_viable=true` when both sides advertised.
Client advertising (desktop/src-tauri/src/lib.rs):
- New `try_reflect_own_addr` helper that reuses the Phase 1
oneshot pattern WITHOUT holding state.signal.lock() across the
await (critical: the recv loop reacquires the same mutex to
fire the oneshot, so holding it would deadlock).
- `place_call` queries reflect first and includes the returned
addr in DirectCallOffer. Falls back to None on any failure —
call still proceeds via the relay path.
- `answer_call` queries reflect ONLY on AcceptTrusted so
AcceptGeneric keeps the callee's IP private by design. Reject
and AcceptGeneric both pass None.
- recv loop's CallSetup handler destructures and forwards
peer_direct_addr to the JS layer in the signal-event payload.
Client scaffolding for dual-path (desktop/src-tauri/src/lib.rs +
desktop/src/main.ts):
- `connect` Tauri command gets a new optional `peer_direct_addr`
argument. Currently LOGS the addr but still uses the relay
path for the media connection — Phase 3.5 will swap in a
tokio::select! race between direct dial + relay dial. Scaffolding
lands here so the JS wire is stable, real-hardware testing can
confirm advertising works end-to-end, and Phase 3.5 is a pure
Rust change with no JS touches.
- JS setup handler forwards `data.peer_direct_addr` to invoke.
Back-compat with the CLI client (crates/wzp-client/src/cli.rs):
- CLI test harness updated for the new fields — always passes
None for both reflex addrs (no hole-punching). Also destructures
peer_direct_addr: _ in its CallSetup handler.
Tests (8 new, all passing):
- wzp-proto: hole_punching_optional_fields_roundtrip,
hole_punching_backward_compat_old_json_parses
- wzp-relay call_registry: call_registry_stores_reflexive_addrs,
call_registry_clearing_reflex_addr_works
- wzp-relay integration: crates/wzp-relay/tests/hole_punching.rs
* both_peers_advertise_reflex_addrs_cross_wire_in_setup
* privacy_mode_answer_omits_callee_addr_from_setup
* pre_phase3_caller_leaves_both_setups_relay_only
* neither_peer_advertises_both_setups_are_relay_only
Full workspace test goes from 396 → 404 passing.
PRD: .taskmaster/docs/prd_hole_punching.txt
Tasks: 53-60 all completed (58 = scaffolding-only; 3.5 follow-up)
Next up: **Phase 3.5 — dual-path QUIC connect race**. With the
advertising layer live, this becomes a focused change: on
CallSetup-with-peer_direct_addr, start a server-capable dual
endpoint, and tokio::select! across (direct dial, relay dial,
inbound accept). Whichever QUIC handshake completes first wins,
the losers drop, 2s direct timeout falls back to relay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lets a client ask its registered relay "what IP:port do you see for
me?" over the existing TLS-authenticated signal channel, returning
the client's server-reflexive address as a SocketAddr. Replaces the
need for a classic STUN deployment and becomes the bootstrap step
for future P2P hole-punching: once both peers know their own reflex
addrs, they can advertise them in DirectCallOffer and attempt a
direct QUIC handshake to each other.
Wire protocol (wzp-proto):
- SignalMessage::Reflect — unit variant, client -> relay
- SignalMessage::ReflectResponse { observed_addr: String } — relay -> client
- JSON-serde, appended at end of enum: zero ordinal concerns,
backward compat with pre-Phase-1 relays by construction (older
relays log "unexpected message" and drop; newer clients time out
cleanly within 1s).
Relay handler (wzp-relay/src/main.rs, signal loop):
- New match arm next to Ping reuses the already-bound `addr` from
connection.remote_address() and replies with observed_addr as a
string. debug!-level log on success, warn!-level on send failure.
Client side (desktop/src-tauri/src/lib.rs):
- SignalState gains pending_reflect: Option<oneshot::Sender<SocketAddr>>.
- get_reflected_address Tauri command installs the oneshot before
sending Reflect and awaits it with a 1s timeout; cleans up on
every exit path (send failure, timeout, parse error).
- recv loop's new ReflectResponse arm fires the pending sender or
emits a debug log for unsolicited responses — never crashes the
loop on malformed input.
- Integrated into invoke_handler! alongside the other signal
commands.
UI (desktop/index.html + src/main.ts):
- New "Network" section in settings panel with a "Detect" button
that displays the reflected address or a categorized warning
("register first" / "relay does not support reflection" / error).
Tests (crates/wzp-relay/tests/reflect.rs — 3 new, all passing):
- reflect_happy_path: client on loopback gets back 127.0.0.1:<its own port>
- reflect_two_clients_distinct_ports: two concurrent clients see
their own distinct ports, proving per-connection remote_address
- reflect_old_relay_times_out: mock relay that ignores Reflect —
client times out between 1000-1200ms and does not hang
Also pre-existing test bit-rot unrelated to this PR — fixed so the
full workspace `cargo test` goes green:
- handshake_integration tests in wzp-client, wzp-relay and
featherchat_compat in wzp-crypto all missed the `alias` field
addition to CallOffer and the 3-arg form of perform_handshake
plus 4-tuple return of accept_handshake. Updated to the current
API surface.
Results:
cargo test --workspace --exclude wzp-android: 386 passed
cargo check --workspace: clean
cargo clippy: no new warnings in touched files
Verification excludes wzp-android because it's dead code on this
branch (Tauri mobile uses wzp-native instead) and can't link -llog
on macOS host — unchanged status quo.
PRD: .taskmaster/docs/prd_reflect_over_quic.txt
Tasks: 39-46 all completed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 4 lays the telemetry foundation for distinguishing DRED recoveries
from classical PLC in production: a new SignalMessage variant, two new
per-session Prometheus counters on the relay side, and a highlighted
loss-recovery section in the Android DebugReporter.
The periodic emitter (client → relay) and Grafana panel are deferred to
Phase 4b — this commit ships the protocol surface, the relay sink, and
the immediate user-visible debug output. Once 4b lands the full path
(emitter → relay → Prometheus → Grafana), the metrics here will
automatically start receiving data.
Scope decision — why not extend QualityReport instead:
The existing wire-format QualityReport is a fixed 4-byte media packet
trailer. Adding counter fields to it would shift the binary layout and
break backward compatibility (old receivers would parse the last 4
bytes of the extended trailer as QR, corrupting audio). Using a
new SignalMessage variant on the reliable QUIC signal stream sidesteps
the wire-format problem entirely — serde JSON enums tolerate unknown
variants gracefully on old receivers, and the signal channel is the
right layer for periodic telemetry aggregates.
Changes:
wzp-proto/src/packet.rs:
- New SignalMessage::LossRecoveryUpdate variant carrying:
* dred_reconstructions: u64 (monotonic since call start)
* classical_plc_invocations: u64 (monotonic)
* frames_decoded: u64 (for rate calculation)
- All three fields tagged #[serde(default)] for forward compat.
wzp-client/src/featherchat.rs:
- Added a match arm so signal_to_call_type() handles the new
variant (treat as Offer for featherChat bridging purposes).
wzp-relay/src/metrics.rs:
- Two new IntCounterVec metrics on the relay, labeled by session_id:
* wzp_relay_session_dred_reconstructions_total
* wzp_relay_session_classical_plc_total
- New method update_session_loss_recovery(session_id, dred, plc)
applies monotonic deltas: if the incoming totals exceed the
current counter, the difference is inc_by'd. If the incoming
totals are LOWER (client restart or counter reset), the
Prometheus counter holds steady until the client catches up.
This matches the existing update_session_buffer delta pattern.
- remove_session_metrics() now cleans up the two new labels.
- New test session_loss_recovery_monotonic_delta exercises:
* initial population (10 DRED, 2 PLC)
* forward advance (25, 5 → delta +15, +3)
* lower values ignored (client reset → counters unchanged)
* client catches up (30, 8 → advances to new max)
- Existing session_metrics_cleanup test extended to cover the
new counters.
android/app/src/main/java/com/wzp/debug/DebugReporter.kt:
- Phase 4 users — and incident responders — need to quickly see
whether DRED is actually firing during a call. The stats JSON
already carries the counters (after Phase 3c), but they were
buried in the trailing JSON dump. Added a dedicated
"=== Loss Recovery ===" section to the meta preamble that
extracts dred_reconstructions, classical_plc_invocations,
frames_decoded, and fec_recovered from the JSON and displays
them plainly, plus computed percentages when frames_decoded > 0.
- New extractLongField helper: tiny hand-rolled JSON integer
extractor. We don't want to pull in a full JSON parser for this
single use case and CallStats has a flat, well-known schema.
Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-proto --lib: 63 passing
- cargo test -p wzp-codec --lib: 68 passing
- cargo test -p wzp-client --lib: 35 passing (+1 ignored probe)
- cargo test -p wzp-relay --lib: 68 passing (+1 new Phase 4 test)
- cargo check -p wzp-android --lib: zero errors
- Android APK build verified earlier today (unridden-alfonso.apk
via the remote Docker builder) — Phase 0–3c confirmed to compile
end-to-end on the NDK target.
Phase 4b remaining (not blocking this commit):
- Periodic LossRecoveryUpdate emitter in wzp-client/src/call.rs and
wzp-android/src/engine.rs (every ~5 s)
- Relay-side handler in main.rs that matches the new variant and
calls metrics.update_session_loss_recovery
- Grafana "Loss recovery breakdown" panel in docs/grafana-dashboard.json
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Derive a 4-digit code from the shared DH secret via HKDF with label
"warzone-sas-code". Both peers compute the same code; a MITM relay
produces a different one. Users compare verbally during the call.
- CryptoSession::sas_code() -> Option<u32> on the trait
- ChaChaSession stores and returns the SAS
- HKDF derivation in WarzoneKeyExchange::derive_session()
- Tests: both peers match, MITM produces different code
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New feature: call someone directly by fingerprint through the relay.
- Client connects with SNI "_signal" for persistent signaling
- RegisterPresence/RegisterPresenceAck for relay registration
- DirectCallOffer routed to target by fingerprint
- DirectCallAnswer with AcceptGeneric/AcceptTrusted/Reject modes
- Relay creates private room (call-{id}), sends CallSetup to both
- Both clients connect to private room for media (existing SFU path)
- Hangup forwarding + cleanup on disconnect
- Desktop CLI: --signal + --call <fingerprint> for testing
- CallRegistry tracks call state (Pending/Ringing/Active/Ended)
- SignalHub manages persistent signaling connections
Tested: Alice calls Bob by fingerprint, relay routes offer, Bob
auto-accepts, both join private room, media flows bidirectionally.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Time-based dedup (2s TTL) replaces fixed-window dedup — consecutive
senders with same seq numbers no longer collide
- Raw byte forwarding for federation local delivery (no re-serialization)
- Jitter buffer resets on large backward seq jumps (>100)
- recv_media skips malformed datagrams instead of returning connection-closed
- SIGTERM handler for clean QUIC shutdown on wzp-client
- JSONL event log infrastructure (--event-log flag) for protocol analysis
- FEC disabled on GOOD profile for federation debugging (fec_ratio=0.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RoomParticipant.relay_label identifies which relay a participant is
connected to. Local participants have None, federated participants
get tagged with the peer relay's label when storing remote_participants.
This enables clients to group participants by relay in the UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GlobalRoomActive signal now carries participant list from the
announcing relay. When received, the relay:
1. Stores remote participants per peer link
2. Broadcasts merged RoomUpdate to local clients (local + all remote)
This means clients on different relays can now SEE each other in the
participant list. Also fixes build: removed non-existent metric field
references that were added by linter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major rewrite of relay federation replacing virtual participants with
a clean router model:
1. Global rooms: [[global_rooms]] in TOML config declares rooms that
are bridged across federation. Each relay is a router + local SFU.
2. Room events: RoomManager emits LocalJoin/LocalLeave via broadcast
channel when rooms transition between empty and non-empty.
3. GlobalRoomActive/Inactive signals: relays announce when they have
local participants in global rooms. Peers track active state and
forward media accordingly. Announcements propagate for multi-hop.
4. Media forwarding: separated from SFU loop. Local participant sends
via mpsc channel → egress task → forward_to_peers() → room-hash
tagged datagrams to active peer links. Inbound datagrams delivered
to local participants + forwarded to other active peers (multi-hop).
5. Loop prevention: don't forward back to source relay.
6. Room name hashing: is_global_room() checks both plain name and
hash (clients hash room names for SNI privacy).
Removed: ParticipantSender::Federation, federated_participants, virtual
participant join/leave, periodic room polling. Rooms now only contain
local participants.
Signaling tested: 3-relay chain (A→B←C) correctly propagates
GlobalRoomActive through B to both A and C. Media forwarding plumbing
in place but needs final debugging.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added [[trusted]] config: relay B can accept inbound federation
from relay A by fingerprint alone, without knowing A's address.
A connects to B with [[peers]], B trusts A with [[trusted]].
- FederationHello signal: outbound connections send their TLS
fingerprint as first signal. The accepting relay verifies it
against [[peers]] (by IP) or [[trusted]] (by fingerprint).
- Tested 3-relay chain: A→B←C. Both A and C connect to B, B trusts
both. B correctly accepts both inbound connections. Room
announcements flow A→B and C→B.
- Remaining: B needs to announce rooms back to A and C on the same
connection so media can flow A→B→C. Currently A has no virtual
participant for B, so media doesn't reach B's SFU for forwarding.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1 of relay federation:
1. Signal messages: FederationRoomJoin/Leave/ParticipantUpdate added
to SignalMessage enum for relay-to-relay room coordination.
2. Room changes: ParticipantOrigin (Local/Federated) tracking, loop
prevention (federated media only forwards to local participants),
ParticipantSender::Federation with 8-byte room-hash prefixed
datagrams, merged participant lists (local + remote), new methods:
join_federated(), update_federated_participants(), local_senders(),
active_rooms(), local_participants().
3. FederationManager: connects to configured peers via QUIC with SNI
"_federation", reconnects with exponential backoff (5s-300s),
exchanges FederationRoomJoin signals, runs recv loops for both
signals and media datagrams, creates virtual participants in rooms.
4. Accept-side: _federation SNI handling in main.rs, unknown peer
gets helpful "add to relay.toml" log message, recognized peers
handed off to FederationManager.
TODO: TLS fingerprint verification — currently outbound connections
use client_config() which doesn't present a cert, so inbound
verification fails. Need mutual TLS or URL-based peer matching.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Wire protocol: add Opus 32k/48k/64k (CodecId 6/7/8) + STUDIO
profiles with is_opus() helper. Opus enc/dec accept all Opus variants.
2. JNI bridge: expand profile_from_int to 7 levels (0-6) mapping to
GOOD, DEGRADED, CATASTROPHIC, Codec2_3200, STUDIO_32K/48K/64K.
3. Settings UI: replace radio buttons with Material3 Slider — 7 stops
from Studio 64k (green) to Codec2 1.2k (dark red), matching desktop.
4. Key-change warning: AlertDialog on connect when server fingerprint
has changed. Shows old vs new fingerprint, Accept New Key or Cancel.
Accepting saves the new fingerprint and proceeds with the call.
5. Engine recv: handle studio codec IDs in auto-switch path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add SettingsScreen with identity (alias, key backup/restore), audio defaults,
server management, network prefs, and default room
- SettingsRepository persists all settings via SharedPreferences
- Auto-generate random display names on first launch (e.g. "Swift Wolf")
- Thread alias through CallOffer → relay handshake → RoomUpdate broadcast
- Derive caller fingerprint from identity key in relay handshake (fixes null
fingerprints when --auth-url is not set)
- Persist identity seed for stable fingerprints across reconnects
- Add alias field to SignalMessage::CallOffer (serde default for backward compat)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add RoomUpdate signal message to wzp-proto with participant count + list
- Add RoomParticipant struct (fingerprint + optional alias)
- Store fingerprint/alias in relay Participant struct
- Broadcast RoomUpdate to all room members on join and leave
- Add signal recv task in Android engine to handle RoomUpdate
- Surface room_participant_count + room_participants in CallStats JSON
- Show "X in room" with participant names in Android in-call UI
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New wzp-android crate with Oboe C++ backend, lock-free SPSC ring buffers,
engine orchestrator, codec pipeline, and Android Gradle project structure
- AEC (NLMS adaptive filter), AGC (two-stage with fast attack/slow release),
windowed-sinc FIR resampler replacing linear interpolation (wzp-codec)
- Opus encoder tuning: complexity 7 default, set_expected_loss support
- Mobile jitter buffer: asymmetric EMA (fast up/slow down), handoff spike
detection with 2s cooldown, configurable safety margin
- Network-aware quality control: cellular-specific thresholds, faster
downgrade on cellular, proactive tier drop on WiFi→cellular handoff,
FEC ratio boost during network transitions
- Handoff detection in PathMonitor via RTT jitter spike analysis
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RelayLink: QUIC connection to peer relay (SNI "_relay") for forwarding
specific sessions. Methods: connect, forward, add/remove_session, is_idle.
RelayLinkManager: manages connections to multiple peers.
- get_or_connect: lazy connection establishment
- forward_to: send media packet to specific peer
- register/unregister_session: track which sessions use which links
- Auto-closes idle links on session unregister
Protocol: added SignalMessage::SessionForward { session_id,
target_fingerprint, source_relay } and SessionForwardAck { session_id,
room_name } for relay-link session setup signaling.
Building block for P3-T7 (call setup over mesh) which wires
route resolution + relay links + handshake into a complete flow.
62 relay tests + 42 proto tests passing (7 new relay_link tests).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RouteResolver queries PresenceRegistry to determine how to reach a target:
- Route::Local — connected to this relay
- Route::DirectPeer(addr) — on a directly connected peer relay
- Route::Chain(addrs) — multi-hop (structure ready, single-hop for now)
- Route::NotFound — not in any known relay
Protocol: added SignalMessage::RouteQuery { fingerprint, ttl } and
RouteResponse { fingerprint, found, relay_chain } for peer-to-peer
route queries over probe connections.
HTTP API: GET /route/:fingerprint returns JSON with route type + chain.
Relay handles incoming RouteQuery on probe connections: looks up locally,
replies with RouteResponse. TTL decremented for future multi-hop forwarding.
55 relay tests + 42 proto tests passing (7 new route tests).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PresenceRegistry tracks who is connected where:
- register_local/unregister_local for directly connected users
- update_peer for fingerprints reported by peer relays
- lookup returns Local or Remote(addr)
- expire_stale removes entries older than timeout
Gossip via probe connections:
- New SignalMessage::PresenceUpdate { fingerprints, relay_addr }
- Probes send local fingerprints every 10s alongside Ping/Pong
- Receiving relay updates its remote presence table
HTTP API on metrics port:
- GET /presence — all known fingerprints + locations
- GET /presence/:fingerprint — single lookup
- GET /peers — peer relays + their connected users
Wired into relay main:
- Registry created at startup
- register_local after auth+handshake
- unregister_local on disconnect
- Passed to probe mesh and metrics server
Also marks FC-10 as DONE in integration tracker.
48 relay tests + 42 proto tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
T6 wiring: Trunking in relay hot path
- TrunkedForwarder wraps transport with TrunkBatcher
- run_participant uses 5ms flush timer when trunking enabled
- send_trunk/recv_trunk on QuinnTransport
- --trunking flag on relay config
- 2 new tests: forwarder batches, auto-flush on full
T7 wiring: Mini-frames in encoder/decoder
- MediaPacket::encode_compact/decode_compact with MiniFrameContext
- CallEncoder sends mini-headers for consecutive frames (full every 50th)
- CallDecoder auto-detects full vs mini on receive
- mini_frames_enabled in CallConfig (default true)
- 3 new tests: encode/decode sequence, periodic full, disabled mode
Noise suppression (nnnoiseless/RNNoise)
- NoiseSupressor in wzp-codec: pure Rust ML-based noise removal
- Processes 960-sample frames as two 480-sample halves
- Integrated in CallEncoder before silence detection
- noise_suppression in CallConfig (default true)
- 4 new tests: creation, processing, SNR improvement, passthrough
T1-S4: Adaptive playout delay
- AdaptivePlayoutDelay: EMA-based jitter tracking (NetEq-inspired)
- Computes target_delay from observed inter-arrival jitter
- JitterBuffer::new_adaptive() uses adaptive delay
- adaptive_jitter in CallConfig (default true)
- 5 new tests: stable, jitter increase, recovery, clamping, estimate
272 tests passing across all crates.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WZP-S-4: Room access control
- hash_room_name() in wzp-crypto: SHA-256("featherchat-group:"+name)[:16]
- CLI --room flag hashes before SNI, web bridge does the same
- RoomManager gains ACL: with_acl(), allow(), is_authorized()
- join() returns Result, rejects unauthorized fingerprints
WZP-S-5: Crypto handshake wired into all live paths
- CLI: perform_handshake() after connect, before any mode
- Relay: accept_handshake() after auth, before room join
- Web bridge: perform_handshake() after auth, before audio
- Relay generates ephemeral identity at startup
WZP-S-6: Web bridge featherChat auth
- --auth-url flag: browsers send {"type":"auth","token":"..."} as first WS msg
- Validates against featherChat, passes token to relay
- --cert/--key flags for production TLS (replaces self-signed)
WZP-S-7: wzp-proto standalone
- Cargo.toml uses explicit versions (no workspace inheritance)
- FC can use as git dependency
WZP-S-9: All 6 hardcoded assumptions resolved
- Auth, hashed rooms, mandatory handshake, real TLS certs,
profile negotiation, token validation
CLI also gains --room and --token flags.
179 tests passing across all crates.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WZP-S-2: Relay token authentication
- New --auth-url flag: relay calls POST {url} with bearer token
- Clients must send SignalMessage::AuthToken as first signal
- Relay validates against featherChat's /v1/auth/validate endpoint
- Rejects unauthenticated clients before they join rooms
- New auth.rs module with validate_token() + tests
WZP-S-3: featherChat signaling bridge
- New featherchat.rs module for CallSignal interop
- WzpCallPayload: wraps SignalMessage + relay_addr + room name
- encode_call_payload/decode_call_payload for JSON serialization
- CallSignalType enum mirrors featherChat's variant
- signal_to_call_type maps WZP signals to FC types
Protocol: Added SignalMessage::AuthToken { token } variant
129 tests passing across all crates.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>