main
104 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1120c7b579 |
feat(signal): PresenceList broadcast for lobby user discovery
New signal infrastructure for the lobby-first UI:
- PresenceUser struct: { fingerprint, alias }
- SignalMessage::PresenceList: relay broadcasts full user list
to all signal clients on every register/deregister
- SignalHub::presence_list(): builds the list from connected clients
- SignalHub::broadcast(): sends to ALL signal clients
- Relay calls broadcast on register + unregister
- Desktop emits "presence_list" signal-event to JS frontend
This gives clients real-time visibility of who's online via the
signal channel, without needing to join a voice room first.
603 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
bb23976076 |
feat(quality): upgrade negotiation + asymmetric quality signals (#28, #29, #30)
New SignalMessage variants for P2P quality coordination: UpgradeProposal/UpgradeResponse/UpgradeConfirm (#28): - Consensual quality upgrade flow — proposer sends desired profile, peer accepts/rejects based on own conditions, confirm commits both - All carry call_id for relay routing QualityCapability (#30): - Peer reports its max sustainable profile — enables asymmetric encoding where each side uses its own best quality instead of forcing everyone to the weakest link Relay forwards all 4 signals to the call peer (same pattern as MediaPathReport, CandidateUpdate, HardNatProbe). Desktop signal recv loop handles all 4 with debug logging. Encoder switching TODOs noted for wiring into CallEngine. 4 new serde roundtrip tests. 603 total, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
18e5e75f33 |
feat(analyzer): encrypted payload decoding in replay mode (#17)
When --key <64-char-hex> is provided with --replay, the analyzer decrypts each packet's ChaCha20-Poly1305 payload using the session key and logs plaintext frame sizes. Prints first 5 + every 100th decrypt result, and a summary at the end. This completes all 5 protocol analyzer tasks (#13-17): - #13: Observer mode (live passive listener) — was done - #14: TUI with Ratatui (per-participant panels) — was done - #15: Capture and replay (.wzp format) — was done - #16: HTML report (Chart.js loss/jitter graphs) — was done - #17: Encrypted decode (--key for replay) — done now Usage: wzp-analyzer --replay session.wzp --key <64-hex-chars> --html report.html Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
f06f9073ae |
feat(nat): birthday attack module + HardNatBirthdayStart signal (#86, #87)
Birthday attack for random symmetric NATs:
- birthday.rs: open_acceptor_ports() opens N sockets, STUN-probes
each to learn external ports. generate_dialer_targets() builds
hit list (known ports first, then random fill). spray_dialer()
sprays QUIC connects with rate limiting, first success wins.
- Default: 32 acceptor ports, 128 dialer probes, 20ms interval
Signal coordination:
- HardNatBirthdayStart { acceptor_ports, external_ip } sent by
Acceptor when peer's HardNatProbe shows random/sequential NAT
- Relay forwards it like other call signals
- Desktop recv loop handles and logs it
Hybrid waterfall integration:
- On receiving HardNatProbe with non-cone allocation, Acceptor
auto-opens birthday ports and sends BirthdayStart
- Sockets kept alive 10s for NAT mapping persistence
- Dialer spray integration into race() pending (needs transport
hot-swap for background upgrade)
6 new tests, 599 total, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
1de280fe04 |
fix(nat): working NAT tickle + smart filter debug + timeout diags
Fixes from real-world 5G↔Starlink testing: NAT tickle fix: - tokio::net::UdpSocket::bind() doesn't set SO_REUSEADDR, so binding to the same port as quinn silently failed. Now uses socket2::Socket with explicit SO_REUSEADDR + SO_REUSEPORT (via libc on unix). - Tickle now logs success/failure for debugging. Diagnostic fixes: - connect:dual_path_race_start shows both dial_order_raw and dial_order_smart so we can see what filtering removed - Grace-period timeout (relay wins first, direct still running) now fills "timeout:grace" diags for unrecorded candidates - Previously candidate_diags was empty when relay won the race Dependencies: - Added socket2 = "0.5" to wzp-client 593 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
bc6d327ebb |
feat(nat): smart candidate filtering + acceptor NAT tickle + 4s timeout
Major P2P improvements for cross-network calls: Smart candidate filtering (smart_dial_order): - Strip LAN candidates when peer's public IP differs from ours (172.16.x.x is unreachable from a different network) - Strip all IPv6 candidates (Phase 7 disabled, wastes dial slots) - Only keep mapped + reflexive for cross-network calls - LAN candidates preserved when both peers share the same public IP Acceptor NAT tickle: - A-role sends a 1-byte UDP packet to each peer candidate BEFORE accepting. This opens the NAT pinhole for return traffic from the Dialer's IP — critical for address-restricted NATs that only allow inbound from IPs they've seen outbound traffic to. - Uses SO_REUSEADDR on the same port as the quinn endpoint. Direct timeout increased from 2s to 4s: - Cross-network QUIC handshakes through CGNAT can take 2-3s - 2s was too aggressive for 5G/LTE networks Diagnostic fix: - Record "timeout:4s" for candidates still in-flight when the timeout fires (previously these had no diagnostic entry) 5 new tests for smart_dial_order edge cases. 593 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
c0dd6c06ff |
feat(debug): per-candidate dial diagnostics in dual-path race
Added CandidateDiag struct to RaceResult with per-candidate: - address attempted - result (ok / skipped:ipv6 / error:reason) - elapsed time in ms Surfaced in call-debug events: - connect:dual_path_race_start now includes dial_order + peer_mapped - connect:dual_path_race_done now includes candidate_diags array Upgraded dual_path tracing from debug to info for IPv6 skips and dial failures so they appear in logcat/console. Helps diagnose why P2P fails on specific networks (5G CGNAT, address-restricted NATs, etc). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
ec1bdf3cd5 |
feat(nat): hard NAT port allocation detection + prediction + HardNatProbe signal (#29)
Phase A of hard NAT traversal (PRD-hard-nat.md):
- PortAllocation enum: PortPreserving / Sequential{delta} / Random / Unknown
- detect_port_allocation(): sequential STUN probes from single socket,
analyzes port sequence for allocation pattern
- classify_port_allocation(): pure function with jitter tolerance,
wraparound handling, 60% threshold for noisy sequences
- predict_ports(): generates target port range from last_port + delta
- HardNatProbe signal message: carries port_sequence, allocation
pattern, external_ip for peer coordination
- Relay forwards HardNatProbe to call peer
- Netcheck gains port_allocation field + format_report display
588 tests pass (17 new), 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
8fcf1be341 |
feat(nat): Tailscale-inspired STUN/ICE + port mapping + mid-call re-gathering (#28)
Phase 8: 5 new modules bringing NAT traversal close to Tailscale's approach. - stun.rs: RFC 5389 STUN client — public server reflexive discovery, XOR-MAPPED-ADDRESS parsing, parallel probe with retry, STUN fallback in desktop try_reflect_own_addr() - portmap.rs: NAT-PMP (RFC 6886) + PCP (RFC 6887) + UPnP IGD port mapping — gateway discovery, acquire/release/refresh lifecycle, new PeerCandidates.mapped candidate type in dial order - ice_agent.rs: candidate lifecycle — gather(), re_gather(), apply_peer_update() with monotonic generation counter, CandidateUpdate signal message forwarded by relay - netcheck.rs: comprehensive diagnostic — NAT type, IPv4/v6, port mapping availability, relay latencies, CLI --netcheck - relay_map.rs: RTT-sorted relay map, preferred() selection, populate_from_ack() for RegisterPresenceAck.available_relays Relay: CallRegistry stores + cross-wires caller/callee_mapped_addr into CallSetup.peer_mapped_addr. Region config + available_relays populated from federation peers in RegisterPresenceAck. Desktop: place_call/answer_call call acquire_port_mapping() and fill caller/callee_mapped_addr. STUN+relay combined NAT detection. 571 tests pass (66 new), 0 regressions, 0 warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
425c67a08a |
feat(analyzer): replay, HTML report, encrypted decode stub (#15, #16, #17)
#15 - Replay mode: --replay <file.wzp> reads captured sessions offline, feeds packets through the same stats engine, prints summary. CaptureReader mirrors CaptureWriter's binary format. #16 - HTML report: --html <report.html> generates self-contained HTML with Chart.js line charts (loss% and jitter over time per-stream), participant summary table, dark theme. Works with live sessions (after exit) or replay mode. #17 - Encrypted decode: --key <hex> flag accepted and stored. Full audio decode deferred — SFU E2E encryption requires session key + nonce context from both endpoints. Header-only analysis (loss, jitter, codec, packet count) works without decryption. Usage: wzp-analyzer --replay session.wzp --html report.html wzp-analyzer relay:4433 --room test --capture out.wzp --html report.html 372 tests passing, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
88ca3e099a |
feat: wzp-analyzer binary — protocol analyzer with TUI (#13, #14, #15)
New binary: wzp-analyzer joins a room as a passive observer and displays real-time per-participant quality metrics. Features: - Passive observation: connects to relay, receives all media, never sends - Participant detection: identifies senders by sequence number streams - Per-participant stats: packets, loss%, jitter, codec, codec switches - TUI mode (ratatui): color-coded table (green/yellow/red by loss), 10 FPS refresh, session header, quit with q/Ctrl+C - No-TUI mode: prints stats to stderr every 2s (for headless/CI use) - Capture mode: binary .wzp format with microsecond timestamps for offline replay (magic WZP\x01, JSON header, per-packet records) - Session summary on exit Usage: wzp-analyzer 193.180.213.68:4433 --room general wzp-analyzer 193.180.213.68:4433 --room general --no-tui --duration 60 wzp-analyzer 193.180.213.68:4433 --room general --capture session.wzp 372 tests passing, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
1e82811cc1 |
feat(p2p): adaptive quality on direct calls (#23)
P2P calls now adapt codec quality based on observed network conditions, matching what relay calls already had. Three-layer implementation: - QualityReport::from_path_stats(): construct reports from local quinn stats (loss%, RTT, jitter) without needing relay-generated reports - CallEncoder.pending_quality_report: one-shot attachment to next source packet (consumed on encode, not repeated) - Engine send tasks: generate quality report every 50 frames (~1s) from quinn_path_stats() and attach via set_pending_quality_report() - Engine recv tasks: self-observe from own QUIC path stats every 50 packets, feed to AdaptiveQualityController for P2P adaptation (works even if peer isn't sending quality reports yet) Both relay and P2P calls now have adaptive quality. On relay calls, both peer-sent reports AND local observations feed the controller. Hysteresis (3 consecutive bad reports to downgrade) prevents thrashing. 372 tests passing (+4 new: from_path_stats encoding, clamping, zero values, encoder quality report attachment). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
ea5fc17c34 |
fix(relay): debug tap signal logging, dual_path test regression, PRD updates
- Add log_signal() and log_event() to DebugTap for RoomUpdate, QualityDirective, join/leave lifecycle events (task #11) - Fix dual_path.rs Phase 7 regression: add missing ipv6_endpoint arg to 3 race() call sites - Update PRDs to reflect actual implementation status: mark adaptive quality, coordinated codec, P2P, network awareness, protocol analyzer - Update PROGRESS.md with QualityDirective gap and dual_path regression Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
22045bc5e6 |
feat: adaptive quality in desktop, relay quality directive, Oboe state polling
- Wire AdaptiveQualityController into desktop engine send/recv tasks (mirrors Android pattern: AtomicU8 pending_profile, auto-mode check) - Wire same into Android engine send task (was only in recv before) - QualityDirective SignalMessage variant for relay-initiated codec switch - ParticipantQuality tracking in relay RoomManager (per-participant AdaptiveQualityController, weakest-link tier computation) - Relay broadcasts QualityDirective to all participants when room-wide tier degrades (coordinated codec switching) - Oboe stream state polling: poll getState() for up to 2s after requestStart() to ensure both streams reach Started before proceeding (fixes intermittent silent calls on cold start, Nothing Phone A059) Tasks: #7, #25, #26, #31, #35 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
766c9df442 |
feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window
- DredTuner: maps live network metrics (loss/RTT/jitter) to continuous DRED duration every ~500ms instead of discrete tier-locked values. Includes jitter-spike detection for pre-emptive Starlink-style boost. - Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports) - PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval - TrunkedForwarder respects discovered MTU (was hard-coded 1200) - QuinnPathSnapshot exposes quinn internal stats + discovered MTU - AudioEncoder trait: set_expected_loss() + set_dred_duration() methods - PathMonitor: sliding-window jitter variance for spike detection - Integrated into both Android and desktop send tasks in engine.rs - 14 new tests (10 tuner unit + 4 encoder integration) - Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
a798634b3d |
fix(signal): add call_id to Hangup — prevents stale hangup killing new calls
Root cause: Hangup had no call_id field. The relay forwarded hangups to ALL active calls for a user. When user A hung up call 1 and user B immediately placed call 2, the relay's processing of A's hangup would also kill call 2 (race window ~1-2s). Fix: add optional call_id to Hangup (backwards-compatible via serde skip_serializing_if). When present, the relay only ends the named call. Old clients send call_id=None and get the legacy broadcast behavior. Also: clear pending_path_report in Hangup recv handler and internal_deregister to prevent stale oneshot channels from blocking subsequent call setups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
29cd23fe39 |
fix(p2p): connection cleanup — 4 fixes for stale/dead connections
PRD 4: Disable IPv6 direct dial/accept temporarily. IPv6 QUIC
handshakes succeed but connections die immediately on datagram
send ("connection lost"). IPv4 candidates work reliably. IPv6
candidates still gathered but filtered at dial time.
PRD 1: Close losing transport after Phase 6 negotiation. The
non-selected transport now gets an explicit QUIC close frame
instead of silently dropping after 30s idle timeout. Prevents
phantom connections from polluting future accept() calls.
PRD 2: Harden accept loop with max 3 stale retries. Stale
connections are explicitly closed (conn.close) and counted.
After 3 stale connections, the accept loop aborts instead of
spinning until the race timeout.
PRD 3: Resource cleanup — close old IPv6 endpoint before
creating a new one in place_call/answer_call. Add Drop impl
to CallEngine so tasks are signalled to stop on ungraceful
shutdown.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
002df15c5e |
fix(cli): add .. rest pattern for RegisterPresenceAck error arm
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
1904b19d05 |
fix(direct): validate A-role accepted connection, skip stale ones
The Acceptor's accept() on the shared signal endpoint can dequeue a stale QUIC connection from a previous call that the Dialer has already dropped. This results in "connection lost" errors when media datagrams are sent — 100% drops on both sides. Fix: after accepting a connection, check close_reason(). If the connection is already closed, log a warning and re-accept. Also verify max_datagram_size() is available before returning. Additionally: emit transport details (remote addr, max_datagram, close_reason) in the call_engine_starting debug event so stale connection issues are visible in the user-facing debug log. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
40955bd11c |
debug(media): add connection diagnostics for direct P2P drops
When direct P2P calls show 100% datagram drops, we need to know
WHY send_media() fails. This commit adds:
- Remote address + stable_id logging on A-role accept and D-role
dial success (dual_path.rs) — tells us which candidate won
- Remote address + max_datagram_size on engine transport init —
verifies datagrams are negotiated
- last_send_err in send heartbeat — captures the actual error
from send_datagram() failures
- QuinnTransport::remote_address() helper
Also fixes UI badge: was looking for wrong event name
("dual_path_race_won" → "path_negotiated").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
0b62d3e22f |
fix(cli): add missing build_version fields to Offer/Answer
CLI binary was missing the new caller_build_version and callee_build_version fields, causing E0063 compile errors on Linux relay/client builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
c2d298beb5 |
feat(net): Phase 7 — dual-socket IPv4+IPv6 ICE
Adds a dedicated IPv6 QUIC endpoint (IPV6_V6ONLY=1 via socket2) alongside the existing IPv4 signal endpoint for proper dual-stack P2P connectivity. Previous [::]:0 dual-stack attempt broke IPv4 on Android; this uses separate sockets per address family like WebRTC/libwebrtc. - create_ipv6_endpoint(): socket2-based IPv6-only UDP socket, tries same port as IPv4 signal EP, falls back to ephemeral - local_host_candidates(v4_port, v6_port): now gathers IPv6 global-unicast (2000::/3) and unique-local (fc00::/7) addrs - dual_path::race(): A-role accepts on both v4+v6 via select!, D-role routes each candidate to matching-AF endpoint - Graceful fallback: if IPv6 unavailable, .ok() → None → pure IPv4 behavior identical to pre-Phase-7 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
aee41a638d |
fix(audio+net): revert dual-stack [::]:0, add Oboe playout stall auto-restart
Two fixes: ## Revert [::]:0 dual-stack sockets → back to 0.0.0.0:0 Android's IPV6_V6ONLY=1 default on some kernels (confirmed on Nothing Phone) makes [::]:0 IPv6-only, silently killing ALL IPv4 traffic. This broke P2P direct calls: IPv4 LAN candidates (172.16.81.x) couldn't complete QUIC handshakes through the IPv6-only socket, causing local_direct_ok=false and relay fallback on every call after the first. Reverted all bind sites to 0.0.0.0:0 (reliable IPv4). IPv6 host candidates are disabled in local_host_candidates() until a proper dual-socket approach (one IPv4 + one IPv6 endpoint, Phase 7) is implemented. ## Fix A (task #35): Oboe playout callback stall auto-restart The Nothing Phone's Oboe playout callback fires once (cb#0) and then stops draining the ring on ~50% of cold-launch calls. Fix D+C (stop+prime from previous commit) didn't help because audio_stop is a no-op on cold launch. New approach: self-healing watchdog in audio_write_playout. Tracks the playout ring's read_idx across writes. If read_idx hasn't advanced in 50 consecutive writes (~1 second), the Oboe playout callback has stopped: 1. Log "playout STALL detected" 2. Call wzp_oboe_stop() to tear down the stuck streams 3. Clear both ring buffers (prevent stale data reads) 4. Call wzp_oboe_start() to rebuild fresh streams 5. Log success/failure 6. Return 0 (caller retries on next frame) This is the same teardown+rebuild that "rejoin" does — but triggered automatically from the first stalled call instead of requiring the user to hang up and redial. The watchdog runs on every write so it fires within 1s of the stall starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
9fb92967eb |
fix(net): bind all endpoints to [::]:0 for dual-stack IPv4+IPv6
Every QUIC endpoint was bound to 0.0.0.0:0 (IPv4-only). This silently killed ALL IPv6 host candidates: the Dialer couldn't send packets to [2a0d:...] addresses (wrong address family on the socket), and the Acceptor couldn't receive incoming IPv6 QUIC handshakes. The IPv6 candidates were gathered and advertised in DirectCallOffer/Answer but were completely non-functional. On same-LAN with dual-stack (which both test phones have), this meant: - JoinSet fanned out 3+ candidates (2× IPv6 + 1× IPv4) - IPv6 dials failed silently or timed out - IPv4 dial worked but competed with failed IPv6 for JoinSet attention - Sometimes the JoinSet returned an IPv6 failure before the IPv4 success, causing unnecessary fallback to relay Fix: bind to [::]:0 (IPv6 any) instead of 0.0.0.0:0. On dual-stack systems (Linux/Android default), [::]:0 creates a socket that handles BOTH: - IPv6 natively (global unicast, ULA) - IPv4 via v4-mapped addresses (::ffff:172.16.81.x) One socket, both protocols. All 7 bind sites updated: - register_signal (signal endpoint) - do_register_signal - ping_relay - probe_reflect_addr (fresh endpoint fallback) - dual_path::race (A-role fresh, D-role fresh, relay fresh) With this fix, same-LAN P2P should prefer the IPv6 path (no NAT, direct routing, lower latency) and fall through to IPv4 if IPv6 fails — relay is the last resort after ALL candidates are exhausted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
f5542ef822 |
feat(p2p): Phase 6 — ICE-style path negotiation
Before Phase 6, each side's dual-path race ran independently and
committed to whichever transport completed first. When one side
picked Direct and the other picked Relay, they sent media to
different places — TX > 0 RX: 0 on both, completely silent call.
Phase 6 adds a negotiation step: after the local race completes,
each side sends a MediaPathReport { call_id, direct_ok, winner }
to the peer through the relay. Both wait for the other's report
before committing a transport to the CallEngine. The decision
rule is simple: if BOTH report direct_ok = true, use direct; if
EITHER reports false, BOTH use relay.
## Wire protocol
New `SignalMessage::MediaPathReport { call_id, direct_ok,
race_winner }`. The relay forwards it to the call peer via the
same signal_hub routing used for DirectCallOffer/Answer. The
cross-relay dispatcher also forwards it.
## dual_path::race restructured
Returns `RaceResult` instead of `(Arc<QuinnTransport>, WinningPath)`:
- `direct_transport: Option<Arc<QuinnTransport>>`
- `relay_transport: Option<Arc<QuinnTransport>>`
- `local_winner: WinningPath`
Both paths are run as spawned tasks. After the first completes,
a 1s grace period lets the loser also finish. The connect
command gets BOTH transports (when available) and picks the
right one based on the negotiation outcome. The unused transport
is dropped.
## connect command flow (revised)
1. Run race() → RaceResult with both transports
2. Send MediaPathReport to relay with our direct_ok
3. Install oneshot; wait for peer's report (3s timeout)
4. Decision: both direct_ok → use direct; else → use relay
5. Start CallEngine with the agreed transport
If the peer never responds (old build, timeout), falls back to
relay — backward compatible.
## Relay forwarding
MediaPathReport is forwarded like DirectCallOffer/Answer: via
signal_hub.send_to(peer_fp) for same-relay calls, and via
cross-relay dispatcher for federated calls.
## Debug log events
- `connect:dual_path_race_done` — local race result
- `connect:path_report_sent` — our report to the peer
- `connect:peer_report_received` — peer's report
- `connect:peer_report_timeout` — peer didn't respond (3s)
- `connect:path_negotiated` — final agreed path with reasons
Full workspace test: 423 passing (no regressions).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
16793be36f |
fix(p2p): Phase 5.6 — direct-path head start + hangup propagation + media debug events
Three fixes from a field-test log where same-LAN calls were
still losing the dual-path race to the relay path, peers were
getting stuck on an empty call screen when the other side
hung up, and 1-way audio was hard to diagnose because the
GUI debug log had no media-level events.
## 1. Direct-path 500ms head start (dual_path.rs)
The race was resolving in ~105ms with Relay winning even when
both phones were on the same MikroTik LAN with valid IPv6 host
candidates. Root cause: the relay dial is a plain outbound QUIC
connect that completes in whatever the client→relay RTT is
(~100ms), while the direct path needs the PEER to also process
its CallSetup, spin up its own race, and complete at least one
LAN dial back to us. That cross-client sequence reliably takes
longer than 100ms, so relay always won.
Fix: delay the relay_fut with `tokio::time::sleep(500ms)` before
starting its connect. Same-LAN direct dials complete in 30-50ms
typically, so the head start gives direct plenty of time to win
cleanly. Users on setups where direct genuinely can't work
(LTE-to-LTE cross-carrier) pay 500ms extra on the relay fallback,
which is invisible for a call setup.
## 2. Hangup propagation via a new hangup_call command (lib.rs + main.ts)
The hangup button was calling `disconnect` which stopped the
local media engine but never sent a SignalMessage::Hangup to
the relay. The peer never got notified and was stuck on the
call screen with silent audio. My earlier fix (commit
|
||
|
|
fa038df057 |
feat(p2p): Phase 5.5 — ICE LAN host candidates (IPv4 + IPv6)
Same-LAN P2P was failing because MikroTik masquerade (like most
consumer NATs) doesn't support NAT hairpinning — the advertised
WAN reflex addr is unreachable from a peer on the same LAN as
the advertiser. Phase 5 got us Cone NAT classification and fixed
the measurement artifact, but same-LAN direct dials still had
nowhere to land.
Phase 5.5 adds ICE-style host candidates: each client enumerates
its LAN-local network interface addresses, includes them in the
DirectCallOffer/Answer alongside the reflex addr, and the
dual-path race fans out to ALL peer candidates in parallel.
Same-LAN peers find each other via their RFC1918 IPv4 + ULA /
global-unicast IPv6 addresses without touching the NAT at all.
Dual-stack IPv6 is in scope from the start — on modern ISPs
(including Starlink) the v6 path often works even when v4
hairpinning doesn't, because there's no NAT on the v6 side.
## Changes
### `wzp_client::reflect::local_host_candidates(port)` (new)
Enumerates network interfaces via `if-addrs` and returns
SocketAddrs paired with the caller's port. Filters:
- IPv4: RFC1918 (10/8, 172.16/12, 192.168/16) + CGNAT (100.64/10)
- IPv6: global unicast (2000::/3) + ULA (fc00::/7)
- Skipped: loopback, link-local (169.254, fe80::), public v4
(already covered by reflex-addr), unspecified
Safe from any thread, one `getifaddrs(3)` syscall.
### Wire protocol (wzp-proto/packet.rs)
Three new `#[serde(default, skip_serializing_if = "Vec::is_empty")]`
fields, backward-compat with pre-5.5 clients/relays by
construction:
- `DirectCallOffer.caller_local_addrs: Vec<String>`
- `DirectCallAnswer.callee_local_addrs: Vec<String>`
- `CallSetup.peer_local_addrs: Vec<String>`
### Call registry (wzp-relay/call_registry.rs)
`DirectCall` gains `caller_local_addrs` + `callee_local_addrs`
Vec<String> fields. New `set_caller_local_addrs` /
`set_callee_local_addrs` setters. Follow the same pattern as
the reflex addr fields.
### Relay cross-wiring (wzp-relay/main.rs)
Both the local-call and cross-relay-federation paths now track
the local_addrs through the registry and inject them into the
CallSetup's peer_local_addrs. Cross-wiring is identical to the
existing peer_direct_addr logic — each party's CallSetup
carries the OTHER party's LAN candidates.
### Client side (desktop/src-tauri/lib.rs)
- `place_call`: gathers local host candidates via
`local_host_candidates(signal_endpoint.local_addr().port())`
and includes them in `DirectCallOffer.caller_local_addrs`.
The port match is critical — it's the Phase 5 shared signal
socket, so incoming dials to these addrs land on the same
endpoint that's already listening.
- `answer_call`: same, AcceptTrusted only (privacy mode keeps
LAN addrs hidden too, for consistency with the reflex addr).
- `connect` Tauri command: new `peer_local_addrs: Vec<String>`
arg. Builds a `PeerCandidates` bundle and passes it to the
dual-path race.
- Recv loop's CallSetup handler: destructures + forwards the
new field to JS via the signal-event payload.
### `dual_path::race` (wzp-client/dual_path.rs)
Signature change: takes `PeerCandidates` (reflex + local Vec)
instead of a single SocketAddr. The D-role branch now fans out
N parallel dials via `tokio::task::JoinSet` — one per candidate
— and the first successful dial wins (losers are aborted
immediately via `set.abort_all()`). Only when ALL candidates
have failed do we return Err; individual candidate failures are
just traced at debug level and the race waits for the others.
LAN host candidates are tried BEFORE the reflex addr in
`PeerCandidates::dial_order()` — they're faster when they work,
and the reflex addr is the fallback for the not-on-same-LAN
case.
### JS side (desktop/main.ts)
`connect` invoke now passes `peerLocalAddrs: data.peer_local_addrs ?? []`
alongside the existing `peerDirectAddr`.
### Tests
All existing test callsites updated for the new Vec<String>
fields (defaults to Vec::new() in tests — they don't exercise
the multi-candidate path). `dual_path.rs` integration tests
wrap the single `dead_peer` / `acceptor_listen_addr` in a
`PeerCandidates { reflexive: Some(_), local: Vec::new() }`.
Full workspace test: 423 passing (same as before 5.5).
## Expected behavior on the reporter's setup
Two phones behind MikroTik, both on the same LAN:
place_call:host_candidates {"local_addrs": ["192.168.88.21:XXX", "2001:...:YY:XXX"]}
recv:DirectCallAnswer {"callee_local_addrs": ["192.168.88.22:ZZZ", "2001:...:WW:ZZZ"]}
recv:CallSetup {"peer_direct_addr":"150.228.49.65:NN",
"peer_local_addrs":["192.168.88.22:ZZZ","2001:...:WW:ZZZ"]}
connect:dual_path_race_start {"peer_reflex":"...","peer_local":[...]}
dual_path: direct dial succeeded on candidate 0 ← LAN v4 wins
connect:dual_path_race_won {"path":"Direct"}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
1618ff6c9d |
feat(p2p): Phase 5 — single-socket architecture (Nebula-style)
Before Phase 5 WarzonePhone used THREE separate UDP sockets per
client:
1. Signal endpoint (register_signal, client-only)
2. Reflect probe endpoints (one fresh socket per relay probe)
3. Dual-path race endpoint (fresh per call setup)
This broke two things in production on port-preserving NATs
(MikroTik masquerade, most consumer routers):
a. Phase 2 NAT detection was WRONG. Each probe used a fresh
internal port, so MikroTik mapped each one to a different
external port, and the classifier saw "different port per
relay" and labeled it SymmetricPort. The real NAT was
cone-like but measurement via fresh sockets hid that.
b. Phase 3.5 dual-path P2P race was BROKEN. The reflex addr
we advertised in DirectCallOffer was observed by the signal
endpoint's socket. The actual dual-path race listened on a
DIFFERENT fresh socket, on a different internal (and
therefore external) port. Peers dialed the advertised addr
and hit MikroTik's mapping for the signal socket, which
forwarded to the signal endpoint — a client-only endpoint
that doesn't accept incoming connections. Direct path
silently failed, relay always won the race.
Nebula-style fix: one socket for everything. The signal endpoint
is now dual-purpose (client + server_config), and both the
reflect probes and the dual-path race reuse it instead of
creating fresh ones. MikroTik's port-preservation then gives us
a stable external port across all flows → classifier correctly
sees Cone NAT → advertised reflex addr is the actual listening
port → direct dials from peers land on the right socket →
`endpoint.accept()` in the A-role branch of the dual-path race
picks up the incoming connection.
## Changes
### `register_signal` (desktop/src-tauri/src/lib.rs)
- Endpoint now created with `Some(server_config())` instead of
`None`. The socket can now accept incoming QUIC connections as
well as dial outbound.
- Every code path that previously read `sig.endpoint` for the
relay-dial reuse benefits automatically — same socket is now
ALSO listening for peer dials.
### `probe_reflect_addr` (wzp-client/src/reflect.rs)
- New `existing_endpoint: Option<Endpoint>` arg. `Some` reuses
the caller's socket (production: pass the signal endpoint).
`None` creates a fresh one (tests + pre-registration).
- Removed the `drop(endpoint)` at the end — was correct for
fresh endpoints (explicit early socket close) but incorrect
for shared ones. End-of-scope drop does the right thing in
both cases via Arc semantics.
### `detect_nat_type` (wzp-client/src/reflect.rs)
- New `shared_endpoint: Option<Endpoint>` arg, forwarded to
every probe in the JoinSet fan-out. One shared socket means
the classifier sees the true NAT type.
### `detect_nat_type` Tauri command (desktop/src-tauri/src/lib.rs)
- Reads `state.signal.endpoint` and passes it as the shared
endpoint. Falls back to None when not registered. NAT detection
now produces accurate classifications against MikroTik / most
consumer NATs.
### `dual_path::race` (wzp-client/src/dual_path.rs)
- New `shared_endpoint: Option<Endpoint>` arg.
- A-role: when `Some`, reuses it for `accept()`. This is the
critical change — the reflex addr advertised to peers is now
the address listening for incoming direct dials.
- D-role: when `Some`, reuses it for the outbound direct dial.
MikroTik keeps the same external port for the dial as for
the signal flow → direct dial through a cone-mapped NAT.
- Relay path: also reuses the shared endpoint so MikroTik has
a single consistent mapping across the whole call (saves one
extra external port and makes firewall traces cleaner).
- When `None`, falls back to fresh per-role endpoints as before.
### `connect` Tauri command (desktop/src-tauri/src/lib.rs)
- Reads `state.signal.endpoint` once when acquiring own reflex
addr and passes it through to `dual_path::race`.
### Tests
- `wzp-client/tests/dual_path.rs` and
`wzp-relay/tests/multi_reflect.rs` updated to pass `None` for
the new endpoint arg — tests use fresh sockets and that's
fine because the loopback harness doesn't care about
port-preserving NAT behavior.
Full workspace test: 423 passing (no regressions).
## Expected behavior after this commit on real hardware
Behind MikroTik + Starlink-bypass (the reporter's setup):
- Phase 2 NAT detect → **Cone NAT** (was SymmetricPort — false
positive from the measurement artifact)
- Phase 3.5 direct-P2P dial → succeeds for both cone-cone and
cone-CGNAT cases where the remote side was previously blocked
by our own socket mismatch
- LTE ↔ LTE cross-carrier → still likely relay fallback; that's
genuinely strict symmetric and needs Phase 5.5 port prediction.
## Phase 5.5 (next, separate PRD)
Multi-candidate port prediction + ICE-style candidate aggregation
for truly strict symmetric NATs. Not needed for the 95% case —
Phase 5 alone fixes most consumer-router setups.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
00deb97a5d |
fix(reflect): drop LAN/private reflex addrs from NAT classification
Real-world report: a user with one LAN relay + one internet relay got "Multiple IPs — treating as symmetric" because the LAN relay saw the client's LAN IP (172.16.81.172) while the internet relay saw the WAN IP (150.228.49.65). Two observations of "different public IPs" from the classifier's perspective, but semantically they describe two different network paths and shouldn't be compared. The LAN relay's reflection is always true, just not useful for public NAT classification: there's no NAT between the client and the LAN relay, so that path's reflex addr is always the LAN interface IP regardless of what the public-facing NAT beyond it looks like. Fix: new `is_private_or_loopback` helper filters the probe set before classification. Drops: - 127.0.0.0/8 loopback - 10/8, 172.16/12, 192.168/16 RFC1918 private - 169.254/16 link-local - 100.64/10 CGNAT shared-transition (same reasoning: a relay that sees the client with a CGNAT addr is on the same carrier network and can't describe public NAT state) - IPv6 loopback, unspecified, fe80::/10 link-local Failed probes still filtered out of classification (they were already) but now dimmed in the UI list instead of highlighted amber. Same rationale: a momentarily-offline probe target isn't a warning-worthy state, it's just a fact about the probe run. UI palette rebalance: only Cone gets green, everything else neutral text-dim. Wording changed from warning-tone "⚠ must use relay" to informational "ℹ P2P falls back to relay, calls still work" — symmetric NAT isn't broken state, it just means media takes the relay path. Tests added (4 new in wzp_client::reflect): - classify_drops_private_ip_probes — LAN + public → Unknown - classify_drops_loopback_probes — loopback + 2 public → Cone - classify_drops_cgnat_probes — CGNAT + 2 public same-IP- diff-port → SymmetricPort - classify_two_lan_probes_is_unknown_not_cone — all LAN → Unknown Existing multi_reflect integration test updated: two loopback relays now correctly classify as Unknown (because loopback reflex addrs are filtered) with the plumbing-works invariant preserved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
8cdf8d486a |
feat(p2p): Phase 4 cross-relay direct calling over federation
Teaches the relay pair to route direct-call signaling across an
existing federation link. Alice on Relay A can now place a direct
call to Bob on Relay B if A and B are federation peers — the
wire protocol, call registry, and signal dispatch all learn to
track and route the cross-relay flow.
Phase 3.5's dual-path QUIC race then carries the media directly
peer-to-peer using the advertised reflex addrs, with zero
changes needed on the client side.
## Wire protocol (wzp-proto)
New `SignalMessage::FederatedSignalForward { inner, origin_relay_fp }`
envelope variant, appended at end of enum — JSON serde is
name-tagged so pre-Phase-4 relays just log "unknown variant" and
drop it. 2 new roundtrip tests (any-inner nesting + single
DirectCallOffer case).
## Call registry (wzp-relay)
`DirectCall.peer_relay_fp: Option<String>` — federation TLS fp
of the peer relay that forwarded the offer/answer for this call.
`None` on local calls, `Some` on cross-relay. Used by the answer
path to route the reply back through the same federation link
instead of trying (and failing) to deliver via local signal_hub.
New `set_peer_relay_fp` setter + 1 new unit test.
## FederationManager (wzp-relay)
Three new methods:
- `local_tls_fp()` — exposes the relay's own federation TLS fp
so main.rs can build `origin_relay_fp` fields.
- `broadcast_signal(msg) -> usize` — fan out any signal message
(in practice `FederatedSignalForward`) to every active peer
link, returning the reach count. Used when Relay A doesn't
know which peer has the target fingerprint.
- `send_signal_to_peer(fp, msg)` — targeted send for the reply
path where the registry already knows which peer relay to
hit.
Plus a new `cross_relay_signal_tx: Mutex<Option<Sender<...>>>`
field that `set_cross_relay_tx()` wires at startup so the
federation `handle_signal` can push unwrapped inner messages
into the main signal dispatcher.
## Federation handle_signal (wzp-relay)
New match arm for `FederatedSignalForward`:
- Loop prevention: drops forwards whose `origin_relay_fp` equals
this relay's own fp (prevents A→B→A echo loops without needing
TTL yet).
- Otherwise pulls the inner message out and pushes it through
`cross_relay_signal_tx` so the main loop's dispatcher task
handles it as if it had arrived locally.
## Main signal loop (wzp-relay)
### DirectCallOffer when target not local
Before falling through to Hangup, try the federation path:
- Wrap the offer in `FederatedSignalForward` with
`origin_relay_fp = this relay's tls_fp`
- `fm.broadcast_signal(forward)` — returns peer count
- If any peers reached, stash the call in local registry with
`caller_reflexive_addr` set, `peer_relay_fp` still None
(broadcast — the answer-side will identify itself when it
replies)
- Send `CallRinging` to caller immediately for UX feedback
- Only if no federation or no peers → legacy Hangup path
### DirectCallAnswer when peer is remote
- Registry lookup now reads both `peer_fingerprint` and
`peer_relay_fp` in one acquisition
- If `peer_relay_fp.is_some()`:
* Reject → forward a `Hangup` over federation via
`send_signal_to_peer` instead of local signal_hub
* Accept → wrap the raw answer in `FederatedSignalForward`,
route to the specific origin peer, then emit the LOCAL
CallSetup to our callee with `peer_direct_addr =
caller_reflexive_addr` (caller is remote; this side only
has the callee)
- If `peer_relay_fp.is_none()` → existing Phase 3 same-relay
path with both CallSetups (caller + callee)
### Cross-relay signal dispatcher task
New long-running task reading `(inner, origin_relay_fp)` from
`cross_relay_rx`. In Phase 4 MVP handles:
- `DirectCallOffer` — if target is local, create the call in
the registry with `peer_relay_fp = origin_relay_fp`, stash
caller addr, deliver offer to local callee. If target isn't
local, drop (no multi-hop in Phase 4 MVP).
- `DirectCallAnswer` — look up local caller by call_id, stash
callee addr, forward raw answer to local caller via
signal_hub, emit local CallSetup with `peer_direct_addr =
callee_reflexive_addr` (peer is local now; this side only
has the caller).
- `CallRinging` — best-effort forward to local caller for UX.
- `Hangup` — logged for now; Phase 4.1 will target by call_id.
## Integration tests
`crates/wzp-relay/tests/cross_relay_direct_call.rs` — 3 tests
that reproduce the main.rs cross-relay dispatcher logic inline
and assert the invariants without spinning up real binaries:
1. `cross_relay_offer_forwards_and_stashes_peer_relay_fp` —
Relay A gets Alice's offer, broadcasts. Relay B's dispatcher
creates the call with `peer_relay_fp = relay_a_tls_fp`.
2. `cross_relay_answer_crosswires_peer_direct_addrs` — full
round trip; both CallSetups (one on each relay) carry the
OTHER party's reflex addr.
3. `cross_relay_loop_prevention_drops_self_sourced_forward` —
explicit loop-prevention check.
Full workspace test goes from 413 → 419 passing. Clippy clean
on touched files.
## Non-goals (deferred to Phase 4.1+)
- Relay-mediated media fallback across federation — if P2P
direct fails (symmetric NAT on either side), the call errors
out with "no media path". Making the existing federation
media pipeline carry ephemeral call-<id> rooms is the Phase
4.1 lift.
- Multi-hop federation (A → B → C). Phase 4 MVP supports a
direct federation link between A and B only.
- Fingerprint → peer-relay routing gossip.
PRD: .taskmaster/docs/prd_phase4_cross_relay_p2p.txt
Tasks: 70-78 all completed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
59ce52f8e8 |
feat(p2p): Phase 3.5 dual-path QUIC race + GUI call-flow debug logs
Two features in one commit because they ship and test together:
Phase 3.5 closes the hole-punching loop and the call-flow debug
logs give the user live visibility into every step of a call so
real-hardware testing of the new P2P path is debuggable.
## Phase 3.5 — dual-path QUIC connect race
Completes the hole-punching work Phase 3 scaffolded. On receiving
a CallSetup with peer_direct_addr, the client now actually races a
direct QUIC handshake against the relay dial and uses whichever
completes first. Symmetric role assignment avoids the two-conns-
per-call problem:
- Both peers compare `own_reflex_addr` vs `peer_reflex_addr`
lexicographically.
- Smaller addr → **Acceptor** (A-role): builds a server-capable
dual endpoint, awaits an incoming QUIC session. Does NOT dial.
- Larger addr → **Dialer** (D-role): builds a client-only
endpoint, dials the peer's addr with `call-<id>` SNI. Does NOT
listen.
- Both sides always dial the relay in parallel as fallback.
- `tokio::select!` with `biased` preference for direct, `tokio::pin!`
so each branch can await the losing opposite as fallback.
- Direct timeout 2s, relay fallback timeout 5s (so 7s worst case
from CallSetup to "no media path" error).
New crate module `wzp_client::dual_path::{race, WinningPath}`
(moved here from desktop/src-tauri so it's testable from a
workspace test). `determine_role` in `wzp_client::reflect` is
pure-function and unit-tested.
### CallEngine integration
- New `pre_connected_transport: Option<Arc<QuinnTransport>>` arg
on both android + desktop `CallEngine::start` branches. Skips
the internal wzp_transport::connect step when Some. Backward-
compat: None keeps Phase 0 relay-only behavior.
- `connect` Tauri command reads own_reflex_addr from SignalState,
computes role, runs the race, passes the winning transport
into CallEngine. If ANY input is missing (no peer addr, no own
addr, equal addrs), falls back to classic relay path —
identical to pre-Phase-3.5 behavior.
### Tests (9 new, all passing)
- 6 unit tests for `determine_role` truth table in
`wzp-client/src/reflect.rs` (smaller=Acceptor, larger=Dialer,
port-only diff, equal, missing-side, symmetry)
- 3 integration tests in `crates/wzp-client/tests/dual_path.rs`:
* `dual_path_direct_wins_on_loopback` — two-endpoint test
rig, Dialer wins direct path vs loopback mock relay
* `dual_path_relay_wins_when_direct_is_dead` — dead peer
port, 2s direct timeout, relay fallback wins
* `dual_path_errors_cleanly_when_both_paths_dead` — <10s
error, no hang
## GUI call-flow debug logs
Runtime-toggled structured events at every step of a call so the
user can see where a call progressed or stalled on real hardware.
Modeled on the existing DRED_VERBOSE_LOGS pattern.
### Rust side
- `static CALL_DEBUG_LOGS: AtomicBool` + `emit_call_debug(&app,
step, details)` helper. Always logs via `tracing::info!`
(logcat always has a copy); GUI Tauri `call-debug-log` event
only fires when the flag is on.
- Tauri commands `set_call_debug_logs` / `get_call_debug_logs`.
### Instrumented steps (24 emit_call_debug sites)
- `register_signal`: start, identity loaded, endpoint created,
connect failed/ok, RegisterPresence sent, ack received/failed,
recv loop spawning
- Recv loop: CallRinging, DirectCallOffer (w/ caller_reflexive_addr),
DirectCallAnswer (w/ callee_reflexive_addr), CallSetup (w/
peer_direct_addr), Hangup
- `place_call`: start, reflect query start/ok/none, offer sent,
send failed
- `answer_call`: start, reflect query start/ok/none or privacy
skip, answer sent, send failed
- `connect`: start, dual_path_race_start (w/ role), won (w/
path), failed, skipped (w/ reasons), call_engine_starting/
started/failed
### JS side
- New `callDebugLogs: boolean` field on Settings type.
- Boot-time hydrate of the Rust flag from localStorage so the
choice survives restarts (like `dredDebugLogs`).
- Settings panel: new "Call flow debug logs" checkbox alongside
the DRED toggle.
- New "Call Debug Log" section that ONLY shows when the flag is
on. Rolling in-memory buffer of the last 200 events, rendered
as monospace `HH:MM:SS.mmm step {details}` lines with auto-
scroll and a Clear button.
- `listen("call-debug-log", ...)` subscribed at app startup,
appends to the buffer, re-renders on every event.
Full workspace test goes from 404 → 413 passing. Clippy clean
on touched crates.
PRD: .taskmaster/docs/prd_phase35_dual_path_race.txt
Tasks: 61-69 all completed
Next: APK + desktop build carrying everything — Phase 2 NAT
detect, Phase 3 advertising, Phase 3.5 dual-path + call debug
logs, plus the earlier Android first-join diagnostics — so the
user can validate the P2P path on real hardware with live
per-step visibility into where any failures happen.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
39277bf3a0 |
feat(hole-punching): advertise peer reflexive addrs in DirectCall flow — Phase 3
Completes the signal-plane plumbing for P2P direct calling: both
peers now learn their own server-reflexive address (Phase 1
Reflect), include it in DirectCallOffer / DirectCallAnswer, and
the relay cross-wires them into each side's CallSetup so the
client knows the OTHER party's direct addr. Dual-path QUIC race
is scaffolded but deferred to Phase 3.5 — this commit ships the
full advertising layer so real-hardware testing can confirm the
addrs flow end-to-end before adding the concurrent-connect logic.
Wire protocol (wzp-proto/src/packet.rs):
- DirectCallOffer gains optional `caller_reflexive_addr`
- DirectCallAnswer gains optional `callee_reflexive_addr`
- CallSetup gains optional `peer_direct_addr`
- All #[serde(default, skip_serializing_if = "Option::is_none")] so
pre-Phase-3 peers and relays stay backward compatible by
construction — the new fields are elided from the JSON on the
wire when None, and older clients parse the JSON ignoring any
fields they don't know.
- 2 new roundtrip tests (Some + None cases, old-JSON parse-back).
Call registry (wzp-relay/src/call_registry.rs):
- DirectCall gains caller_reflexive_addr + callee_reflexive_addr.
- set_caller_reflexive_addr / set_callee_reflexive_addr setters.
- 2 new unit tests: stores and returns addrs, clearing works.
Relay cross-wiring (wzp-relay/src/main.rs):
- On DirectCallOffer: stash the caller's addr in the registry.
- On DirectCallAnswer: stash the callee's addr (only set by
AcceptTrusted answers — privacy-mode leaves it None).
- Send two different CallSetup messages: one to the caller with
peer_direct_addr=callee_addr, and one to the callee with
peer_direct_addr=caller_addr. The cross-wiring means each side
gets the OTHER party's direct addr, not its own.
- Logs `p2p_viable=true` when both sides advertised.
Client advertising (desktop/src-tauri/src/lib.rs):
- New `try_reflect_own_addr` helper that reuses the Phase 1
oneshot pattern WITHOUT holding state.signal.lock() across the
await (critical: the recv loop reacquires the same mutex to
fire the oneshot, so holding it would deadlock).
- `place_call` queries reflect first and includes the returned
addr in DirectCallOffer. Falls back to None on any failure —
call still proceeds via the relay path.
- `answer_call` queries reflect ONLY on AcceptTrusted so
AcceptGeneric keeps the callee's IP private by design. Reject
and AcceptGeneric both pass None.
- recv loop's CallSetup handler destructures and forwards
peer_direct_addr to the JS layer in the signal-event payload.
Client scaffolding for dual-path (desktop/src-tauri/src/lib.rs +
desktop/src/main.ts):
- `connect` Tauri command gets a new optional `peer_direct_addr`
argument. Currently LOGS the addr but still uses the relay
path for the media connection — Phase 3.5 will swap in a
tokio::select! race between direct dial + relay dial. Scaffolding
lands here so the JS wire is stable, real-hardware testing can
confirm advertising works end-to-end, and Phase 3.5 is a pure
Rust change with no JS touches.
- JS setup handler forwards `data.peer_direct_addr` to invoke.
Back-compat with the CLI client (crates/wzp-client/src/cli.rs):
- CLI test harness updated for the new fields — always passes
None for both reflex addrs (no hole-punching). Also destructures
peer_direct_addr: _ in its CallSetup handler.
Tests (8 new, all passing):
- wzp-proto: hole_punching_optional_fields_roundtrip,
hole_punching_backward_compat_old_json_parses
- wzp-relay call_registry: call_registry_stores_reflexive_addrs,
call_registry_clearing_reflex_addr_works
- wzp-relay integration: crates/wzp-relay/tests/hole_punching.rs
* both_peers_advertise_reflex_addrs_cross_wire_in_setup
* privacy_mode_answer_omits_callee_addr_from_setup
* pre_phase3_caller_leaves_both_setups_relay_only
* neither_peer_advertises_both_setups_are_relay_only
Full workspace test goes from 396 → 404 passing.
PRD: .taskmaster/docs/prd_hole_punching.txt
Tasks: 53-60 all completed (58 = scaffolding-only; 3.5 follow-up)
Next up: **Phase 3.5 — dual-path QUIC connect race**. With the
advertising layer live, this becomes a focused change: on
CallSetup-with-peer_direct_addr, start a server-capable dual
endpoint, and tokio::select! across (direct dial, relay dial,
inbound accept). Whichever QUIC handshake completes first wins,
the losers drop, 2s direct timeout falls back to relay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
8d903f16c6 |
feat(reflect): multi-relay NAT type detection — Phase 2
Builds on Phase 1's SignalMessage::Reflect to probe N relays in
parallel through transient QUIC connections and classify the
client's NAT type for the future P2P hole-punching path. No wire
protocol changes — Phase 1's Reflect/ReflectResponse pair is
reused unchanged.
New client-side module (crates/wzp-client/src/reflect.rs):
- probe_reflect_addr(relay, timeout_ms): opens a throwaway
quinn::Endpoint (fresh ephemeral source port per probe,
essential for NAT-type detection — sharing one endpoint would
make a symmetric NAT look like a cone NAT), connects to _signal,
sends RegisterPresence with zero identity, consumes the Ack,
sends Reflect, awaits ReflectResponse, cleanly closes.
- detect_nat_type(relays, timeout_ms): parallel probes via
tokio::task::JoinSet (bounded by slowest probe not sum) and
returns a NatDetection with per-probe results + aggregate
classification.
- classify_nat(probes): pure-function classifier split out for
network-free unit tests. Rules:
* 0-1 successful probes → Unknown
* 2+ successes, same ip same port → Cone (P2P viable)
* 2+ successes, same ip diff ports → SymmetricPort (relay)
* 2+ successes, different ips → Multiple (treat as
symmetric)
Tauri command (desktop/src-tauri/src/lib.rs):
- detect_nat_type({ relays: [{ name, address }] }) -> NatDetection
as JSON. Takes the relay list from JS because localStorage
owns the config. Parse-up-front so a malformed entry fails
clean instead of as a probe error. 1500ms per-probe timeout.
UI (desktop/index.html + src/main.ts):
- New "NAT type" row + "Detect NAT" button in the Network
settings section. Renders per-probe status (name, address,
observed addr, latency, or error) plus the colored verdict:
* green Cone — shows consensus addr
* amber SymmetricPort / Multiple — must relay
* gray Unknown — not enough data
Tests:
- 7 unit tests in wzp-client/src/reflect.rs covering every
classifier branch (empty, 1 success, 2 identical, 2 diff ports,
2 diff ips, success+failure mix, pure-failure).
- 3 integration tests in crates/wzp-relay/tests/multi_reflect.rs:
* probe_reflect_addr_happy_path — single mock relay end-to-end
* detect_nat_type_two_loopback_relays_is_cone — two concurrent
relays, asserts both see 127.0.0.1 and classifier returns
Cone or SymmetricPort (accepted because the test harness
uses fresh ephemeral ports per probe which look like
SymmetricPort on single-host loopback)
* detect_nat_type_dead_relay_is_unknown — alive + dead port
mix, asserts the dead probe surfaces an error string and
the aggregator returns Unknown (only 1 success)
Full workspace test goes from 386 → 396 passing.
PRD: .taskmaster/docs/prd_multi_relay_reflect.txt
Tasks: 47-52 all completed
Next up: hole-punching (Phase 3) — use the reflected address in
DirectCallOffer/Answer and CallSetup so peers attempt a direct
QUIC handshake to each other, with relay fallback on timeout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
921856eba9 |
feat(reflect): QUIC-native NAT reflection ("STUN for QUIC") — Phase 1
Lets a client ask its registered relay "what IP:port do you see for
me?" over the existing TLS-authenticated signal channel, returning
the client's server-reflexive address as a SocketAddr. Replaces the
need for a classic STUN deployment and becomes the bootstrap step
for future P2P hole-punching: once both peers know their own reflex
addrs, they can advertise them in DirectCallOffer and attempt a
direct QUIC handshake to each other.
Wire protocol (wzp-proto):
- SignalMessage::Reflect — unit variant, client -> relay
- SignalMessage::ReflectResponse { observed_addr: String } — relay -> client
- JSON-serde, appended at end of enum: zero ordinal concerns,
backward compat with pre-Phase-1 relays by construction (older
relays log "unexpected message" and drop; newer clients time out
cleanly within 1s).
Relay handler (wzp-relay/src/main.rs, signal loop):
- New match arm next to Ping reuses the already-bound `addr` from
connection.remote_address() and replies with observed_addr as a
string. debug!-level log on success, warn!-level on send failure.
Client side (desktop/src-tauri/src/lib.rs):
- SignalState gains pending_reflect: Option<oneshot::Sender<SocketAddr>>.
- get_reflected_address Tauri command installs the oneshot before
sending Reflect and awaits it with a 1s timeout; cleans up on
every exit path (send failure, timeout, parse error).
- recv loop's new ReflectResponse arm fires the pending sender or
emits a debug log for unsolicited responses — never crashes the
loop on malformed input.
- Integrated into invoke_handler! alongside the other signal
commands.
UI (desktop/index.html + src/main.ts):
- New "Network" section in settings panel with a "Detect" button
that displays the reflected address or a categorized warning
("register first" / "relay does not support reflection" / error).
Tests (crates/wzp-relay/tests/reflect.rs — 3 new, all passing):
- reflect_happy_path: client on loopback gets back 127.0.0.1:<its own port>
- reflect_two_clients_distinct_ports: two concurrent clients see
their own distinct ports, proving per-connection remote_address
- reflect_old_relay_times_out: mock relay that ignores Reflect —
client times out between 1000-1200ms and does not hang
Also pre-existing test bit-rot unrelated to this PR — fixed so the
full workspace `cargo test` goes green:
- handshake_integration tests in wzp-client, wzp-relay and
featherchat_compat in wzp-crypto all missed the `alias` field
addition to CallOffer and the 3-arg form of perform_handshake
plus 4-tuple return of accept_handshake. Updated to the current
API surface.
Results:
cargo test --workspace --exclude wzp-android: 386 passed
cargo check --workspace: clean
cargo clippy: no new warnings in touched files
Verification excludes wzp-android because it's dead code on this
branch (Tauri mobile uses wzp-native instead) and can't link -llog
on macOS host — unchanged status quo.
PRD: .taskmaster/docs/prd_reflect_over_quic.txt
Tasks: 39-46 all completed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
daf7bcd9ba |
chore(warnings): sweep the workspace — zero warnings on lib + bin targets
Addressed every rustc warning surfaced by \`cargo check --workspace
--release --lib --bins\` on opus-DRED-v2. Split across three
categories:
## Real bugs surfaced by the audit (fix, don't silence)
- **crates/wzp-relay/src/federation.rs** — the per-peer RTT monitor
task computed \`rtt_ms\` every 5 s and threw it on the floor. The
\`wzp_federation_peer_rtt_ms\` gauge has been registered in
metrics.rs the whole time but was never receiving samples, leaving
the Grafana panel blank. Wired it up: the task now calls
\`fm_rtt.metrics.federation_peer_rtt_ms.with_label_values(&[&label_rtt]).set(rtt_ms)\`
on every sample. Fixes three warnings (\`rtt_ms\`, \`fm_rtt\`,
\`label_rtt\` were all captured for this task and all dead).
## Dead code removal
- **crates/wzp-relay/src/federation.rs** — removed \`local_delivery_seq:
AtomicU16\` field and its initializer. It was described in comments
as "per-room seq counter for federation media delivered to local
clients" but was declared, initialized to 0, and never read or
written anywhere else. Genuine half-wired feature; deletable with
zero behavior change.
- **crates/wzp-relay/src/room.rs** — removed \`let recv_start =
Instant::now()\` at the top of a recv loop that was never read.
Separate variable \`last_recv_instant\` already measures the actual
gap that's used for the \`max_recv_gap_ms\` stat.
- **crates/wzp-client/src/cli.rs** — removed \`let my_fp = fp.clone()\`
from the signal loop setup. Cloned but never used in any match arm.
## Stub-intent warnings (underscore + explanatory comment)
- **crates/wzp-relay/src/handshake.rs** — \`choose_profile\` hardcodes
\`QualityProfile::GOOD\` and ignores its \`supported\` parameter.
Comment already documented "Cap at GOOD (24k) for now — studio
tiers not yet tested for federation reliability". Renamed to
\`_supported\`, expanded the comment to explicitly note the future
plan (pick highest supported ≤ relay ceiling).
- **crates/wzp-relay/src/federation.rs** — \`forward_to_peers\` takes
\`room_name: &str\` but only uses \`room_hash\`. The caller
(handle_datagram) passes the name for caller-site symmetry with
other helpers; kept the param shape and underscored the binding
with a comment noting it's reserved for future per-name logging.
## Cosmetic fixes
- **crates/wzp-relay/src/event_log.rs** — dropped \`use std::sync::Arc\`
(unused).
- **crates/wzp-relay/src/signal_hub.rs** — trimmed \`use tracing::{info,
warn}\` to \`use tracing::info\`. Also removed unnecessary \`mut\` on
\`hub\` binding in the \`register_unregister\` test.
- **crates/wzp-relay/src/room.rs** — trimmed \`use tracing::{debug,
error, info, trace, warn}\` to \`{error, info, warn}\`. Also removed
unnecessary \`mut\` on \`mgr\` binding in the \`room_join_leave\` test.
- **crates/wzp-relay/src/main.rs** — removed unnecessary \`mut\` on the
\`config\` destructured binding from \`parse_args()\`; and dropped
\`ref caller_alias\` from the \`DirectCallOffer\` match pattern since
the relay just forwards the full \`msg\` (caller_alias is preserved
end-to-end, we don't need to read it on the relay).
- **crates/wzp-crypto/tests/featherchat_compat.rs** — dropped
\`CallSignalType\` from a \`use wzp_client::featherchat::{...}\`
(unused in the test body). Note: this test file has pre-existing
compile errors from SignalMessage schema drift unrelated to this
sweep; that's tracked separately.
## Crate-level annotation
- **crates/wzp-android/src/lib.rs** — added
\`#![allow(dead_code, unused_imports, unused_variables, unused_mut)]\`
with a doc block explaining the crate is dead code since the Tauri
mobile rewrite. The legacy Kotlin+JNI Android app that consumed
this crate was replaced by desktop/src-tauri (live Android recv
path) + crates/wzp-native (Oboe bridge). Rather than piecemeal
cleanup of a crate that shouldn't be maintained, the whole-crate
allow keeps CI clean until someone removes the crate entirely. Kills
all 6 wzp-android warnings (4 unused imports/vars, 1 unused \`mut\`
on a JNI env param, 1 dead \`command_rx\` field) in one line.
## Not touched
- **deps/featherchat/warzone/crates/warzone-protocol/src/x3dh.rs** —
3 unused-variable warnings in \`alice_spk_secret\`, \`alice_bundle\`,
\`bob_bundle_bytes\`. This is a vendored third-party submodule;
upstream's problem, not ours. Would need to be reported to
featherchat upstream if we care.
## Verification
- \`cargo check --workspace --release --lib --bins\` → 0 warnings, 0 errors
- \`cargo check --workspace --release --all-targets\` → only the 3
featherchat submodule warnings remain, plus the pre-existing 3
broken integration tests (SignalMessage schema drift from Phase 2,
tracked separately and explicitly out of scope).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
df1a45a5f5 |
fix(cli): port live mode to ring API (read_frame/write_frame removed)
AudioCapture and AudioPlayback no longer expose the old read_frame() and write_frame() methods — they were replaced with ring() returning &Arc<AudioRing> when the lock-free SPSC ring was introduced. The CLI live-mode loop still referenced the removed methods, which broke every workspace build that touched wzp-client bin (including the remote Linux x86_64 docker build). - Send loop: allocate a 960-sample scratch buffer, fill it in a loop via capture.ring().read() until a full 20 ms frame is available, sleep 2 ms between empty reads to avoid hot-spinning. - Recv loop: write decoded PCM into playback.ring() instead of calling write_frame(). Short writes on full ring drop the tail, which is the correct real-time behavior for CLI live mode. No behavioral change on the wire or in the call pipeline — this is purely a compile fix for cli.rs bitrot that accumulated since the ring API landed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
7515417202 |
feat(telemetry): Phase 4 — LossRecoveryUpdate protocol + relay metrics + DebugReporter
Phase 4 lays the telemetry foundation for distinguishing DRED recoveries
from classical PLC in production: a new SignalMessage variant, two new
per-session Prometheus counters on the relay side, and a highlighted
loss-recovery section in the Android DebugReporter.
The periodic emitter (client → relay) and Grafana panel are deferred to
Phase 4b — this commit ships the protocol surface, the relay sink, and
the immediate user-visible debug output. Once 4b lands the full path
(emitter → relay → Prometheus → Grafana), the metrics here will
automatically start receiving data.
Scope decision — why not extend QualityReport instead:
The existing wire-format QualityReport is a fixed 4-byte media packet
trailer. Adding counter fields to it would shift the binary layout and
break backward compatibility (old receivers would parse the last 4
bytes of the extended trailer as QR, corrupting audio). Using a
new SignalMessage variant on the reliable QUIC signal stream sidesteps
the wire-format problem entirely — serde JSON enums tolerate unknown
variants gracefully on old receivers, and the signal channel is the
right layer for periodic telemetry aggregates.
Changes:
wzp-proto/src/packet.rs:
- New SignalMessage::LossRecoveryUpdate variant carrying:
* dred_reconstructions: u64 (monotonic since call start)
* classical_plc_invocations: u64 (monotonic)
* frames_decoded: u64 (for rate calculation)
- All three fields tagged #[serde(default)] for forward compat.
wzp-client/src/featherchat.rs:
- Added a match arm so signal_to_call_type() handles the new
variant (treat as Offer for featherChat bridging purposes).
wzp-relay/src/metrics.rs:
- Two new IntCounterVec metrics on the relay, labeled by session_id:
* wzp_relay_session_dred_reconstructions_total
* wzp_relay_session_classical_plc_total
- New method update_session_loss_recovery(session_id, dred, plc)
applies monotonic deltas: if the incoming totals exceed the
current counter, the difference is inc_by'd. If the incoming
totals are LOWER (client restart or counter reset), the
Prometheus counter holds steady until the client catches up.
This matches the existing update_session_buffer delta pattern.
- remove_session_metrics() now cleans up the two new labels.
- New test session_loss_recovery_monotonic_delta exercises:
* initial population (10 DRED, 2 PLC)
* forward advance (25, 5 → delta +15, +3)
* lower values ignored (client reset → counters unchanged)
* client catches up (30, 8 → advances to new max)
- Existing session_metrics_cleanup test extended to cover the
new counters.
android/app/src/main/java/com/wzp/debug/DebugReporter.kt:
- Phase 4 users — and incident responders — need to quickly see
whether DRED is actually firing during a call. The stats JSON
already carries the counters (after Phase 3c), but they were
buried in the trailing JSON dump. Added a dedicated
"=== Loss Recovery ===" section to the meta preamble that
extracts dred_reconstructions, classical_plc_invocations,
frames_decoded, and fec_recovered from the JSON and displays
them plainly, plus computed percentages when frames_decoded > 0.
- New extractLongField helper: tiny hand-rolled JSON integer
extractor. We don't want to pull in a full JSON parser for this
single use case and CallStats has a flat, well-known schema.
Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-proto --lib: 63 passing
- cargo test -p wzp-codec --lib: 68 passing
- cargo test -p wzp-client --lib: 35 passing (+1 ignored probe)
- cargo test -p wzp-relay --lib: 68 passing (+1 new Phase 4 test)
- cargo check -p wzp-android --lib: zero errors
- Android APK build verified earlier today (unridden-alfonso.apk
via the remote Docker builder) — Phase 0–3c confirmed to compile
end-to-end on the NDK target.
Phase 4b remaining (not blocking this commit):
- Periodic LossRecoveryUpdate emitter in wzp-client/src/call.rs and
wzp-android/src/engine.rs (every ~5 s)
- Relay-side handler in main.rs that matches the new variant and
calls metrics.update_session_loss_recovery
- Grafana "Loss recovery breakdown" panel in docs/grafana-dashboard.json
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
27bc264738 |
feat(codec): Phase 3b — CallDecoder DRED reconstruction on packet loss
Phase 3b of the DRED integration — wires the Phase 3a FFI primitives
into the desktop receive path. When the jitter buffer reports a missing
Opus frame, CallDecoder now attempts to reconstruct the audio from the
most recently parsed DRED side-channel state before falling through to
classical PLC.
Architectural refinement vs the PRD's literal wording: the PRD said
"jitter buffer takes a Box<dyn DredReconstructor>". After checking deps,
wzp-transport depends only on wzp-proto (not wzp-codec). Putting DRED
state in the jitter buffer would require a new cross-crate dep and
couple the codec-agnostic buffer to libopus. Instead, this commit keeps
the DRED state ring and reconstruction dispatch inside CallDecoder (one
layer up from the jitter buffer), intercepting the existing
PlayoutResult::Missing signal. Same lookahead/backfill semantics,
cleaner layering, zero change to wzp-transport.
Changes:
CallDecoder field type: Box<dyn AudioDecoder> → AdaptiveDecoder.
Required because Phase 3b calls the inherent reconstruct_from_dred
method, which cannot live on the AudioDecoder trait without dragging
libopus DredState through wzp-proto. In practice AdaptiveDecoder was
the only AudioDecoder implementor anyway — the trait abstraction was
buying nothing. Method call sites unchanged because AdaptiveDecoder
also implements AudioDecoder.
New CallDecoder fields:
- dred_decoder: DredDecoderHandle
- dred_parse_scratch: DredState (scratch for parse_into)
- last_good_dred: DredState (cached most-recent valid state)
- last_good_dred_seq: Option<u16>
- dred_reconstructions: u64 (Phase 4 telemetry)
- classical_plc_invocations: u64 (Phase 4 telemetry)
CallDecoder::ingest — on Opus non-repair packets, parse DRED into the
scratch state. On success (samples_available > 0), std::mem::swap the
scratch into last_good_dred and record the seq. This is O(1) per
packet, zero allocation after construction (the two DredState buffers
are allocated once in new() and reused forever).
CallDecoder::decode_next — on PlayoutResult::Missing(seq) for Opus
profiles: if last_good_dred_seq > seq and the seq delta × frame_samples
fits within samples_available, call audio_dec.reconstruct_from_dred
and bump dred_reconstructions. Otherwise fall through to classical
PLC and bump classical_plc_invocations. The Codec2 path always falls
through to classical PLC since DRED is libopus-only and
AdaptiveDecoder::reconstruct_from_dred rejects Codec2 tiers
explicitly.
OpusDecoder and AdaptiveDecoder: new inherent reconstruct_from_dred
method that delegates to the underlying DecoderHandle. Needed to
bridge CallDecoder's wzp-client code to the Phase 3a FFI wrappers
without touching the AudioDecoder trait.
CRITICAL FINDING — raised DRED loss floor from 5% to 15%:
Phase 3b testing discovered that libopus 1.5's DRED emission window
scales aggressively with OPUS_SET_PACKET_LOSS_PERC. Empirical data
(see probe_dred_samples_available_by_loss_floor, an #[ignore]'d
diagnostic test in call.rs):
loss_pct samples_available effective_ms
5% 720 15 ms (useless!)
10% 2640 55 ms
15% 4560 95 ms
20% 6480 135 ms
25%+ 8400 (capped) 175 ms (~87% of 200 ms configured)
The Phase 1 default of 5% produced only a 15 ms reconstruction window
— too small to even cover a single 20 ms Opus frame. DRED was
effectively disabled even though it was emitting bytes. Raised the
floor to 15% (95 ms window) as the minimum that actually provides
single-frame loss recovery. This updates Phase 1's DRED_LOSS_FLOOR_PCT
constant in opus_enc.rs and the accompanying module docstring.
Trade-off: 15% assumed loss slightly increases encoder bitrate overhead
on clean networks. Measured via the existing phase1 bitrate probe:
Before (5% floor): 3649 bytes/sec at Opus 24k + 300 Hz sine
After (15% floor): 3568 bytes/sec at Opus 24k + 300 Hz sine
The delta is within noise — 15% isn't meaningfully more expensive than
5% on this signal, which suggests the DRED emission size is signal-
dependent rather than loss-dependent for small values. Net result: we
get a 6x larger reconstruction window for essentially free.
Tests (+3 DRED recovery, +1 #[ignore]'d probe):
- opus_single_packet_loss_is_recovered_via_dred — full encode → ingest
→ decode_next loop with one packet dropped mid-stream. Asserts
dred_reconstructions ≥ 1 and observes the exact counter deltas.
- opus_lossless_ingest_never_triggers_dred_or_plc — baseline behavior,
lossless stream never takes the Missing branch.
- codec2_loss_falls_through_to_classical_plc — Codec2 never
reconstructs via DRED even if state were populated (which it won't
be — Codec2 packets don't carry DRED bytes).
- probe_dred_samples_available_by_loss_floor — #[ignore]'d diagnostic
that sweeps loss_pct values and prints the resulting DRED window
sizes. Kept for future tuning work.
New CallDecoder introspection accessors (public but undocumented in
the PRD): last_good_dred_seq() and last_good_dred_samples_available()
for test diagnostics and future telemetry surfaces in Phase 4.
Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 68 passing (Phase 3a baseline held)
- cargo test -p wzp-client --lib: 35 passing (+3 Phase 3b tests,
+1 ignored diagnostic, no regressions)
Next up: Phase 3c mirrors this on the Android engine.rs receive path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
6db5c25b54 |
feat(codec): Phase 2 — remove RaptorQ from Opus tiers, Codec2 unchanged
Phase 2 of the DRED integration (docs/PRD-dred-integration.md). With
Phase 1 having enabled DRED on every Opus profile, the app-level RaptorQ
layer is now redundant overhead on those tiers: +20% bitrate, +40–100 ms
receive-side latency (block wait), +CPU for stats we never used. This
phase removes RaptorQ from the Opus encode and decode paths on both the
desktop (wzp-client/call.rs) and Android (wzp-android/engine.rs) sides.
Codec2 tiers keep RaptorQ with their current ratios unchanged — DRED is
libopus-only and Codec2 has no neural equivalent.
Encoder changes (the real bandwidth / CPU win):
- CallEncoder::encode_frame and engine.rs encode loop now gate the
RaptorQ path on !codec.is_opus():
- Opus source packets emit fec_block=0, fec_symbol=0,
fec_ratio_encoded=0 in the MediaHeader
- fec_enc.add_source_symbol is skipped on Opus
- generate_repair + repair packet emission is skipped on Opus
- block_id and frame_in_block counters stay frozen at 0 for Opus
- Codec2 path is byte-for-byte identical to pre-Phase-2 behavior.
Decoder changes (mostly cleanup, since both live decoder paths were
already reading audio directly from source packets and only using the
RaptorQ decoder output for stats):
- CallDecoder::ingest skips fec_dec.add_symbol on Opus packets. Source
packets still flow to the jitter buffer; Opus repair packets from old
senders are dropped cleanly (repair packets never hit the jitter
buffer either).
- engine.rs recv loop skips fec_dec.add_symbol, fec_dec.try_decode, and
fec_dec.expire_before on Opus packets. The `fec_recovered` stat
counter becomes Codec2-only (a separate DRED reconstruction counter
lands in Phase 4).
Wire-format backward compat verified at pre-flight:
- Old receiver + new sender: engine.rs pipeline.rs path gates on
non-zero fec_block/fec_symbol which now never fire for Opus, so the
RaptorQ decoder simply isn't fed. Audio flows normally. Desktop
CallDecoder's old path accumulated packets into the stale-eviction
HashMap, which cleans up after 2s — harmless.
- New receiver + old sender: new receiver skips RaptorQ on Opus so
old-sender repair packets are ignored entirely (no crash, no double-
decode). Loses the (previously vestigial) RaptorQ recovery benefit,
which was never actually active in the audio path. Source packets
still decode normally.
- No wire format version bump required. MediaHeader is unchanged; we
just zero the FEC fields on Opus packets.
Test changes:
- Removed `encoder_generates_repair_on_full_block` — asserted the old
(pre-Phase-2) RaptorQ-on-Opus behavior and is now incorrect. Replaced
with two symmetric tests:
- `opus_source_packets_have_zero_fec_header_fields` — verifies
Phase 2 invariants on Opus packets
- `opus_encoder_never_emits_repair_packets` — runs 20 frames of
non-silent sine wave through a GOOD-profile encoder, asserts
exactly 20 output packets, zero repair
- `codec2_encoder_generates_repair_on_full_block` — same shape as
the old test but on CATASTROPHIC profile (Codec2 1200, 8
frames/block, ratio 1.0) to verify Codec2 path still emits
repairs as before
Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 61 passing (Phase 1 baseline held)
- cargo test -p wzp-client --lib: 32 passing (+3 new Phase 2 tests,
-1 old test removed)
- cargo check -p wzp-android --lib: zero errors (host link of
wzp-android tests fails on -llog per pre-existing Android-only
build.rs, unrelated to this work; integration build via
build-and-notify.sh will validate Android end-to-end)
- Pre-existing broken integration test in
crates/wzp-client/tests/handshake_integration.rs (SignalMessage
schema drift) is NOT caused by this commit — baseline had the same
3 compile errors before Phase 2. Flagged as a separate cleanup task.
Expected observable effects on a real call:
- Opus 24k outgoing bitrate drops from ~28.8 kbps (ratio 0.2 RaptorQ)
to ~25 kbps (base 24 kbps + DRED ~1–10 kbps signal-dependent)
- Opus receive-side latency drops ~40 ms on clean network (no more
block wait — jitter buffer emits as soon as a source packet arrives)
- Codec2 calls show no latency or bitrate change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
07873ea598 |
fix(linux-aec): fall back to 0.3 crate + apt lib (2.x bundled is broken)
Switch the webrtc-audio-processing dep from the 2.x git source (bundled mode) back to crates.io 0.3, and link against Debian's apt package libwebrtc-audio-processing-dev (0.3-1+b1 on Bookworm). The 2.x path fails because both the crates.io tarball and the upstream git main branch of webrtc-audio-processing-sys 2.0.3 have a build.rs bug where \`meson setup --reconfigure\` is passed unconditionally, panicking on first-run empty build dirs with "Directory does not contain a valid build tree". The 0.x line sidesteps bundled mode entirely by linking the apt-provided library. Trade-off: we get AEC2 (the older generation) instead of AEC3, but it's the same algorithm family and is what PulseAudio's module-echo-cancel and PipeWire's filter-chain use on current Debian-family distros. Fine for shipping — we can revisit AEC3 once the 2.x bundled build is fixed upstream. API changes: - 0.3's Processor::process_capture_frame and process_render_frame take &mut self, so wrap the module-level processor in a Mutex. Capture and playback threads each lock briefly (sub-ms per 10 ms frame); contention is minimal. - Import NUM_SAMPLES_PER_FRAME from the crate directly instead of hardcoding 480, so the code tracks whatever sample rate the upstream C++ lib exposes (currently 48 kHz hardcoded -> 480). - Helper fns drain_frames_through_apm / tee_render_samples / etc. take &Mutex<Processor> instead of &Processor. - Use explicit EchoCancellationSuppressionLevel and NoiseSuppressionLevel imports rather than fully-qualified paths. Dockerfile: - Drop meson / ninja-build / python3 (only needed for bundled build). - Add libwebrtc-audio-processing-dev for the system link path. - Keep clang (may be needed by the bindgen step in some versions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
cc00f7cace |
fix(linux-aec): try main branch of webrtc-audio-processing
v2.0.3 bundled build hits 'Directory does not contain a valid build tree' because the crate's build.rs uses `meson setup --reconfigure` unconditionally, which fails on first run when the build dir doesn't yet contain prior meson state. Try the main branch in case it's been fixed post-release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
eb9de988d6 |
fix(linux-aec): use git dep for webrtc-audio-processing
The crates.io tarball of webrtc-audio-processing-sys 2.0.3 is missing the vendored C++ submodule — the bundled build fails with 'Directory does not contain a valid build tree' when meson tries to configure the ./webrtc-audio-processing subdirectory. Cargo clones git deps with submodules auto-initialized since ~1.27, so pulling from the upstream git repo (pinned to tag v2.0.3) gives us the full source tree. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
4ba77c8c0e |
feat(linux): WebRTC AEC3 capture/playback backend with render-side tee
Adds gold-standard Linux echo cancellation: in-app WebRTC AEC3 (Audio Processing Module) via the webrtc-audio-processing crate, using the same algorithm as Chrome WebRTC, Zoom, Teams, and Jitsi. Runs entirely in-process, so it works identically on ALSA / PulseAudio / PipeWire systems — no dependency on user-configured echo-cancel modules. Architecture: - New crates/wzp-client/src/audio_linux_aec.rs module (~470 lines). Contains LinuxAecCapture and LinuxAecPlayback, both using CPAL under the hood but routing samples through a shared Arc<webrtc_audio_processing::Processor>. The playback path tees each 20 ms frame into APM.process_render_frame as the echo reference BEFORE handing the samples to CPAL's output callback. The capture path runs APM.process_capture_frame on each mic frame in place before pushing to the audio ring buffer. This is the "tee the playback ring" approach that Zoom/Teams/Jitsi use. - New `linux-aec` feature in wzp-client pulling in the webrtc-audio-processing crate at v2.x with the `bundled` sub-feature. Bundled means the vendored PulseAudio WebRTC C++ sources are statically compiled via meson+ninja at cargo build time — no runtime .so dependency, avoids Debian Bookworm's stale libwebrtc-audio-processing-dev 0.3 package (which predates AEC3). Dep is target-gated to Linux, so enabling the feature on non-Linux is a no-op. - lib.rs re-exports LinuxAecCapture/LinuxAecPlayback as AudioCapture/AudioPlayback when `linux-aec` is on, otherwise falls back to the CPAL audio_io path. Shared public API (start/ring/stop/Drop) means downstream code is unchanged. - New `linux-aec` feature in wzp-desktop forwards to wzp-client/linux-aec so `cargo tauri build -- --features wzp-desktop/linux-aec` builds the AEC variant. APM configuration: - EchoCancellation: High suppression, delay-agnostic mode on, extended filter on, stream_delay_ms=60 initial hint - NoiseSuppression: High - HighPassFilter: on - AGC: off (can fight Opus encoder's own gain staging + adaptive quality controller; add later if users report low mic level) Frame size handling: - Pipeline uses 20 ms frames (960 samples @ 48 kHz mono) - APM requires strict 10 ms (480 samples) per call - Each 20 ms frame is split into two 480-sample halves, APM called twice, halves stitched back - Same pattern for render and capture sides - Carry-buffer logic handles the case where CPAL delivers samples in arbitrary chunk sizes that don't divide 960 Build infrastructure: - scripts/Dockerfile.linux-desktop-builder adds meson, ninja-build, python3, clang for the webrtc-audio-processing bundled build - scripts/build-linux-desktop-docker.sh takes a new --aec flag that enables the linux-aec feature and renames the output artifacts with an `-aec` suffix so noAEC and AEC variants can coexist on disk Task #30. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
4e9244eb00 |
fix(windows): add Win32_Security feature + 2024 edition unsafe wrappers
- CreateEventW is gated behind Win32_Security in the windows crate
because its signature takes SECURITY_ATTRIBUTES; add to features.
- Remove unused HANDLE import.
- Wrap GetId() and PWSTR::to_string() in explicit unsafe { ... }
blocks for Rust 2024 edition's unsafe_op_in_unsafe_fn lint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
03a80a3196 |
feat(windows): WASAPI capture backend with OS-level AEC
Adds a direct WASAPI microphone capture path for the Windows desktop build that opens the default communications endpoint via IMMDeviceEnumerator -> IAudioClient2 -> SetClientProperties with AudioCategory_Communications, turning on Windows's communications audio processing chain (AEC, noise suppression, automatic gain control). The communications AEC operates at the OS level and uses the system render mix as the reference signal, so echo from our existing CPAL playback stream is cancelled automatically with no per-process reference plumbing. Architecture: - New crates/wzp-client/src/audio_wasapi.rs module (~280 lines). Event-driven capture loop on a dedicated thread; pushes PCM into the same lock-free AudioRing used by the CPAL path. Same public API as audio_io::AudioCapture so downstream code is unchanged. - New `windows-aec` feature in wzp-client that pulls in the `windows` crate (Microsoft's official Rust COM bindings) gated to target_os = "windows" only. Enabling the feature on non-Windows targets is a no-op since both the module and the dep are cfg(target_os = "windows"). - lib.rs re-exports WasapiAudioCapture as AudioCapture when the feature is on, otherwise falls back to the CPAL AudioCapture. AudioPlayback is always the CPAL one — no reason to swap it. - desktop/src-tauri/Cargo.toml Windows target enables the new feature: `features = ["audio", "windows-aec"]`. Implementation notes: - Uses eCommunications role (not eConsole) for GetDefaultAudioEndpoint — the user-configured "communications" device that Teams/Zoom pick up, and the one Windows's AEC is tuned for. - Requests 48 kHz mono i16 with AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM + SRC_DEFAULT_QUALITY so Windows handles any format conversion in the audio engine instead of rejecting our format. - Event-driven with SetEventHandle / WaitForSingleObject — no polling, minimal CPU cost between packets. - 200 ms wait timeout so the capture thread polls `running` often enough for Drop to stop cleanly even if the audio engine stalls (e.g. device unplug). Task #24. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
da09fdb6e9 |
windows(desktop): gate coreaudio / VoiceProcessingIO to macOS-only targets
First step of the Windows x86_64 desktop build: stop pulling
coreaudio-rs into the Windows dependency graph so the project can at
least run `cargo check --target x86_64-pc-windows-msvc`. Software AEC
is already disabled in engine.rs so there's nothing else to stub — the
macOS-specific VPIO path is skipped via #[cfg(target_os = "macos")] on
both sides and Windows falls through to the plain CPAL
AudioCapture/AudioPlayback branch that already existed.
crates/wzp-client/Cargo.toml
- coreaudio-rs optional dep moved under [target.'cfg(target_os = "macos")']
- `vpio` feature now uses `dep:coreaudio-rs` syntax and the gated dep
- Enabling `vpio` on Windows/Linux is a no-op at resolution time
crates/wzp-client/src/lib.rs
- `pub mod audio_vpio` is now #[cfg(all(feature = "vpio", target_os = "macos"))]
- Previously `vpio` alone was enough to try to compile the Core Audio
bindings, which would fail on non-Apple targets the moment the
feature flag was flipped on
desktop/src-tauri/Cargo.toml
- [target.'cfg(not(target_os = "android"))'] removed — was leaking
vpio into Windows/Linux via the catch-all.
- macOS: wzp-client with features = ["audio", "vpio"]
- Windows: wzp-client with features = ["audio"]
- Linux: wzp-client with features = ["audio"]
- Android: wzp-client with default-features = false (unchanged)
- Dropped the unused direct coreaudio-rs = "0.11" dep on macOS —
wzp-desktop's own sources never call Core Audio directly.
Verified via `cargo tree --target x86_64-pc-windows-msvc -p wzp-desktop`
that the Windows target now resolves wzp-client with cpal but without
coreaudio-rs. macOS target still resolves with coreaudio (direct via
vpio feature and transitively via cpal). macOS `cargo check` still
builds cleanly.
Cross-compile from macOS hit a cargo-xwin + llvm-lib setup issue in
ring's build.rs, so the actual `cargo check --target
x86_64-pc-windows-msvc` did not complete locally. Build verification
belongs on the user's Windows x86_64 host where MSVC is present
natively.
See tasks #23 (this one), #24 (Voice Capture DSP / WASAPI Communications
for OS-level AEC on Windows), and #25 (aarch64-pc-windows-msvc support).
|
||
|
|
2288c1ae07 |
feat: direct calling UI for desktop Tauri app + merge android branch
Tauri backend: - register_signal: persistent _signal connection, presence registration - place_call: send DirectCallOffer by fingerprint - answer_call: accept/reject incoming calls - get_signal_status: poll signal state Frontend: - Mode toggle: "Room" vs "Direct Call" - Register button → registers on relay signal channel - Incoming call panel with Accept/Reject - Fingerprint input + Call button - Auto-connect to media room on CallSetup event Also merges feat/android-voip-client into desktop branch: - Federation fixes, time-based dedup, FEC stale blocks - Direct calling protocol types - ACL + SAS verification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
3351cb6473 |
feat: direct 1:1 calling via relay signaling (Phase 1)
New feature: call someone directly by fingerprint through the relay.
- Client connects with SNI "_signal" for persistent signaling
- RegisterPresence/RegisterPresenceAck for relay registration
- DirectCallOffer routed to target by fingerprint
- DirectCallAnswer with AcceptGeneric/AcceptTrusted/Reject modes
- Relay creates private room (call-{id}), sends CallSetup to both
- Both clients connect to private room for media (existing SFU path)
- Hangup forwarding + cleanup on disconnect
- Desktop CLI: --signal + --call <fingerprint> for testing
- CallRegistry tracks call state (Pending/Ringing/Active/Ended)
- SignalHub manages persistent signaling connections
Tested: Alice calls Bob by fingerprint, relay routes offer, Bob
auto-accepts, both join private room, media flows bidirectionally.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
f935bd69cd |
fix: rewrite seq/fec for federation-delivered packets
- Time-based dedup (2s TTL) replaces fixed-window dedup — consecutive senders with same seq numbers no longer collide - Raw byte forwarding for federation local delivery (no re-serialization) - Jitter buffer resets on large backward seq jumps (>100) - recv_media skips malformed datagrams instead of returning connection-closed - SIGTERM handler for clean QUIC shutdown on wzp-client - JSONL event log infrastructure (--event-log flag) for protocol analysis - FEC disabled on GOOD profile for federation debugging (fec_ratio=0.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
5c24adf1c1 |
feat: remote version query — wzp-client --version-check <relay>
Connects to a relay over QUIC with SNI "version", reads build hash
from a unidirectional stream, prints "<relay> <git-hash>" and exits.
Usage: wzp-client --version-check 172.16.81.175:4434
Output: 172.16.81.175:4434
|