wz-phone

Author	SHA1	Message	Date
Siavash Sameni	18e5e75f33	feat(analyzer): encrypted payload decoding in replay mode (#17 ) Some checks failed Mirror to GitHub / mirror (push) Failing after 20s Details Build Release Binaries / build-amd64 (push) Failing after 3m33s Details When --key <64-char-hex> is provided with --replay, the analyzer decrypts each packet's ChaCha20-Poly1305 payload using the session key and logs plaintext frame sizes. Prints first 5 + every 100th decrypt result, and a summary at the end. This completes all 5 protocol analyzer tasks (#13-17): - #13: Observer mode (live passive listener) — was done - #14: TUI with Ratatui (per-participant panels) — was done - #15: Capture and replay (.wzp format) — was done - #16: HTML report (Chart.js loss/jitter graphs) — was done - #17: Encrypted decode (--key for replay) — done now Usage: wzp-analyzer --replay session.wzp --key <64-hex-chars> --html report.html Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:07:43 +04:00
Siavash Sameni	f06f9073ae	feat(nat): birthday attack module + HardNatBirthdayStart signal (#86 , #87 ) Some checks failed Mirror to GitHub / mirror (push) Failing after 25s Details Build Release Binaries / build-amd64 (push) Failing after 3m43s Details Birthday attack for random symmetric NATs: - birthday.rs: open_acceptor_ports() opens N sockets, STUN-probes each to learn external ports. generate_dialer_targets() builds hit list (known ports first, then random fill). spray_dialer() sprays QUIC connects with rate limiting, first success wins. - Default: 32 acceptor ports, 128 dialer probes, 20ms interval Signal coordination: - HardNatBirthdayStart { acceptor_ports, external_ip } sent by Acceptor when peer's HardNatProbe shows random/sequential NAT - Relay forwards it like other call signals - Desktop recv loop handles and logs it Hybrid waterfall integration: - On receiving HardNatProbe with non-cone allocation, Acceptor auto-opens birthday ports and sends BirthdayStart - Sockets kept alive 10s for NAT mapping persistence - Dialer spray integration into race() pending (needs transport hot-swap for background upgrade) 6 new tests, 599 total, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 16:44:36 +04:00
Siavash Sameni	1de280fe04	fix(nat): working NAT tickle + smart filter debug + timeout diags Some checks failed Mirror to GitHub / mirror (push) Failing after 27s Details Build Release Binaries / build-amd64 (push) Failing after 3m39s Details Fixes from real-world 5G↔Starlink testing: NAT tickle fix: - tokio::net::UdpSocket::bind() doesn't set SO_REUSEADDR, so binding to the same port as quinn silently failed. Now uses socket2::Socket with explicit SO_REUSEADDR + SO_REUSEPORT (via libc on unix). - Tickle now logs success/failure for debugging. Diagnostic fixes: - connect:dual_path_race_start shows both dial_order_raw and dial_order_smart so we can see what filtering removed - Grace-period timeout (relay wins first, direct still running) now fills "timeout:grace" diags for unrecorded candidates - Previously candidate_diags was empty when relay won the race Dependencies: - Added socket2 = "0.5" to wzp-client 593 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:58:13 +04:00
Siavash Sameni	bc6d327ebb	feat(nat): smart candidate filtering + acceptor NAT tickle + 4s timeout Some checks failed Mirror to GitHub / mirror (push) Failing after 24s Details Build Release Binaries / build-amd64 (push) Failing after 3m33s Details Major P2P improvements for cross-network calls: Smart candidate filtering (smart_dial_order): - Strip LAN candidates when peer's public IP differs from ours (172.16.x.x is unreachable from a different network) - Strip all IPv6 candidates (Phase 7 disabled, wastes dial slots) - Only keep mapped + reflexive for cross-network calls - LAN candidates preserved when both peers share the same public IP Acceptor NAT tickle: - A-role sends a 1-byte UDP packet to each peer candidate BEFORE accepting. This opens the NAT pinhole for return traffic from the Dialer's IP — critical for address-restricted NATs that only allow inbound from IPs they've seen outbound traffic to. - Uses SO_REUSEADDR on the same port as the quinn endpoint. Direct timeout increased from 2s to 4s: - Cross-network QUIC handshakes through CGNAT can take 2-3s - 2s was too aggressive for 5G/LTE networks Diagnostic fix: - Record "timeout:4s" for candidates still in-flight when the timeout fires (previously these had no diagnostic entry) 5 new tests for smart_dial_order edge cases. 593 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:42:02 +04:00
Siavash Sameni	c0dd6c06ff	feat(debug): per-candidate dial diagnostics in dual-path race Some checks failed Mirror to GitHub / mirror (push) Failing after 28s Details Build Release Binaries / build-amd64 (push) Failing after 3m24s Details Added CandidateDiag struct to RaceResult with per-candidate: - address attempted - result (ok / skipped:ipv6 / error:reason) - elapsed time in ms Surfaced in call-debug events: - connect:dual_path_race_start now includes dial_order + peer_mapped - connect:dual_path_race_done now includes candidate_diags array Upgraded dual_path tracing from debug to info for IPv6 skips and dial failures so they appear in logcat/console. Helps diagnose why P2P fails on specific networks (5G CGNAT, address-restricted NATs, etc). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:16:34 +04:00
Siavash Sameni	ec1bdf3cd5	feat(nat): hard NAT port allocation detection + prediction + HardNatProbe signal (#29 ) Some checks failed Mirror to GitHub / mirror (push) Failing after 31s Details Build Release Binaries / build-amd64 (push) Failing after 3m30s Details Phase A of hard NAT traversal (PRD-hard-nat.md): - PortAllocation enum: PortPreserving / Sequential{delta} / Random / Unknown - detect_port_allocation(): sequential STUN probes from single socket, analyzes port sequence for allocation pattern - classify_port_allocation(): pure function with jitter tolerance, wraparound handling, 60% threshold for noisy sequences - predict_ports(): generates target port range from last_port + delta - HardNatProbe signal message: carries port_sequence, allocation pattern, external_ip for peer coordination - Relay forwards HardNatProbe to call peer - Netcheck gains port_allocation field + format_report display 588 tests pass (17 new), 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 11:29:35 +04:00
Siavash Sameni	8fcf1be341	feat(nat): Tailscale-inspired STUN/ICE + port mapping + mid-call re-gathering (#28 ) Some checks failed Mirror to GitHub / mirror (push) Failing after 23s Details Build Release Binaries / build-amd64 (push) Failing after 6m8s Details Phase 8: 5 new modules bringing NAT traversal close to Tailscale's approach. - stun.rs: RFC 5389 STUN client — public server reflexive discovery, XOR-MAPPED-ADDRESS parsing, parallel probe with retry, STUN fallback in desktop try_reflect_own_addr() - portmap.rs: NAT-PMP (RFC 6886) + PCP (RFC 6887) + UPnP IGD port mapping — gateway discovery, acquire/release/refresh lifecycle, new PeerCandidates.mapped candidate type in dial order - ice_agent.rs: candidate lifecycle — gather(), re_gather(), apply_peer_update() with monotonic generation counter, CandidateUpdate signal message forwarded by relay - netcheck.rs: comprehensive diagnostic — NAT type, IPv4/v6, port mapping availability, relay latencies, CLI --netcheck - relay_map.rs: RTT-sorted relay map, preferred() selection, populate_from_ack() for RegisterPresenceAck.available_relays Relay: CallRegistry stores + cross-wires caller/callee_mapped_addr into CallSetup.peer_mapped_addr. Region config + available_relays populated from federation peers in RegisterPresenceAck. Desktop: place_call/answer_call call acquire_port_mapping() and fill caller/callee_mapped_addr. STUN+relay combined NAT detection. 571 tests pass (66 new), 0 regressions, 0 warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:17:17 +04:00
Siavash Sameni	9377a9009c	feat(quality): bandwidth probing for upward adaptive quality (#10 ) Some checks failed Mirror to GitHub / mirror (push) Failing after 25s Details Build Release Binaries / build-amd64 (push) Failing after 3m36s Details After 30s stable at a tier, the AdaptiveQualityController actively probes the next tier up by switching the encoder and observing for 5s. If loss/RTT stay within the target tier's thresholds, the upgrade commits. If >1 bad report, the probe aborts with a 60s cooldown. Probing is disabled on cellular (studio tiers aren't classified there) and skipped when already at Studio64k (highest tier). This complements the passive upgrade path (10 consecutive good reports) by actively discovering that a path can sustain higher quality, rather than waiting for the classification to drift upward. New: ProbeState struct, check_probe() method, 4 constants, 5 tests. 377 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:47:21 +04:00
Siavash Sameni	425c67a08a	feat(analyzer): replay, HTML report, encrypted decode stub (#15 , #16 , #17 ) Some checks failed Mirror to GitHub / mirror (push) Failing after 26s Details Build Release Binaries / build-amd64 (push) Failing after 3m31s Details #15 - Replay mode: --replay <file.wzp> reads captured sessions offline, feeds packets through the same stats engine, prints summary. CaptureReader mirrors CaptureWriter's binary format. #16 - HTML report: --html <report.html> generates self-contained HTML with Chart.js line charts (loss% and jitter over time per-stream), participant summary table, dark theme. Works with live sessions (after exit) or replay mode. #17 - Encrypted decode: --key <hex> flag accepted and stored. Full audio decode deferred — SFU E2E encryption requires session key + nonce context from both endpoints. Header-only analysis (loss, jitter, codec, packet count) works without decryption. Usage: wzp-analyzer --replay session.wzp --html report.html wzp-analyzer relay:4433 --room test --capture out.wzp --html report.html 372 tests passing, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:31:28 +04:00
Siavash Sameni	88ca3e099a	feat: wzp-analyzer binary — protocol analyzer with TUI (#13 , #14 , #15 ) Some checks failed Mirror to GitHub / mirror (push) Failing after 28s Details Build Release Binaries / build-amd64 (push) Failing after 3m20s Details New binary: wzp-analyzer joins a room as a passive observer and displays real-time per-participant quality metrics. Features: - Passive observation: connects to relay, receives all media, never sends - Participant detection: identifies senders by sequence number streams - Per-participant stats: packets, loss%, jitter, codec, codec switches - TUI mode (ratatui): color-coded table (green/yellow/red by loss), 10 FPS refresh, session header, quit with q/Ctrl+C - No-TUI mode: prints stats to stderr every 2s (for headless/CI use) - Capture mode: binary .wzp format with microsecond timestamps for offline replay (magic WZP\x01, JSON header, per-packet records) - Session summary on exit Usage: wzp-analyzer 193.180.213.68:4433 --room general wzp-analyzer 193.180.213.68:4433 --room general --no-tui --duration 60 wzp-analyzer 193.180.213.68:4433 --room general --capture session.wzp 372 tests passing, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:26:46 +04:00
Siavash Sameni	1e82811cc1	feat(p2p): adaptive quality on direct calls (#23 ) Some checks failed Mirror to GitHub / mirror (push) Failing after 27s Details Build Release Binaries / build-amd64 (push) Failing after 3m37s Details P2P calls now adapt codec quality based on observed network conditions, matching what relay calls already had. Three-layer implementation: - QualityReport::from_path_stats(): construct reports from local quinn stats (loss%, RTT, jitter) without needing relay-generated reports - CallEncoder.pending_quality_report: one-shot attachment to next source packet (consumed on encode, not repeated) - Engine send tasks: generate quality report every 50 frames (~1s) from quinn_path_stats() and attach via set_pending_quality_report() - Engine recv tasks: self-observe from own QUIC path stats every 50 packets, feed to AdaptiveQualityController for P2P adaptation (works even if peer isn't sending quality reports yet) Both relay and P2P calls now have adaptive quality. On relay calls, both peer-sent reports AND local observations feed the controller. Hysteresis (3 consecutive bad reports to downgrade) prevents thrashing. 372 tests passing (+4 new: from_path_stats encoding, clamping, zero values, encoder quality report attachment). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:14:06 +04:00
Siavash Sameni	81b5522942	refactor: clap CLI parser, safety docs, dead code docs, cross-refs Some checks failed Mirror to GitHub / mirror (push) Failing after 26s Details Build Release Binaries / build-amd64 (push) Failing after 4m1s Details Audit items 6, 8, 9, 10: #6 - Relay CLI: replaced 154-line manual parse_args() with clap derive (13 flags/options preserved, auto --help, --version from build hash) #8 - wzp-native: added # Safety docs to all 3 unsafe extern "C" fns #9 - wzp-crypto: documented x25519_static_secret/public as reserved for future static-key federation auth (not dead code, intentionally unused) #10 - Cross-references between quality.rs ↔ dred_tuner.rs module docs 368 tests passing, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 15:40:49 +04:00
Siavash Sameni	d539a6dfb9	test(federation): 29 tests for federation.rs (was 0), engine dedup PRD Some checks failed Mirror to GitHub / mirror (push) Failing after 27s Details Build Release Binaries / build-amd64 (push) Failing after 3m45s Details Federation test coverage (crates/wzp-relay/tests/federation.rs): - room_hash: determinism, uniqueness, length, case sensitivity (5) - is_global_room: static config, call-* implicit, exact match (3) - resolve_global_room: static + call-* resolution (2) - global_room_hash: canonical names, fallthrough, independence (4) - forward_to_peers: zero peers, live QUIC datagram delivery (2) - broadcast_signal: zero peers, live QUIC signal delivery (2) - send_signal_to_peer: unknown fingerprint error (1) - peer lookup: fingerprint normalization, IP, trust priority (5) - accessors: local_tls_fp, cross_relay_tx, remote_participants (3) - integration: full media egress over live QUIC link (1) - edge case: exact room match (1) Total relay tests: 120 (was 91). Full suite: 368 passing. Also added PRD-engine-dedup.md for the engine.rs helper extraction completed in the previous commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 15:35:04 +04:00
Siavash Sameni	ba12aae439	refactor: extract shared engine helpers, federation clone-before-send, constants Some checks failed Mirror to GitHub / mirror (push) Failing after 30s Details Build Release Binaries / build-amd64 (push) Failing after 3m48s Details Engine deduplication (PRD-engine-dedup.md): - build_call_config(): shared CallConfig construction (was 23 lines × 2) - codec_to_profile(): shared CodecId → QualityProfile mapping (was 19 lines × 2) - run_signal_task(): shared signal handler (was 48 lines × 2) - Net -39 lines from engine.rs, 6 duplicated blocks → single-line calls Quick wins from REFACTOR-codebase-audit.md: - 6 magic number constants extracted (CAPTURE_POLL_MS, RECV_TIMEOUT_MS, etc.) - DRED_POLL_INTERVAL moved from 2 local defs to 1 module-level const - federation.rs: forward_to_peers, broadcast_signal, send_signal_to_peer now clone peer list and release lock before sending (was holding Mutex across async I/O — last lock-during-send pattern eliminated) - main.rs: close_transport() helper replaces 12 silent .ok() calls with debug-level logging 314 tests passing, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 15:22:44 +04:00
Siavash Sameni	a52b011fb5	feat(relay): replace global Mutex<RoomManager> with DashMap sharding Some checks failed Mirror to GitHub / mirror (push) Failing after 24s Details Build Release Binaries / build-amd64 (push) Failing after 3m41s Details Eliminates the single-lock bottleneck for media forwarding. Before: all participants across all rooms competed for one Mutex. Now rooms are stored in DashMap (64 internal shards with per-shard RwLocks). Changes: - RoomManager.rooms: HashMap → DashMap<String, Room> - Per-room quality tracking (qualities, current_tier moved into Room) - Arc<Mutex<RoomManager>> → Arc<RoomManager> everywhere - 20 .lock().await sites removed across room.rs, main.rs, federation.rs, ws.rs - federation forward_to_peers: clone peer list, release lock, then send - ACL uses std::sync::Mutex (rarely accessed, non-async) Concurrency improvement: - Before: 100 rooms × 10 people = 1000 tasks → 1 Mutex - After: distributed across 64 DashMap shards, ~15 tasks per shard avg - Rooms are fully independent — room A never blocks room B 314 tests passing, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:17:57 +04:00
Siavash Sameni	9ae9441de4	fix(audio): check capture ring available before read (fixes Opus6k choppy) Some checks failed Mirror to GitHub / mirror (push) Failing after 32s Details Build Release Binaries / build-amd64 (push) Failing after 3m58s Details Partial reads from the capture ring consumed samples that were then discarded when the send loop retried from buf[0]. For 20ms codecs this was invisible (single Oboe burst fills 960 samples in one read), but 40ms codecs (Opus6k, 1920 samples) needed 2 bursts — the first partial read consumed 960 real samples and threw them away. Result: Opus6k produced ~11 frames/s instead of 25 (~44% of expected). Fix: expose wzp_native_audio_capture_available() and check it before reading, matching the desktop capture_ring.available() pattern. Partial reads no longer occur because we only read when enough samples exist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:46:15 +04:00
Siavash Sameni	d424515542	feat: 5-tier quality classification, QualityDirective handling, debug tap stats Some checks failed Mirror to GitHub / mirror (push) Failing after 31s Details Build Release Binaries / build-amd64 (push) Failing after 3m49s Details - Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good + Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5, studio:10) - Handle QualityDirective signals in both desktop and Android engines — relay-coordinated codec switching now works end-to-end - Add periodic TAP STATS to debug tap: packets in/out, fan-out avg, seq gaps, codecs seen (every 5s) - Mark task #2 done (ParticipantInfo in federation signals already implemented) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:23:48 +04:00
Siavash Sameni	ea5fc17c34	fix(relay): debug tap signal logging, dual_path test regression, PRD updates Some checks failed Build Release Binaries / build-amd64 (push) Failing after 3m39s Details Mirror to GitHub / mirror (push) Failing after 28s Details - Add log_signal() and log_event() to DebugTap for RoomUpdate, QualityDirective, join/leave lifecycle events (task #11) - Fix dual_path.rs Phase 7 regression: add missing ipv6_endpoint arg to 3 race() call sites - Update PRDs to reflect actual implementation status: mark adaptive quality, coordinated codec, P2P, network awareness, protocol analyzer - Update PROGRESS.md with QualityDirective gap and dual_path regression Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 09:54:52 +04:00
Siavash Sameni	d249b32ee5	test+docs: add tests for QualityDirective, ParticipantQuality; update docs - QualityDirective signal roundtrip tests (with/without reason) - ParticipantQuality unit tests (initial tier, degradation, weakest-link) - Updated PROGRESS.md with desktop adaptive quality, relay coordinated switching, Oboe state polling entries - Updated ARCHITECTURE.md SFU fan-out rules with QualityDirective - Updated PRD-coordinated-codec.md with implementation status - 312 tests passing across all modified crates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 19:56:46 +04:00
Siavash Sameni	22045bc5e6	feat: adaptive quality in desktop, relay quality directive, Oboe state polling - Wire AdaptiveQualityController into desktop engine send/recv tasks (mirrors Android pattern: AtomicU8 pending_profile, auto-mode check) - Wire same into Android engine send task (was only in recv before) - QualityDirective SignalMessage variant for relay-initiated codec switch - ParticipantQuality tracking in relay RoomManager (per-participant AdaptiveQualityController, weakest-link tier computation) - Relay broadcasts QualityDirective to all participants when room-wide tier degrades (coordinated codec switching) - Oboe stream state polling: poll getState() for up to 2s after requestStart() to ensure both streams reach Started before proceeding (fixes intermittent silent calls on cold start, Nothing Phone A059) Tasks: #7, #25, #26, #31, #35 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 19:54:04 +04:00
Siavash Sameni	766c9df442	feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window - DredTuner: maps live network metrics (loss/RTT/jitter) to continuous DRED duration every ~500ms instead of discrete tier-locked values. Includes jitter-spike detection for pre-emptive Starlink-style boost. - Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports) - PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval - TrunkedForwarder respects discovered MTU (was hard-coded 1200) - QuinnPathSnapshot exposes quinn internal stats + discovered MTU - AudioEncoder trait: set_expected_loss() + set_dred_duration() methods - PathMonitor: sliding-window jitter variance for spike detection - Integrated into both Android and desktop send tasks in engine.rs - 14 new tests (10 tuner unit + 4 encoder integration) - Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 19:38:37 +04:00
Siavash Sameni	a37c8b30fe	fix(native): add missing bt_active field to stall detector config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:25:11 +04:00
Siavash Sameni	137fe5f084	fix(bluetooth): BT SCO mode skips 48kHz + VoiceCommunication on capture Root cause: Oboe capture at 48kHz with InputPreset::VoiceCommunication cannot open against a BT SCO device (only supports 8/16kHz). The stream silently falls back to builtin mic, delivering zeros. Fix: add bt_active flag to WzpOboeConfig. When set, capture skips setSampleRate and setInputPreset, letting the system route to BT SCO at its native rate. Oboe's SampleRateConversionQuality::Best resamples to 48kHz for our ring buffers. Playout uses Usage::Media in BT mode. New API: wzp_native_audio_start_bt() for BT mode, called from set_bluetooth_sco(on=true). Normal audio_start() restores the standard config when switching back to earpiece/speaker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:23:19 +04:00
Siavash Sameni	5dfb5b3581	fix(bluetooth): use Shared mode for Oboe + delay restart for BT route Two fixes for BT audio silence: 1. Switch Oboe streams from Exclusive to Shared sharing mode. Exclusive mode bypasses Oboe's internal resampler, so opening a 48kHz stream against a BT SCO device (8/16kHz only) fails at the AudioPolicy level. Shared mode lets Oboe's resampler bridge the gap. 2. Add 500ms post-SCO delay before Oboe restart. The audio policy needs time to apply the bt-sco route after setCommunicationDevice returns. Without the delay, Oboe opens against the old device (handset). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:14:06 +04:00
Siavash Sameni	fd0ccf8e99	fix(bluetooth): enable Oboe sample rate conversion for BT SCO (8/16kHz) BT SCO devices only support 8kHz or 16kHz but our Oboe streams request 48kHz. Without resampling, AudioPolicyManager rejects the input stream ("getInputProfile could not find profile for... sampling rate 48000"). Fix: add setSampleRateConversionQuality(Best) to both capture and playout stream builders. Oboe resamples internally so our ring buffers stay at 48kHz regardless of the hardware sample rate. Also removes the broken setBluetoothScoOn/isBluetoothScoOn calls from stop_bluetooth_sco — just call stopBluetoothSco() unconditionally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:08:48 +04:00
Siavash Sameni	a798634b3d	fix(signal): add call_id to Hangup — prevents stale hangup killing new calls Root cause: Hangup had no call_id field. The relay forwarded hangups to ALL active calls for a user. When user A hung up call 1 and user B immediately placed call 2, the relay's processing of A's hangup would also kill call 2 (race window ~1-2s). Fix: add optional call_id to Hangup (backwards-compatible via serde skip_serializing_if). When present, the relay only ends the named call. Old clients send call_id=None and get the legacy broadcast behavior. Also: clear pending_path_report in Hangup recv handler and internal_deregister to prevent stale oneshot channels from blocking subsequent call setups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:39:21 +04:00
Siavash Sameni	4c1ad841e1	feat(android): Bluetooth audio routing + network change detection + per-arch APK builds Bluetooth: wire existing AudioRouteManager SCO support through both app variants. Replace binary speaker toggle with 3-way route cycling (Earpiece → Speaker → Bluetooth). Tauri side adds JNI bridge functions (start/stop/query SCO, device availability) and Oboe stream restart. Network awareness: integrate Android ConnectivityManager to detect WiFi/cellular transitions and feed them to AdaptiveQualityController via lock-free AtomicU8 signaling. Enables proactive quality downgrade and FEC boost on network handoffs. Build: add --arch flag to build-tauri-android.sh supporting arm64, armv7, or all (separate per-arch APKs for smaller tester binaries). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:07:41 +04:00
Siavash Sameni	29cd23fe39	fix(p2p): connection cleanup — 4 fixes for stale/dead connections PRD 4: Disable IPv6 direct dial/accept temporarily. IPv6 QUIC handshakes succeed but connections die immediately on datagram send ("connection lost"). IPv4 candidates work reliably. IPv6 candidates still gathered but filtered at dial time. PRD 1: Close losing transport after Phase 6 negotiation. The non-selected transport now gets an explicit QUIC close frame instead of silently dropping after 30s idle timeout. Prevents phantom connections from polluting future accept() calls. PRD 2: Harden accept loop with max 3 stale retries. Stale connections are explicitly closed (conn.close) and counted. After 3 stale connections, the accept loop aborts instead of spinning until the race timeout. PRD 3: Resource cleanup — close old IPv6 endpoint before creating a new one in place_call/answer_call. Add Drop impl to CallEngine so tasks are signalled to stop on ungraceful shutdown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 15:11:50 +04:00
Siavash Sameni	4d66d3769d	fix(relay): set peer_relay_fp on originating relay when answer arrives The originating relay (where the caller is) never set peer_relay_fp because the call was created locally. When the callee's answer arrived via federation, the cross-relay dispatcher handled it but didn't mark the call as cross-relay. This meant the caller's MediaPathReport was delivered via local hub.send_to() to a peer fingerprint that isn't connected locally — silently dropped. Fix: in the cross-relay answer dispatcher, call reg.set_peer_relay_fp(call_id, Some(origin_relay_fp)) so the originating relay knows to forward MediaPathReport via federation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:49:34 +04:00
Siavash Sameni	002df15c5e	fix(cli): add .. rest pattern for RegisterPresenceAck error arm Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:32:57 +04:00
Siavash Sameni	1eb82d77b8	feat(relay+client): relay reports build version in Ack Add relay_build field to RegisterPresenceAck so the client logs which relay version it connected to. Shows in the debug log as register_signal:ack_received {"relay_build":"f843a93"}. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:27:58 +04:00
Siavash Sameni	f843a934fe	fix(relay): forward MediaPathReport across federation MediaPathReport was only delivered via local signal_hub, so calls between peers on different relays always hit peer_report_timeout and fell back to relay — even when direct P2P worked perfectly. Fix: check peer_relay_fp in call_registry (same pattern as DirectCallAnswer). If the peer is on a remote relay, wrap in FederatedSignalForward and send via federation link. Also fix the cross-relay dispatcher to deliver to BOTH caller and callee (not just caller), since the report can come from either side. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:14:30 +04:00
Siavash Sameni	1904b19d05	fix(direct): validate A-role accepted connection, skip stale ones The Acceptor's accept() on the shared signal endpoint can dequeue a stale QUIC connection from a previous call that the Dialer has already dropped. This results in "connection lost" errors when media datagrams are sent — 100% drops on both sides. Fix: after accepting a connection, check close_reason(). If the connection is already closed, log a warning and re-accept. Also verify max_datagram_size() is available before returning. Additionally: emit transport details (remote addr, max_datagram, close_reason) in the call_engine_starting debug event so stale connection issues are visible in the user-facing debug log. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 13:50:21 +04:00
Siavash Sameni	40955bd11c	debug(media): add connection diagnostics for direct P2P drops When direct P2P calls show 100% datagram drops, we need to know WHY send_media() fails. This commit adds: - Remote address + stable_id logging on A-role accept and D-role dial success (dual_path.rs) — tells us which candidate won - Remote address + max_datagram_size on engine transport init — verifies datagrams are negotiated - last_send_err in send heartbeat — captures the actual error from send_datagram() failures - QuinnTransport::remote_address() helper Also fixes UI badge: was looking for wrong event name ("dual_path_race_won" → "path_negotiated"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 13:29:58 +04:00
Siavash Sameni	0b62d3e22f	fix(cli): add missing build_version fields to Offer/Answer CLI binary was missing the new caller_build_version and callee_build_version fields, causing E0063 compile errors on Linux relay/client builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 13:09:26 +04:00
Siavash Sameni	bd6733b2e5	feat(signal): advertise build version in Offer/Answer Add caller_build_version / callee_build_version (git short hash) to DirectCallOffer and DirectCallAnswer so peers can identify each other's build in debug logs. Also log own build at register time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 12:43:55 +04:00
Siavash Sameni	7d1b8f1fdc	fix(android): add missing CallSetup pattern fields (.. rest) The CallSetup enum gained peer_direct_addr and peer_local_addrs in Phase 5.5 but the wzp-android signal recv match arm was never updated, breaking cargo ndk builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 12:09:44 +04:00
Siavash Sameni	c2d298beb5	feat(net): Phase 7 — dual-socket IPv4+IPv6 ICE Adds a dedicated IPv6 QUIC endpoint (IPV6_V6ONLY=1 via socket2) alongside the existing IPv4 signal endpoint for proper dual-stack P2P connectivity. Previous [::]:0 dual-stack attempt broke IPv4 on Android; this uses separate sockets per address family like WebRTC/libwebrtc. - create_ipv6_endpoint(): socket2-based IPv6-only UDP socket, tries same port as IPv4 signal EP, falls back to ephemeral - local_host_candidates(v4_port, v6_port): now gathers IPv6 global-unicast (2000::/3) and unique-local (fc00::/7) addrs - dual_path::race(): A-role accepts on both v4+v6 via select!, D-role routes each candidate to matching-AF endpoint - Graceful fallback: if IPv6 unavailable, .ok() → None → pure IPv4 behavior identical to pre-Phase-7 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 11:54:13 +04:00
Siavash Sameni	aee41a638d	fix(audio+net): revert dual-stack [::]:0, add Oboe playout stall auto-restart Two fixes: ## Revert [::]:0 dual-stack sockets → back to 0.0.0.0:0 Android's IPV6_V6ONLY=1 default on some kernels (confirmed on Nothing Phone) makes [::]:0 IPv6-only, silently killing ALL IPv4 traffic. This broke P2P direct calls: IPv4 LAN candidates (172.16.81.x) couldn't complete QUIC handshakes through the IPv6-only socket, causing local_direct_ok=false and relay fallback on every call after the first. Reverted all bind sites to 0.0.0.0:0 (reliable IPv4). IPv6 host candidates are disabled in local_host_candidates() until a proper dual-socket approach (one IPv4 + one IPv6 endpoint, Phase 7) is implemented. ## Fix A (task #35): Oboe playout callback stall auto-restart The Nothing Phone's Oboe playout callback fires once (cb#0) and then stops draining the ring on ~50% of cold-launch calls. Fix D+C (stop+prime from previous commit) didn't help because audio_stop is a no-op on cold launch. New approach: self-healing watchdog in audio_write_playout. Tracks the playout ring's read_idx across writes. If read_idx hasn't advanced in 50 consecutive writes (~1 second), the Oboe playout callback has stopped: 1. Log "playout STALL detected" 2. Call wzp_oboe_stop() to tear down the stuck streams 3. Clear both ring buffers (prevent stale data reads) 4. Call wzp_oboe_start() to rebuild fresh streams 5. Log success/failure 6. Return 0 (caller retries on next frame) This is the same teardown+rebuild that "rejoin" does — but triggered automatically from the first stalled call instead of requiring the user to hang up and redial. The watchdog runs on every write so it fires within 1s of the stall starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 11:24:16 +04:00
Siavash Sameni	9fb92967eb	fix(net): bind all endpoints to [::]:0 for dual-stack IPv4+IPv6 Every QUIC endpoint was bound to 0.0.0.0:0 (IPv4-only). This silently killed ALL IPv6 host candidates: the Dialer couldn't send packets to [2a0d:...] addresses (wrong address family on the socket), and the Acceptor couldn't receive incoming IPv6 QUIC handshakes. The IPv6 candidates were gathered and advertised in DirectCallOffer/Answer but were completely non-functional. On same-LAN with dual-stack (which both test phones have), this meant: - JoinSet fanned out 3+ candidates (2× IPv6 + 1× IPv4) - IPv6 dials failed silently or timed out - IPv4 dial worked but competed with failed IPv6 for JoinSet attention - Sometimes the JoinSet returned an IPv6 failure before the IPv4 success, causing unnecessary fallback to relay Fix: bind to [::]:0 (IPv6 any) instead of 0.0.0.0:0. On dual-stack systems (Linux/Android default), [::]:0 creates a socket that handles BOTH: - IPv6 natively (global unicast, ULA) - IPv4 via v4-mapped addresses (::ffff:172.16.81.x) One socket, both protocols. All 7 bind sites updated: - register_signal (signal endpoint) - do_register_signal - ping_relay - probe_reflect_addr (fresh endpoint fallback) - dual_path::race (A-role fresh, D-role fresh, relay fresh) With this fix, same-LAN P2P should prefer the IPv6 path (no NAT, direct routing, lower latency) and fall through to IPv4 if IPv6 fails — relay is the last resort after ALL candidates are exhausted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 11:09:06 +04:00
Siavash Sameni	f5542ef822	feat(p2p): Phase 6 — ICE-style path negotiation Before Phase 6, each side's dual-path race ran independently and committed to whichever transport completed first. When one side picked Direct and the other picked Relay, they sent media to different places — TX > 0 RX: 0 on both, completely silent call. Phase 6 adds a negotiation step: after the local race completes, each side sends a MediaPathReport { call_id, direct_ok, winner } to the peer through the relay. Both wait for the other's report before committing a transport to the CallEngine. The decision rule is simple: if BOTH report direct_ok = true, use direct; if EITHER reports false, BOTH use relay. ## Wire protocol New `SignalMessage::MediaPathReport { call_id, direct_ok, race_winner }`. The relay forwards it to the call peer via the same signal_hub routing used for DirectCallOffer/Answer. The cross-relay dispatcher also forwards it. ## dual_path::race restructured Returns `RaceResult` instead of `(Arc<QuinnTransport>, WinningPath)`: - `direct_transport: Option<Arc<QuinnTransport>>` - `relay_transport: Option<Arc<QuinnTransport>>` - `local_winner: WinningPath` Both paths are run as spawned tasks. After the first completes, a 1s grace period lets the loser also finish. The connect command gets BOTH transports (when available) and picks the right one based on the negotiation outcome. The unused transport is dropped. ## connect command flow (revised) 1. Run race() → RaceResult with both transports 2. Send MediaPathReport to relay with our direct_ok 3. Install oneshot; wait for peer's report (3s timeout) 4. Decision: both direct_ok → use direct; else → use relay 5. Start CallEngine with the agreed transport If the peer never responds (old build, timeout), falls back to relay — backward compatible. ## Relay forwarding MediaPathReport is forwarded like DirectCallOffer/Answer: via signal_hub.send_to(peer_fp) for same-relay calls, and via cross-relay dispatcher for federated calls. ## Debug log events - `connect:dual_path_race_done` — local race result - `connect:path_report_sent` — our report to the peer - `connect:peer_report_received` — peer's report - `connect:peer_report_timeout` — peer didn't respond (3s) - `connect:path_negotiated` — final agreed path with reasons Full workspace test: 423 passing (no regressions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:03:42 +04:00
Siavash Sameni	026940d492	fix(federation): diagnostic logging for cross-relay media routing Added warn-level log in handle_datagram when a federation datagram arrives but no matching local room is found. Prints: - room_hash (8-byte tag from the datagram) - active_rooms (all rooms the relay currently has) - seq + peer label This diagnoses the cross-relay recv_fr=0 issue: if media IS arriving from the peer relay but the room hash doesn't match any active room, the log tells us exactly what hash is expected vs what rooms exist locally. If no datagram log fires at all, the issue is upstream (peer relay not forwarding, federation link down, etc.). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:27:34 +04:00
Siavash Sameni	6cd61fc63b	feat(federation): Phase 4.1 — call-* rooms are implicitly global All rooms with names starting with 'call-' are now treated as global rooms by the federation pipeline. This enables relay- mediated media fallback for cross-relay direct calls: when Alice on Relay A and Bob on Relay B both join the same call-<id> room, the federation media forwarding pipeline (GlobalRoomActive announcements + datagram forwarding + presence replication) kicks in automatically without any runtime registration step. Previously, cross-relay direct calls that couldn't go P2P (symmetric NAT on either side) failed with "no media path" because the call-<id> room wasn't in the configured global_rooms set and media datagrams weren't forwarded across the federation link. The relay's existing ACL for call-* rooms (only the two authorized fingerprints from the call registry can join) prevents random clients from creating or eavesdropping on call rooms. ## Changes ### `is_global_room` (federation.rs) Added `room.starts_with("call-")` check before the static global_rooms set lookup. Returns true immediately for any call-prefixed room. ### `resolve_global_room` (federation.rs) Return type changed from `Option<&str>` to `Option<String>` (owned) because call-* room names aren't stored on `self` — they come from the caller and resolve to themselves as the canonical name. The 13 callers continue to work via String/&str auto-deref; 4 HashMap lookups needed explicit `.as_str()` or `&` borrows. Full workspace test: 423 passing (no regressions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 08:55:01 +04:00
Siavash Sameni	16793be36f	fix(p2p): Phase 5.6 — direct-path head start + hangup propagation + media debug events Three fixes from a field-test log where same-LAN calls were still losing the dual-path race to the relay path, peers were getting stuck on an empty call screen when the other side hung up, and 1-way audio was hard to diagnose because the GUI debug log had no media-level events. ## 1. Direct-path 500ms head start (dual_path.rs) The race was resolving in ~105ms with Relay winning even when both phones were on the same MikroTik LAN with valid IPv6 host candidates. Root cause: the relay dial is a plain outbound QUIC connect that completes in whatever the client→relay RTT is (~100ms), while the direct path needs the PEER to also process its CallSetup, spin up its own race, and complete at least one LAN dial back to us. That cross-client sequence reliably takes longer than 100ms, so relay always won. Fix: delay the relay_fut with `tokio::time::sleep(500ms)` before starting its connect. Same-LAN direct dials complete in 30-50ms typically, so the head start gives direct plenty of time to win cleanly. Users on setups where direct genuinely can't work (LTE-to-LTE cross-carrier) pay 500ms extra on the relay fallback, which is invisible for a call setup. ## 2. Hangup propagation via a new hangup_call command (lib.rs + main.ts) The hangup button was calling `disconnect` which stopped the local media engine but never sent a SignalMessage::Hangup to the relay. The peer never got notified and was stuck on the call screen with silent audio. My earlier fix (commit `e75b045`) only handled the RECEIVE side — auto-dismiss call screen on recv:Hangup — but the SEND side was still missing. New Tauri command `hangup_call`: 1. Acquire state.signal.lock(), send SignalMessage::Hangup over the signal transport (best-effort; log + continue if signal is down) 2. Acquire state.engine.lock(), stop the CallEngine JS hangupBtn click handler now calls hangup_call with a fallback to raw disconnect if the command is missing (older builds). ## 3. Media debug events (engine.rs + lib.rs) Threaded tauri::AppHandle into CallEngine::start so the send/ recv tasks can emit call-debug events when the user has debug logs enabled. Added on the Android branch (desktop branch accepts the arg for API symmetry but doesn't emit yet): - media:first_send — emitted when the first encoded frame is handed to the transport. Useful for 1-way audio diagnosis: if this fires on side A but side B never sees media:first_recv, A's outbound is broken. - media:first_recv — emitted when the first packet from the peer arrives. Mirror of first_send. - media:send_heartbeat — every 2s with frames_sent, last_rms, last_pkt_bytes, short_reads, drops. A stalled last_rms (== 0) tells you the mic isn't producing samples; a frozen frames_sent tells you the encode pipeline hung. - media:recv_heartbeat — every 2s with recv_fr, decoded_frames, last_written, written_samples, decode_errs, codec. Mirror invariants for the inbound direction. All four are gated by `call_debug_logs_enabled()` via `emit_call_debug`, so they only show up in the GUI log when the user has the Call Flow Debug Logs checkbox on. Tracing::info! still runs unconditionally so logcat (adb) keeps its copy regardless. The `emit_call_debug` fn in lib.rs is now `pub(crate)` so engine.rs can call it via `crate::emit_call_debug`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 07:55:41 +04:00
Siavash Sameni	fa038df057	feat(p2p): Phase 5.5 — ICE LAN host candidates (IPv4 + IPv6) Same-LAN P2P was failing because MikroTik masquerade (like most consumer NATs) doesn't support NAT hairpinning — the advertised WAN reflex addr is unreachable from a peer on the same LAN as the advertiser. Phase 5 got us Cone NAT classification and fixed the measurement artifact, but same-LAN direct dials still had nowhere to land. Phase 5.5 adds ICE-style host candidates: each client enumerates its LAN-local network interface addresses, includes them in the DirectCallOffer/Answer alongside the reflex addr, and the dual-path race fans out to ALL peer candidates in parallel. Same-LAN peers find each other via their RFC1918 IPv4 + ULA / global-unicast IPv6 addresses without touching the NAT at all. Dual-stack IPv6 is in scope from the start — on modern ISPs (including Starlink) the v6 path often works even when v4 hairpinning doesn't, because there's no NAT on the v6 side. ## Changes ### `wzp_client::reflect::local_host_candidates(port)` (new) Enumerates network interfaces via `if-addrs` and returns SocketAddrs paired with the caller's port. Filters: - IPv4: RFC1918 (10/8, 172.16/12, 192.168/16) + CGNAT (100.64/10) - IPv6: global unicast (2000::/3) + ULA (fc00::/7) - Skipped: loopback, link-local (169.254, fe80::), public v4 (already covered by reflex-addr), unspecified Safe from any thread, one `getifaddrs(3)` syscall. ### Wire protocol (wzp-proto/packet.rs) Three new `#[serde(default, skip_serializing_if = "Vec::is_empty")]` fields, backward-compat with pre-5.5 clients/relays by construction: - `DirectCallOffer.caller_local_addrs: Vec<String>` - `DirectCallAnswer.callee_local_addrs: Vec<String>` - `CallSetup.peer_local_addrs: Vec<String>` ### Call registry (wzp-relay/call_registry.rs) `DirectCall` gains `caller_local_addrs` + `callee_local_addrs` Vec<String> fields. New `set_caller_local_addrs` / `set_callee_local_addrs` setters. Follow the same pattern as the reflex addr fields. ### Relay cross-wiring (wzp-relay/main.rs) Both the local-call and cross-relay-federation paths now track the local_addrs through the registry and inject them into the CallSetup's peer_local_addrs. Cross-wiring is identical to the existing peer_direct_addr logic — each party's CallSetup carries the OTHER party's LAN candidates. ### Client side (desktop/src-tauri/lib.rs) - `place_call`: gathers local host candidates via `local_host_candidates(signal_endpoint.local_addr().port())` and includes them in `DirectCallOffer.caller_local_addrs`. The port match is critical — it's the Phase 5 shared signal socket, so incoming dials to these addrs land on the same endpoint that's already listening. - `answer_call`: same, AcceptTrusted only (privacy mode keeps LAN addrs hidden too, for consistency with the reflex addr). - `connect` Tauri command: new `peer_local_addrs: Vec<String>` arg. Builds a `PeerCandidates` bundle and passes it to the dual-path race. - Recv loop's CallSetup handler: destructures + forwards the new field to JS via the signal-event payload. ### `dual_path::race` (wzp-client/dual_path.rs) Signature change: takes `PeerCandidates` (reflex + local Vec) instead of a single SocketAddr. The D-role branch now fans out N parallel dials via `tokio::task::JoinSet` — one per candidate — and the first successful dial wins (losers are aborted immediately via `set.abort_all()`). Only when ALL candidates have failed do we return Err; individual candidate failures are just traced at debug level and the race waits for the others. LAN host candidates are tried BEFORE the reflex addr in `PeerCandidates::dial_order()` — they're faster when they work, and the reflex addr is the fallback for the not-on-same-LAN case. ### JS side (desktop/main.ts) `connect` invoke now passes `peerLocalAddrs: data.peer_local_addrs ?? []` alongside the existing `peerDirectAddr`. ### Tests All existing test callsites updated for the new Vec<String> fields (defaults to Vec::new() in tests — they don't exercise the multi-candidate path). `dual_path.rs` integration tests wrap the single `dead_peer` / `acceptor_listen_addr` in a `PeerCandidates { reflexive: Some(_), local: Vec::new() }`. Full workspace test: 423 passing (same as before 5.5). ## Expected behavior on the reporter's setup Two phones behind MikroTik, both on the same LAN: place_call:host_candidates {"local_addrs": ["192.168.88.21:XXX", "2001:...:YY:XXX"]} recv:DirectCallAnswer {"callee_local_addrs": ["192.168.88.22:ZZZ", "2001:...:WW:ZZZ"]} recv:CallSetup {"peer_direct_addr":"150.228.49.65:NN", "peer_local_addrs":["192.168.88.22:ZZZ","2001:...:WW:ZZZ"]} connect:dual_path_race_start {"peer_reflex":"...","peer_local":[...]} dual_path: direct dial succeeded on candidate 0 ← LAN v4 wins connect:dual_path_race_won {"path":"Direct"} Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 07:34:49 +04:00
Siavash Sameni	1618ff6c9d	feat(p2p): Phase 5 — single-socket architecture (Nebula-style) Before Phase 5 WarzonePhone used THREE separate UDP sockets per client: 1. Signal endpoint (register_signal, client-only) 2. Reflect probe endpoints (one fresh socket per relay probe) 3. Dual-path race endpoint (fresh per call setup) This broke two things in production on port-preserving NATs (MikroTik masquerade, most consumer routers): a. Phase 2 NAT detection was WRONG. Each probe used a fresh internal port, so MikroTik mapped each one to a different external port, and the classifier saw "different port per relay" and labeled it SymmetricPort. The real NAT was cone-like but measurement via fresh sockets hid that. b. Phase 3.5 dual-path P2P race was BROKEN. The reflex addr we advertised in DirectCallOffer was observed by the signal endpoint's socket. The actual dual-path race listened on a DIFFERENT fresh socket, on a different internal (and therefore external) port. Peers dialed the advertised addr and hit MikroTik's mapping for the signal socket, which forwarded to the signal endpoint — a client-only endpoint that doesn't accept incoming connections. Direct path silently failed, relay always won the race. Nebula-style fix: one socket for everything. The signal endpoint is now dual-purpose (client + server_config), and both the reflect probes and the dual-path race reuse it instead of creating fresh ones. MikroTik's port-preservation then gives us a stable external port across all flows → classifier correctly sees Cone NAT → advertised reflex addr is the actual listening port → direct dials from peers land on the right socket → `endpoint.accept()` in the A-role branch of the dual-path race picks up the incoming connection. ## Changes ### `register_signal` (desktop/src-tauri/src/lib.rs) - Endpoint now created with `Some(server_config())` instead of `None`. The socket can now accept incoming QUIC connections as well as dial outbound. - Every code path that previously read `sig.endpoint` for the relay-dial reuse benefits automatically — same socket is now ALSO listening for peer dials. ### `probe_reflect_addr` (wzp-client/src/reflect.rs) - New `existing_endpoint: Option<Endpoint>` arg. `Some` reuses the caller's socket (production: pass the signal endpoint). `None` creates a fresh one (tests + pre-registration). - Removed the `drop(endpoint)` at the end — was correct for fresh endpoints (explicit early socket close) but incorrect for shared ones. End-of-scope drop does the right thing in both cases via Arc semantics. ### `detect_nat_type` (wzp-client/src/reflect.rs) - New `shared_endpoint: Option<Endpoint>` arg, forwarded to every probe in the JoinSet fan-out. One shared socket means the classifier sees the true NAT type. ### `detect_nat_type` Tauri command (desktop/src-tauri/src/lib.rs) - Reads `state.signal.endpoint` and passes it as the shared endpoint. Falls back to None when not registered. NAT detection now produces accurate classifications against MikroTik / most consumer NATs. ### `dual_path::race` (wzp-client/src/dual_path.rs) - New `shared_endpoint: Option<Endpoint>` arg. - A-role: when `Some`, reuses it for `accept()`. This is the critical change — the reflex addr advertised to peers is now the address listening for incoming direct dials. - D-role: when `Some`, reuses it for the outbound direct dial. MikroTik keeps the same external port for the dial as for the signal flow → direct dial through a cone-mapped NAT. - Relay path: also reuses the shared endpoint so MikroTik has a single consistent mapping across the whole call (saves one extra external port and makes firewall traces cleaner). - When `None`, falls back to fresh per-role endpoints as before. ### `connect` Tauri command (desktop/src-tauri/src/lib.rs) - Reads `state.signal.endpoint` once when acquiring own reflex addr and passes it through to `dual_path::race`. ### Tests - `wzp-client/tests/dual_path.rs` and `wzp-relay/tests/multi_reflect.rs` updated to pass `None` for the new endpoint arg — tests use fresh sockets and that's fine because the loopback harness doesn't care about port-preserving NAT behavior. Full workspace test: 423 passing (no regressions). ## Expected behavior after this commit on real hardware Behind MikroTik + Starlink-bypass (the reporter's setup): - Phase 2 NAT detect → Cone NAT (was SymmetricPort — false positive from the measurement artifact) - Phase 3.5 direct-P2P dial → succeeds for both cone-cone and cone-CGNAT cases where the remote side was previously blocked by our own socket mismatch - LTE ↔ LTE cross-carrier → still likely relay fallback; that's genuinely strict symmetric and needs Phase 5.5 port prediction. ## Phase 5.5 (next, separate PRD) Multi-candidate port prediction + ICE-style candidate aggregation for truly strict symmetric NATs. Not needed for the 95% case — Phase 5 alone fixes most consumer-router setups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 19:47:20 +04:00
Siavash Sameni	00deb97a5d	fix(reflect): drop LAN/private reflex addrs from NAT classification Real-world report: a user with one LAN relay + one internet relay got "Multiple IPs — treating as symmetric" because the LAN relay saw the client's LAN IP (172.16.81.172) while the internet relay saw the WAN IP (150.228.49.65). Two observations of "different public IPs" from the classifier's perspective, but semantically they describe two different network paths and shouldn't be compared. The LAN relay's reflection is always true, just not useful for public NAT classification: there's no NAT between the client and the LAN relay, so that path's reflex addr is always the LAN interface IP regardless of what the public-facing NAT beyond it looks like. Fix: new `is_private_or_loopback` helper filters the probe set before classification. Drops: - 127.0.0.0/8 loopback - 10/8, 172.16/12, 192.168/16 RFC1918 private - 169.254/16 link-local - 100.64/10 CGNAT shared-transition (same reasoning: a relay that sees the client with a CGNAT addr is on the same carrier network and can't describe public NAT state) - IPv6 loopback, unspecified, fe80::/10 link-local Failed probes still filtered out of classification (they were already) but now dimmed in the UI list instead of highlighted amber. Same rationale: a momentarily-offline probe target isn't a warning-worthy state, it's just a fact about the probe run. UI palette rebalance: only Cone gets green, everything else neutral text-dim. Wording changed from warning-tone "⚠ must use relay" to informational "ℹ P2P falls back to relay, calls still work" — symmetric NAT isn't broken state, it just means media takes the relay path. Tests added (4 new in wzp_client::reflect): - classify_drops_private_ip_probes — LAN + public → Unknown - classify_drops_loopback_probes — loopback + 2 public → Cone - classify_drops_cgnat_probes — CGNAT + 2 public same-IP- diff-port → SymmetricPort - classify_two_lan_probes_is_unknown_not_cone — all LAN → Unknown Existing multi_reflect integration test updated: two loopback relays now correctly classify as Unknown (because loopback reflex addrs are filtered) with the plumbing-works invariant preserved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 18:29:09 +04:00
Siavash Sameni	da08723fe7	fix(signal): forward-compat — log+continue on unknown SignalMessage variants Both sides of the signal channel previously broke their recv loop on any deserialize error, which meant adding a new variant in one build silently killed signal connections from peers running an older build. This bit us during Phase 1 testing: a new client sending SignalMessage::Reflect to a pre-Phase-1 relay caused the relay to drop the whole signal connection, which looked like "Error: not registered" on the next place_call. Fix: - New TransportError::Deserialize(String) variant in wzp-proto carries serde errors as a distinct category. - wzp-transport/reliable.rs::recv_signal returns Deserialize on serde_json::from_slice failures (was wrapped in Internal). - wzp-relay/main.rs signal loop matches on Deserialize → warn + continue (instead of break). - desktop/src-tauri/lib.rs recv loop does the same. Other TransportError variants (ConnectionLost, Io, Internal) still break the loop — only pure parse failures are recoverable. This means future SignalMessage variant additions are backward- compat by construction: older peers will see "unknown variant, continuing" in their logs while newer peers can keep evolving the protocol. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 18:13:31 +04:00
Siavash Sameni	8cdf8d486a	feat(p2p): Phase 4 cross-relay direct calling over federation Teaches the relay pair to route direct-call signaling across an existing federation link. Alice on Relay A can now place a direct call to Bob on Relay B if A and B are federation peers — the wire protocol, call registry, and signal dispatch all learn to track and route the cross-relay flow. Phase 3.5's dual-path QUIC race then carries the media directly peer-to-peer using the advertised reflex addrs, with zero changes needed on the client side. ## Wire protocol (wzp-proto) New `SignalMessage::FederatedSignalForward { inner, origin_relay_fp }` envelope variant, appended at end of enum — JSON serde is name-tagged so pre-Phase-4 relays just log "unknown variant" and drop it. 2 new roundtrip tests (any-inner nesting + single DirectCallOffer case). ## Call registry (wzp-relay) `DirectCall.peer_relay_fp: Option<String>` — federation TLS fp of the peer relay that forwarded the offer/answer for this call. `None` on local calls, `Some` on cross-relay. Used by the answer path to route the reply back through the same federation link instead of trying (and failing) to deliver via local signal_hub. New `set_peer_relay_fp` setter + 1 new unit test. ## FederationManager (wzp-relay) Three new methods: - `local_tls_fp()` — exposes the relay's own federation TLS fp so main.rs can build `origin_relay_fp` fields. - `broadcast_signal(msg) -> usize` — fan out any signal message (in practice `FederatedSignalForward`) to every active peer link, returning the reach count. Used when Relay A doesn't know which peer has the target fingerprint. - `send_signal_to_peer(fp, msg)` — targeted send for the reply path where the registry already knows which peer relay to hit. Plus a new `cross_relay_signal_tx: Mutex<Option<Sender<...>>>` field that `set_cross_relay_tx()` wires at startup so the federation `handle_signal` can push unwrapped inner messages into the main signal dispatcher. ## Federation handle_signal (wzp-relay) New match arm for `FederatedSignalForward`: - Loop prevention: drops forwards whose `origin_relay_fp` equals this relay's own fp (prevents A→B→A echo loops without needing TTL yet). - Otherwise pulls the inner message out and pushes it through `cross_relay_signal_tx` so the main loop's dispatcher task handles it as if it had arrived locally. ## Main signal loop (wzp-relay) ### DirectCallOffer when target not local Before falling through to Hangup, try the federation path: - Wrap the offer in `FederatedSignalForward` with `origin_relay_fp = this relay's tls_fp` - `fm.broadcast_signal(forward)` — returns peer count - If any peers reached, stash the call in local registry with `caller_reflexive_addr` set, `peer_relay_fp` still None (broadcast — the answer-side will identify itself when it replies) - Send `CallRinging` to caller immediately for UX feedback - Only if no federation or no peers → legacy Hangup path ### DirectCallAnswer when peer is remote - Registry lookup now reads both `peer_fingerprint` and `peer_relay_fp` in one acquisition - If `peer_relay_fp.is_some()`: * Reject → forward a `Hangup` over federation via `send_signal_to_peer` instead of local signal_hub * Accept → wrap the raw answer in `FederatedSignalForward`, route to the specific origin peer, then emit the LOCAL CallSetup to our callee with `peer_direct_addr = caller_reflexive_addr` (caller is remote; this side only has the callee) - If `peer_relay_fp.is_none()` → existing Phase 3 same-relay path with both CallSetups (caller + callee) ### Cross-relay signal dispatcher task New long-running task reading `(inner, origin_relay_fp)` from `cross_relay_rx`. In Phase 4 MVP handles: - `DirectCallOffer` — if target is local, create the call in the registry with `peer_relay_fp = origin_relay_fp`, stash caller addr, deliver offer to local callee. If target isn't local, drop (no multi-hop in Phase 4 MVP). - `DirectCallAnswer` — look up local caller by call_id, stash callee addr, forward raw answer to local caller via signal_hub, emit local CallSetup with `peer_direct_addr = callee_reflexive_addr` (peer is local now; this side only has the caller). - `CallRinging` — best-effort forward to local caller for UX. - `Hangup` — logged for now; Phase 4.1 will target by call_id. ## Integration tests `crates/wzp-relay/tests/cross_relay_direct_call.rs` — 3 tests that reproduce the main.rs cross-relay dispatcher logic inline and assert the invariants without spinning up real binaries: 1. `cross_relay_offer_forwards_and_stashes_peer_relay_fp` — Relay A gets Alice's offer, broadcasts. Relay B's dispatcher creates the call with `peer_relay_fp = relay_a_tls_fp`. 2. `cross_relay_answer_crosswires_peer_direct_addrs` — full round trip; both CallSetups (one on each relay) carry the OTHER party's reflex addr. 3. `cross_relay_loop_prevention_drops_self_sourced_forward` — explicit loop-prevention check. Full workspace test goes from 413 → 419 passing. Clippy clean on touched files. ## Non-goals (deferred to Phase 4.1+) - Relay-mediated media fallback across federation — if P2P direct fails (symmetric NAT on either side), the call errors out with "no media path". Making the existing federation media pipeline carry ephemeral call-<id> rooms is the Phase 4.1 lift. - Multi-hop federation (A → B → C). Phase 4 MVP supports a direct federation link between A and B only. - Fingerprint → peer-relay routing gossip. PRD: .taskmaster/docs/prd_phase4_cross_relay_p2p.txt Tasks: 70-78 all completed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 17:31:43 +04:00
Siavash Sameni	59ce52f8e8	feat(p2p): Phase 3.5 dual-path QUIC race + GUI call-flow debug logs Two features in one commit because they ship and test together: Phase 3.5 closes the hole-punching loop and the call-flow debug logs give the user live visibility into every step of a call so real-hardware testing of the new P2P path is debuggable. ## Phase 3.5 — dual-path QUIC connect race Completes the hole-punching work Phase 3 scaffolded. On receiving a CallSetup with peer_direct_addr, the client now actually races a direct QUIC handshake against the relay dial and uses whichever completes first. Symmetric role assignment avoids the two-conns- per-call problem: - Both peers compare `own_reflex_addr` vs `peer_reflex_addr` lexicographically. - Smaller addr → Acceptor (A-role): builds a server-capable dual endpoint, awaits an incoming QUIC session. Does NOT dial. - Larger addr → Dialer (D-role): builds a client-only endpoint, dials the peer's addr with `call-<id>` SNI. Does NOT listen. - Both sides always dial the relay in parallel as fallback. - `tokio::select!` with `biased` preference for direct, `tokio::pin!` so each branch can await the losing opposite as fallback. - Direct timeout 2s, relay fallback timeout 5s (so 7s worst case from CallSetup to "no media path" error). New crate module `wzp_client::dual_path::{race, WinningPath}` (moved here from desktop/src-tauri so it's testable from a workspace test). `determine_role` in `wzp_client::reflect` is pure-function and unit-tested. ### CallEngine integration - New `pre_connected_transport: Option<Arc<QuinnTransport>>` arg on both android + desktop `CallEngine::start` branches. Skips the internal wzp_transport::connect step when Some. Backward- compat: None keeps Phase 0 relay-only behavior. - `connect` Tauri command reads own_reflex_addr from SignalState, computes role, runs the race, passes the winning transport into CallEngine. If ANY input is missing (no peer addr, no own addr, equal addrs), falls back to classic relay path — identical to pre-Phase-3.5 behavior. ### Tests (9 new, all passing) - 6 unit tests for `determine_role` truth table in `wzp-client/src/reflect.rs` (smaller=Acceptor, larger=Dialer, port-only diff, equal, missing-side, symmetry) - 3 integration tests in `crates/wzp-client/tests/dual_path.rs`: * `dual_path_direct_wins_on_loopback` — two-endpoint test rig, Dialer wins direct path vs loopback mock relay * `dual_path_relay_wins_when_direct_is_dead` — dead peer port, 2s direct timeout, relay fallback wins * `dual_path_errors_cleanly_when_both_paths_dead` — <10s error, no hang ## GUI call-flow debug logs Runtime-toggled structured events at every step of a call so the user can see where a call progressed or stalled on real hardware. Modeled on the existing DRED_VERBOSE_LOGS pattern. ### Rust side - `static CALL_DEBUG_LOGS: AtomicBool` + `emit_call_debug(&app, step, details)` helper. Always logs via `tracing::info!` (logcat always has a copy); GUI Tauri `call-debug-log` event only fires when the flag is on. - Tauri commands `set_call_debug_logs` / `get_call_debug_logs`. ### Instrumented steps (24 emit_call_debug sites) - `register_signal`: start, identity loaded, endpoint created, connect failed/ok, RegisterPresence sent, ack received/failed, recv loop spawning - Recv loop: CallRinging, DirectCallOffer (w/ caller_reflexive_addr), DirectCallAnswer (w/ callee_reflexive_addr), CallSetup (w/ peer_direct_addr), Hangup - `place_call`: start, reflect query start/ok/none, offer sent, send failed - `answer_call`: start, reflect query start/ok/none or privacy skip, answer sent, send failed - `connect`: start, dual_path_race_start (w/ role), won (w/ path), failed, skipped (w/ reasons), call_engine_starting/ started/failed ### JS side - New `callDebugLogs: boolean` field on Settings type. - Boot-time hydrate of the Rust flag from localStorage so the choice survives restarts (like `dredDebugLogs`). - Settings panel: new "Call flow debug logs" checkbox alongside the DRED toggle. - New "Call Debug Log" section that ONLY shows when the flag is on. Rolling in-memory buffer of the last 200 events, rendered as monospace `HH:MM:SS.mmm step {details}` lines with auto- scroll and a Clear button. - `listen("call-debug-log", ...)` subscribed at app startup, appends to the buffer, re-renders on every event. Full workspace test goes from 404 → 413 passing. Clippy clean on touched crates. PRD: .taskmaster/docs/prd_phase35_dual_path_race.txt Tasks: 61-69 all completed Next: APK + desktop build carrying everything — Phase 2 NAT detect, Phase 3 advertising, Phase 3.5 dual-path + call debug logs, plus the earlier Android first-join diagnostics — so the user can validate the P2P path on real hardware with live per-step visibility into where any failures happen. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 14:06:44 +04:00

1 2 3 4 5

234 Commits