148 Commits

Author SHA1 Message Date
Siavash Sameni
1120c7b579 feat(signal): PresenceList broadcast for lobby user discovery
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 7m21s
Mirror to GitHub / mirror (push) Failing after 27s
New signal infrastructure for the lobby-first UI:

- PresenceUser struct: { fingerprint, alias }
- SignalMessage::PresenceList: relay broadcasts full user list
  to all signal clients on every register/deregister
- SignalHub::presence_list(): builds the list from connected clients
- SignalHub::broadcast(): sends to ALL signal clients
- Relay calls broadcast on register + unregister
- Desktop emits "presence_list" signal-event to JS frontend

This gives clients real-time visibility of who's online via the
signal channel, without needing to join a voice room first.

603 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:12:47 +04:00
Siavash Sameni
bb23976076 feat(quality): upgrade negotiation + asymmetric quality signals (#28, #29, #30)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
New SignalMessage variants for P2P quality coordination:

UpgradeProposal/UpgradeResponse/UpgradeConfirm (#28):
- Consensual quality upgrade flow — proposer sends desired profile,
  peer accepts/rejects based on own conditions, confirm commits both
- All carry call_id for relay routing

QualityCapability (#30):
- Peer reports its max sustainable profile — enables asymmetric
  encoding where each side uses its own best quality instead of
  forcing everyone to the weakest link

Relay forwards all 4 signals to the call peer (same pattern as
MediaPathReport, CandidateUpdate, HardNatProbe).

Desktop signal recv loop handles all 4 with debug logging.
Encoder switching TODOs noted for wiring into CallEngine.

4 new serde roundtrip tests. 603 total, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:25:34 +04:00
Siavash Sameni
488efcb614 feat(ui): birthday attack toggle in settings (default off)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 22s
Build Release Binaries / build-amd64 (push) Failing after 3m36s
New setting: "Birthday attack (opens extra ports for hard NAT)"
- Default: OFF — no extra latency on call setup
- When ON: waits up to 3s for peer's birthday ports if peer has
  non-cone NAT, adds them to the dial race

Gated end-to-end: Settings → localStorage → JS invoke →
Rust connect param → birthday wait + target injection.
LAN/cone calls unaffected regardless of setting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:54:22 +04:00
Siavash Sameni
8c360186df feat(nat): wire birthday attack end-to-end into connect flow
Some checks failed
Mirror to GitHub / mirror (push) Failing after 32s
Build Release Binaries / build-amd64 (push) Failing after 3m19s
Complete Dialer-side birthday attack integration:

- SignalState stores peer_birthday_ports from HardNatBirthdayStart
- connect command: if peer's HardNatProbe shows non-cone NAT, waits
  up to 3s for birthday ports to arrive (Acceptor needs time to open
  32 sockets + STUN-probe each)
- When birthday ports arrive, generate_dialer_targets() builds hit
  list (known ports + random fill) and adds them to PeerCandidates
- All birthday targets go into the dual-path race as extra candidates
- LAN/cone calls skip the wait entirely (gated on allocation type)

Full waterfall now:
1. Standard candidates (reflexive + mapped)     → immediate
2. Port prediction (sequential delta)           → immediate
3. Birthday targets (if non-cone peer)          → +3s wait
4. All of above raced in parallel via JoinSet
5. Relay runs concurrently with 500ms head-start

599 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:50:11 +04:00
Siavash Sameni
f06f9073ae feat(nat): birthday attack module + HardNatBirthdayStart signal (#86, #87)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 25s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
Birthday attack for random symmetric NATs:
- birthday.rs: open_acceptor_ports() opens N sockets, STUN-probes
  each to learn external ports. generate_dialer_targets() builds
  hit list (known ports first, then random fill). spray_dialer()
  sprays QUIC connects with rate limiting, first success wins.
- Default: 32 acceptor ports, 128 dialer probes, 20ms interval

Signal coordination:
- HardNatBirthdayStart { acceptor_ports, external_ip } sent by
  Acceptor when peer's HardNatProbe shows random/sequential NAT
- Relay forwards it like other call signals
- Desktop recv loop handles and logs it

Hybrid waterfall integration:
- On receiving HardNatProbe with non-cone allocation, Acceptor
  auto-opens birthday ports and sends BirthdayStart
- Sockets kept alive 10s for NAT mapping persistence
- Dialer spray integration into race() pending (needs transport
  hot-swap for background upgrade)

6 new tests, 599 total, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:44:36 +04:00
Siavash Sameni
6c49d7436f feat(ui): direct-only mode setting (no relay fallback)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m38s
New toggle in Settings → "Direct-only mode (no relay fallback)":
- Default: OFF (normal behavior, relay fallback on P2P failure)
- When ON: connect returns error if P2P fails, with full
  candidate_diags in the debug log showing why each candidate
  failed. Call never falls back to relay.

Useful for testing NAT traversal — you see the exact failure
reason instead of the call silently working through relay.

Wired end-to-end:
- Settings.directOnly persisted in localStorage
- Passed as directOnly param to Rust connect command
- connect:path_negotiated shows direct_only flag
- connect:direct_only_failed emits on failure with diags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:04:45 +04:00
Siavash Sameni
1de280fe04 fix(nat): working NAT tickle + smart filter debug + timeout diags
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m39s
Fixes from real-world 5G↔Starlink testing:

NAT tickle fix:
- tokio::net::UdpSocket::bind() doesn't set SO_REUSEADDR, so binding
  to the same port as quinn silently failed. Now uses socket2::Socket
  with explicit SO_REUSEADDR + SO_REUSEPORT (via libc on unix).
- Tickle now logs success/failure for debugging.

Diagnostic fixes:
- connect:dual_path_race_start shows both dial_order_raw and
  dial_order_smart so we can see what filtering removed
- Grace-period timeout (relay wins first, direct still running)
  now fills "timeout:grace" diags for unrecorded candidates
- Previously candidate_diags was empty when relay won the race

Dependencies:
- Added socket2 = "0.5" to wzp-client

593 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:58:13 +04:00
Siavash Sameni
bc6d327ebb feat(nat): smart candidate filtering + acceptor NAT tickle + 4s timeout
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
Major P2P improvements for cross-network calls:

Smart candidate filtering (smart_dial_order):
- Strip LAN candidates when peer's public IP differs from ours
  (172.16.x.x is unreachable from a different network)
- Strip all IPv6 candidates (Phase 7 disabled, wastes dial slots)
- Only keep mapped + reflexive for cross-network calls
- LAN candidates preserved when both peers share the same public IP

Acceptor NAT tickle:
- A-role sends a 1-byte UDP packet to each peer candidate BEFORE
  accepting. This opens the NAT pinhole for return traffic from
  the Dialer's IP — critical for address-restricted NATs that only
  allow inbound from IPs they've seen outbound traffic to.
- Uses SO_REUSEADDR on the same port as the quinn endpoint.

Direct timeout increased from 2s to 4s:
- Cross-network QUIC handshakes through CGNAT can take 2-3s
- 2s was too aggressive for 5G/LTE networks

Diagnostic fix:
- Record "timeout:4s" for candidates still in-flight when the
  timeout fires (previously these had no diagnostic entry)

5 new tests for smart_dial_order edge cases.
593 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:42:02 +04:00
Siavash Sameni
c478224d67 fix(ui): remove buffer clear that wiped connect events
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m35s
The callDebugBuffer.length=0 in showCallScreen() ran AFTER the
connect command returned, wiping all connect: events (path_negotiated,
race_start, race_done, candidate_diags). Only media: events survived
because they arrived after the clear.

Removed all automatic buffer clearing. The reverse().find() already
handles stale data by picking the most recent event. The manual
"Clear log" button (line 624) is the only way to clear now.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:25:13 +04:00
Siavash Sameni
16dcc75514 fix(ui): move buffer clear from call-end to call-start
Some checks failed
Mirror to GitHub / mirror (push) Failing after 25s
Build Release Binaries / build-amd64 (push) Failing after 3m42s
Clearing callDebugBuffer in showConnectScreen() wiped all debug
events the moment a call ended, so the user saw empty logs. Moved
the clear to showCallScreen() instead — the buffer is reset at the
START of a new call, not the end. This way:

- After hanging up, all events from the call are still visible
- Starting a new call clears stale data from the previous one
- The reverse().find() for P2P badge still gets fresh data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:17:16 +04:00
Siavash Sameni
db5751985e fix(ui): replace findLast with reverse().find() for WebView compat
Some checks failed
Mirror to GitHub / mirror (push) Failing after 26s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
findLast() requires Chrome 97+ / Android WebView 97+. Older Android
devices crash with TypeError in pollStatus(), killing all status
updates including the debug log. Use [...arr].reverse().find() which
works everywhere.

Also pass peerMappedAddr in the direct-call connect invoke.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:06:07 +04:00
Siavash Sameni
c0dd6c06ff feat(debug): per-candidate dial diagnostics in dual-path race
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m24s
Added CandidateDiag struct to RaceResult with per-candidate:
- address attempted
- result (ok / skipped:ipv6 / error:reason)
- elapsed time in ms

Surfaced in call-debug events:
- connect:dual_path_race_start now includes dial_order + peer_mapped
- connect:dual_path_race_done now includes candidate_diags array

Upgraded dual_path tracing from debug to info for IPv6 skips and
dial failures so they appear in logcat/console.

Helps diagnose why P2P fails on specific networks (5G CGNAT,
address-restricted NATs, etc).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:16:34 +04:00
Siavash Sameni
6805caae0e fix(ui): P2P badge showing stale status from previous call
Some checks failed
Mirror to GitHub / mirror (push) Failing after 26s
Build Release Binaries / build-amd64 (push) Failing after 3m47s
The callDebugBuffer persisted across calls, so .find() returned the
path_negotiated event from Call 1 (P2P Direct) when rendering the
badge during Call 2 (Relay). Two fixes:

1. Clear callDebugBuffer in showConnectScreen() between calls
2. Use .findLast() instead of .find() so the most recent event wins

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:02:06 +04:00
Siavash Sameni
5a03da72d3 feat(ui): selectable NAT detection mode + netcheck Tauri command
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
detect_nat_type now accepts optional `mode` parameter:
- "relay" — relay-based Reflect only (original behavior)
- "stun" — public STUN servers only (no relay needed)
- "both" — relay + STUN in parallel (default, highest confidence)

New run_netcheck Tauri command exposes the full network diagnostic
(NAT type, IPv4/v6, port mapping, relay latencies, port allocation)
to the JS frontend.

JS usage:
  await invoke('detect_nat_type', { relays, mode: 'stun' })
  await invoke('run_netcheck', { relays })

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:43:17 +04:00
Siavash Sameni
e3e63a40a0 feat(nat): wire hard NAT port prediction into call flow (#85)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m27s
End-to-end integration of sequential port prediction:

- place_call: spawns background detect_port_allocation() + sends
  HardNatProbe signal after offer (doesn't delay call setup)
- answer_call: same for AcceptTrusted answers (privacy mode skips)
- Signal recv loop: stashes HardNatProbe in SignalState.peer_hard_nat_probe
- connect: reads peer's probe, if Sequential{delta} runs predict_ports()
  and adds predicted addrs to PeerCandidates.local for the dual-path race
- parse_sequential_delta() helper for "sequential(delta=N)" strings

The full flow: both peers independently detect their NAT's port
allocation, exchange HardNatProbe via relay, and the connect command
uses the peer's sequence to predict which ports to dial — all before
the dual-path race starts.

588 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:39:40 +04:00
Siavash Sameni
5d431c0721 fix(android): restore tauri::Emitter import for Docker builder toolchain
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Has been cancelled
Edition 2024 on local macOS auto-resolves the Emitter trait, but the
Docker builder's Rust/Tauri version requires the explicit import for
AppHandle::emit() to resolve. Keeps the warning locally to avoid
breaking the CI build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:34:23 +04:00
Siavash Sameni
8fcf1be341 feat(nat): Tailscale-inspired STUN/ICE + port mapping + mid-call re-gathering (#28)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 23s
Build Release Binaries / build-amd64 (push) Failing after 6m8s
Phase 8: 5 new modules bringing NAT traversal close to Tailscale's approach.

- stun.rs: RFC 5389 STUN client — public server reflexive discovery,
  XOR-MAPPED-ADDRESS parsing, parallel probe with retry, STUN fallback
  in desktop try_reflect_own_addr()
- portmap.rs: NAT-PMP (RFC 6886) + PCP (RFC 6887) + UPnP IGD port
  mapping — gateway discovery, acquire/release/refresh lifecycle,
  new PeerCandidates.mapped candidate type in dial order
- ice_agent.rs: candidate lifecycle — gather(), re_gather(),
  apply_peer_update() with monotonic generation counter,
  CandidateUpdate signal message forwarded by relay
- netcheck.rs: comprehensive diagnostic — NAT type, IPv4/v6,
  port mapping availability, relay latencies, CLI --netcheck
- relay_map.rs: RTT-sorted relay map, preferred() selection,
  populate_from_ack() for RegisterPresenceAck.available_relays

Relay: CallRegistry stores + cross-wires caller/callee_mapped_addr
into CallSetup.peer_mapped_addr. Region config + available_relays
populated from federation peers in RegisterPresenceAck.

Desktop: place_call/answer_call call acquire_port_mapping() and
fill caller/callee_mapped_addr. STUN+relay combined NAT detection.

571 tests pass (66 new), 0 regressions, 0 warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:17:17 +04:00
Siavash Sameni
1e82811cc1 feat(p2p): adaptive quality on direct calls (#23)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m37s
P2P calls now adapt codec quality based on observed network conditions,
matching what relay calls already had.

Three-layer implementation:
- QualityReport::from_path_stats(): construct reports from local quinn
  stats (loss%, RTT, jitter) without needing relay-generated reports
- CallEncoder.pending_quality_report: one-shot attachment to next
  source packet (consumed on encode, not repeated)
- Engine send tasks: generate quality report every 50 frames (~1s)
  from quinn_path_stats() and attach via set_pending_quality_report()
- Engine recv tasks: self-observe from own QUIC path stats every 50
  packets, feed to AdaptiveQualityController for P2P adaptation
  (works even if peer isn't sending quality reports yet)

Both relay and P2P calls now have adaptive quality. On relay calls,
both peer-sent reports AND local observations feed the controller.
Hysteresis (3 consecutive bad reports to downgrade) prevents thrashing.

372 tests passing (+4 new: from_path_stats encoding, clamping, zero
values, encoder quality report attachment).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:14:06 +04:00
Siavash Sameni
ba12aae439 refactor: extract shared engine helpers, federation clone-before-send, constants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 30s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Engine deduplication (PRD-engine-dedup.md):
- build_call_config(): shared CallConfig construction (was 23 lines × 2)
- codec_to_profile(): shared CodecId → QualityProfile mapping (was 19 lines × 2)
- run_signal_task(): shared signal handler (was 48 lines × 2)
- Net -39 lines from engine.rs, 6 duplicated blocks → single-line calls

Quick wins from REFACTOR-codebase-audit.md:
- 6 magic number constants extracted (CAPTURE_POLL_MS, RECV_TIMEOUT_MS, etc.)
- DRED_POLL_INTERVAL moved from 2 local defs to 1 module-level const
- federation.rs: forward_to_peers, broadcast_signal, send_signal_to_peer
  now clone peer list and release lock before sending (was holding Mutex
  across async I/O — last lock-during-send pattern eliminated)
- main.rs: close_transport() helper replaces 12 silent .ok() calls with
  debug-level logging

314 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:22:44 +04:00
Siavash Sameni
9ae9441de4 fix(audio): check capture ring available before read (fixes Opus6k choppy)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 32s
Build Release Binaries / build-amd64 (push) Failing after 3m58s
Partial reads from the capture ring consumed samples that were then
discarded when the send loop retried from buf[0]. For 20ms codecs this
was invisible (single Oboe burst fills 960 samples in one read), but
40ms codecs (Opus6k, 1920 samples) needed 2 bursts — the first partial
read consumed 960 real samples and threw them away.

Result: Opus6k produced ~11 frames/s instead of 25 (~44% of expected).

Fix: expose wzp_native_audio_capture_available() and check it before
reading, matching the desktop capture_ring.available() pattern. Partial
reads no longer occur because we only read when enough samples exist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:46:15 +04:00
Siavash Sameni
8ff0c548a7 fix(audio): update frame_samples on codec profile switch, fix buf sizing
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Has been cancelled
frame_samples was immutable — when adaptive quality switched from 20ms
(Opus24k, 960 samples) to 40ms (Opus6k, 1920 samples), the send loop
kept reading 960 samples and feeding half-sized frames to the encoder.
This caused Opus6k to produce ~11 frames/s instead of 25, making audio
choppy.

Fix:
- frame_samples is now mut and updated on profile switch
- buf sized for max frame (1920) with frame_samples-bounded slices
- RMS, mute, encode, and capture reads all use &buf[..frame_samples]
- Applied to both Android and desktop send tasks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:33:02 +04:00
Siavash Sameni
d424515542 feat: 5-tier quality classification, QualityDirective handling, debug tap stats
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
- Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good +
  Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5,
  studio:10)
- Handle QualityDirective signals in both desktop and Android engines
  — relay-coordinated codec switching now works end-to-end
- Add periodic TAP STATS to debug tap: packets in/out, fan-out avg,
  seq gaps, codecs seen (every 5s)
- Mark task #2 done (ParticipantInfo in federation signals already
  implemented)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:23:48 +04:00
Siavash Sameni
22045bc5e6 feat: adaptive quality in desktop, relay quality directive, Oboe state polling
- Wire AdaptiveQualityController into desktop engine send/recv tasks
  (mirrors Android pattern: AtomicU8 pending_profile, auto-mode check)
- Wire same into Android engine send task (was only in recv before)
- QualityDirective SignalMessage variant for relay-initiated codec switch
- ParticipantQuality tracking in relay RoomManager (per-participant
  AdaptiveQualityController, weakest-link tier computation)
- Relay broadcasts QualityDirective to all participants when room-wide
  tier degrades (coordinated codec switching)
- Oboe stream state polling: poll getState() for up to 2s after
  requestStart() to ensure both streams reach Started before proceeding
  (fixes intermittent silent calls on cold start, Nothing Phone A059)

Tasks: #7, #25, #26, #31, #35

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:54:04 +04:00
Siavash Sameni
766c9df442 feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window
- DredTuner: maps live network metrics (loss/RTT/jitter) to continuous
  DRED duration every ~500ms instead of discrete tier-locked values.
  Includes jitter-spike detection for pre-emptive Starlink-style boost.
- Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports)
- PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval
- TrunkedForwarder respects discovered MTU (was hard-coded 1200)
- QuinnPathSnapshot exposes quinn internal stats + discovered MTU
- AudioEncoder trait: set_expected_loss() + set_dred_duration() methods
- PathMonitor: sliding-window jitter variance for spike detection
- Integrated into both Android and desktop send tasks in engine.rs
- 14 new tests (10 tuner unit + 4 encoder integration)
- Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:38:37 +04:00
Siavash Sameni
24cc74d93c fix(audio): clear BT SCO communication device on call end
Without clearCommunicationDevice(), the BT headset stays locked in SCO
mode after the call. Media playback (video, music) can't route to BT
A2DP, requiring a device reboot to restore normal audio.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:40:44 +04:00
Siavash Sameni
114d69e488 fix: use tracing::warn! instead of bare warn! in engine.rs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:31:12 +04:00
Siavash Sameni
15c237ceea fix(audio): defer MODE_IN_COMMUNICATION to call start, restore on end
Root cause: MainActivity set MODE_IN_COMMUNICATION at app launch,
hijacking system audio routing immediately — BT A2DP music dropped to
earpiece, and the pre-existing communication mode confused subsequent
setCommunicationDevice calls for BT SCO.

Fix: MainActivity now only sets volumes. MODE_IN_COMMUNICATION is set
via JNI right before Oboe audio_start() in CallEngine, and MODE_NORMAL
is restored after audio_stop() when the call ends.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:29:59 +04:00
Siavash Sameni
137fe5f084 fix(bluetooth): BT SCO mode skips 48kHz + VoiceCommunication on capture
Root cause: Oboe capture at 48kHz with InputPreset::VoiceCommunication
cannot open against a BT SCO device (only supports 8/16kHz). The stream
silently falls back to builtin mic, delivering zeros.

Fix: add bt_active flag to WzpOboeConfig. When set, capture skips
setSampleRate and setInputPreset, letting the system route to BT SCO
at its native rate. Oboe's SampleRateConversionQuality::Best resamples
to 48kHz for our ring buffers. Playout uses Usage::Media in BT mode.

New API: wzp_native_audio_start_bt() for BT mode, called from
set_bluetooth_sco(on=true). Normal audio_start() restores the
standard config when switching back to earpiece/speaker.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:23:19 +04:00
Siavash Sameni
5dfb5b3581 fix(bluetooth): use Shared mode for Oboe + delay restart for BT route
Two fixes for BT audio silence:

1. Switch Oboe streams from Exclusive to Shared sharing mode. Exclusive
   mode bypasses Oboe's internal resampler, so opening a 48kHz stream
   against a BT SCO device (8/16kHz only) fails at the AudioPolicy
   level. Shared mode lets Oboe's resampler bridge the gap.

2. Add 500ms post-SCO delay before Oboe restart. The audio policy needs
   time to apply the bt-sco route after setCommunicationDevice returns.
   Without the delay, Oboe opens against the old device (handset).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:14:06 +04:00
Siavash Sameni
2d4948a7b3 fix(bluetooth): add missing &[] arg to getAvailableCommunicationDevices JNI call
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:02:57 +04:00
Siavash Sameni
19703ff66c fix(bluetooth): use setCommunicationDevice API on Android 12+
Root cause: setBluetoothScoOn(true) is silently rejected on Android 12+
for non-system apps ("is greater than FIRST_APPLICATION_UID exiting").
Audio policy routed to handset instead of BT despite SCO link being up.

Fix: use the modern setCommunicationDevice(AudioDeviceInfo) API on
API 31+ which properly routes voice audio to the BT device. Falls back
to deprecated startBluetoothSco() on older APIs.

Also uses getCommunicationDevice() for is_bluetooth_sco_on() and
clearCommunicationDevice() for stop, matching the modern API surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:01:33 +04:00
Siavash Sameni
7e8dc400dc fix(bluetooth): wait for SCO link before Oboe restart + detect A2DP devices
Three fixes for Bluetooth audio not working:

1. is_bluetooth_available() now checks for TYPE_BLUETOOTH_A2DP (8) in
   addition to TYPE_BLUETOOTH_SCO (7) — many headsets only register as
   A2DP until SCO is explicitly started.

2. set_bluetooth_sco(on=true) polls isBluetoothScoOn() for up to 3s
   before restarting Oboe. startBluetoothSco() is async — the SCO link
   takes 500ms-2s to establish. Without waiting, Oboe opens against
   earpiece and audio goes nowhere.

3. Frontend skips redundant set_speakerphone(false) when transitioning
   to BT — start_bluetooth_sco() handles speaker-off internally,
   avoiding a double Oboe restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:46:56 +04:00
Siavash Sameni
a798634b3d fix(signal): add call_id to Hangup — prevents stale hangup killing new calls
Root cause: Hangup had no call_id field. The relay forwarded hangups to
ALL active calls for a user. When user A hung up call 1 and user B
immediately placed call 2, the relay's processing of A's hangup would
also kill call 2 (race window ~1-2s).

Fix: add optional call_id to Hangup (backwards-compatible via serde
skip_serializing_if). When present, the relay only ends the named call.
Old clients send call_id=None and get the legacy broadcast behavior.

Also: clear pending_path_report in Hangup recv handler and
internal_deregister to prevent stale oneshot channels from blocking
subsequent call setups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:39:21 +04:00
Siavash Sameni
4c1ad841e1 feat(android): Bluetooth audio routing + network change detection + per-arch APK builds
Bluetooth: wire existing AudioRouteManager SCO support through both app
variants. Replace binary speaker toggle with 3-way route cycling
(Earpiece → Speaker → Bluetooth). Tauri side adds JNI bridge functions
(start/stop/query SCO, device availability) and Oboe stream restart.

Network awareness: integrate Android ConnectivityManager to detect
WiFi/cellular transitions and feed them to AdaptiveQualityController
via lock-free AtomicU8 signaling. Enables proactive quality downgrade
and FEC boost on network handoffs.

Build: add --arch flag to build-tauri-android.sh supporting arm64,
armv7, or all (separate per-arch APKs for smaller tester binaries).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:07:41 +04:00
Siavash Sameni
29cd23fe39 fix(p2p): connection cleanup — 4 fixes for stale/dead connections
PRD 4: Disable IPv6 direct dial/accept temporarily. IPv6 QUIC
handshakes succeed but connections die immediately on datagram
send ("connection lost"). IPv4 candidates work reliably. IPv6
candidates still gathered but filtered at dial time.

PRD 1: Close losing transport after Phase 6 negotiation. The
non-selected transport now gets an explicit QUIC close frame
instead of silently dropping after 30s idle timeout. Prevents
phantom connections from polluting future accept() calls.

PRD 2: Harden accept loop with max 3 stale retries. Stale
connections are explicitly closed (conn.close) and counted.
After 3 stale connections, the accept loop aborts instead of
spinning until the race timeout.

PRD 3: Resource cleanup — close old IPv6 endpoint before
creating a new one in place_call/answer_call. Add Drop impl
to CallEngine so tasks are signalled to stop on ungraceful
shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:11:50 +04:00
Siavash Sameni
1eb82d77b8 feat(relay+client): relay reports build version in Ack
Add relay_build field to RegisterPresenceAck so the client logs
which relay version it connected to. Shows in the debug log as
register_signal:ack_received {"relay_build":"f843a93"}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:27:58 +04:00
Siavash Sameni
b79073c649 Revert "fix(connect): trust direct path on peer report timeout"
This reverts commit 82b439595c.
2026-04-12 14:10:44 +04:00
Siavash Sameni
82b439595c fix(connect): trust direct path on peer report timeout
When peers are on different relays, MediaPathReport can't be
forwarded — causing a 3s timeout and false relay fallback even
though direct P2P works perfectly.

Fix: on timeout, if local_direct_ok is true AND the direct
transport's connection is still alive (no close_reason), trust
the direct path instead of falling back to relay. The timeout
indicates a relay forwarding issue, not a direct path failure.

Also fix ALT build paste URL (paste.tbs.manko.yoga not amn.gg).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:07:44 +04:00
Siavash Sameni
1904b19d05 fix(direct): validate A-role accepted connection, skip stale ones
The Acceptor's accept() on the shared signal endpoint can dequeue
a stale QUIC connection from a previous call that the Dialer has
already dropped. This results in "connection lost" errors when
media datagrams are sent — 100% drops on both sides.

Fix: after accepting a connection, check close_reason(). If the
connection is already closed, log a warning and re-accept. Also
verify max_datagram_size() is available before returning.

Additionally: emit transport details (remote addr, max_datagram,
close_reason) in the call_engine_starting debug event so stale
connection issues are visible in the user-facing debug log.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:50:21 +04:00
Siavash Sameni
40955bd11c debug(media): add connection diagnostics for direct P2P drops
When direct P2P calls show 100% datagram drops, we need to know
WHY send_media() fails. This commit adds:

- Remote address + stable_id logging on A-role accept and D-role
  dial success (dual_path.rs) — tells us which candidate won
- Remote address + max_datagram_size on engine transport init —
  verifies datagrams are negotiated
- last_send_err in send heartbeat — captures the actual error
  from send_datagram() failures
- QuinnTransport::remote_address() helper

Also fixes UI badge: was looking for wrong event name
("dual_path_race_won" → "path_negotiated").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:29:58 +04:00
Siavash Sameni
7554959baa fix(ui): show correct P2P Direct / Via Relay badge
The UI looked for event "connect:dual_path_race_won" which doesn't
exist — the actual event is "connect:path_negotiated" with a
use_direct boolean. Badge always showed "Via Relay" even when the
call was direct P2P.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:22:00 +04:00
Siavash Sameni
4cfcd5117f fix(connect): install MediaPathReport oneshot BEFORE race starts
The peer's MediaPathReport can arrive while our dual_path::race is
still running. Previously, the oneshot was created AFTER the race
completed, so the recv loop had nowhere to deliver the report —
it was silently dropped, causing a 3s timeout and false relay
fallback on ~50% of calls.

Fix: create the oneshot and install it in SignalState BEFORE
starting the race. The oneshot::Receiver buffers the value so the
connect command can read it immediately after the race finishes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:06:13 +04:00
Siavash Sameni
bd6733b2e5 feat(signal): advertise build version in Offer/Answer
Add caller_build_version / callee_build_version (git short hash)
to DirectCallOffer and DirectCallAnswer so peers can identify each
other's build in debug logs. Also log own build at register time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:43:55 +04:00
Siavash Sameni
c2d298beb5 feat(net): Phase 7 — dual-socket IPv4+IPv6 ICE
Adds a dedicated IPv6 QUIC endpoint (IPV6_V6ONLY=1 via socket2)
alongside the existing IPv4 signal endpoint for proper dual-stack
P2P connectivity. Previous [::]:0 dual-stack attempt broke IPv4
on Android; this uses separate sockets per address family like
WebRTC/libwebrtc.

- create_ipv6_endpoint(): socket2-based IPv6-only UDP socket,
  tries same port as IPv4 signal EP, falls back to ephemeral
- local_host_candidates(v4_port, v6_port): now gathers IPv6
  global-unicast (2000::/3) and unique-local (fc00::/7) addrs
- dual_path::race(): A-role accepts on both v4+v6 via select!,
  D-role routes each candidate to matching-AF endpoint
- Graceful fallback: if IPv6 unavailable, .ok() → None → pure
  IPv4 behavior identical to pre-Phase-7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 11:54:13 +04:00
Siavash Sameni
aee41a638d fix(audio+net): revert dual-stack [::]:0, add Oboe playout stall auto-restart
Two fixes:

## Revert [::]:0 dual-stack sockets → back to 0.0.0.0:0

Android's IPV6_V6ONLY=1 default on some kernels (confirmed on
Nothing Phone) makes [::]:0 IPv6-only, silently killing ALL
IPv4 traffic. This broke P2P direct calls: IPv4 LAN candidates
(172.16.81.x) couldn't complete QUIC handshakes through the
IPv6-only socket, causing local_direct_ok=false and relay
fallback on every call after the first.

Reverted all bind sites to 0.0.0.0:0 (reliable IPv4). IPv6 host
candidates are disabled in local_host_candidates() until a
proper dual-socket approach (one IPv4 + one IPv6 endpoint,
Phase 7) is implemented.

## Fix A (task #35): Oboe playout callback stall auto-restart

The Nothing Phone's Oboe playout callback fires once (cb#0) and
then stops draining the ring on ~50% of cold-launch calls. Fix
D+C (stop+prime from previous commit) didn't help because
audio_stop is a no-op on cold launch.

New approach: self-healing watchdog in audio_write_playout.
Tracks the playout ring's read_idx across writes. If read_idx
hasn't advanced in 50 consecutive writes (~1 second), the Oboe
playout callback has stopped:

1. Log "playout STALL detected"
2. Call wzp_oboe_stop() to tear down the stuck streams
3. Clear both ring buffers (prevent stale data reads)
4. Call wzp_oboe_start() to rebuild fresh streams
5. Log success/failure
6. Return 0 (caller retries on next frame)

This is the same teardown+rebuild that "rejoin" does — but
triggered automatically from the first stalled call instead of
requiring the user to hang up and redial. The watchdog runs
on every write so it fires within 1s of the stall starting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 11:24:16 +04:00
Siavash Sameni
9fb92967eb fix(net): bind all endpoints to [::]:0 for dual-stack IPv4+IPv6
Every QUIC endpoint was bound to 0.0.0.0:0 (IPv4-only). This
silently killed ALL IPv6 host candidates: the Dialer couldn't
send packets to [2a0d:...] addresses (wrong address family on
the socket), and the Acceptor couldn't receive incoming IPv6
QUIC handshakes. The IPv6 candidates were gathered and advertised
in DirectCallOffer/Answer but were completely non-functional.

On same-LAN with dual-stack (which both test phones have), this
meant:
- JoinSet fanned out 3+ candidates (2× IPv6 + 1× IPv4)
- IPv6 dials failed silently or timed out
- IPv4 dial worked but competed with failed IPv6 for JoinSet
  attention
- Sometimes the JoinSet returned an IPv6 failure before the
  IPv4 success, causing unnecessary fallback to relay

Fix: bind to [::]:0 (IPv6 any) instead of 0.0.0.0:0. On
dual-stack systems (Linux/Android default), [::]:0 creates a
socket that handles BOTH:
- IPv6 natively (global unicast, ULA)
- IPv4 via v4-mapped addresses (::ffff:172.16.81.x)

One socket, both protocols. All 7 bind sites updated:
- register_signal (signal endpoint)
- do_register_signal
- ping_relay
- probe_reflect_addr (fresh endpoint fallback)
- dual_path::race (A-role fresh, D-role fresh, relay fresh)

With this fix, same-LAN P2P should prefer the IPv6 path (no
NAT, direct routing, lower latency) and fall through to IPv4
if IPv6 fails — relay is the last resort after ALL candidates
are exhausted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 11:09:06 +04:00
Siavash Sameni
9f2ff6a6ec fix(android-audio): Fix D+C — stop+prime cycle on every call start
Addresses the first-join no-audio regression (tasks #35-37) where
the Oboe playout callback fires once (cb#0) and then stops
draining the ring on the Nothing Phone, causing written_samples
to freeze at 7679 (ring capacity minus one burst). Second call
(rejoin) always works because audio_stop tears down the streams
and audio_start rebuilds them fresh.

Two combined fixes:

**Fix D (task #37)**: always call audio_stop() before audio_start()
at the top of CallEngine::start. On a cold launch this is a no-op
(streams not yet started). On subsequent calls it guarantees a
clean teardown before rebuild — the same thing rejoin does. Added
a 50ms pause between stop and start to let the Android HAL release
the audio session.

**Fix C (task #36)**: after audio_start(), immediately write 960
samples (20ms) of silence into the playout ring. This ensures the
Oboe playout callback has data to drain on its first invocation.
On devices where an empty-ring first callback causes the stream
to self-pause (Nothing Phone's Qualcomm HAL), the priming data
keeps the callback loop alive until real decoded audio arrives
from the recv task.

Together these cover the two most likely root causes:
1. Stale Oboe state from a previous audio_start that didn't
   clean up properly → Fix D forces a clean rebuild
2. Playout callback self-pausing on an empty ring → Fix C
   ensures the ring is non-empty at callback time

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:50:58 +04:00
Siavash Sameni
134ee3a77f fix(engine): pass is_direct_p2p explicitly instead of deriving from is_some
Critical Phase 6 bug: when the negotiation agreed on relay path
but delivered the relay transport via pre_connected_transport,
CallEngine saw is_some() = true → is_direct_p2p = true → skipped
perform_handshake. The relay couldn't authenticate the participant
→ room join silently failed → recv_fr: 0, both sides sending
into the void.

Fix: add explicit is_direct_p2p: bool parameter to CallEngine::
start (both android and desktop branches). The connect command
sets it from the Phase 6 negotiation result (use_direct), not
from whether pre_connected_transport is Some.

Now relay-negotiated calls correctly run perform_handshake,
and direct P2P calls correctly skip it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:34:21 +04:00
Siavash Sameni
e61397ca85 fix(connect): remove pre-Phase-6 same-IP heuristic
The commit de007ec added a heuristic that forced relay-only when
peers had different public IPs. That was a stopgap for the race
condition where one side picked Direct and the other picked Relay.
Phase 6 (f5542ef) solved this properly via MediaPathReport
negotiation, but the heuristic wasn't cleaned up and was still
running BEFORE the Phase 6 code — suppressing the race entirely
for cross-network calls.

Removed. Phase 6 negotiation now handles ALL cases: both sides
race, exchange reports, and agree on the same path before
committing media. Cross-network calls that can't go P2P will
have both sides report direct_ok=false and agree on relay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:23:36 +04:00
Siavash Sameni
f5542ef822 feat(p2p): Phase 6 — ICE-style path negotiation
Before Phase 6, each side's dual-path race ran independently and
committed to whichever transport completed first. When one side
picked Direct and the other picked Relay, they sent media to
different places — TX > 0 RX: 0 on both, completely silent call.

Phase 6 adds a negotiation step: after the local race completes,
each side sends a MediaPathReport { call_id, direct_ok, winner }
to the peer through the relay. Both wait for the other's report
before committing a transport to the CallEngine. The decision
rule is simple: if BOTH report direct_ok = true, use direct; if
EITHER reports false, BOTH use relay.

## Wire protocol

New `SignalMessage::MediaPathReport { call_id, direct_ok,
race_winner }`. The relay forwards it to the call peer via the
same signal_hub routing used for DirectCallOffer/Answer. The
cross-relay dispatcher also forwards it.

## dual_path::race restructured

Returns `RaceResult` instead of `(Arc<QuinnTransport>, WinningPath)`:
- `direct_transport: Option<Arc<QuinnTransport>>`
- `relay_transport: Option<Arc<QuinnTransport>>`
- `local_winner: WinningPath`

Both paths are run as spawned tasks. After the first completes,
a 1s grace period lets the loser also finish. The connect
command gets BOTH transports (when available) and picks the
right one based on the negotiation outcome. The unused transport
is dropped.

## connect command flow (revised)

1. Run race() → RaceResult with both transports
2. Send MediaPathReport to relay with our direct_ok
3. Install oneshot; wait for peer's report (3s timeout)
4. Decision: both direct_ok → use direct; else → use relay
5. Start CallEngine with the agreed transport

If the peer never responds (old build, timeout), falls back to
relay — backward compatible.

## Relay forwarding

MediaPathReport is forwarded like DirectCallOffer/Answer: via
signal_hub.send_to(peer_fp) for same-relay calls, and via
cross-relay dispatcher for federated calls.

## Debug log events

- `connect:dual_path_race_done` — local race result
- `connect:path_report_sent` — our report to the peer
- `connect:peer_report_received` — peer's report
- `connect:peer_report_timeout` — peer didn't respond (3s)
- `connect:path_negotiated` — final agreed path with reasons

Full workspace test: 423 passing (no regressions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:03:42 +04:00