wz-phone

Author	SHA1	Message	Date
Siavash Sameni	f265fd772d	docs: relay concurrency model, Opus6k fix, build script fixes Some checks failed Mirror to GitHub / mirror (push) Failing after 34s Details Build Release Binaries / build-amd64 (push) Failing after 3m56s Details - ARCHITECTURE.md: new "Relay Concurrency Model" section documenting threading, shared state locking table, scaling characteristics, and the RoomManager Mutex as primary bottleneck - PROGRESS.md: Opus6k frame starvation fix, build script fixes - PRD-dred-integration.md: Opus6k frame starvation bug documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:54:37 +04:00
Siavash Sameni	9ae9441de4	fix(audio): check capture ring available before read (fixes Opus6k choppy) Some checks failed Mirror to GitHub / mirror (push) Failing after 32s Details Build Release Binaries / build-amd64 (push) Failing after 3m58s Details Partial reads from the capture ring consumed samples that were then discarded when the send loop retried from buf[0]. For 20ms codecs this was invisible (single Oboe burst fills 960 samples in one read), but 40ms codecs (Opus6k, 1920 samples) needed 2 bursts — the first partial read consumed 960 real samples and threw them away. Result: Opus6k produced ~11 frames/s instead of 25 (~44% of expected). Fix: expose wzp_native_audio_capture_available() and check it before reading, matching the desktop capture_ring.available() pattern. Partial reads no longer occur because we only read when enough samples exist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:46:15 +04:00
Siavash Sameni	d9e7e72978	docs: update PROGRESS, PRDs for completed tasks #9 , #11 , #12 , #27 Some checks failed Mirror to GitHub / mirror (push) Failing after 28s Details Build Release Binaries / build-amd64 (push) Failing after 3m50s Details - PROGRESS.md: add 2026-04-13 section with 5-tier quality, QualityDirective handling, debug tap enhancements, dual_path fix, keystore sync - PRD-coordinated-codec.md: Phase 3 marked complete (client directive handling) - PRD-adaptive-quality.md: milestone table updated with Done/Pending status Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:34:01 +04:00
Siavash Sameni	8ff0c548a7	fix(audio): update frame_samples on codec profile switch, fix buf sizing Some checks failed Mirror to GitHub / mirror (push) Failing after 27s Details Build Release Binaries / build-amd64 (push) Has been cancelled Details frame_samples was immutable — when adaptive quality switched from 20ms (Opus24k, 960 samples) to 40ms (Opus6k, 1920 samples), the send loop kept reading 960 samples and feeding half-sized frames to the encoder. This caused Opus6k to produce ~11 frames/s instead of 25, making audio choppy. Fix: - frame_samples is now mut and updated on profile switch - buf sized for max frame (1920) with frame_samples-bounded slices - RMS, mute, encode, and capture reads all use &buf[..frame_samples] - Applied to both Android and desktop send tasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:33:02 +04:00
Siavash Sameni	f17420aa98	fix(build): sync keystores from persistent cache before build Some checks failed Mirror to GitHub / mirror (push) Failing after 27s Details Build Release Binaries / build-amd64 (push) Failing after 3m49s Details Keystores are gitignored so git reset --hard deletes them. The build script now copies them from a persistent $BASE_DIR/data/keystore/ cache into the source tree before building. This ensures both primary and alt servers always have signing keys available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:11:28 +04:00
Siavash Sameni	d424515542	feat: 5-tier quality classification, QualityDirective handling, debug tap stats Some checks failed Mirror to GitHub / mirror (push) Failing after 31s Details Build Release Binaries / build-amd64 (push) Failing after 3m49s Details - Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good + Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5, studio:10) - Handle QualityDirective signals in both desktop and Android engines — relay-coordinated codec switching now works end-to-end - Add periodic TAP STATS to debug tap: packets in/out, fan-out avg, seq gaps, codecs seen (every 5s) - Mark task #2 done (ParticipantInfo in federation signals already implemented) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:23:48 +04:00
Siavash Sameni	ea5fc17c34	fix(relay): debug tap signal logging, dual_path test regression, PRD updates Some checks failed Build Release Binaries / build-amd64 (push) Failing after 3m39s Details Mirror to GitHub / mirror (push) Failing after 28s Details - Add log_signal() and log_event() to DebugTap for RoomUpdate, QualityDirective, join/leave lifecycle events (task #11) - Fix dual_path.rs Phase 7 regression: add missing ipv6_endpoint arg to 3 race() call sites - Update PRDs to reflect actual implementation status: mark adaptive quality, coordinated codec, P2P, network awareness, protocol analyzer - Update PROGRESS.md with QualityDirective gap and dual_path regression Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 09:54:52 +04:00
Siavash Sameni	1a7dd935ee	fix(build): add zipalign + apksigner signing to build.sh Some checks failed Mirror to GitHub / mirror (push) Failing after 43s Details Build Release Binaries / build-amd64 (push) Failing after 3m44s Details build.sh was producing unsigned APKs because it reimplemented the Docker build inline without the signing step from build-tauri-android.sh. Now uses the same pipeline: find keystore (release preferred, debug fallback), zipalign -f 4, apksigner sign with keystore credentials. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 20:13:20 +04:00
Siavash Sameni	a7c2261b70	fix(build): clean stale APKs before build, prefer release APK on upload Some checks failed Mirror to GitHub / mirror (push) Failing after 37s Details Build Release Binaries / build-amd64 (push) Failing after 3m50s Details find was picking up a cached 384MB debug APK over the fresh 25MB release APK because the old file was listed first. Now: 1. Delete all APKs before the build starts (clean slate) 2. On upload, prefer release.apk over any other match Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 20:08:06 +04:00
Siavash Sameni	eca0bb7531	Merge branch 'opus-DRED-v2' Some checks failed Mirror to GitHub / mirror (push) Failing after 37s Details Build Release Binaries / build-amd64 (push) Failing after 3m26s Details	2026-04-12 19:57:35 +04:00
Siavash Sameni	d249b32ee5	test+docs: add tests for QualityDirective, ParticipantQuality; update docs - QualityDirective signal roundtrip tests (with/without reason) - ParticipantQuality unit tests (initial tier, degradation, weakest-link) - Updated PROGRESS.md with desktop adaptive quality, relay coordinated switching, Oboe state polling entries - Updated ARCHITECTURE.md SFU fan-out rules with QualityDirective - Updated PRD-coordinated-codec.md with implementation status - 312 tests passing across all modified crates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 19:56:46 +04:00
Siavash Sameni	22045bc5e6	feat: adaptive quality in desktop, relay quality directive, Oboe state polling - Wire AdaptiveQualityController into desktop engine send/recv tasks (mirrors Android pattern: AtomicU8 pending_profile, auto-mode check) - Wire same into Android engine send task (was only in recv before) - QualityDirective SignalMessage variant for relay-initiated codec switch - ParticipantQuality tracking in relay RoomManager (per-participant AdaptiveQualityController, weakest-link tier computation) - Relay broadcasts QualityDirective to all participants when room-wide tier degrades (coordinated codec switching) - Oboe stream state polling: poll getState() for up to 2s after requestStart() to ensure both streams reach Started before proceeding (fixes intermittent silent calls on cold start, Nothing Phone A059) Tasks: #7, #25, #26, #31, #35 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 19:54:04 +04:00
Siavash Sameni	766c9df442	feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window - DredTuner: maps live network metrics (loss/RTT/jitter) to continuous DRED duration every ~500ms instead of discrete tier-locked values. Includes jitter-spike detection for pre-emptive Starlink-style boost. - Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports) - PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval - TrunkedForwarder respects discovered MTU (was hard-coded 1200) - QuinnPathSnapshot exposes quinn internal stats + discovered MTU - AudioEncoder trait: set_expected_loss() + set_dred_duration() methods - PathMonitor: sliding-window jitter variance for spike detection - Integrated into both Android and desktop send tasks in engine.rs - 14 new tests (10 tuner unit + 4 encoder integration) - Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 19:38:37 +04:00
Siavash Sameni	6f43415285	merge opus-DRED-v2 into main Some checks failed Mirror to GitHub / mirror (push) Failing after 38s Details Build Release Binaries / build-amd64 (push) Failing after 3m25s Details 50 commits: BT audio routing, network change detection, Hangup call_id, per-arch APK builds, setCommunicationDevice API 31+, deferred MODE_IN_COMMUNICATION, Oboe BT mode, build signing, doc updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:41:57 +04:00
Siavash Sameni	24cc74d93c	fix(audio): clear BT SCO communication device on call end Without clearCommunicationDevice(), the BT headset stays locked in SCO mode after the call. Media playback (video, music) can't route to BT A2DP, requiring a device reboot to restore normal audio. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:40:44 +04:00
Siavash Sameni	300ea66d13	docs: update DESIGN, ARCHITECTURE, PRDs, PROGRESS for BT + network + build changes Reflects the current reality: setCommunicationDevice API 31+, deferred MODE_IN_COMMUNICATION, BT-mode Oboe (bt_active flag), per-arch builds, Hangup call_id fix, and network monitoring integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:39:59 +04:00
Siavash Sameni	114d69e488	fix: use tracing::warn! instead of bare warn! in engine.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:31:12 +04:00
Siavash Sameni	15c237ceea	fix(audio): defer MODE_IN_COMMUNICATION to call start, restore on end Root cause: MainActivity set MODE_IN_COMMUNICATION at app launch, hijacking system audio routing immediately — BT A2DP music dropped to earpiece, and the pre-existing communication mode confused subsequent setCommunicationDevice calls for BT SCO. Fix: MainActivity now only sets volumes. MODE_IN_COMMUNICATION is set via JNI right before Oboe audio_start() in CallEngine, and MODE_NORMAL is restored after audio_stop() when the call ends. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:29:59 +04:00
Siavash Sameni	a37c8b30fe	fix(native): add missing bt_active field to stall detector config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:25:11 +04:00
Siavash Sameni	137fe5f084	fix(bluetooth): BT SCO mode skips 48kHz + VoiceCommunication on capture Root cause: Oboe capture at 48kHz with InputPreset::VoiceCommunication cannot open against a BT SCO device (only supports 8/16kHz). The stream silently falls back to builtin mic, delivering zeros. Fix: add bt_active flag to WzpOboeConfig. When set, capture skips setSampleRate and setInputPreset, letting the system route to BT SCO at its native rate. Oboe's SampleRateConversionQuality::Best resamples to 48kHz for our ring buffers. Playout uses Usage::Media in BT mode. New API: wzp_native_audio_start_bt() for BT mode, called from set_bluetooth_sco(on=true). Normal audio_start() restores the standard config when switching back to earpiece/speaker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:23:19 +04:00
Siavash Sameni	5dfb5b3581	fix(bluetooth): use Shared mode for Oboe + delay restart for BT route Two fixes for BT audio silence: 1. Switch Oboe streams from Exclusive to Shared sharing mode. Exclusive mode bypasses Oboe's internal resampler, so opening a 48kHz stream against a BT SCO device (8/16kHz only) fails at the AudioPolicy level. Shared mode lets Oboe's resampler bridge the gap. 2. Add 500ms post-SCO delay before Oboe restart. The audio policy needs time to apply the bt-sco route after setCommunicationDevice returns. Without the delay, Oboe opens against the old device (handset). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:14:06 +04:00
Siavash Sameni	fd0ccf8e99	fix(bluetooth): enable Oboe sample rate conversion for BT SCO (8/16kHz) BT SCO devices only support 8kHz or 16kHz but our Oboe streams request 48kHz. Without resampling, AudioPolicyManager rejects the input stream ("getInputProfile could not find profile for... sampling rate 48000"). Fix: add setSampleRateConversionQuality(Best) to both capture and playout stream builders. Oboe resamples internally so our ring buffers stay at 48kHz regardless of the hardware sample rate. Also removes the broken setBluetoothScoOn/isBluetoothScoOn calls from stop_bluetooth_sco — just call stopBluetoothSco() unconditionally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:08:48 +04:00
Siavash Sameni	2d4948a7b3	fix(bluetooth): add missing &[] arg to getAvailableCommunicationDevices JNI call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:02:57 +04:00
Siavash Sameni	19703ff66c	fix(bluetooth): use setCommunicationDevice API on Android 12+ Root cause: setBluetoothScoOn(true) is silently rejected on Android 12+ for non-system apps ("is greater than FIRST_APPLICATION_UID exiting"). Audio policy routed to handset instead of BT despite SCO link being up. Fix: use the modern setCommunicationDevice(AudioDeviceInfo) API on API 31+ which properly routes voice audio to the BT device. Falls back to deprecated startBluetoothSco() on older APIs. Also uses getCommunicationDevice() for is_bluetooth_sco_on() and clearCommunicationDevice() for stop, matching the modern API surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:01:33 +04:00
Siavash Sameni	7e8dc400dc	fix(bluetooth): wait for SCO link before Oboe restart + detect A2DP devices Three fixes for Bluetooth audio not working: 1. is_bluetooth_available() now checks for TYPE_BLUETOOTH_A2DP (8) in addition to TYPE_BLUETOOTH_SCO (7) — many headsets only register as A2DP until SCO is explicitly started. 2. set_bluetooth_sco(on=true) polls isBluetoothScoOn() for up to 3s before restarting Oboe. startBluetoothSco() is async — the SCO link takes 500ms-2s to establish. Without waiting, Oboe opens against earpiece and audio goes nowhere. 3. Frontend skips redundant set_speakerphone(false) when transitioning to BT — start_bluetooth_sco() handles speaker-off internally, avoiding a double Oboe restart. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:46:56 +04:00
Siavash Sameni	a798634b3d	fix(signal): add call_id to Hangup — prevents stale hangup killing new calls Root cause: Hangup had no call_id field. The relay forwarded hangups to ALL active calls for a user. When user A hung up call 1 and user B immediately placed call 2, the relay's processing of A's hangup would also kill call 2 (race window ~1-2s). Fix: add optional call_id to Hangup (backwards-compatible via serde skip_serializing_if). When present, the relay only ends the named call. Old clients send call_id=None and get the legacy broadcast behavior. Also: clear pending_path_report in Hangup recv handler and internal_deregister to prevent stale oneshot channels from blocking subsequent call setups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:39:21 +04:00
Siavash Sameni	d89376016a	fix(build): sign release APKs with project keystore (wzp-release.jks) Release builds from cargo-tauri are unsigned. After Gradle produces the APK, zipalign + apksigner now sign it with the release keystore (android/keystore/wzp-release.jks). Falls back to debug keystore if release is missing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:21:38 +04:00
Siavash Sameni	678695776e	fix(build): correct APK output path — target/ is mounted from cache dir Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:10:03 +04:00
Siavash Sameni	4c1ad841e1	feat(android): Bluetooth audio routing + network change detection + per-arch APK builds Bluetooth: wire existing AudioRouteManager SCO support through both app variants. Replace binary speaker toggle with 3-way route cycling (Earpiece → Speaker → Bluetooth). Tauri side adds JNI bridge functions (start/stop/query SCO, device availability) and Oboe stream restart. Network awareness: integrate Android ConnectivityManager to detect WiFi/cellular transitions and feed them to AdaptiveQualityController via lock-free AtomicU8 signaling. Enables proactive quality downgrade and FEC boost on network handoffs. Build: add --arch flag to build-tauri-android.sh supporting arm64, armv7, or all (separate per-arch APKs for smaller tester binaries). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:07:41 +04:00
Siavash Sameni	29cd23fe39	fix(p2p): connection cleanup — 4 fixes for stale/dead connections PRD 4: Disable IPv6 direct dial/accept temporarily. IPv6 QUIC handshakes succeed but connections die immediately on datagram send ("connection lost"). IPv4 candidates work reliably. IPv6 candidates still gathered but filtered at dial time. PRD 1: Close losing transport after Phase 6 negotiation. The non-selected transport now gets an explicit QUIC close frame instead of silently dropping after 30s idle timeout. Prevents phantom connections from polluting future accept() calls. PRD 2: Harden accept loop with max 3 stale retries. Stale connections are explicitly closed (conn.close) and counted. After 3 stale connections, the accept loop aborts instead of spinning until the race timeout. PRD 3: Resource cleanup — close old IPv6 endpoint before creating a new one in place_call/answer_call. Add Drop impl to CallEngine so tasks are signalled to stop on ungraceful shutdown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 15:11:50 +04:00
Siavash Sameni	4d66d3769d	fix(relay): set peer_relay_fp on originating relay when answer arrives The originating relay (where the caller is) never set peer_relay_fp because the call was created locally. When the callee's answer arrived via federation, the cross-relay dispatcher handled it but didn't mark the call as cross-relay. This meant the caller's MediaPathReport was delivered via local hub.send_to() to a peer fingerprint that isn't connected locally — silently dropped. Fix: in the cross-relay answer dispatcher, call reg.set_peer_relay_fp(call_id, Some(origin_relay_fp)) so the originating relay knows to forward MediaPathReport via federation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:49:34 +04:00
Siavash Sameni	002df15c5e	fix(cli): add .. rest pattern for RegisterPresenceAck error arm Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:32:57 +04:00
Siavash Sameni	1eb82d77b8	feat(relay+client): relay reports build version in Ack Add relay_build field to RegisterPresenceAck so the client logs which relay version it connected to. Shows in the debug log as register_signal:ack_received {"relay_build":"f843a93"}. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:27:58 +04:00
Siavash Sameni	f843a934fe	fix(relay): forward MediaPathReport across federation MediaPathReport was only delivered via local signal_hub, so calls between peers on different relays always hit peer_report_timeout and fell back to relay — even when direct P2P worked perfectly. Fix: check peer_relay_fp in call_registry (same pattern as DirectCallAnswer). If the peer is on a remote relay, wrap in FederatedSignalForward and send via federation link. Also fix the cross-relay dispatcher to deliver to BOTH caller and callee (not just caller), since the report can come from either side. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:14:30 +04:00
Siavash Sameni	b79073c649	Revert "fix(connect): trust direct path on peer report timeout" This reverts commit `82b439595c`.	2026-04-12 14:10:44 +04:00
Siavash Sameni	82b439595c	fix(connect): trust direct path on peer report timeout When peers are on different relays, MediaPathReport can't be forwarded — causing a 3s timeout and false relay fallback even though direct P2P works perfectly. Fix: on timeout, if local_direct_ok is true AND the direct transport's connection is still alive (no close_reason), trust the direct path instead of falling back to relay. The timeout indicates a relay forwarding issue, not a direct path failure. Also fix ALT build paste URL (paste.tbs.manko.yoga not amn.gg). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:07:44 +04:00
Siavash Sameni	1904b19d05	fix(direct): validate A-role accepted connection, skip stale ones The Acceptor's accept() on the shared signal endpoint can dequeue a stale QUIC connection from a previous call that the Dialer has already dropped. This results in "connection lost" errors when media datagrams are sent — 100% drops on both sides. Fix: after accepting a connection, check close_reason(). If the connection is already closed, log a warning and re-accept. Also verify max_datagram_size() is available before returning. Additionally: emit transport details (remote addr, max_datagram, close_reason) in the call_engine_starting debug event so stale connection issues are visible in the user-facing debug log. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 13:50:21 +04:00
Siavash Sameni	40955bd11c	debug(media): add connection diagnostics for direct P2P drops When direct P2P calls show 100% datagram drops, we need to know WHY send_media() fails. This commit adds: - Remote address + stable_id logging on A-role accept and D-role dial success (dual_path.rs) — tells us which candidate won - Remote address + max_datagram_size on engine transport init — verifies datagrams are negotiated - last_send_err in send heartbeat — captures the actual error from send_datagram() failures - QuinnTransport::remote_address() helper Also fixes UI badge: was looking for wrong event name ("dual_path_race_won" → "path_negotiated"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 13:29:58 +04:00
Siavash Sameni	7554959baa	fix(ui): show correct P2P Direct / Via Relay badge The UI looked for event "connect:dual_path_race_won" which doesn't exist — the actual event is "connect:path_negotiated" with a use_direct boolean. Badge always showed "Via Relay" even when the call was direct P2P. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 13:22:00 +04:00
Siavash Sameni	0b62d3e22f	fix(cli): add missing build_version fields to Offer/Answer CLI binary was missing the new caller_build_version and callee_build_version fields, causing E0063 compile errors on Linux relay/client builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 13:09:26 +04:00
Siavash Sameni	4cfcd5117f	fix(connect): install MediaPathReport oneshot BEFORE race starts The peer's MediaPathReport can arrive while our dual_path::race is still running. Previously, the oneshot was created AFTER the race completed, so the recv loop had nowhere to deliver the report — it was silently dropped, causing a 3s timeout and false relay fallback on ~50% of calls. Fix: create the oneshot and install it in SignalState BEFORE starting the race. The oneshot::Receiver buffers the value so the connect command can read it immediately after the race finishes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 13:06:13 +04:00
Siavash Sameni	bd6733b2e5	feat(signal): advertise build version in Offer/Answer Add caller_build_version / callee_build_version (git short hash) to DirectCallOffer and DirectCallAnswer so peers can identify each other's build in debug logs. Also log own build at register time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 12:43:55 +04:00
Siavash Sameni	7d1b8f1fdc	fix(android): add missing CallSetup pattern fields (.. rest) The CallSetup enum gained peer_direct_addr and peer_local_addrs in Phase 5.5 but the wzp-android signal recv match arm was never updated, breaking cargo ndk builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 12:09:44 +04:00
Siavash Sameni	c2d298beb5	feat(net): Phase 7 — dual-socket IPv4+IPv6 ICE Adds a dedicated IPv6 QUIC endpoint (IPV6_V6ONLY=1 via socket2) alongside the existing IPv4 signal endpoint for proper dual-stack P2P connectivity. Previous [::]:0 dual-stack attempt broke IPv4 on Android; this uses separate sockets per address family like WebRTC/libwebrtc. - create_ipv6_endpoint(): socket2-based IPv6-only UDP socket, tries same port as IPv4 signal EP, falls back to ephemeral - local_host_candidates(v4_port, v6_port): now gathers IPv6 global-unicast (2000::/3) and unique-local (fc00::/7) addrs - dual_path::race(): A-role accepts on both v4+v6 via select!, D-role routes each candidate to matching-AF endpoint - Graceful fallback: if IPv6 unavailable, .ok() → None → pure IPv4 behavior identical to pre-Phase-7 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 11:54:13 +04:00
Siavash Sameni	aee41a638d	fix(audio+net): revert dual-stack [::]:0, add Oboe playout stall auto-restart Two fixes: ## Revert [::]:0 dual-stack sockets → back to 0.0.0.0:0 Android's IPV6_V6ONLY=1 default on some kernels (confirmed on Nothing Phone) makes [::]:0 IPv6-only, silently killing ALL IPv4 traffic. This broke P2P direct calls: IPv4 LAN candidates (172.16.81.x) couldn't complete QUIC handshakes through the IPv6-only socket, causing local_direct_ok=false and relay fallback on every call after the first. Reverted all bind sites to 0.0.0.0:0 (reliable IPv4). IPv6 host candidates are disabled in local_host_candidates() until a proper dual-socket approach (one IPv4 + one IPv6 endpoint, Phase 7) is implemented. ## Fix A (task #35): Oboe playout callback stall auto-restart The Nothing Phone's Oboe playout callback fires once (cb#0) and then stops draining the ring on ~50% of cold-launch calls. Fix D+C (stop+prime from previous commit) didn't help because audio_stop is a no-op on cold launch. New approach: self-healing watchdog in audio_write_playout. Tracks the playout ring's read_idx across writes. If read_idx hasn't advanced in 50 consecutive writes (~1 second), the Oboe playout callback has stopped: 1. Log "playout STALL detected" 2. Call wzp_oboe_stop() to tear down the stuck streams 3. Clear both ring buffers (prevent stale data reads) 4. Call wzp_oboe_start() to rebuild fresh streams 5. Log success/failure 6. Return 0 (caller retries on next frame) This is the same teardown+rebuild that "rejoin" does — but triggered automatically from the first stalled call instead of requiring the user to hang up and redial. The watchdog runs on every write so it fires within 1s of the stall starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 11:24:16 +04:00
Siavash Sameni	9fb92967eb	fix(net): bind all endpoints to [::]:0 for dual-stack IPv4+IPv6 Every QUIC endpoint was bound to 0.0.0.0:0 (IPv4-only). This silently killed ALL IPv6 host candidates: the Dialer couldn't send packets to [2a0d:...] addresses (wrong address family on the socket), and the Acceptor couldn't receive incoming IPv6 QUIC handshakes. The IPv6 candidates were gathered and advertised in DirectCallOffer/Answer but were completely non-functional. On same-LAN with dual-stack (which both test phones have), this meant: - JoinSet fanned out 3+ candidates (2× IPv6 + 1× IPv4) - IPv6 dials failed silently or timed out - IPv4 dial worked but competed with failed IPv6 for JoinSet attention - Sometimes the JoinSet returned an IPv6 failure before the IPv4 success, causing unnecessary fallback to relay Fix: bind to [::]:0 (IPv6 any) instead of 0.0.0.0:0. On dual-stack systems (Linux/Android default), [::]:0 creates a socket that handles BOTH: - IPv6 natively (global unicast, ULA) - IPv4 via v4-mapped addresses (::ffff:172.16.81.x) One socket, both protocols. All 7 bind sites updated: - register_signal (signal endpoint) - do_register_signal - ping_relay - probe_reflect_addr (fresh endpoint fallback) - dual_path::race (A-role fresh, D-role fresh, relay fresh) With this fix, same-LAN P2P should prefer the IPv6 path (no NAT, direct routing, lower latency) and fall through to IPv4 if IPv6 fails — relay is the last resort after ALL candidates are exhausted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 11:09:06 +04:00
Siavash Sameni	9f2ff6a6ec	fix(android-audio): Fix D+C — stop+prime cycle on every call start Addresses the first-join no-audio regression (tasks #35-37) where the Oboe playout callback fires once (cb#0) and then stops draining the ring on the Nothing Phone, causing written_samples to freeze at 7679 (ring capacity minus one burst). Second call (rejoin) always works because audio_stop tears down the streams and audio_start rebuilds them fresh. Two combined fixes: Fix D (task #37): always call audio_stop() before audio_start() at the top of CallEngine::start. On a cold launch this is a no-op (streams not yet started). On subsequent calls it guarantees a clean teardown before rebuild — the same thing rejoin does. Added a 50ms pause between stop and start to let the Android HAL release the audio session. Fix C (task #36): after audio_start(), immediately write 960 samples (20ms) of silence into the playout ring. This ensures the Oboe playout callback has data to drain on its first invocation. On devices where an empty-ring first callback causes the stream to self-pause (Nothing Phone's Qualcomm HAL), the priming data keeps the callback loop alive until real decoded audio arrives from the recv task. Together these cover the two most likely root causes: 1. Stale Oboe state from a previous audio_start that didn't clean up properly → Fix D forces a clean rebuild 2. Playout callback self-pausing on an empty ring → Fix C ensures the ring is non-empty at callback time Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:50:58 +04:00
Siavash Sameni	134ee3a77f	fix(engine): pass is_direct_p2p explicitly instead of deriving from is_some Critical Phase 6 bug: when the negotiation agreed on relay path but delivered the relay transport via pre_connected_transport, CallEngine saw is_some() = true → is_direct_p2p = true → skipped perform_handshake. The relay couldn't authenticate the participant → room join silently failed → recv_fr: 0, both sides sending into the void. Fix: add explicit is_direct_p2p: bool parameter to CallEngine:: start (both android and desktop branches). The connect command sets it from the Phase 6 negotiation result (use_direct), not from whether pre_connected_transport is Some. Now relay-negotiated calls correctly run perform_handshake, and direct P2P calls correctly skip it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:34:21 +04:00
Siavash Sameni	e61397ca85	fix(connect): remove pre-Phase-6 same-IP heuristic The commit `de007ec` added a heuristic that forced relay-only when peers had different public IPs. That was a stopgap for the race condition where one side picked Direct and the other picked Relay. Phase 6 (`f5542ef`) solved this properly via MediaPathReport negotiation, but the heuristic wasn't cleaned up and was still running BEFORE the Phase 6 code — suppressing the race entirely for cross-network calls. Removed. Phase 6 negotiation now handles ALL cases: both sides race, exchange reports, and agree on the same path before committing media. Cross-network calls that can't go P2P will have both sides report direct_ok=false and agree on relay. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:23:36 +04:00
Siavash Sameni	f5542ef822	feat(p2p): Phase 6 — ICE-style path negotiation Before Phase 6, each side's dual-path race ran independently and committed to whichever transport completed first. When one side picked Direct and the other picked Relay, they sent media to different places — TX > 0 RX: 0 on both, completely silent call. Phase 6 adds a negotiation step: after the local race completes, each side sends a MediaPathReport { call_id, direct_ok, winner } to the peer through the relay. Both wait for the other's report before committing a transport to the CallEngine. The decision rule is simple: if BOTH report direct_ok = true, use direct; if EITHER reports false, BOTH use relay. ## Wire protocol New `SignalMessage::MediaPathReport { call_id, direct_ok, race_winner }`. The relay forwards it to the call peer via the same signal_hub routing used for DirectCallOffer/Answer. The cross-relay dispatcher also forwards it. ## dual_path::race restructured Returns `RaceResult` instead of `(Arc<QuinnTransport>, WinningPath)`: - `direct_transport: Option<Arc<QuinnTransport>>` - `relay_transport: Option<Arc<QuinnTransport>>` - `local_winner: WinningPath` Both paths are run as spawned tasks. After the first completes, a 1s grace period lets the loser also finish. The connect command gets BOTH transports (when available) and picks the right one based on the negotiation outcome. The unused transport is dropped. ## connect command flow (revised) 1. Run race() → RaceResult with both transports 2. Send MediaPathReport to relay with our direct_ok 3. Install oneshot; wait for peer's report (3s timeout) 4. Decision: both direct_ok → use direct; else → use relay 5. Start CallEngine with the agreed transport If the peer never responds (old build, timeout), falls back to relay — backward compatible. ## Relay forwarding MediaPathReport is forwarded like DirectCallOffer/Answer: via signal_hub.send_to(peer_fp) for same-relay calls, and via cross-relay dispatcher for federated calls. ## Debug log events - `connect:dual_path_race_done` — local race result - `connect:path_report_sent` — our report to the peer - `connect:peer_report_received` — peer's report - `connect:peer_report_timeout` — peer didn't respond (3s) - `connect:path_negotiated` — final agreed path with reasons Full workspace test: 423 passing (no regressions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:03:42 +04:00

1 2 3 4 5 ...

419 Commits