wz-phone

Author	SHA1	Message	Date
Siavash Sameni	06d28a9280	fix(video): preserve annex-b mediacodec output Some checks failed Mirror to GitHub / mirror (push) Failing after 31s Details Build Release Binaries / build-amd64 (push) Failing after 3m35s Details	2026-05-25 20:20:22 +04:00
Siavash Sameni	d57ebe3d2c	fix(video): force h264 and trace frame pipeline Some checks failed Build Release Binaries / build-amd64 (push) Failing after 3m32s Details Mirror to GitHub / mirror (push) Failing after 28s Details	2026-05-25 20:03:11 +04:00
Siavash Sameni	7eca79846f	fix(quality): use windowed loss instead of cumulative for codec adaptation Some checks failed Mirror to GitHub / mirror (push) Failing after 36s Details Build Release Binaries / build-amd64 (push) Failing after 3m9s Details Quinn's cumulative loss_pct (lost / sent since connection start) was biased forever by handshake-era losses. Even ~5 lost-out-of-100 early packets pinned us at "Degraded" (5% threshold) and Codec2_1200 was just a few more drops away. The metric only diluted as thousands more clean packets accumulated — by which time the call was over. LossWindow tracks prev (sent, lost) and reports delta loss per ~25- packet window. The cumulative value is the fallback when the window hasn't accumulated enough samples (< 20 packets). All 6 sites converted (DRED tuner + QualityReport on both send tasks, self-observation on both recv tasks). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 18:55:57 +04:00
Siavash Sameni	25b3278d31	feat(android): wire video send + recv in Android engine; add video:* debug events Some checks failed Mirror to GitHub / mirror (push) Failing after 30s Details Build Release Binaries / build-amd64 (push) Failing after 3m5s Details Mirror the desktop video pipeline into the #[cfg(target_os="android")] start function: capture _negotiated_video_codec from the handshake, spawn a video send task that pulls VideoFrames from camera_tx, encodes/packetizes/sends. Add video reassembly + decode + emit "video:frame" in the recv task before the audio branch so Android can both send and receive video. Instrumentation: emit video:first_send and video:first_recv on both desktop and android paths so we can verify the pipeline end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 18:19:42 +04:00
Siavash Sameni	e8cab25eda	fix: revert E2E AEAD wrapping (broke multi-client voice); add Android CAMERA Some checks failed Mirror to GitHub / mirror (push) Failing after 24s Details Build Release Binaries / build-amd64 (push) Failing after 3m19s Details Voice regression: EncryptingTransport encrypts media with the pairwise client↔relay session key, but the relay forwards bytes without re-encrypting per recipient. Sender's key_A ≠ recipient's key_B → recipient cannot decrypt → silent audio between mac and android. Drop the wrapper; restore plaintext- over-QUIC-TLS to the relay. Proper E2E needs MLS group keys or relay hop-by- hop re-encryption (future PRD). Android camera: add CAMERA manifest permission + runtime request via MainActivity. NOTE: still not sufficient — Tauri/Wry's WebChromeClient does not grant getUserMedia, so video on Android needs a Tauri plugin override or native Camera2 path. Documented in MainActivity.kt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 17:04:56 +04:00
Siavash Sameni	06253fdeeb	feat(video+desktop): camera capture, video UI, E2E AEAD wiring, test fixes Blockers 4 & 5: browser getUserMedia → JPEG IPC → Rust I420 pipeline; remote video strip renders decoded frames via canvas; EncryptingTransport wraps QuinnTransport so WZP AEAD is applied to all media (C2 fix). Test fixes: HandshakeResult.session destructuring across relay/client/crypto integration tests; video_codecs field added to all CallOffer/CallAnswer structs; wzp-video pipeline_roundtrip integration tests added. PRD docs: five Kimi-ready specs for E2E encryption, Android NDK 0.9 migration, quality upgrade flow, wire-format hardening, and clippy debt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 15:30:26 +04:00
Siavash Sameni	739bdaf3ab	feat(debug): emit media:room_update and participants call-event from signal task Pass AppHandle into run_signal_task so it can emit call-debug events and Tauri events directly. On each RoomUpdate: - emit connect:media:room_update debug event with participant list - emit call-event/participants Tauri event for JS-side diagnostics Helps diagnose whether room join and participant sync is working independently of audio startup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 09:07:08 +04:00
Siavash Sameni	bc1668ed96	fix(android): run set_audio_mode_communication on Tauri main thread spawn_blocking uses arbitrary thread-pool threads that don't have the Android JNI context initialized, causing ndk_context::android_context() to panic. Switch to run_on_main_thread (where the context is always valid) via a oneshot channel, with a 2s timeout. Panic is caught and forwarded as an Err so the debug log captures it rather than crashing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 08:18:18 +04:00
Siavash Sameni	77b036439b	fix(android): spawn_blocking + 2s timeout for set_audio_mode_communication The JNI call into AudioManager.setMode() was running directly on the tokio async thread. If the Android audio policy service is slow (e.g. immediately after mic permission grant), this could block the runtime. Moved to spawn_blocking with a 2s timeout; timeout and panic cases are logged as connect:audio_mode_timeout / connect:audio_mode_panic debug events and treated as non-fatal (we continue to audio_start). Also removes the has_record_audio_permission call from the preflight debug event — it was a redundant JNI round-trip that added latency and is now captured separately in the preflight_start event context. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 08:08:24 +04:00
Siavash Sameni	0ebc73ab13	fix(android): remove legacy connected event_cb; add preflight_start debug step The legacy event_cb("connected") call between handshake and audio preflight was a no-op on the frontend (it enters voice only after the command resolves) but added noise to failing traces. Replaced with a connect:connected_event_skipped debug event and added an explicit connect:android_audio_preflight_start marker so the debug log shows a clear boundary between handshake completion and audio startup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 08:02:19 +04:00
Siavash Sameni	394987a349	fix(android): 8s Rust timeout on audio_start; always emit connect: debug events - engine.rs: wrap spawn_blocking(audio_start) in an 8s tokio timeout so the connect command fails fast with a clear error if the Oboe HAL never returns, instead of blocking the JS 45s timer - lib.rs: emit_call_debug now always forwards connect: and register_signal: steps to the JS overlay regardless of the debug-logs toggle — needed because app-data clears reset the toggle to false, making join failures invisible on first install - main.ts: JS timeout bumped to 45s (Rust 8s fires first); timeout message now includes last native connect: step so the toast is actionable without opening the debug log Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 07:49:21 +04:00
Siavash Sameni	2aa6582585	fix(android): call-debug instrumentation for audio startup path Add emit_call_debug events at every step of the Android connect/audio path so failures are visible in the Settings debug log without needing adb logcat: - connect:handshake_start/done/failed (with timing) - connect:android_audio_preflight (wzp_native loaded + RECORD_AUDIO permission check via new has_record_audio_permission() JNI helper) - connect:audio_stop_start/done - connect:audio_mode_start/done/failed - connect:audio_start_start/failed/panic/done (with oboe error code) - connect:reuse_endpoint (endpoint reuse diagnostic) Also adds has_record_audio_permission() to android_audio.rs — used in the preflight event to confirm the OS has granted mic access before wzp_oboe_start is called. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 07:38:38 +04:00
Siavash Sameni	5a13f12334	fix(android): spawn_blocking for audio_start + 15s JS connect timeout wzp_oboe_start is a sync FFI call that can block the OS thread indefinitely waiting on the Android audio HAL. Calling it directly from an async context freezes all tokio tasks including Rust-side timeouts. Fix: run it via spawn_blocking so tokio stays responsive. Also add a 15s Promise.race timeout in JS so a frozen audio_start surfaces as "connect timed out — check audio permissions" instead of the join button staying stuck in "Connecting…" forever. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 07:13:26 +04:00
Siavash Sameni	276ecc660e	T5.1: PriorityMode enum + SetPriorityMode signal; extend QualityProfile with video fields	2026-05-12 12:21:40 +04:00
Siavash Sameni	e73f8a7150	T3.3: SignalMessage version field	2026-05-12 06:11:59 +04:00
Siavash Sameni	6f81487778	T1.6: Protocol version negotiation in handshake	2026-05-11 15:53:04 +04:00
Siavash Sameni	c93d302656	T1.5: Migrate emit/parse sites to v2 wire format	2026-05-11 12:37:32 +04:00
Siavash Sameni	5d431c0721	fix(android): restore tauri::Emitter import for Docker builder toolchain Some checks failed Mirror to GitHub / mirror (push) Failing after 24s Details Build Release Binaries / build-amd64 (push) Has been cancelled Details Edition 2024 on local macOS auto-resolves the Emitter trait, but the Docker builder's Rust/Tauri version requires the explicit import for AppHandle::emit() to resolve. Keeps the warning locally to avoid breaking the CI build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:34:23 +04:00
Siavash Sameni	8fcf1be341	feat(nat): Tailscale-inspired STUN/ICE + port mapping + mid-call re-gathering (#28 ) Some checks failed Mirror to GitHub / mirror (push) Failing after 23s Details Build Release Binaries / build-amd64 (push) Failing after 6m8s Details Phase 8: 5 new modules bringing NAT traversal close to Tailscale's approach. - stun.rs: RFC 5389 STUN client — public server reflexive discovery, XOR-MAPPED-ADDRESS parsing, parallel probe with retry, STUN fallback in desktop try_reflect_own_addr() - portmap.rs: NAT-PMP (RFC 6886) + PCP (RFC 6887) + UPnP IGD port mapping — gateway discovery, acquire/release/refresh lifecycle, new PeerCandidates.mapped candidate type in dial order - ice_agent.rs: candidate lifecycle — gather(), re_gather(), apply_peer_update() with monotonic generation counter, CandidateUpdate signal message forwarded by relay - netcheck.rs: comprehensive diagnostic — NAT type, IPv4/v6, port mapping availability, relay latencies, CLI --netcheck - relay_map.rs: RTT-sorted relay map, preferred() selection, populate_from_ack() for RegisterPresenceAck.available_relays Relay: CallRegistry stores + cross-wires caller/callee_mapped_addr into CallSetup.peer_mapped_addr. Region config + available_relays populated from federation peers in RegisterPresenceAck. Desktop: place_call/answer_call call acquire_port_mapping() and fill caller/callee_mapped_addr. STUN+relay combined NAT detection. 571 tests pass (66 new), 0 regressions, 0 warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:17:17 +04:00
Siavash Sameni	1e82811cc1	feat(p2p): adaptive quality on direct calls (#23 ) Some checks failed Mirror to GitHub / mirror (push) Failing after 27s Details Build Release Binaries / build-amd64 (push) Failing after 3m37s Details P2P calls now adapt codec quality based on observed network conditions, matching what relay calls already had. Three-layer implementation: - QualityReport::from_path_stats(): construct reports from local quinn stats (loss%, RTT, jitter) without needing relay-generated reports - CallEncoder.pending_quality_report: one-shot attachment to next source packet (consumed on encode, not repeated) - Engine send tasks: generate quality report every 50 frames (~1s) from quinn_path_stats() and attach via set_pending_quality_report() - Engine recv tasks: self-observe from own QUIC path stats every 50 packets, feed to AdaptiveQualityController for P2P adaptation (works even if peer isn't sending quality reports yet) Both relay and P2P calls now have adaptive quality. On relay calls, both peer-sent reports AND local observations feed the controller. Hysteresis (3 consecutive bad reports to downgrade) prevents thrashing. 372 tests passing (+4 new: from_path_stats encoding, clamping, zero values, encoder quality report attachment). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:14:06 +04:00
Siavash Sameni	ba12aae439	refactor: extract shared engine helpers, federation clone-before-send, constants Some checks failed Mirror to GitHub / mirror (push) Failing after 30s Details Build Release Binaries / build-amd64 (push) Failing after 3m48s Details Engine deduplication (PRD-engine-dedup.md): - build_call_config(): shared CallConfig construction (was 23 lines × 2) - codec_to_profile(): shared CodecId → QualityProfile mapping (was 19 lines × 2) - run_signal_task(): shared signal handler (was 48 lines × 2) - Net -39 lines from engine.rs, 6 duplicated blocks → single-line calls Quick wins from REFACTOR-codebase-audit.md: - 6 magic number constants extracted (CAPTURE_POLL_MS, RECV_TIMEOUT_MS, etc.) - DRED_POLL_INTERVAL moved from 2 local defs to 1 module-level const - federation.rs: forward_to_peers, broadcast_signal, send_signal_to_peer now clone peer list and release lock before sending (was holding Mutex across async I/O — last lock-during-send pattern eliminated) - main.rs: close_transport() helper replaces 12 silent .ok() calls with debug-level logging 314 tests passing, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 15:22:44 +04:00
Siavash Sameni	9ae9441de4	fix(audio): check capture ring available before read (fixes Opus6k choppy) Some checks failed Mirror to GitHub / mirror (push) Failing after 32s Details Build Release Binaries / build-amd64 (push) Failing after 3m58s Details Partial reads from the capture ring consumed samples that were then discarded when the send loop retried from buf[0]. For 20ms codecs this was invisible (single Oboe burst fills 960 samples in one read), but 40ms codecs (Opus6k, 1920 samples) needed 2 bursts — the first partial read consumed 960 real samples and threw them away. Result: Opus6k produced ~11 frames/s instead of 25 (~44% of expected). Fix: expose wzp_native_audio_capture_available() and check it before reading, matching the desktop capture_ring.available() pattern. Partial reads no longer occur because we only read when enough samples exist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:46:15 +04:00
Siavash Sameni	8ff0c548a7	fix(audio): update frame_samples on codec profile switch, fix buf sizing Some checks failed Mirror to GitHub / mirror (push) Failing after 27s Details Build Release Binaries / build-amd64 (push) Has been cancelled Details frame_samples was immutable — when adaptive quality switched from 20ms (Opus24k, 960 samples) to 40ms (Opus6k, 1920 samples), the send loop kept reading 960 samples and feeding half-sized frames to the encoder. This caused Opus6k to produce ~11 frames/s instead of 25, making audio choppy. Fix: - frame_samples is now mut and updated on profile switch - buf sized for max frame (1920) with frame_samples-bounded slices - RMS, mute, encode, and capture reads all use &buf[..frame_samples] - Applied to both Android and desktop send tasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:33:02 +04:00
Siavash Sameni	d424515542	feat: 5-tier quality classification, QualityDirective handling, debug tap stats Some checks failed Mirror to GitHub / mirror (push) Failing after 31s Details Build Release Binaries / build-amd64 (push) Failing after 3m49s Details - Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good + Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5, studio:10) - Handle QualityDirective signals in both desktop and Android engines — relay-coordinated codec switching now works end-to-end - Add periodic TAP STATS to debug tap: packets in/out, fan-out avg, seq gaps, codecs seen (every 5s) - Mark task #2 done (ParticipantInfo in federation signals already implemented) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:23:48 +04:00
Siavash Sameni	22045bc5e6	feat: adaptive quality in desktop, relay quality directive, Oboe state polling - Wire AdaptiveQualityController into desktop engine send/recv tasks (mirrors Android pattern: AtomicU8 pending_profile, auto-mode check) - Wire same into Android engine send task (was only in recv before) - QualityDirective SignalMessage variant for relay-initiated codec switch - ParticipantQuality tracking in relay RoomManager (per-participant AdaptiveQualityController, weakest-link tier computation) - Relay broadcasts QualityDirective to all participants when room-wide tier degrades (coordinated codec switching) - Oboe stream state polling: poll getState() for up to 2s after requestStart() to ensure both streams reach Started before proceeding (fixes intermittent silent calls on cold start, Nothing Phone A059) Tasks: #7, #25, #26, #31, #35 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 19:54:04 +04:00
Siavash Sameni	766c9df442	feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window - DredTuner: maps live network metrics (loss/RTT/jitter) to continuous DRED duration every ~500ms instead of discrete tier-locked values. Includes jitter-spike detection for pre-emptive Starlink-style boost. - Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports) - PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval - TrunkedForwarder respects discovered MTU (was hard-coded 1200) - QuinnPathSnapshot exposes quinn internal stats + discovered MTU - AudioEncoder trait: set_expected_loss() + set_dred_duration() methods - PathMonitor: sliding-window jitter variance for spike detection - Integrated into both Android and desktop send tasks in engine.rs - 14 new tests (10 tuner unit + 4 encoder integration) - Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 19:38:37 +04:00
Siavash Sameni	24cc74d93c	fix(audio): clear BT SCO communication device on call end Without clearCommunicationDevice(), the BT headset stays locked in SCO mode after the call. Media playback (video, music) can't route to BT A2DP, requiring a device reboot to restore normal audio. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:40:44 +04:00
Siavash Sameni	114d69e488	fix: use tracing::warn! instead of bare warn! in engine.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:31:12 +04:00
Siavash Sameni	15c237ceea	fix(audio): defer MODE_IN_COMMUNICATION to call start, restore on end Root cause: MainActivity set MODE_IN_COMMUNICATION at app launch, hijacking system audio routing immediately — BT A2DP music dropped to earpiece, and the pre-existing communication mode confused subsequent setCommunicationDevice calls for BT SCO. Fix: MainActivity now only sets volumes. MODE_IN_COMMUNICATION is set via JNI right before Oboe audio_start() in CallEngine, and MODE_NORMAL is restored after audio_stop() when the call ends. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:29:59 +04:00
Siavash Sameni	29cd23fe39	fix(p2p): connection cleanup — 4 fixes for stale/dead connections PRD 4: Disable IPv6 direct dial/accept temporarily. IPv6 QUIC handshakes succeed but connections die immediately on datagram send ("connection lost"). IPv4 candidates work reliably. IPv6 candidates still gathered but filtered at dial time. PRD 1: Close losing transport after Phase 6 negotiation. The non-selected transport now gets an explicit QUIC close frame instead of silently dropping after 30s idle timeout. Prevents phantom connections from polluting future accept() calls. PRD 2: Harden accept loop with max 3 stale retries. Stale connections are explicitly closed (conn.close) and counted. After 3 stale connections, the accept loop aborts instead of spinning until the race timeout. PRD 3: Resource cleanup — close old IPv6 endpoint before creating a new one in place_call/answer_call. Add Drop impl to CallEngine so tasks are signalled to stop on ungraceful shutdown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 15:11:50 +04:00
Siavash Sameni	40955bd11c	debug(media): add connection diagnostics for direct P2P drops When direct P2P calls show 100% datagram drops, we need to know WHY send_media() fails. This commit adds: - Remote address + stable_id logging on A-role accept and D-role dial success (dual_path.rs) — tells us which candidate won - Remote address + max_datagram_size on engine transport init — verifies datagrams are negotiated - last_send_err in send heartbeat — captures the actual error from send_datagram() failures - QuinnTransport::remote_address() helper Also fixes UI badge: was looking for wrong event name ("dual_path_race_won" → "path_negotiated"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 13:29:58 +04:00
Siavash Sameni	9f2ff6a6ec	fix(android-audio): Fix D+C — stop+prime cycle on every call start Addresses the first-join no-audio regression (tasks #35-37) where the Oboe playout callback fires once (cb#0) and then stops draining the ring on the Nothing Phone, causing written_samples to freeze at 7679 (ring capacity minus one burst). Second call (rejoin) always works because audio_stop tears down the streams and audio_start rebuilds them fresh. Two combined fixes: Fix D (task #37): always call audio_stop() before audio_start() at the top of CallEngine::start. On a cold launch this is a no-op (streams not yet started). On subsequent calls it guarantees a clean teardown before rebuild — the same thing rejoin does. Added a 50ms pause between stop and start to let the Android HAL release the audio session. Fix C (task #36): after audio_start(), immediately write 960 samples (20ms) of silence into the playout ring. This ensures the Oboe playout callback has data to drain on its first invocation. On devices where an empty-ring first callback causes the stream to self-pause (Nothing Phone's Qualcomm HAL), the priming data keeps the callback loop alive until real decoded audio arrives from the recv task. Together these cover the two most likely root causes: 1. Stale Oboe state from a previous audio_start that didn't clean up properly → Fix D forces a clean rebuild 2. Playout callback self-pausing on an empty ring → Fix C ensures the ring is non-empty at callback time Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:50:58 +04:00
Siavash Sameni	134ee3a77f	fix(engine): pass is_direct_p2p explicitly instead of deriving from is_some Critical Phase 6 bug: when the negotiation agreed on relay path but delivered the relay transport via pre_connected_transport, CallEngine saw is_some() = true → is_direct_p2p = true → skipped perform_handshake. The relay couldn't authenticate the participant → room join silently failed → recv_fr: 0, both sides sending into the void. Fix: add explicit is_direct_p2p: bool parameter to CallEngine:: start (both android and desktop branches). The connect command sets it from the Phase 6 negotiation result (use_direct), not from whether pre_connected_transport is Some. Now relay-negotiated calls correctly run perform_handshake, and direct P2P calls correctly skip it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:34:21 +04:00
Siavash Sameni	0a973b234b	fix(engine): import tauri::Emitter for AppHandle::emit on Android target	2026-04-12 09:29:56 +04:00
Siavash Sameni	0ccf4ed6b5	feat(call): media health watchdog — warn user when no audio arrives When a P2P direct call establishes successfully but the underlying network path dies (phone switched from WiFi to LTE mid-call, or cross-relay media forwarding isn't working), the call stays up silently with recv_fr frozen at 0. No feedback to the user. New watchdog in the Android recv task: tracks consecutive heartbeat ticks (2s each) where recv_fr hasn't advanced. After 3 ticks (6s) with no new packets, emits: - call-event { kind: "media-degraded" } — user-facing warning banner: "No audio — connection may be lost. Try hanging up and reconnecting, or switch to a different relay." - call-debug media:no_recv_timeout for the debug log If packets resume (recv_fr advances), clears the banner via: - call-event { kind: "media-recovered" } JS listener creates/removes a red-tinted banner dynamically at the top of the call screen. Banner is also cleaned up on showConnectScreen (call end). This covers: - Direct P2P that established on WiFi but died when the phone switched to LTE (stale NAT mapping, unreachable peer) - Cross-relay calls where federation media isn't forwarding (relay not upgraded, not federated, etc.) - Any other "connected but silent" scenario Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:18:38 +04:00
Siavash Sameni	2427630472	fix(connect): make peerLocalAddrs optional + skip handshake on direct P2P Two regressions from Phase 5.5/5.6: 1. Room connect broken: the connect Tauri command required peerLocalAddrs as a Vec<String>, but the room-join JS path doesn't pass it (only the direct-call setup handler does). Error: "invalid args 'peerLocalAddrs' for command 'connect': command connect missing required key peerLocalAddrs". Fix: change to Option<Vec<String>>, unwrap_or_default() at usage sites. Room connect works again with zero peer addrs. 2. Direct P2P call connects but then CallEngine fails with "expected CallAnswer, got Discriminant(0)". Root cause: after the dual-path race picked a direct P2P transport, CallEngine still ran perform_handshake() on it. That handshake is a relay-specific protocol — sends a CallOffer signal and waits for CallAnswer back. On a direct QUIC connection to a phone, there's nobody running accept_handshake, so the handshake reads garbage from the peer's first media packet and errors. Fix: track is_direct_p2p = pre_connected_transport.is_some() and skip perform_handshake when true. The direct connection is already TLS-encrypted by QUIC, and both peers' identities were verified through the signal channel (DirectCallOffer/ Answer carry identity_pub + ephemeral_pub + signature). Both android and desktop branches updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 08:09:32 +04:00
Siavash Sameni	16793be36f	fix(p2p): Phase 5.6 — direct-path head start + hangup propagation + media debug events Three fixes from a field-test log where same-LAN calls were still losing the dual-path race to the relay path, peers were getting stuck on an empty call screen when the other side hung up, and 1-way audio was hard to diagnose because the GUI debug log had no media-level events. ## 1. Direct-path 500ms head start (dual_path.rs) The race was resolving in ~105ms with Relay winning even when both phones were on the same MikroTik LAN with valid IPv6 host candidates. Root cause: the relay dial is a plain outbound QUIC connect that completes in whatever the client→relay RTT is (~100ms), while the direct path needs the PEER to also process its CallSetup, spin up its own race, and complete at least one LAN dial back to us. That cross-client sequence reliably takes longer than 100ms, so relay always won. Fix: delay the relay_fut with `tokio::time::sleep(500ms)` before starting its connect. Same-LAN direct dials complete in 30-50ms typically, so the head start gives direct plenty of time to win cleanly. Users on setups where direct genuinely can't work (LTE-to-LTE cross-carrier) pay 500ms extra on the relay fallback, which is invisible for a call setup. ## 2. Hangup propagation via a new hangup_call command (lib.rs + main.ts) The hangup button was calling `disconnect` which stopped the local media engine but never sent a SignalMessage::Hangup to the relay. The peer never got notified and was stuck on the call screen with silent audio. My earlier fix (commit `e75b045`) only handled the RECEIVE side — auto-dismiss call screen on recv:Hangup — but the SEND side was still missing. New Tauri command `hangup_call`: 1. Acquire state.signal.lock(), send SignalMessage::Hangup over the signal transport (best-effort; log + continue if signal is down) 2. Acquire state.engine.lock(), stop the CallEngine JS hangupBtn click handler now calls hangup_call with a fallback to raw disconnect if the command is missing (older builds). ## 3. Media debug events (engine.rs + lib.rs) Threaded tauri::AppHandle into CallEngine::start so the send/ recv tasks can emit call-debug events when the user has debug logs enabled. Added on the Android branch (desktop branch accepts the arg for API symmetry but doesn't emit yet): - media:first_send — emitted when the first encoded frame is handed to the transport. Useful for 1-way audio diagnosis: if this fires on side A but side B never sees media:first_recv, A's outbound is broken. - media:first_recv — emitted when the first packet from the peer arrives. Mirror of first_send. - media:send_heartbeat — every 2s with frames_sent, last_rms, last_pkt_bytes, short_reads, drops. A stalled last_rms (== 0) tells you the mic isn't producing samples; a frozen frames_sent tells you the encode pipeline hung. - media:recv_heartbeat — every 2s with recv_fr, decoded_frames, last_written, written_samples, decode_errs, codec. Mirror invariants for the inbound direction. All four are gated by `call_debug_logs_enabled()` via `emit_call_debug`, so they only show up in the GUI log when the user has the Call Flow Debug Logs checkbox on. Tracing::info! still runs unconditionally so logcat (adb) keeps its copy regardless. The `emit_call_debug` fn in lib.rs is now `pub(crate)` so engine.rs can call it via `crate::emit_call_debug`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 07:55:41 +04:00
Siavash Sameni	59ce52f8e8	feat(p2p): Phase 3.5 dual-path QUIC race + GUI call-flow debug logs Two features in one commit because they ship and test together: Phase 3.5 closes the hole-punching loop and the call-flow debug logs give the user live visibility into every step of a call so real-hardware testing of the new P2P path is debuggable. ## Phase 3.5 — dual-path QUIC connect race Completes the hole-punching work Phase 3 scaffolded. On receiving a CallSetup with peer_direct_addr, the client now actually races a direct QUIC handshake against the relay dial and uses whichever completes first. Symmetric role assignment avoids the two-conns- per-call problem: - Both peers compare `own_reflex_addr` vs `peer_reflex_addr` lexicographically. - Smaller addr → Acceptor (A-role): builds a server-capable dual endpoint, awaits an incoming QUIC session. Does NOT dial. - Larger addr → Dialer (D-role): builds a client-only endpoint, dials the peer's addr with `call-<id>` SNI. Does NOT listen. - Both sides always dial the relay in parallel as fallback. - `tokio::select!` with `biased` preference for direct, `tokio::pin!` so each branch can await the losing opposite as fallback. - Direct timeout 2s, relay fallback timeout 5s (so 7s worst case from CallSetup to "no media path" error). New crate module `wzp_client::dual_path::{race, WinningPath}` (moved here from desktop/src-tauri so it's testable from a workspace test). `determine_role` in `wzp_client::reflect` is pure-function and unit-tested. ### CallEngine integration - New `pre_connected_transport: Option<Arc<QuinnTransport>>` arg on both android + desktop `CallEngine::start` branches. Skips the internal wzp_transport::connect step when Some. Backward- compat: None keeps Phase 0 relay-only behavior. - `connect` Tauri command reads own_reflex_addr from SignalState, computes role, runs the race, passes the winning transport into CallEngine. If ANY input is missing (no peer addr, no own addr, equal addrs), falls back to classic relay path — identical to pre-Phase-3.5 behavior. ### Tests (9 new, all passing) - 6 unit tests for `determine_role` truth table in `wzp-client/src/reflect.rs` (smaller=Acceptor, larger=Dialer, port-only diff, equal, missing-side, symmetry) - 3 integration tests in `crates/wzp-client/tests/dual_path.rs`: * `dual_path_direct_wins_on_loopback` — two-endpoint test rig, Dialer wins direct path vs loopback mock relay * `dual_path_relay_wins_when_direct_is_dead` — dead peer port, 2s direct timeout, relay fallback wins * `dual_path_errors_cleanly_when_both_paths_dead` — <10s error, no hang ## GUI call-flow debug logs Runtime-toggled structured events at every step of a call so the user can see where a call progressed or stalled on real hardware. Modeled on the existing DRED_VERBOSE_LOGS pattern. ### Rust side - `static CALL_DEBUG_LOGS: AtomicBool` + `emit_call_debug(&app, step, details)` helper. Always logs via `tracing::info!` (logcat always has a copy); GUI Tauri `call-debug-log` event only fires when the flag is on. - Tauri commands `set_call_debug_logs` / `get_call_debug_logs`. ### Instrumented steps (24 emit_call_debug sites) - `register_signal`: start, identity loaded, endpoint created, connect failed/ok, RegisterPresence sent, ack received/failed, recv loop spawning - Recv loop: CallRinging, DirectCallOffer (w/ caller_reflexive_addr), DirectCallAnswer (w/ callee_reflexive_addr), CallSetup (w/ peer_direct_addr), Hangup - `place_call`: start, reflect query start/ok/none, offer sent, send failed - `answer_call`: start, reflect query start/ok/none or privacy skip, answer sent, send failed - `connect`: start, dual_path_race_start (w/ role), won (w/ path), failed, skipped (w/ reasons), call_engine_starting/ started/failed ### JS side - New `callDebugLogs: boolean` field on Settings type. - Boot-time hydrate of the Rust flag from localStorage so the choice survives restarts (like `dredDebugLogs`). - Settings panel: new "Call flow debug logs" checkbox alongside the DRED toggle. - New "Call Debug Log" section that ONLY shows when the flag is on. Rolling in-memory buffer of the last 200 events, rendered as monospace `HH:MM:SS.mmm step {details}` lines with auto- scroll and a Clear button. - `listen("call-debug-log", ...)` subscribed at app startup, appends to the buffer, re-renders on every event. Full workspace test goes from 404 → 413 passing. Clippy clean on touched crates. PRD: .taskmaster/docs/prd_phase35_dual_path_race.txt Tasks: 61-69 all completed Next: APK + desktop build carrying everything — Phase 2 NAT detect, Phase 3 advertising, Phase 3.5 dual-path + call debug logs, plus the earlier Android first-join diagnostics — so the user can validate the P2P path on real hardware with live per-step visibility into where any failures happen. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 14:06:44 +04:00
Siavash Sameni	7e7968b2f9	diag(android-engine): first-join no-audio ordering instrumentation Adds a single call_t0 = Instant::now() at the top of the Android CallEngine::start path, threaded through send + recv tasks as send_t0 / recv_t0, and tags the following milestones with t_ms_since_call_start so we can build a clean side-by-side log of first-call vs rejoin: 1. QUIC connection established 2. handshake complete 3. wzp-native audio_start returned (+ how long audio_start itself took) 4. send task spawned 5. send: first full capture frame read (+ short_reads_before count) 6. send: first non-zero capture RMS 7. recv task spawned 8. recv: first media packet received 9. recv: first successful decode 10. recv: first playout-ring write Combined with the existing C++-side cb#0 logs in crates/wzp-native/cpp/oboe_bridge.cpp ("capture cb#0", "playout cb#0") this gives us full-pipeline ordering with no native-side changes needed. PRD: .taskmaster/docs/prd_android_first_join_no_audio.txt Task: 32 (first task in the chain — diagnostics before any fix attempts so we know which of the 5 suspect causes is real). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 10:00:20 +04:00
Siavash Sameni	578ff8cff4	feat(debug): GUI toggle for DRED verbose logs + macOS mic permission DRED verbose logs (off by default — keeps logcat clean in normal use): - wzp-codec: DRED_VERBOSE_LOGS atomic flag with dred_verbose_logs() / set_dred_verbose_logs() helpers - opus_enc: gate "DRED enabled" + libopus version logs behind the flag - desktop/src-tauri/engine.rs: gate DredRecvState parse log, reconstruction log, classical PLC log, and DRED-counter fields in the Android recv heartbeat (non-verbose path still logs basic recv stats) - Tauri commands set_dred_verbose_logs / get_dred_verbose_logs - Settings panel gets a "DRED debug logs (verbose, dev only)" checkbox; preference persists in wzp-settings localStorage and is pushed to Rust on save and on app boot macOS mic permission: - Add desktop/src-tauri/Info.plist with NSMicrophoneUsageDescription. Without it, modern macOS silently denies CoreAudio capture for ad-hoc-signed Tauri builds — capture starts but every callback hands you zeros. Symptom: phones could not hear desktop client, desktop could still hear phones (playout has no TCC gate). The Tauri 2 bundler auto-merges this file into WarzonePhone.app's Contents/Info.plist on the next build, so first launch will pop the standard mic prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 09:48:32 +04:00
Siavash Sameni	16890576fb	feat(observability): logcat-visible DRED proof of life on Android Adds enough INFO-level logging that an opus-DRED-v2 APK on Android can be verified end-to-end by reading logcat alone — no debugger, no Prometheus, no telemetry pipeline required. Three observation points: 1. Encoder construction (opus_enc.rs) - Bumped the "DRED enabled" log from debug! to info! so the per-call DRED config is in logcat by default. Each call's first OpusEncoder construction logs codec, dred_frames, dred_ms, loss_floor_pct. - Added a one-shot static OnceLock that logs `opusic_c::version()` the first time an OpusEncoder is built in the process. This is the smoking gun for "is the new libopus actually loaded" — pre- Phase-0 audiopus shipped libopus 1.3 with no DRED, post-Phase-0 should print 1.5.2 here. 2. DRED state ingest (DredRecvState::ingest_opus in desktop/src-tauri/src/engine.rs) - First successful parse on a call logs immediately so we can see "DRED is on the wire" in logcat. - Subsequent parses sample every 100th to confirm steady-state samples_available without drowning the log. - New parses_total / parses_with_data counters track the parse rate vs the success rate (a packet without DRED in it returns `available == 0`, so a low ratio means the encoder isn't emitting DRED bytes). 3. DRED reconstruction events (DredRecvState::fill_gap_to) - Every DRED reconstruction logs at INFO with missing_seq, anchor_seq, offset_samples, offset_ms, samples_available, gap_size, and the running total. These events are rare on a clean network and we want to know exactly which gap was filled. - First three classical PLC fills + every 50th thereafter log so we can see when DRED couldn't cover a gap (offset out of range, no good state, or reconstruct error). 4. Recv heartbeat (Android start() in engine.rs) - Existing 2-second heartbeat now includes dred_recv, classical_plc, dred_parses_with_data, dred_parses_total so a steady-state call shows the cumulative counters in logcat without parsing. How to verify on a real call: adb logcat -s 'RustStdoutStderr:*' \| grep -i 'dred\\|libopus version' Expected output sequence on a successful Opus call: - "linked libopus version libopus_version=libopus 1.5.2-..." (once per process) - "opus encoder: DRED enabled codec=Opus24k dred_frames=20 dred_ms=200 loss_floor_pct=15" (per call) - "DRED state parsed from Opus packet seq=N samples_available=4560 ms=95 ..." (after first DRED-bearing packet) - "recv heartbeat (android) ... dred_recv=0 classical_plc=0 dred_parses_with_data=58 dred_parses_total=58" (every 2s) If you see "linked libopus version libopus 1.3" — the FFI swap didn't take. If dred_parses_with_data stays at 0 while dred_parses_total climbs — the sender isn't emitting DRED (check the encoder's loss floor and the receiver's libopus version). If gaps trigger "classical PLC fill" instead of "DRED reconstruction fired" — DRED state coverage is too small for the observed loss pattern, and the loss floor or DRED duration policy needs tuning. Verification: - cargo check -p wzp-codec -p wzp-client: 0 errors - cargo check -p wzp-desktop: 0 Rust errors (only the pre-existing tauri::generate_context!() proc macro panic on missing ../dist which fires at host check time, irrelevant on the remote build) - cargo test -p wzp-codec --lib: 69 passing (no regressions) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 08:58:03 +04:00
Siavash Sameni	dfbe21fe6e	feat(tauri-engine): Phase 3b/3c re-port — DRED reconstruction on the live Tauri mobile engine The original Phase 3b landed on wzp-client/CallDecoder and Phase 3c landed on wzp-android/src/engine.rs. Both of those are DEAD CODE on feat/desktop-audio-rewrite: the legacy Kotlin app in android/app/ is not built by the Tauri mobile pipeline, and the Tauri engine bypasses CallDecoder by calling wzp_codec::create_decoder directly. The live Android call engine lives at desktop/src-tauri/src/engine.rs with two `pub async fn start<F>` functions — one cfg-gated on Android (Oboe via wzp-native) and one for desktop (CPAL). Both recv tasks were using `let mut decoder = wzp_codec::create_decoder(...)` which returns `Box<dyn AudioDecoder>` and doesn't expose the inherent `reconstruct_from_dred` method. Changes: New helper struct `DredRecvState` at the top of engine.rs, wrapping: - DredDecoderHandle (libopus DRED side-channel parser) - DredState scratch (for parse_into) - DredState last_good (cached valid state, swapped on success) - last_good_seq: Option<u16> (DRED anchor sequence) - expected_seq: Option<u16> (for gap detection) - dred_reconstructions / classical_plc_invocations counters With three methods: - ingest_opus(seq, payload): parse DRED, swap on success - fill_gap_to(decoder, current_seq, frame_samples, scratch, emit): detect gap back from expected_seq, reconstruct each missing frame via DRED if state covers it, fall through to classical decoder.decode_lost() when it doesn't. Calls emit() once per frame with a slice the caller uses for AGC + playout write. - reset_on_profile_switch(): invalidate tracking when codec changes Both recv tasks (Android @ ~line 297 and desktop @ ~line 907): - Decoder type changed from `Box<dyn AudioDecoder>` via `wzp_codec::create_decoder` to concrete `AdaptiveDecoder::new(profile)` so we can call the inherent reconstruct_from_dred method. - Added `use wzp_proto::traits::AudioDecoder;` at the top of engine.rs to bring decode/decode_lost/set_profile trait methods into scope on the concrete type. - New `current_profile` local alongside `current_codec` (used for frame_duration lookups that drive the DRED sample offset math). - On codec/profile switch, call dred_recv.reset_on_profile_switch() because the cached DRED state is tied to the old profile's frame rate. - For each arriving Opus source packet: 1. dred_recv.ingest_opus(seq, payload) — parse DRED 2. dred_recv.fill_gap_to(...) — detect gap and reconstruct missing frames, each emitted through a closure that does AGC + playout write (wzp_native on Android, playout_ring on desktop) 3. Normal decoder.decode() fallthrough for the current packet (unchanged) - Codec2 packets skip the DRED path entirely (is_opus() gate) — libopus can't reconstruct Codec2 audio. Ordering invariant: gap reconstruction writes to playout BEFORE the current packet's decoded audio, preserving temporal order since the playout ring is FIFO. The closure captures the `spk_muted` flag once before the gap loop to avoid mid-gap-fill state changes. Kept `crates/wzp-android/src/engine.rs` and `crates/wzp-android/src/ stats.rs` from the earlier Phase 3c commit as-is — they're dead code on feat/desktop-audio-rewrite but harmless, and deleting them would diverge this branch from an independently-useful intermediate state. The old Phase 3c commit (`505a834`) stays as historical reference. Verification: - cargo check -p wzp-codec -p wzp-client -p wzp-relay: 0 errors - cargo check -p wzp-desktop: only pre-existing `tauri::generate_context!()` panic on missing ../dist (Vite output not built on host) — no Rust compile errors from our changes - cargo test -p wzp-codec --lib: 69 passing (unchanged) - cargo test -p wzp-client --lib: 35 passing + 1 ignored (unchanged) Next: scripts/build-tauri-android.sh to get the actual Tauri APK — NOT build-and-notify.sh which builds the dead legacy android/app. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 21:31:09 +04:00
Siavash Sameni	2fd94651e4	fix(desktop): direct calls used wrong identity file — mac identity mismatch Some checks failed Mirror to GitHub / mirror (push) Failing after 37s Details Build Release Binaries / build-amd64 (push) Failing after 3m40s Details The non-Android branch of CallEngine::start loaded the seed from \$HOME/.wzp/identity directly, while register_signal in lib.rs goes through the shared load_or_create_seed() helper which resolves via APP_DATA_DIR → Tauri's app_data_dir(). On macOS those are two completely different files: register_signal → ~/Library/Application Support/com.wzp.desktop/.wzp/identity CallEngine::start (old) → ~/.wzp/identity On a fresh install they end up holding two different random seeds. Register and CallEngine then derive two different fingerprints from those seeds, and when a direct call comes in the relay routes it to "you" under the register_signal fingerprint, but once CallEngine tries to join the call-* room it advertises a DIFFERENT fingerprint — which fails the call_registry ACL check on the relay side (only the two authorised participants of a call can join its room). Silent hang, the call never completes. Android hit this bug earlier in the week and was fixed by switching its CallEngine::start branch to `crate::load_or_create_seed()`. Backport the same single-line change to the desktop branch so both platforms share one identity source of truth. Also bring the desktop branch up to parity with the android branch on diagnostic logging: - log CallEngine::start entry with relay/room/alias/quality/has_reuse - log endpoint.local_addr on reuse / create - log "QUIC connection established, performing handshake" between connect() and perform_handshake() so a hang at either step is immediately localisable - map_err all three potential failure points (create_endpoint, connect, perform_handshake) to an explicit error! trace	2026-04-10 12:15:23 +04:00
Siavash Sameni	cfa9ff67cf	fix(android-audio): VoIP mode + speakerphone + debug PCM recorder Some checks failed Mirror to GitHub / mirror (push) Failing after 40s Details Build Release Binaries / build-amd64 (push) Has been cancelled Details Build `96be740` logs proved the entire software pipeline is healthy: capture heartbeat: calls=1100 to_write=960 full_drops=0 total_written=1056000 recv heartbeat: decoded_frames=1035 last_written=960 decode_errs=0 recv decoded PCM: range=[-13564..9244] rms=8044 (real audio) playout WRITE: in_len=960 written=960 rms=2318 (real audio into the ring) playout heartbeat: calls=1100 nonempty=1099 total_played_real=1055040 1055040 samples / 48000 Hz = 22s — exactly matches wall-clock elapsed, meaning Oboe IS calling our playout callback at the expected rate and WE ARE handing it real PCM every 20ms. User still heard nothing. Ergo Oboe accepted the PCM and routed it to a silent output. Two fixes: 1) MainActivity.kt: switch to MODE_IN_COMMUNICATION + speakerphone ON right after permissions are granted, and crank STREAM_VOICE_CALL to max. Without this, an Oboe Usage::VoiceCommunication stream gets opened, the OS creates a real AAudio pipeline, the callback fires on schedule — and audio goes to either the earpiece at muted volume or a "call not active" dead end. Logs the audio mode + volume levels before and after the switch so we can confirm the state change in logcat next run. 2) oboe_bridge.cpp: revert Usage::Media → VoiceCommunication (the mode that matches MODE_IN_COMMUNICATION), pin the audio API to AAudio explicitly instead of letting Oboe fall back to OpenSLES (which has its own silent-drop failure modes on some devices), and add getState + getXRunCount to the playout heartbeat so we'll see silent stream disconnects instead of reading zeros forever. 3) engine.rs recv task: dump the first ~10s of post-AGC decoded PCM to `<app_data_dir>/decoded.pcm` as raw i16 LE so we can adb pull it and play it back locally: adb shell run-as com.wzp.desktop cat .wzp/decoded.pcm > decoded.pcm ffmpeg -f s16le -ar 48000 -ac 1 -i decoded.pcm decoded.wav This divorces "is our decoder actually producing audible audio" from "is Android's audio stack playing it". If the recorded WAV sounds correct when played on a laptop, the decoder is fine and 100% of the remaining bug surface is AudioManager / Oboe routing. 4) engine.rs: also log when spk_muted=true blocks the write. User reported the Speaker button in the UI has inconsistent semantics between desktop and android — adding this log rules out the accidental "first click muted playback" theory for good.	2026-04-09 21:24:26 +04:00
Siavash Sameni	96be740fd9	diag(android-audio): aggressive logging across the whole Oboe pipeline Some checks failed Mirror to GitHub / mirror (push) Failing after 40s Details Build Release Binaries / build-amd64 (push) Failing after 3m46s Details User confirmed: mac hears android, android does not hear mac. So Oboe capture works end-to-end but Oboe playout on Android silently drops audio even though QUIC forwards the packets. Archaeology on the legacy wzp-android crate also revealed that the "last known good" Android audio path NEVER used Oboe in production — it used Kotlin AudioRecord + AudioTrack via JNI, and cpp/oboe_bridge.cpp was dead code. So every time we've "tested" Oboe end-to-end this week was the first production use, and any of its config knobs could be the bug. Instrumenting every stage of the pipeline so one smoke-test log dump can isolate the layer at fault: C++ (oboe_bridge.cpp) - Log the ACTUAL stream parameters after openStream for both capture and playout (sample rate, channels, format, framesPerBurst, framesPerDataCallback, bufferCapacityInFrames, sharing, perf mode). Oboe may silently override values we requested — e.g. if we ask for 48kHz mono but the device gives us 44.1kHz stereo our 960-sample frames are the wrong duration and the pipeline drifts. - Capture callback: on cb#0 log sample range+RMS of the first frame to prove we get real mic data (not zeros). Every 50 callbacks (~1s at 20ms burst) log calls, numFrames, ring available_write, bytes actually written, ring_full_drops, total_written. - Playout callback: on cb#0 log numFrames + ring state. On the FIRST non-empty read log sample range+RMS so we can tell if the samples coming out of the ring are real audio or zeros. Every 50 callbacks log calls, nonempty count, numFrames, ring available_read, underrun_frames, total_played_real. Rust wzp-native (src/lib.rs) - wzp_native_audio_write_playout now logs the first 3 writes and then every 50th: in_len, written, sample range, RMS, ring write/read cursors before, available_read and available_write after. Reveals ring-overflow and whether the engine is actually handing us audio. - Minimal android logcat shim via __android_log_write extern — no new crate dependency. - AudioBackend grows a `playout_write_log_count` AtomicU64 to gate the write-side log throttle. Rust engine.rs (android branch) - Recv task: log sample range + RMS for the first 3 decoded PCM frames and then every 100th. Reveals whether decoder.decode is producing real audio or silent buffers. - Recv task: if audio_write_playout returns fewer samples than we handed it (partial write → ring nearly full) warn about it in the first 10 frames. - Recv heartbeat every 2s: recv_fr, decoded_frames, last_decode_n, last_written, written_samples, decode_errs, codec. Expected flow in a healthy log: capture cb#0: numFrames=960 range=[-1200..900] rms=180 ← mic OK capture stream opened: actualSR=48000 Ch=1 ... ← no override playout stream opened: actualSR=48000 Ch=1 ... CallEngine::start invoked ... → connected → audio started recv: first media packet received ... recv: decoded PCM sample range decoded_frames=1 range=[-300..250] rms=92 playout WRITE #0: in_len=960 written=960 range=[-300..250] rms=92 playout FIRST nonempty read: to_read=960 range=[-300..250] rms=92 playout heartbeat: calls=50 nonempty=50 underrun=0 ... recv heartbeat: decoded_frames=100 last_written=960 ... If any of those are missing/zero we know the exact stage to fix.	2026-04-09 21:13:29 +04:00
Siavash Sameni	8c4d640f89	fix(android): playout Usage::Media + relay CallSetup advertises real IP Some checks failed Mirror to GitHub / mirror (push) Failing after 40s Details Build Release Binaries / build-amd64 (push) Failing after 3m43s Details Three real bugs, one smoke-test session's worth of progress. 1. RELAY: wrong advertised addr in CallSetup The direct-call CallSetup computed `relay_addr = addr.ip()` where `addr = connection.remote_address()` — i.e. the CLIENT'S IP, not the relay's. So the relay was telling both parties "the call room is at the answerer's IP:4433", which meant each client dialed either the other client (no server listening) or themselves. Both endpoint.connect calls hung forever and the call never happened. Fix: compute the relay's own advertised IP once at startup. If the listen addr is 0.0.0.0, probe the primary outbound interface via the classic UDP-bind-and-connect(8.8.8.8:80) trick to discover the LAN IP the OS would use to reach external hosts. Thread the resulting advertised_addr_str into the CallSetup sender for both parties. 2. RELAY: accept loop serialized QUIC handshakes Previously the main accept loop called `wzp_transport::accept` which did both `endpoint.accept().await` AND `incoming.await` (the server- side QUIC handshake). A single slow handshake therefore blocked every subsequent client from being accepted. Unroll the helper here and move `incoming.await` into the per-connection spawned task, so every handshake runs in parallel. Also log "accept queue: new Incoming", "QUIC handshake complete", and "QUIC handshake failed" so we can tell immediately whether a client's packets are reaching the relay at all. 3. ANDROID: playout was routed to the silent in-call stream The Oboe playout stream was configured with Usage::VoiceCommunication, which routes to the Android in-call earpiece stream. That stream is silent unless the Activity has called AudioManager.setMode( IN_COMMUNICATION) and, even then, only the earpiece/BT headset get audio (not the loud speaker). Result: android→mac calls worked because mac had a normal media output, but mac→android calls were silent even though packets flowed through the relay just fine. Switch to Usage::Media + ContentType::Speech so Oboe routes to the loud speaker and uses the media volume slider. A later polish step will wire setMode + setSpeakerphoneOn from MainActivity.kt so we can go back to VoiceCommunication for AEC and proximity-sensor routing. Plus: heartbeat tracing every 2s in the send/recv tasks — frames_sent, last_rms, last_pkt_bytes, short_reads on the send side; decoded_frames, last_decode_n, last_written, decode_errs on the recv side. Will make the next "no sound" regression trivial to localize.	2026-04-09 20:55:10 +04:00
Siavash Sameni	49f101d785	fix(android): reuse signal endpoint for direct-call media connection Some checks failed Mirror to GitHub / mirror (push) Failing after 38s Details Build Release Binaries / build-amd64 (push) Failing after 3m46s Details Direct-call accept hangs forever at the QUIC handshake on Android. Logs from `d7b37a5` showed: CallEngine::start (android) invoked relay=172.16.81.172:4433 room=call-… resolved relay addr identity loaded endpoint created, dialing relay ← reached ← nothing, 90s+, no error The "connect failed" and "QUIC connection established" log lines never fire, meaning endpoint.connect_with(…).await never makes progress. Repro is 100%: SFU room join (one endpoint) works perfectly; direct call (opens a SECOND quinn::Endpoint on top of the signal one) hangs in the QUIC handshake. Creating two quinn::Endpoints on Android's AAudio-adjacent UDP stack apparently causes the second one's datagrams to never reach the relay (the server never sees the Initial packet). Rather than fight the platform, quinn is happy to multiplex multiple Connections on a single Endpoint — so we reuse the signal endpoint for the media connection. - SignalState now stores the quinn::Endpoint alongside the QuinnTransport. register_signal populates both at the same time. - CallEngine::start (both android and desktop branches) takes an Option<wzp_transport::Endpoint>. Some → reuse (direct-call path, after register_signal). None → create fresh (SFU room join path). - The connect tauri command reads state.signal.endpoint and threads it through to CallEngine::start, so the direct-call auto-connect (fired by the "setup" signal-event in main.ts) lands on the existing UDP socket. - wzp_transport re-exports quinn::Endpoint so wzp-desktop doesn't need to depend on quinn directly. - Also wraps the android connect in tokio::time::timeout(10s) so future hangs become deterministic "connect TIMED OUT" errors in logcat instead of silent deadlock. Same fix applies verbatim to the desktop client — the user suspects direct call is broken there too and this was likely always the cause, just never surfaced because desktop was only tested via SFU rooms.	2026-04-09 20:29:51 +04:00
Siavash Sameni	d7b37a5749	diag: tracing for direct-call signal loop + CallEngine::start stages Some checks failed Mirror to GitHub / mirror (push) Failing after 38s Details Build Release Binaries / build-amd64 (push) Failing after 3m57s Details User reports tapping "answer" on an incoming direct call does nothing visible, and suspects the same may affect desktop. The signal recv loop had no tracing at all, so we can't tell whether CallSetup is being received, whether the recv loop died silently, or whether CallEngine::start is failing between "identity loaded" and "connected to relay, handshake complete". - register_signal recv loop now logs every message type with fields (CallRinging, DirectCallOffer, DirectCallAnswer, CallSetup, Hangup, unhandled), plus a warn! on recv errors and a final warn when the loop exits. - place_call / answer_call commands log entry + success / error. The answer_call error path logs the underlying send_signal error so we can see it in logcat instead of only in the JS error toast. - CallEngine::start android branch logs relay/room/alias on entry, logs "endpoint created, dialing relay" between create_endpoint and connect, "QUIC connection established, performing handshake" between connect and perform_handshake, and promotes all three potential failures to explicit error! logs so a silent hang / error becomes visible in logcat. No functional changes — pure diagnostics. Stacks on `b35a6b7` (the Oboe stack-pointer-escape fix) so build #43 carries both.	2026-04-09 19:17:03 +04:00
Siavash Sameni	5beea7de40	phase 3(android): unify connect/disconnect/toggle_*/get_status commands Some checks failed Mirror to GitHub / mirror (push) Failing after 37s Details Build Release Binaries / build-amd64 (push) Failing after 3m49s Details Step 3 of the Tauri Android rewrite was still returning "audio backend not yet wired on Android (step 3)" because the cfg-gated Android stubs for connect/disconnect/toggle_mic/toggle_speaker/get_status were shadowing the real commands. Now that CallEngine::start() has a real Android body (phase 3, commit `fdbe502`), the gates are unnecessary. - Drop the #[cfg(not(target_os = "android"))] gates from all five engine-backed Tauri commands. - Delete the Android stub block (~50 LOC of "not connected" boilerplate). - Ungate `use engine::CallEngine;` and the AppState.engine field so both targets share the same Mutex<Option<CallEngine>>. - CallEngine::stop() now calls crate::wzp_native::audio_stop() on Android so the mic + speaker are released between calls, matching the desktop behaviour where dropping _audio_handle tears down CPAL. Direct-call flow on Android: peer sends DirectCallOffer → user accepts via answer_call → relay sends signal "setup" event → main.ts auto-invokes connect(relay, room) → CallEngine::start() runs the Android branch → wzp_native::audio_start() brings up Oboe → send/recv tasks stream PCM through the dlopen boundary.	2026-04-09 18:53:54 +04:00
Siavash Sameni	fdbe502524	phase 3(android): wire CallEngine::start to wzp-native audio FFI Some checks failed Mirror to GitHub / mirror (push) Failing after 39s Details Build Release Binaries / build-amd64 (push) Failing after 3m57s Details Replaces the Android-side CallEngine::start() stub with a real implementation that mirrors the desktop start() body but routes all PCM through the standalone wzp-native cdylib loaded at startup via libloading instead of using CPAL. - desktop/src-tauri/src/wzp_native.rs: new module with a static OnceLock<libloading::Library> + cached raw fn pointers for every symbol we need (version, hello, audio_start/stop, read_capture, write_playout, is_running, capture/playout_latency_ms). init() resolves everything once at startup; accessors return default values if init() never ran. - desktop/src-tauri/src/lib.rs: drop the inline dlopen smoke test, add `mod wzp_native;` behind target_os="android", and invoke wzp_native::init() from the Tauri setup() callback so the library is loaded + all symbols cached before any CallEngine can touch audio. - desktop/src-tauri/src/engine.rs: the Android #[cfg] branch of CallEngine::start() now does the full QUIC handshake + signal loop + Opus send/recv tasks, calling wzp_native::audio_start() / audio_read_capture() / audio_write_playout() instead of the desktop CPAL rings. SyncWrapper now holds a placeholder Box<()> on Android because the audio backend lives in a process-global singleton inside libwzp_native.so rather than being owned per-engine. Next step: build #39 on the remote docker builder and smoke-test on Pixel 6 that the Connect button in the UI successfully brings up Oboe and streams audio through the dlopen boundary.	2026-04-09 18:42:27 +04:00

1 2

65 Commits