Commit Graph

65 Commits

Author SHA1 Message Date
Siavash Sameni
06d28a9280 fix(video): preserve annex-b mediacodec output
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m35s
2026-05-25 20:20:22 +04:00
Siavash Sameni
d57ebe3d2c fix(video): force h264 and trace frame pipeline
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m32s
Mirror to GitHub / mirror (push) Failing after 28s
2026-05-25 20:03:11 +04:00
Siavash Sameni
7eca79846f fix(quality): use windowed loss instead of cumulative for codec adaptation
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m9s
Quinn's cumulative loss_pct (lost / sent since connection start) was
biased forever by handshake-era losses. Even ~5 lost-out-of-100 early
packets pinned us at "Degraded" (5% threshold) and Codec2_1200 was just
a few more drops away. The metric only diluted as thousands more clean
packets accumulated — by which time the call was over.

LossWindow tracks prev (sent, lost) and reports delta loss per ~25-
packet window. The cumulative value is the fallback when the window
hasn't accumulated enough samples (< 20 packets).

All 6 sites converted (DRED tuner + QualityReport on both send tasks,
self-observation on both recv tasks).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 18:55:57 +04:00
Siavash Sameni
25b3278d31 feat(android): wire video send + recv in Android engine; add video:* debug events
Some checks failed
Mirror to GitHub / mirror (push) Failing after 30s
Build Release Binaries / build-amd64 (push) Failing after 3m5s
Mirror the desktop video pipeline into the #[cfg(target_os="android")] start
function: capture _negotiated_video_codec from the handshake, spawn a video
send task that pulls VideoFrames from camera_tx, encodes/packetizes/sends.
Add video reassembly + decode + emit "video:frame" in the recv task before
the audio branch so Android can both send and receive video.

Instrumentation: emit video:first_send and video:first_recv on both desktop
and android paths so we can verify the pipeline end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 18:19:42 +04:00
Siavash Sameni
e8cab25eda fix: revert E2E AEAD wrapping (broke multi-client voice); add Android CAMERA
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m19s
Voice regression: EncryptingTransport encrypts media with the pairwise
client↔relay session key, but the relay forwards bytes without re-encrypting
per recipient. Sender's key_A ≠ recipient's key_B → recipient cannot decrypt
→ silent audio between mac and android. Drop the wrapper; restore plaintext-
over-QUIC-TLS to the relay. Proper E2E needs MLS group keys or relay hop-by-
hop re-encryption (future PRD).

Android camera: add CAMERA manifest permission + runtime request via
MainActivity. NOTE: still not sufficient — Tauri/Wry's WebChromeClient does
not grant getUserMedia, so video on Android needs a Tauri plugin override
or native Camera2 path. Documented in MainActivity.kt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 17:04:56 +04:00
Siavash Sameni
06253fdeeb feat(video+desktop): camera capture, video UI, E2E AEAD wiring, test fixes
Blockers 4 & 5: browser getUserMedia → JPEG IPC → Rust I420 pipeline;
remote video strip renders decoded frames via canvas; EncryptingTransport
wraps QuinnTransport so WZP AEAD is applied to all media (C2 fix).

Test fixes: HandshakeResult.session destructuring across relay/client/crypto
integration tests; video_codecs field added to all CallOffer/CallAnswer
structs; wzp-video pipeline_roundtrip integration tests added.

PRD docs: five Kimi-ready specs for E2E encryption, Android NDK 0.9 migration,
quality upgrade flow, wire-format hardening, and clippy debt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 15:30:26 +04:00
Siavash Sameni
739bdaf3ab feat(debug): emit media:room_update and participants call-event from signal task
Pass AppHandle into run_signal_task so it can emit call-debug events
and Tauri events directly. On each RoomUpdate:
- emit connect:media:room_update debug event with participant list
- emit call-event/participants Tauri event for JS-side diagnostics

Helps diagnose whether room join and participant sync is working
independently of audio startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 09:07:08 +04:00
Siavash Sameni
bc1668ed96 fix(android): run set_audio_mode_communication on Tauri main thread
spawn_blocking uses arbitrary thread-pool threads that don't have the
Android JNI context initialized, causing ndk_context::android_context()
to panic. Switch to run_on_main_thread (where the context is always
valid) via a oneshot channel, with a 2s timeout. Panic is caught and
forwarded as an Err so the debug log captures it rather than crashing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 08:18:18 +04:00
Siavash Sameni
77b036439b fix(android): spawn_blocking + 2s timeout for set_audio_mode_communication
The JNI call into AudioManager.setMode() was running directly on the
tokio async thread. If the Android audio policy service is slow (e.g.
immediately after mic permission grant), this could block the runtime.
Moved to spawn_blocking with a 2s timeout; timeout and panic cases are
logged as connect:audio_mode_timeout / connect:audio_mode_panic debug
events and treated as non-fatal (we continue to audio_start).

Also removes the has_record_audio_permission call from the preflight
debug event — it was a redundant JNI round-trip that added latency and
is now captured separately in the preflight_start event context.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 08:08:24 +04:00
Siavash Sameni
0ebc73ab13 fix(android): remove legacy connected event_cb; add preflight_start debug step
The legacy event_cb("connected") call between handshake and audio
preflight was a no-op on the frontend (it enters voice only after the
command resolves) but added noise to failing traces. Replaced with a
connect:connected_event_skipped debug event and added an explicit
connect:android_audio_preflight_start marker so the debug log shows a
clear boundary between handshake completion and audio startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 08:02:19 +04:00
Siavash Sameni
394987a349 fix(android): 8s Rust timeout on audio_start; always emit connect: debug events
- engine.rs: wrap spawn_blocking(audio_start) in an 8s tokio timeout so
  the connect command fails fast with a clear error if the Oboe HAL
  never returns, instead of blocking the JS 45s timer
- lib.rs: emit_call_debug now always forwards connect: and
  register_signal: steps to the JS overlay regardless of the debug-logs
  toggle — needed because app-data clears reset the toggle to false,
  making join failures invisible on first install
- main.ts: JS timeout bumped to 45s (Rust 8s fires first); timeout
  message now includes last native connect: step so the toast is
  actionable without opening the debug log

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 07:49:21 +04:00
Siavash Sameni
2aa6582585 fix(android): call-debug instrumentation for audio startup path
Add emit_call_debug events at every step of the Android connect/audio
path so failures are visible in the Settings debug log without needing
adb logcat:

- connect:handshake_start/done/failed (with timing)
- connect:android_audio_preflight (wzp_native loaded + RECORD_AUDIO
  permission check via new has_record_audio_permission() JNI helper)
- connect:audio_stop_start/done
- connect:audio_mode_start/done/failed
- connect:audio_start_start/failed/panic/done (with oboe error code)
- connect:reuse_endpoint (endpoint reuse diagnostic)

Also adds has_record_audio_permission() to android_audio.rs — used in
the preflight event to confirm the OS has granted mic access before
wzp_oboe_start is called.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 07:38:38 +04:00
Siavash Sameni
5a13f12334 fix(android): spawn_blocking for audio_start + 15s JS connect timeout
wzp_oboe_start is a sync FFI call that can block the OS thread
indefinitely waiting on the Android audio HAL. Calling it directly
from an async context freezes all tokio tasks including Rust-side
timeouts. Fix: run it via spawn_blocking so tokio stays responsive.

Also add a 15s Promise.race timeout in JS so a frozen audio_start
surfaces as "connect timed out — check audio permissions" instead of
the join button staying stuck in "Connecting…" forever.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 07:13:26 +04:00
Siavash Sameni
276ecc660e T5.1: PriorityMode enum + SetPriorityMode signal; extend QualityProfile with video fields 2026-05-12 12:21:40 +04:00
Siavash Sameni
e73f8a7150 T3.3: SignalMessage version field 2026-05-12 06:11:59 +04:00
Siavash Sameni
6f81487778 T1.6: Protocol version negotiation in handshake 2026-05-11 15:53:04 +04:00
Siavash Sameni
c93d302656 T1.5: Migrate emit/parse sites to v2 wire format 2026-05-11 12:37:32 +04:00
Siavash Sameni
5d431c0721 fix(android): restore tauri::Emitter import for Docker builder toolchain
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Has been cancelled
Edition 2024 on local macOS auto-resolves the Emitter trait, but the
Docker builder's Rust/Tauri version requires the explicit import for
AppHandle::emit() to resolve. Keeps the warning locally to avoid
breaking the CI build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:34:23 +04:00
Siavash Sameni
8fcf1be341 feat(nat): Tailscale-inspired STUN/ICE + port mapping + mid-call re-gathering (#28)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 23s
Build Release Binaries / build-amd64 (push) Failing after 6m8s
Phase 8: 5 new modules bringing NAT traversal close to Tailscale's approach.

- stun.rs: RFC 5389 STUN client — public server reflexive discovery,
  XOR-MAPPED-ADDRESS parsing, parallel probe with retry, STUN fallback
  in desktop try_reflect_own_addr()
- portmap.rs: NAT-PMP (RFC 6886) + PCP (RFC 6887) + UPnP IGD port
  mapping — gateway discovery, acquire/release/refresh lifecycle,
  new PeerCandidates.mapped candidate type in dial order
- ice_agent.rs: candidate lifecycle — gather(), re_gather(),
  apply_peer_update() with monotonic generation counter,
  CandidateUpdate signal message forwarded by relay
- netcheck.rs: comprehensive diagnostic — NAT type, IPv4/v6,
  port mapping availability, relay latencies, CLI --netcheck
- relay_map.rs: RTT-sorted relay map, preferred() selection,
  populate_from_ack() for RegisterPresenceAck.available_relays

Relay: CallRegistry stores + cross-wires caller/callee_mapped_addr
into CallSetup.peer_mapped_addr. Region config + available_relays
populated from federation peers in RegisterPresenceAck.

Desktop: place_call/answer_call call acquire_port_mapping() and
fill caller/callee_mapped_addr. STUN+relay combined NAT detection.

571 tests pass (66 new), 0 regressions, 0 warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:17:17 +04:00
Siavash Sameni
1e82811cc1 feat(p2p): adaptive quality on direct calls (#23)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m37s
P2P calls now adapt codec quality based on observed network conditions,
matching what relay calls already had.

Three-layer implementation:
- QualityReport::from_path_stats(): construct reports from local quinn
  stats (loss%, RTT, jitter) without needing relay-generated reports
- CallEncoder.pending_quality_report: one-shot attachment to next
  source packet (consumed on encode, not repeated)
- Engine send tasks: generate quality report every 50 frames (~1s)
  from quinn_path_stats() and attach via set_pending_quality_report()
- Engine recv tasks: self-observe from own QUIC path stats every 50
  packets, feed to AdaptiveQualityController for P2P adaptation
  (works even if peer isn't sending quality reports yet)

Both relay and P2P calls now have adaptive quality. On relay calls,
both peer-sent reports AND local observations feed the controller.
Hysteresis (3 consecutive bad reports to downgrade) prevents thrashing.

372 tests passing (+4 new: from_path_stats encoding, clamping, zero
values, encoder quality report attachment).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:14:06 +04:00
Siavash Sameni
ba12aae439 refactor: extract shared engine helpers, federation clone-before-send, constants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 30s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Engine deduplication (PRD-engine-dedup.md):
- build_call_config(): shared CallConfig construction (was 23 lines × 2)
- codec_to_profile(): shared CodecId → QualityProfile mapping (was 19 lines × 2)
- run_signal_task(): shared signal handler (was 48 lines × 2)
- Net -39 lines from engine.rs, 6 duplicated blocks → single-line calls

Quick wins from REFACTOR-codebase-audit.md:
- 6 magic number constants extracted (CAPTURE_POLL_MS, RECV_TIMEOUT_MS, etc.)
- DRED_POLL_INTERVAL moved from 2 local defs to 1 module-level const
- federation.rs: forward_to_peers, broadcast_signal, send_signal_to_peer
  now clone peer list and release lock before sending (was holding Mutex
  across async I/O — last lock-during-send pattern eliminated)
- main.rs: close_transport() helper replaces 12 silent .ok() calls with
  debug-level logging

314 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:22:44 +04:00
Siavash Sameni
9ae9441de4 fix(audio): check capture ring available before read (fixes Opus6k choppy)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 32s
Build Release Binaries / build-amd64 (push) Failing after 3m58s
Partial reads from the capture ring consumed samples that were then
discarded when the send loop retried from buf[0]. For 20ms codecs this
was invisible (single Oboe burst fills 960 samples in one read), but
40ms codecs (Opus6k, 1920 samples) needed 2 bursts — the first partial
read consumed 960 real samples and threw them away.

Result: Opus6k produced ~11 frames/s instead of 25 (~44% of expected).

Fix: expose wzp_native_audio_capture_available() and check it before
reading, matching the desktop capture_ring.available() pattern. Partial
reads no longer occur because we only read when enough samples exist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:46:15 +04:00
Siavash Sameni
8ff0c548a7 fix(audio): update frame_samples on codec profile switch, fix buf sizing
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Has been cancelled
frame_samples was immutable — when adaptive quality switched from 20ms
(Opus24k, 960 samples) to 40ms (Opus6k, 1920 samples), the send loop
kept reading 960 samples and feeding half-sized frames to the encoder.
This caused Opus6k to produce ~11 frames/s instead of 25, making audio
choppy.

Fix:
- frame_samples is now mut and updated on profile switch
- buf sized for max frame (1920) with frame_samples-bounded slices
- RMS, mute, encode, and capture reads all use &buf[..frame_samples]
- Applied to both Android and desktop send tasks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:33:02 +04:00
Siavash Sameni
d424515542 feat: 5-tier quality classification, QualityDirective handling, debug tap stats
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
- Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good +
  Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5,
  studio:10)
- Handle QualityDirective signals in both desktop and Android engines
  — relay-coordinated codec switching now works end-to-end
- Add periodic TAP STATS to debug tap: packets in/out, fan-out avg,
  seq gaps, codecs seen (every 5s)
- Mark task #2 done (ParticipantInfo in federation signals already
  implemented)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:23:48 +04:00
Siavash Sameni
22045bc5e6 feat: adaptive quality in desktop, relay quality directive, Oboe state polling
- Wire AdaptiveQualityController into desktop engine send/recv tasks
  (mirrors Android pattern: AtomicU8 pending_profile, auto-mode check)
- Wire same into Android engine send task (was only in recv before)
- QualityDirective SignalMessage variant for relay-initiated codec switch
- ParticipantQuality tracking in relay RoomManager (per-participant
  AdaptiveQualityController, weakest-link tier computation)
- Relay broadcasts QualityDirective to all participants when room-wide
  tier degrades (coordinated codec switching)
- Oboe stream state polling: poll getState() for up to 2s after
  requestStart() to ensure both streams reach Started before proceeding
  (fixes intermittent silent calls on cold start, Nothing Phone A059)

Tasks: #7, #25, #26, #31, #35

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:54:04 +04:00
Siavash Sameni
766c9df442 feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window
- DredTuner: maps live network metrics (loss/RTT/jitter) to continuous
  DRED duration every ~500ms instead of discrete tier-locked values.
  Includes jitter-spike detection for pre-emptive Starlink-style boost.
- Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports)
- PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval
- TrunkedForwarder respects discovered MTU (was hard-coded 1200)
- QuinnPathSnapshot exposes quinn internal stats + discovered MTU
- AudioEncoder trait: set_expected_loss() + set_dred_duration() methods
- PathMonitor: sliding-window jitter variance for spike detection
- Integrated into both Android and desktop send tasks in engine.rs
- 14 new tests (10 tuner unit + 4 encoder integration)
- Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:38:37 +04:00
Siavash Sameni
24cc74d93c fix(audio): clear BT SCO communication device on call end
Without clearCommunicationDevice(), the BT headset stays locked in SCO
mode after the call. Media playback (video, music) can't route to BT
A2DP, requiring a device reboot to restore normal audio.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:40:44 +04:00
Siavash Sameni
114d69e488 fix: use tracing::warn! instead of bare warn! in engine.rs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:31:12 +04:00
Siavash Sameni
15c237ceea fix(audio): defer MODE_IN_COMMUNICATION to call start, restore on end
Root cause: MainActivity set MODE_IN_COMMUNICATION at app launch,
hijacking system audio routing immediately — BT A2DP music dropped to
earpiece, and the pre-existing communication mode confused subsequent
setCommunicationDevice calls for BT SCO.

Fix: MainActivity now only sets volumes. MODE_IN_COMMUNICATION is set
via JNI right before Oboe audio_start() in CallEngine, and MODE_NORMAL
is restored after audio_stop() when the call ends.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:29:59 +04:00
Siavash Sameni
29cd23fe39 fix(p2p): connection cleanup — 4 fixes for stale/dead connections
PRD 4: Disable IPv6 direct dial/accept temporarily. IPv6 QUIC
handshakes succeed but connections die immediately on datagram
send ("connection lost"). IPv4 candidates work reliably. IPv6
candidates still gathered but filtered at dial time.

PRD 1: Close losing transport after Phase 6 negotiation. The
non-selected transport now gets an explicit QUIC close frame
instead of silently dropping after 30s idle timeout. Prevents
phantom connections from polluting future accept() calls.

PRD 2: Harden accept loop with max 3 stale retries. Stale
connections are explicitly closed (conn.close) and counted.
After 3 stale connections, the accept loop aborts instead of
spinning until the race timeout.

PRD 3: Resource cleanup — close old IPv6 endpoint before
creating a new one in place_call/answer_call. Add Drop impl
to CallEngine so tasks are signalled to stop on ungraceful
shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:11:50 +04:00
Siavash Sameni
40955bd11c debug(media): add connection diagnostics for direct P2P drops
When direct P2P calls show 100% datagram drops, we need to know
WHY send_media() fails. This commit adds:

- Remote address + stable_id logging on A-role accept and D-role
  dial success (dual_path.rs) — tells us which candidate won
- Remote address + max_datagram_size on engine transport init —
  verifies datagrams are negotiated
- last_send_err in send heartbeat — captures the actual error
  from send_datagram() failures
- QuinnTransport::remote_address() helper

Also fixes UI badge: was looking for wrong event name
("dual_path_race_won" → "path_negotiated").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:29:58 +04:00
Siavash Sameni
9f2ff6a6ec fix(android-audio): Fix D+C — stop+prime cycle on every call start
Addresses the first-join no-audio regression (tasks #35-37) where
the Oboe playout callback fires once (cb#0) and then stops
draining the ring on the Nothing Phone, causing written_samples
to freeze at 7679 (ring capacity minus one burst). Second call
(rejoin) always works because audio_stop tears down the streams
and audio_start rebuilds them fresh.

Two combined fixes:

**Fix D (task #37)**: always call audio_stop() before audio_start()
at the top of CallEngine::start. On a cold launch this is a no-op
(streams not yet started). On subsequent calls it guarantees a
clean teardown before rebuild — the same thing rejoin does. Added
a 50ms pause between stop and start to let the Android HAL release
the audio session.

**Fix C (task #36)**: after audio_start(), immediately write 960
samples (20ms) of silence into the playout ring. This ensures the
Oboe playout callback has data to drain on its first invocation.
On devices where an empty-ring first callback causes the stream
to self-pause (Nothing Phone's Qualcomm HAL), the priming data
keeps the callback loop alive until real decoded audio arrives
from the recv task.

Together these cover the two most likely root causes:
1. Stale Oboe state from a previous audio_start that didn't
   clean up properly → Fix D forces a clean rebuild
2. Playout callback self-pausing on an empty ring → Fix C
   ensures the ring is non-empty at callback time

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:50:58 +04:00
Siavash Sameni
134ee3a77f fix(engine): pass is_direct_p2p explicitly instead of deriving from is_some
Critical Phase 6 bug: when the negotiation agreed on relay path
but delivered the relay transport via pre_connected_transport,
CallEngine saw is_some() = true → is_direct_p2p = true → skipped
perform_handshake. The relay couldn't authenticate the participant
→ room join silently failed → recv_fr: 0, both sides sending
into the void.

Fix: add explicit is_direct_p2p: bool parameter to CallEngine::
start (both android and desktop branches). The connect command
sets it from the Phase 6 negotiation result (use_direct), not
from whether pre_connected_transport is Some.

Now relay-negotiated calls correctly run perform_handshake,
and direct P2P calls correctly skip it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:34:21 +04:00
Siavash Sameni
0a973b234b fix(engine): import tauri::Emitter for AppHandle::emit on Android target 2026-04-12 09:29:56 +04:00
Siavash Sameni
0ccf4ed6b5 feat(call): media health watchdog — warn user when no audio arrives
When a P2P direct call establishes successfully but the underlying
network path dies (phone switched from WiFi to LTE mid-call, or
cross-relay media forwarding isn't working), the call stays up
silently with recv_fr frozen at 0. No feedback to the user.

New watchdog in the Android recv task: tracks consecutive
heartbeat ticks (2s each) where recv_fr hasn't advanced. After 3
ticks (6s) with no new packets, emits:

- call-event { kind: "media-degraded" } — user-facing warning
  banner: "No audio — connection may be lost. Try hanging up and
  reconnecting, or switch to a different relay."
- call-debug media:no_recv_timeout for the debug log

If packets resume (recv_fr advances), clears the banner via:
- call-event { kind: "media-recovered" }

JS listener creates/removes a red-tinted banner dynamically at
the top of the call screen. Banner is also cleaned up on
showConnectScreen (call end).

This covers:
- Direct P2P that established on WiFi but died when the phone
  switched to LTE (stale NAT mapping, unreachable peer)
- Cross-relay calls where federation media isn't forwarding
  (relay not upgraded, not federated, etc.)
- Any other "connected but silent" scenario

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:18:38 +04:00
Siavash Sameni
2427630472 fix(connect): make peerLocalAddrs optional + skip handshake on direct P2P
Two regressions from Phase 5.5/5.6:

1. Room connect broken: the connect Tauri command required
   peerLocalAddrs as a Vec<String>, but the room-join JS path
   doesn't pass it (only the direct-call setup handler does).
   Error: "invalid args 'peerLocalAddrs' for command 'connect':
   command connect missing required key peerLocalAddrs".

   Fix: change to Option<Vec<String>>, unwrap_or_default() at
   usage sites. Room connect works again with zero peer addrs.

2. Direct P2P call connects but then CallEngine fails with
   "expected CallAnswer, got Discriminant(0)". Root cause: after
   the dual-path race picked a direct P2P transport, CallEngine
   still ran perform_handshake() on it. That handshake is a
   relay-specific protocol — sends a CallOffer signal and waits
   for CallAnswer back. On a direct QUIC connection to a phone,
   there's nobody running accept_handshake, so the handshake
   reads garbage from the peer's first media packet and errors.

   Fix: track is_direct_p2p = pre_connected_transport.is_some()
   and skip perform_handshake when true. The direct connection
   is already TLS-encrypted by QUIC, and both peers' identities
   were verified through the signal channel (DirectCallOffer/
   Answer carry identity_pub + ephemeral_pub + signature). Both
   android and desktop branches updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:09:32 +04:00
Siavash Sameni
16793be36f fix(p2p): Phase 5.6 — direct-path head start + hangup propagation + media debug events
Three fixes from a field-test log where same-LAN calls were
still losing the dual-path race to the relay path, peers were
getting stuck on an empty call screen when the other side
hung up, and 1-way audio was hard to diagnose because the
GUI debug log had no media-level events.

## 1. Direct-path 500ms head start (dual_path.rs)

The race was resolving in ~105ms with Relay winning even when
both phones were on the same MikroTik LAN with valid IPv6 host
candidates. Root cause: the relay dial is a plain outbound QUIC
connect that completes in whatever the client→relay RTT is
(~100ms), while the direct path needs the PEER to also process
its CallSetup, spin up its own race, and complete at least one
LAN dial back to us. That cross-client sequence reliably takes
longer than 100ms, so relay always won.

Fix: delay the relay_fut with `tokio::time::sleep(500ms)` before
starting its connect. Same-LAN direct dials complete in 30-50ms
typically, so the head start gives direct plenty of time to win
cleanly. Users on setups where direct genuinely can't work
(LTE-to-LTE cross-carrier) pay 500ms extra on the relay fallback,
which is invisible for a call setup.

## 2. Hangup propagation via a new hangup_call command (lib.rs + main.ts)

The hangup button was calling `disconnect` which stopped the
local media engine but never sent a SignalMessage::Hangup to
the relay. The peer never got notified and was stuck on the
call screen with silent audio. My earlier fix (commit e75b045)
only handled the RECEIVE side — auto-dismiss call screen on
recv:Hangup — but the SEND side was still missing.

New Tauri command `hangup_call`:
  1. Acquire state.signal.lock(), send SignalMessage::Hangup
     over the signal transport (best-effort; log + continue if
     signal is down)
  2. Acquire state.engine.lock(), stop the CallEngine

JS hangupBtn click handler now calls hangup_call with a fallback
to raw disconnect if the command is missing (older builds).

## 3. Media debug events (engine.rs + lib.rs)

Threaded tauri::AppHandle into CallEngine::start so the send/
recv tasks can emit call-debug events when the user has debug
logs enabled. Added on the Android branch (desktop branch
accepts the arg for API symmetry but doesn't emit yet):

  - media:first_send — emitted when the first encoded frame is
    handed to the transport. Useful for 1-way audio diagnosis:
    if this fires on side A but side B never sees media:first_recv,
    A's outbound is broken.
  - media:first_recv — emitted when the first packet from the
    peer arrives. Mirror of first_send.
  - media:send_heartbeat — every 2s with frames_sent, last_rms,
    last_pkt_bytes, short_reads, drops. A stalled last_rms
    (== 0) tells you the mic isn't producing samples; a frozen
    frames_sent tells you the encode pipeline hung.
  - media:recv_heartbeat — every 2s with recv_fr, decoded_frames,
    last_written, written_samples, decode_errs, codec. Mirror
    invariants for the inbound direction.

All four are gated by `call_debug_logs_enabled()` via
`emit_call_debug`, so they only show up in the GUI log when the
user has the Call Flow Debug Logs checkbox on. Tracing::info!
still runs unconditionally so logcat (adb) keeps its copy
regardless.

The `emit_call_debug` fn in lib.rs is now `pub(crate)` so
engine.rs can call it via `crate::emit_call_debug`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 07:55:41 +04:00
Siavash Sameni
59ce52f8e8 feat(p2p): Phase 3.5 dual-path QUIC race + GUI call-flow debug logs
Two features in one commit because they ship and test together:
Phase 3.5 closes the hole-punching loop and the call-flow debug
logs give the user live visibility into every step of a call so
real-hardware testing of the new P2P path is debuggable.

## Phase 3.5 — dual-path QUIC connect race

Completes the hole-punching work Phase 3 scaffolded. On receiving
a CallSetup with peer_direct_addr, the client now actually races a
direct QUIC handshake against the relay dial and uses whichever
completes first. Symmetric role assignment avoids the two-conns-
per-call problem:

- Both peers compare `own_reflex_addr` vs `peer_reflex_addr`
  lexicographically.
- Smaller addr → **Acceptor** (A-role): builds a server-capable
  dual endpoint, awaits an incoming QUIC session. Does NOT dial.
- Larger addr → **Dialer** (D-role): builds a client-only
  endpoint, dials the peer's addr with `call-<id>` SNI. Does NOT
  listen.
- Both sides always dial the relay in parallel as fallback.
- `tokio::select!` with `biased` preference for direct, `tokio::pin!`
  so each branch can await the losing opposite as fallback.
- Direct timeout 2s, relay fallback timeout 5s (so 7s worst case
  from CallSetup to "no media path" error).

New crate module `wzp_client::dual_path::{race, WinningPath}`
(moved here from desktop/src-tauri so it's testable from a
workspace test). `determine_role` in `wzp_client::reflect` is
pure-function and unit-tested.

### CallEngine integration
- New `pre_connected_transport: Option<Arc<QuinnTransport>>` arg
  on both android + desktop `CallEngine::start` branches. Skips
  the internal wzp_transport::connect step when Some. Backward-
  compat: None keeps Phase 0 relay-only behavior.
- `connect` Tauri command reads own_reflex_addr from SignalState,
  computes role, runs the race, passes the winning transport
  into CallEngine. If ANY input is missing (no peer addr, no own
  addr, equal addrs), falls back to classic relay path —
  identical to pre-Phase-3.5 behavior.

### Tests (9 new, all passing)
- 6 unit tests for `determine_role` truth table in
  `wzp-client/src/reflect.rs` (smaller=Acceptor, larger=Dialer,
  port-only diff, equal, missing-side, symmetry)
- 3 integration tests in `crates/wzp-client/tests/dual_path.rs`:
    * `dual_path_direct_wins_on_loopback` — two-endpoint test
      rig, Dialer wins direct path vs loopback mock relay
    * `dual_path_relay_wins_when_direct_is_dead` — dead peer
      port, 2s direct timeout, relay fallback wins
    * `dual_path_errors_cleanly_when_both_paths_dead` — <10s
      error, no hang

## GUI call-flow debug logs

Runtime-toggled structured events at every step of a call so the
user can see where a call progressed or stalled on real hardware.
Modeled on the existing DRED_VERBOSE_LOGS pattern.

### Rust side
- `static CALL_DEBUG_LOGS: AtomicBool` + `emit_call_debug(&app,
  step, details)` helper. Always logs via `tracing::info!`
  (logcat always has a copy); GUI Tauri `call-debug-log` event
  only fires when the flag is on.
- Tauri commands `set_call_debug_logs` / `get_call_debug_logs`.

### Instrumented steps (24 emit_call_debug sites)
- `register_signal`: start, identity loaded, endpoint created,
  connect failed/ok, RegisterPresence sent, ack received/failed,
  recv loop spawning
- Recv loop: CallRinging, DirectCallOffer (w/ caller_reflexive_addr),
  DirectCallAnswer (w/ callee_reflexive_addr), CallSetup (w/
  peer_direct_addr), Hangup
- `place_call`: start, reflect query start/ok/none, offer sent,
  send failed
- `answer_call`: start, reflect query start/ok/none or privacy
  skip, answer sent, send failed
- `connect`: start, dual_path_race_start (w/ role), won (w/
  path), failed, skipped (w/ reasons), call_engine_starting/
  started/failed

### JS side
- New `callDebugLogs: boolean` field on Settings type.
- Boot-time hydrate of the Rust flag from localStorage so the
  choice survives restarts (like `dredDebugLogs`).
- Settings panel: new "Call flow debug logs" checkbox alongside
  the DRED toggle.
- New "Call Debug Log" section that ONLY shows when the flag is
  on. Rolling in-memory buffer of the last 200 events, rendered
  as monospace `HH:MM:SS.mmm step {details}` lines with auto-
  scroll and a Clear button.
- `listen("call-debug-log", ...)` subscribed at app startup,
  appends to the buffer, re-renders on every event.

Full workspace test goes from 404 → 413 passing. Clippy clean
on touched crates.

PRD: .taskmaster/docs/prd_phase35_dual_path_race.txt
Tasks: 61-69 all completed

Next: APK + desktop build carrying everything — Phase 2 NAT
detect, Phase 3 advertising, Phase 3.5 dual-path + call debug
logs, plus the earlier Android first-join diagnostics — so the
user can validate the P2P path on real hardware with live
per-step visibility into where any failures happen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:06:44 +04:00
Siavash Sameni
7e7968b2f9 diag(android-engine): first-join no-audio ordering instrumentation
Adds a single call_t0 = Instant::now() at the top of the Android
CallEngine::start path, threaded through send + recv tasks as
send_t0 / recv_t0, and tags the following milestones with
t_ms_since_call_start so we can build a clean side-by-side log of
first-call vs rejoin:

  1. QUIC connection established
  2. handshake complete
  3. wzp-native audio_start returned (+ how long audio_start itself took)
  4. send task spawned
  5. send: first full capture frame read (+ short_reads_before count)
  6. send: first non-zero capture RMS
  7. recv task spawned
  8. recv: first media packet received
  9. recv: first successful decode
 10. recv: first playout-ring write

Combined with the existing C++-side cb#0 logs in
crates/wzp-native/cpp/oboe_bridge.cpp ("capture cb#0", "playout
cb#0") this gives us full-pipeline ordering with no native-side
changes needed.

PRD: .taskmaster/docs/prd_android_first_join_no_audio.txt
Task: 32 (first task in the chain — diagnostics before any fix
attempts so we know which of the 5 suspect causes is real).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 10:00:20 +04:00
Siavash Sameni
578ff8cff4 feat(debug): GUI toggle for DRED verbose logs + macOS mic permission
DRED verbose logs (off by default — keeps logcat clean in normal use):
- wzp-codec: DRED_VERBOSE_LOGS atomic flag with dred_verbose_logs() /
  set_dred_verbose_logs() helpers
- opus_enc: gate "DRED enabled" + libopus version logs behind the flag
- desktop/src-tauri/engine.rs: gate DredRecvState parse log,
  reconstruction log, classical PLC log, and DRED-counter fields in
  the Android recv heartbeat (non-verbose path still logs basic recv
  stats)
- Tauri commands set_dred_verbose_logs / get_dred_verbose_logs
- Settings panel gets a "DRED debug logs (verbose, dev only)"
  checkbox; preference persists in wzp-settings localStorage and is
  pushed to Rust on save and on app boot

macOS mic permission:
- Add desktop/src-tauri/Info.plist with NSMicrophoneUsageDescription.
  Without it, modern macOS silently denies CoreAudio capture for
  ad-hoc-signed Tauri builds — capture starts but every callback
  hands you zeros. Symptom: phones could not hear desktop client,
  desktop could still hear phones (playout has no TCC gate). The
  Tauri 2 bundler auto-merges this file into WarzonePhone.app's
  Contents/Info.plist on the next build, so first launch will pop
  the standard mic prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 09:48:32 +04:00
Siavash Sameni
16890576fb feat(observability): logcat-visible DRED proof of life on Android
Adds enough INFO-level logging that an opus-DRED-v2 APK on Android can
be verified end-to-end by reading logcat alone — no debugger, no
Prometheus, no telemetry pipeline required. Three observation points:

1. Encoder construction (opus_enc.rs)
   - Bumped the "DRED enabled" log from debug! to info! so the
     per-call DRED config is in logcat by default. Each call's first
     OpusEncoder construction logs codec, dred_frames, dred_ms,
     loss_floor_pct.
   - Added a one-shot static OnceLock that logs `opusic_c::version()`
     the first time an OpusEncoder is built in the process. This is
     the smoking gun for "is the new libopus actually loaded" — pre-
     Phase-0 audiopus shipped libopus 1.3 with no DRED, post-Phase-0
     should print 1.5.2 here.

2. DRED state ingest (DredRecvState::ingest_opus in
   desktop/src-tauri/src/engine.rs)
   - First successful parse on a call logs immediately so we can see
     "DRED is on the wire" in logcat.
   - Subsequent parses sample every 100th to confirm steady-state
     samples_available without drowning the log.
   - New parses_total / parses_with_data counters track the parse
     rate vs the success rate (a packet without DRED in it returns
     `available == 0`, so a low ratio means the encoder isn't
     emitting DRED bytes).

3. DRED reconstruction events (DredRecvState::fill_gap_to)
   - Every DRED reconstruction logs at INFO with missing_seq,
     anchor_seq, offset_samples, offset_ms, samples_available,
     gap_size, and the running total. These events are rare on a
     clean network and we want to know exactly which gap was filled.
   - First three classical PLC fills + every 50th thereafter log so
     we can see when DRED couldn't cover a gap (offset out of range,
     no good state, or reconstruct error).

4. Recv heartbeat (Android start() in engine.rs)
   - Existing 2-second heartbeat now includes dred_recv,
     classical_plc, dred_parses_with_data, dred_parses_total
     so a steady-state call shows the cumulative counters in
     logcat without parsing.

How to verify on a real call:

  adb logcat -s 'RustStdoutStderr:*' | grep -i 'dred\|libopus version'

Expected output sequence on a successful Opus call:
  - "linked libopus version libopus_version=libopus 1.5.2-..."  (once per process)
  - "opus encoder: DRED enabled codec=Opus24k dred_frames=20 dred_ms=200 loss_floor_pct=15"  (per call)
  - "DRED state parsed from Opus packet seq=N samples_available=4560 ms=95 ..."  (after first DRED-bearing packet)
  - "recv heartbeat (android) ... dred_recv=0 classical_plc=0 dred_parses_with_data=58 dred_parses_total=58"  (every 2s)

If you see "linked libopus version libopus 1.3" — the FFI swap didn't
take. If dred_parses_with_data stays at 0 while dred_parses_total
climbs — the sender isn't emitting DRED (check the encoder's loss
floor and the receiver's libopus version). If gaps trigger
"classical PLC fill" instead of "DRED reconstruction fired" —
DRED state coverage is too small for the observed loss pattern,
and the loss floor or DRED duration policy needs tuning.

Verification:
- cargo check -p wzp-codec -p wzp-client: 0 errors
- cargo check -p wzp-desktop: 0 Rust errors (only the pre-existing
  tauri::generate_context!() proc macro panic on missing ../dist
  which fires at host check time, irrelevant on the remote build)
- cargo test -p wzp-codec --lib: 69 passing (no regressions)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 08:58:03 +04:00
Siavash Sameni
dfbe21fe6e feat(tauri-engine): Phase 3b/3c re-port — DRED reconstruction on the live Tauri mobile engine
The original Phase 3b landed on wzp-client/CallDecoder and Phase 3c
landed on wzp-android/src/engine.rs. Both of those are DEAD CODE on
feat/desktop-audio-rewrite: the legacy Kotlin app in android/app/ is
not built by the Tauri mobile pipeline, and the Tauri engine bypasses
CallDecoder by calling wzp_codec::create_decoder directly.

The live Android call engine lives at desktop/src-tauri/src/engine.rs
with two `pub async fn start<F>` functions — one cfg-gated on Android
(Oboe via wzp-native) and one for desktop (CPAL). Both recv tasks
were using `let mut decoder = wzp_codec::create_decoder(...)` which
returns `Box<dyn AudioDecoder>` and doesn't expose the inherent
`reconstruct_from_dred` method.

Changes:

New helper struct `DredRecvState` at the top of engine.rs, wrapping:
  - DredDecoderHandle (libopus DRED side-channel parser)
  - DredState scratch (for parse_into)
  - DredState last_good (cached valid state, swapped on success)
  - last_good_seq: Option<u16> (DRED anchor sequence)
  - expected_seq: Option<u16> (for gap detection)
  - dred_reconstructions / classical_plc_invocations counters

With three methods:
  - ingest_opus(seq, payload): parse DRED, swap on success
  - fill_gap_to(decoder, current_seq, frame_samples, scratch, emit):
    detect gap back from expected_seq, reconstruct each missing
    frame via DRED if state covers it, fall through to classical
    decoder.decode_lost() when it doesn't. Calls emit() once per
    frame with a slice the caller uses for AGC + playout write.
  - reset_on_profile_switch(): invalidate tracking when codec changes

Both recv tasks (Android @ ~line 297 and desktop @ ~line 907):
  - Decoder type changed from `Box<dyn AudioDecoder>` via
    `wzp_codec::create_decoder` to concrete `AdaptiveDecoder::new(profile)`
    so we can call the inherent reconstruct_from_dred method.
  - Added `use wzp_proto::traits::AudioDecoder;` at the top of
    engine.rs to bring decode/decode_lost/set_profile trait methods
    into scope on the concrete type.
  - New `current_profile` local alongside `current_codec` (used for
    frame_duration lookups that drive the DRED sample offset math).
  - On codec/profile switch, call dred_recv.reset_on_profile_switch()
    because the cached DRED state is tied to the old profile's
    frame rate.
  - For each arriving Opus source packet:
      1. dred_recv.ingest_opus(seq, payload) — parse DRED
      2. dred_recv.fill_gap_to(...) — detect gap and reconstruct
         missing frames, each emitted through a closure that does
         AGC + playout write (wzp_native on Android, playout_ring
         on desktop)
      3. Normal decoder.decode() fallthrough for the current packet
         (unchanged)
  - Codec2 packets skip the DRED path entirely (is_opus() gate) —
    libopus can't reconstruct Codec2 audio.

Ordering invariant: gap reconstruction writes to playout BEFORE the
current packet's decoded audio, preserving temporal order since the
playout ring is FIFO. The closure captures the `spk_muted` flag once
before the gap loop to avoid mid-gap-fill state changes.

Kept `crates/wzp-android/src/engine.rs` and `crates/wzp-android/src/
stats.rs` from the earlier Phase 3c commit as-is — they're dead code
on feat/desktop-audio-rewrite but harmless, and deleting them would
diverge this branch from an independently-useful intermediate state.
The old Phase 3c commit (505a834) stays as historical reference.

Verification:
- cargo check -p wzp-codec -p wzp-client -p wzp-relay: 0 errors
- cargo check -p wzp-desktop: only pre-existing `tauri::generate_context!()`
  panic on missing ../dist (Vite output not built on host) — no Rust
  compile errors from our changes
- cargo test -p wzp-codec --lib: 69 passing (unchanged)
- cargo test -p wzp-client --lib: 35 passing + 1 ignored (unchanged)

Next: scripts/build-tauri-android.sh to get the actual Tauri APK —
NOT build-and-notify.sh which builds the dead legacy android/app.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:31:09 +04:00
Siavash Sameni
2fd94651e4 fix(desktop): direct calls used wrong identity file — mac identity mismatch
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m40s
The non-Android branch of CallEngine::start loaded the seed from
\$HOME/.wzp/identity directly, while register_signal in lib.rs goes
through the shared load_or_create_seed() helper which resolves via
APP_DATA_DIR → Tauri's app_data_dir(). On macOS those are two
completely different files:

  register_signal → ~/Library/Application Support/com.wzp.desktop/.wzp/identity
  CallEngine::start (old) → ~/.wzp/identity

On a fresh install they end up holding two different random seeds.
Register and CallEngine then derive two different fingerprints from
those seeds, and when a direct call comes in the relay routes it to
"you" under the register_signal fingerprint, but once CallEngine tries
to join the call-* room it advertises a DIFFERENT fingerprint — which
fails the call_registry ACL check on the relay side (only the two
authorised participants of a call can join its room). Silent hang, the
call never completes.

Android hit this bug earlier in the week and was fixed by switching
its CallEngine::start branch to `crate::load_or_create_seed()`.
Backport the same single-line change to the desktop branch so both
platforms share one identity source of truth.

Also bring the desktop branch up to parity with the android branch on
diagnostic logging:
- log CallEngine::start entry with relay/room/alias/quality/has_reuse
- log endpoint.local_addr on reuse / create
- log "QUIC connection established, performing handshake" between
  connect() and perform_handshake() so a hang at either step is
  immediately localisable
- map_err all three potential failure points (create_endpoint,
  connect, perform_handshake) to an explicit error! trace
2026-04-10 12:15:23 +04:00
Siavash Sameni
cfa9ff67cf fix(android-audio): VoIP mode + speakerphone + debug PCM recorder
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Has been cancelled
Build 96be740 logs proved the entire software pipeline is healthy:
  capture heartbeat:   calls=1100 to_write=960 full_drops=0 total_written=1056000
  recv heartbeat:      decoded_frames=1035 last_written=960 decode_errs=0
  recv decoded PCM:    range=[-13564..9244] rms=8044 (real audio)
  playout WRITE:       in_len=960 written=960 rms=2318 (real audio into the ring)
  playout heartbeat:   calls=1100 nonempty=1099 total_played_real=1055040

1055040 samples / 48000 Hz = 22s — exactly matches wall-clock elapsed,
meaning Oboe IS calling our playout callback at the expected rate and
WE ARE handing it real PCM every 20ms. User still heard nothing. Ergo
Oboe accepted the PCM and routed it to a silent output. Two fixes:

1) MainActivity.kt: switch to MODE_IN_COMMUNICATION + speakerphone ON
   right after permissions are granted, and crank STREAM_VOICE_CALL to
   max. Without this, an Oboe Usage::VoiceCommunication stream gets
   opened, the OS creates a real AAudio pipeline, the callback fires on
   schedule — and audio goes to either the earpiece at muted volume or
   a "call not active" dead end. Logs the audio mode + volume levels
   before and after the switch so we can confirm the state change in
   logcat next run.

2) oboe_bridge.cpp: revert Usage::Media → VoiceCommunication (the mode
   that matches MODE_IN_COMMUNICATION), pin the audio API to AAudio
   explicitly instead of letting Oboe fall back to OpenSLES (which has
   its own silent-drop failure modes on some devices), and add getState
   + getXRunCount to the playout heartbeat so we'll see silent stream
   disconnects instead of reading zeros forever.

3) engine.rs recv task: dump the first ~10s of post-AGC decoded PCM to
   `<app_data_dir>/decoded.pcm` as raw i16 LE so we can adb pull it and
   play it back locally:
       adb shell run-as com.wzp.desktop cat .wzp/decoded.pcm > decoded.pcm
       ffmpeg -f s16le -ar 48000 -ac 1 -i decoded.pcm decoded.wav
   This divorces "is our decoder actually producing audible audio" from
   "is Android's audio stack playing it". If the recorded WAV sounds
   correct when played on a laptop, the decoder is fine and 100% of the
   remaining bug surface is AudioManager / Oboe routing.

4) engine.rs: also log when spk_muted=true blocks the write. User
   reported the Speaker button in the UI has inconsistent semantics
   between desktop and android — adding this log rules out the accidental
   "first click muted playback" theory for good.
2026-04-09 21:24:26 +04:00
Siavash Sameni
96be740fd9 diag(android-audio): aggressive logging across the whole Oboe pipeline
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
User confirmed: mac hears android, android does not hear mac. So Oboe
capture works end-to-end but Oboe playout on Android silently drops
audio even though QUIC forwards the packets. Archaeology on the legacy
wzp-android crate also revealed that the "last known good" Android audio
path NEVER used Oboe in production — it used Kotlin AudioRecord +
AudioTrack via JNI, and cpp/oboe_bridge.cpp was dead code. So every time
we've "tested" Oboe end-to-end this week was the first production use,
and any of its config knobs could be the bug.

Instrumenting every stage of the pipeline so one smoke-test log dump can
isolate the layer at fault:

C++ (oboe_bridge.cpp)
  - Log the ACTUAL stream parameters after openStream for both capture
    and playout (sample rate, channels, format, framesPerBurst,
    framesPerDataCallback, bufferCapacityInFrames, sharing, perf mode).
    Oboe may silently override values we requested — e.g. if we ask for
    48kHz mono but the device gives us 44.1kHz stereo our 960-sample
    frames are the wrong duration and the pipeline drifts.
  - Capture callback: on cb#0 log sample range+RMS of the first frame
    to prove we get real mic data (not zeros). Every 50 callbacks
    (~1s at 20ms burst) log calls, numFrames, ring available_write,
    bytes actually written, ring_full_drops, total_written.
  - Playout callback: on cb#0 log numFrames + ring state. On the FIRST
    non-empty read log sample range+RMS so we can tell if the samples
    coming out of the ring are real audio or zeros. Every 50 callbacks
    log calls, nonempty count, numFrames, ring available_read,
    underrun_frames, total_played_real.

Rust wzp-native (src/lib.rs)
  - wzp_native_audio_write_playout now logs the first 3 writes and then
    every 50th: in_len, written, sample range, RMS, ring write/read
    cursors before, available_read and available_write after. Reveals
    ring-overflow and whether the engine is actually handing us audio.
  - Minimal android logcat shim via __android_log_write extern — no
    new crate dependency.
  - AudioBackend grows a `playout_write_log_count` AtomicU64 to gate
    the write-side log throttle.

Rust engine.rs (android branch)
  - Recv task: log sample range + RMS for the first 3 decoded PCM
    frames and then every 100th. Reveals whether decoder.decode is
    producing real audio or silent buffers.
  - Recv task: if audio_write_playout returns fewer samples than we
    handed it (partial write → ring nearly full) warn about it in the
    first 10 frames.
  - Recv heartbeat every 2s: recv_fr, decoded_frames, last_decode_n,
    last_written, written_samples, decode_errs, codec.

Expected flow in a healthy log:
  capture cb#0: numFrames=960 range=[-1200..900] rms=180          ← mic OK
  capture stream opened: actualSR=48000 Ch=1 ...                   ← no override
  playout stream opened: actualSR=48000 Ch=1 ...
  CallEngine::start invoked ... → connected → audio started
  recv: first media packet received ...
  recv: decoded PCM sample range decoded_frames=1 range=[-300..250] rms=92
  playout WRITE #0: in_len=960 written=960 range=[-300..250] rms=92
  playout FIRST nonempty read: to_read=960 range=[-300..250] rms=92
  playout heartbeat: calls=50 nonempty=50 underrun=0 ...
  recv heartbeat: decoded_frames=100 last_written=960 ...

If any of those are missing/zero we know the exact stage to fix.
2026-04-09 21:13:29 +04:00
Siavash Sameni
8c4d640f89 fix(android): playout Usage::Media + relay CallSetup advertises real IP
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
Three real bugs, one smoke-test session's worth of progress.

1. RELAY: wrong advertised addr in CallSetup
   The direct-call CallSetup computed `relay_addr = addr.ip()` where
   `addr = connection.remote_address()` — i.e. the CLIENT'S IP, not the
   relay's. So the relay was telling both parties "the call room is at
   the answerer's IP:4433", which meant each client dialed either the
   other client (no server listening) or themselves. Both endpoint.connect
   calls hung forever and the call never happened.
   Fix: compute the relay's own advertised IP once at startup. If the
   listen addr is 0.0.0.0, probe the primary outbound interface via the
   classic UDP-bind-and-connect(8.8.8.8:80) trick to discover the LAN
   IP the OS would use to reach external hosts. Thread the resulting
   advertised_addr_str into the CallSetup sender for both parties.

2. RELAY: accept loop serialized QUIC handshakes
   Previously the main accept loop called `wzp_transport::accept` which
   did both `endpoint.accept().await` AND `incoming.await` (the server-
   side QUIC handshake). A single slow handshake therefore blocked every
   subsequent client from being accepted. Unroll the helper here and
   move `incoming.await` into the per-connection spawned task, so every
   handshake runs in parallel. Also log "accept queue: new Incoming",
   "QUIC handshake complete", and "QUIC handshake failed" so we can tell
   immediately whether a client's packets are reaching the relay at all.

3. ANDROID: playout was routed to the silent in-call stream
   The Oboe playout stream was configured with Usage::VoiceCommunication,
   which routes to the Android in-call earpiece stream. That stream is
   silent unless the Activity has called AudioManager.setMode(
   IN_COMMUNICATION) and, even then, only the earpiece/BT headset get
   audio (not the loud speaker). Result: android→mac calls worked
   because mac had a normal media output, but mac→android calls were
   silent even though packets flowed through the relay just fine.
   Switch to Usage::Media + ContentType::Speech so Oboe routes to the
   loud speaker and uses the media volume slider. A later polish step
   will wire setMode + setSpeakerphoneOn from MainActivity.kt so we can
   go back to VoiceCommunication for AEC and proximity-sensor routing.

Plus: heartbeat tracing every 2s in the send/recv tasks — frames_sent,
last_rms, last_pkt_bytes, short_reads on the send side; decoded_frames,
last_decode_n, last_written, decode_errs on the recv side. Will make the
next "no sound" regression trivial to localize.
2026-04-09 20:55:10 +04:00
Siavash Sameni
49f101d785 fix(android): reuse signal endpoint for direct-call media connection
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
Direct-call accept hangs forever at the QUIC handshake on Android. Logs
from d7b37a5 showed:
  CallEngine::start (android) invoked relay=172.16.81.172:4433 room=call-…
  resolved relay addr
  identity loaded
  endpoint created, dialing relay   ← reached
                                    ← nothing, 90s+, no error
The "connect failed" and "QUIC connection established" log lines never
fire, meaning endpoint.connect_with(…).await never makes progress.

Repro is 100%: SFU room join (one endpoint) works perfectly; direct call
(opens a SECOND quinn::Endpoint on top of the signal one) hangs in the
QUIC handshake. Creating two quinn::Endpoints on Android's AAudio-adjacent
UDP stack apparently causes the second one's datagrams to never reach the
relay (the server never sees the Initial packet). Rather than fight the
platform, quinn is happy to multiplex multiple Connections on a single
Endpoint — so we reuse the signal endpoint for the media connection.

- SignalState now stores the quinn::Endpoint alongside the QuinnTransport.
  register_signal populates both at the same time.
- CallEngine::start (both android and desktop branches) takes an
  Option<wzp_transport::Endpoint>. Some → reuse (direct-call path, after
  register_signal). None → create fresh (SFU room join path).
- The connect tauri command reads state.signal.endpoint and threads it
  through to CallEngine::start, so the direct-call auto-connect (fired by
  the "setup" signal-event in main.ts) lands on the existing UDP socket.
- wzp_transport re-exports quinn::Endpoint so wzp-desktop doesn't need to
  depend on quinn directly.
- Also wraps the android connect in tokio::time::timeout(10s) so future
  hangs become deterministic "connect TIMED OUT" errors in logcat
  instead of silent deadlock.

Same fix applies verbatim to the desktop client — the user suspects
direct call is broken there too and this was likely always the cause,
just never surfaced because desktop was only tested via SFU rooms.
2026-04-09 20:29:51 +04:00
Siavash Sameni
d7b37a5749 diag: tracing for direct-call signal loop + CallEngine::start stages
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
User reports tapping "answer" on an incoming direct call does nothing
visible, and suspects the same may affect desktop. The signal recv loop
had no tracing at all, so we can't tell whether CallSetup is being
received, whether the recv loop died silently, or whether
CallEngine::start is failing between "identity loaded" and
"connected to relay, handshake complete".

- register_signal recv loop now logs every message type with fields
  (CallRinging, DirectCallOffer, DirectCallAnswer, CallSetup, Hangup,
  unhandled), plus a warn! on recv errors and a final warn when the
  loop exits.
- place_call / answer_call commands log entry + success / error. The
  answer_call error path logs the underlying send_signal error so we
  can see it in logcat instead of only in the JS error toast.
- CallEngine::start android branch logs relay/room/alias on entry,
  logs "endpoint created, dialing relay" between create_endpoint and
  connect, "QUIC connection established, performing handshake" between
  connect and perform_handshake, and promotes all three potential
  failures to explicit error! logs so a silent hang / error becomes
  visible in logcat.

No functional changes — pure diagnostics. Stacks on b35a6b7 (the Oboe
stack-pointer-escape fix) so build #43 carries both.
2026-04-09 19:17:03 +04:00
Siavash Sameni
5beea7de40 phase 3(android): unify connect/disconnect/toggle_*/get_status commands
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
Step 3 of the Tauri Android rewrite was still returning "audio backend not
yet wired on Android (step 3)" because the cfg-gated Android stubs for
connect/disconnect/toggle_mic/toggle_speaker/get_status were shadowing the
real commands. Now that CallEngine::start() has a real Android body (phase
3, commit fdbe502), the gates are unnecessary.

- Drop the #[cfg(not(target_os = "android"))] gates from all five
  engine-backed Tauri commands.
- Delete the Android stub block (~50 LOC of "not connected" boilerplate).
- Ungate `use engine::CallEngine;` and the AppState.engine field so both
  targets share the same Mutex<Option<CallEngine>>.
- CallEngine::stop() now calls crate::wzp_native::audio_stop() on Android so
  the mic + speaker are released between calls, matching the desktop
  behaviour where dropping _audio_handle tears down CPAL.

Direct-call flow on Android: peer sends DirectCallOffer → user accepts via
answer_call → relay sends signal "setup" event → main.ts auto-invokes
connect(relay, room) → CallEngine::start() runs the Android branch →
wzp_native::audio_start() brings up Oboe → send/recv tasks stream PCM
through the dlopen boundary.
2026-04-09 18:53:54 +04:00
Siavash Sameni
fdbe502524 phase 3(android): wire CallEngine::start to wzp-native audio FFI
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
Replaces the Android-side CallEngine::start() stub with a real implementation
that mirrors the desktop start() body but routes all PCM through the
standalone wzp-native cdylib loaded at startup via libloading instead of
using CPAL.

- desktop/src-tauri/src/wzp_native.rs: new module with a static
  OnceLock<libloading::Library> + cached raw fn pointers for every symbol
  we need (version, hello, audio_start/stop, read_capture, write_playout,
  is_running, capture/playout_latency_ms). init() resolves everything once
  at startup; accessors return default values if init() never ran.

- desktop/src-tauri/src/lib.rs: drop the inline dlopen smoke test, add
  `mod wzp_native;` behind target_os="android", and invoke
  wzp_native::init() from the Tauri setup() callback so the library is
  loaded + all symbols cached before any CallEngine can touch audio.

- desktop/src-tauri/src/engine.rs: the Android #[cfg] branch of
  CallEngine::start() now does the full QUIC handshake + signal loop +
  Opus send/recv tasks, calling wzp_native::audio_start() /
  audio_read_capture() / audio_write_playout() instead of the desktop
  CPAL rings. SyncWrapper now holds a placeholder Box<()> on Android
  because the audio backend lives in a process-global singleton inside
  libwzp_native.so rather than being owned per-engine.

Next step: build #39 on the remote docker builder and smoke-test on
Pixel 6 that the Connect button in the UI successfully brings up Oboe
and streams audio through the dlopen boundary.
2026-04-09 18:42:27 +04:00