Quinn's cumulative loss_pct (lost / sent since connection start) was
biased forever by handshake-era losses. Even ~5 lost-out-of-100 early
packets pinned us at "Degraded" (5% threshold) and Codec2_1200 was just
a few more drops away. The metric only diluted as thousands more clean
packets accumulated — by which time the call was over.
LossWindow tracks prev (sent, lost) and reports delta loss per ~25-
packet window. The cumulative value is the fallback when the window
hasn't accumulated enough samples (< 20 packets).
All 6 sites converted (DRED tuner + QualityReport on both send tasks,
self-observation on both recv tasks).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirror the desktop video pipeline into the #[cfg(target_os="android")] start
function: capture _negotiated_video_codec from the handshake, spawn a video
send task that pulls VideoFrames from camera_tx, encodes/packetizes/sends.
Add video reassembly + decode + emit "video:frame" in the recv task before
the audio branch so Android can both send and receive video.
Instrumentation: emit video:first_send and video:first_recv on both desktop
and android paths so we can verify the pipeline end-to-end.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move video out of the voice drawer into a fixed-position stage that
covers the lobby above the drawer. Remote canvas fills the stage with
object-fit: contain; local preview is a 200x112 PiP in the bottom-right.
Placeholder shows "Waiting for remote video" with a frame counter until
the first frame arrives. Counter logs first remote frame to console for
debugging.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Document why wrapping QuinnTransport with EncryptingTransport using the
pairwise client↔relay key cannot work for an SFU (recipient has a different
key than sender). Propose two valid paths: MLS group keys (true E2E) or
hop-by-hop relay re-encryption (relay-trusted). Recommend hop-by-hop first.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Voice regression: EncryptingTransport encrypts media with the pairwise
client↔relay session key, but the relay forwards bytes without re-encrypting
per recipient. Sender's key_A ≠ recipient's key_B → recipient cannot decrypt
→ silent audio between mac and android. Drop the wrapper; restore plaintext-
over-QUIC-TLS to the relay. Proper E2E needs MLS group keys or relay hop-by-
hop re-encryption (future PRD).
Android camera: add CAMERA manifest permission + runtime request via
MainActivity. NOTE: still not sufficient — Tauri/Wry's WebChromeClient does
not grant getUserMedia, so video on Android needs a Tauri plugin override
or native Camera2 path. Documented in MainActivity.kt.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blue FAB alongside Join Voice; click handler connects then calls
startCamera() so video is active from the moment the call starts.
Cam button inside drawer still toggles camera after joining either way.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blockers 4 & 5: browser getUserMedia → JPEG IPC → Rust I420 pipeline;
remote video strip renders decoded frames via canvas; EncryptingTransport
wraps QuinnTransport so WZP AEAD is applied to all media (C2 fix).
Test fixes: HandshakeResult.session destructuring across relay/client/crypto
integration tests; video_codecs field added to all CallOffer/CallAnswer
structs; wzp-video pipeline_roundtrip integration tests added.
PRD docs: five Kimi-ready specs for E2E encryption, Android NDK 0.9 migration,
quality upgrade flow, wire-format hardening, and clippy debt.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The awk '{print $5}' and grep 'assets/' inside the single-quoted
Docker bash -c '...' string closed the outer quote early, producing
"unexpected EOF while looking for matching ')'" at runtime.
Use double-quoted awk with escaped $5 instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous fix re-ran ./gradlew assembleUniversalRelease to include
the missing frontend assets, but BuildTask.kt calls
`cargo tauri android android-studio-script` which requires the full
Tauri CLI build environment — it fails immediately when invoked
standalone.
New approach: inject the dist/ files directly into the unsigned APK
(which is a ZIP file) using `zip -r`. The existing zipalign + apksigner
step re-aligns and signs the result, producing a valid APK. No extra
Gradle invocation needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tauri CLI 2.10.x silently skips copying the frontendDist (desktop/dist/)
to gen/android/app/src/main/assets/ on Android builds. The WebView then
fails at runtime with "Asset not found: index.html".
After cargo tauri android build, check if index.html landed in the
Android assets folder. If not (the bug path), copy dist/ manually and
re-run ./gradlew assembleUniversalRelease. Gradle is incremental here
(no Java/Kotlin changed) so the extra pass takes < 30s.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The existing build-tauri-android.sh holds an SSH connection open for
the entire Docker build (~10 min). Running it in the background kills
it when the SSH keepalive times out (~60s of silence during compile).
New script:
- uploads the build script to remote and launches it in a detached
tmux session so it survives SSH disconnects
- exits immediately (fire-and-forget); build result arrives via ntfy
- --wait flag blocks + downloads APK when done (same as old script)
- same flags as the original: --init, --rust, --no-pull, --debug
Usage:
./scripts/android-build-async.sh # fire and forget
./scripts/android-build-async.sh --wait # block until APK downloaded
./scripts/android-build-async.sh --init --wait
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass AppHandle into run_signal_task so it can emit call-debug events
and Tauri events directly. On each RoomUpdate:
- emit connect:media:room_update debug event with participant list
- emit call-event/participants Tauri event for JS-side diagnostics
Helps diagnose whether room join and participant sync is working
independently of audio startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
spawn_blocking uses arbitrary thread-pool threads that don't have the
Android JNI context initialized, causing ndk_context::android_context()
to panic. Switch to run_on_main_thread (where the context is always
valid) via a oneshot channel, with a 2s timeout. Panic is caught and
forwarded as an Err so the debug log captures it rather than crashing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The JNI call into AudioManager.setMode() was running directly on the
tokio async thread. If the Android audio policy service is slow (e.g.
immediately after mic permission grant), this could block the runtime.
Moved to spawn_blocking with a 2s timeout; timeout and panic cases are
logged as connect:audio_mode_timeout / connect:audio_mode_panic debug
events and treated as non-fatal (we continue to audio_start).
Also removes the has_record_audio_permission call from the preflight
debug event — it was a redundant JNI round-trip that added latency and
is now captured separately in the preflight_start event context.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The legacy event_cb("connected") call between handshake and audio
preflight was a no-op on the frontend (it enters voice only after the
command resolves) but added noise to failing traces. Replaced with a
connect:connected_event_skipped debug event and added an explicit
connect:android_audio_preflight_start marker so the debug log shows a
clear boundary between handshake completion and audio startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- engine.rs: wrap spawn_blocking(audio_start) in an 8s tokio timeout so
the connect command fails fast with a clear error if the Oboe HAL
never returns, instead of blocking the JS 45s timer
- lib.rs: emit_call_debug now always forwards connect: and
register_signal: steps to the JS overlay regardless of the debug-logs
toggle — needed because app-data clears reset the toggle to false,
making join failures invisible on first install
- main.ts: JS timeout bumped to 45s (Rust 8s fires first); timeout
message now includes last native connect: step so the toast is
actionable without opening the debug log
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add emit_call_debug events at every step of the Android connect/audio
path so failures are visible in the Settings debug log without needing
adb logcat:
- connect:handshake_start/done/failed (with timing)
- connect:android_audio_preflight (wzp_native loaded + RECORD_AUDIO
permission check via new has_record_audio_permission() JNI helper)
- connect:audio_stop_start/done
- connect:audio_mode_start/done/failed
- connect:audio_start_start/failed/panic/done (with oboe error code)
- connect:reuse_endpoint (endpoint reuse diagnostic)
Also adds has_record_audio_permission() to android_audio.rs — used in
the preflight event to confirm the OS has granted mic access before
wzp_oboe_start is called.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- oboe_bridge.cpp: return -6 (instead of silent 0) when streams do not
reach Started within the 2s poll deadline; also clean up streams on
that path so a retry can succeed
- main.ts: shared connectWithTimeout() so room-join and direct-call
auto-connect both get the 15s JS timeout; shared errorMessage() so
Tauri error objects don't show as [object Object] in toasts
- docs/bugs/001-android-join-voice-hang.md: comprehensive bug report
with root cause chain, evidence, return code table, and next steps
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
wzp_oboe_start is a sync FFI call that can block the OS thread
indefinitely waiting on the Android audio HAL. Calling it directly
from an async context freezes all tokio tasks including Rust-side
timeouts. Fix: run it via spawn_blocking so tokio stays responsive.
Also add a 15s Promise.race timeout in JS so a frozen audio_start
surfaces as "connect timed out — check audio permissions" instead of
the join button staying stuck in "Connecting…" forever.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- handshake.rs: add 10s timeout on recv_signal() waiting for CallAnswer —
previously hung forever if relay didn't respond, making join button
disappear with no feedback
- main.ts: keep join button visible + show "Connecting…" state instead of
hiding it before the await; button restores correctly on error
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- main.ts: add showToast() — surfaces Rust connect errors that were
previously swallowed silently (key for diagnosing "never joins calls")
- main.ts: connectPending flag prevents double-tap race on Join Voice
and CallSetup auto-connect; hides button while connect is in-flight
- build-linux-docker.sh: send ntfy notification per-server after each
relay deploy (shows host + version deployed)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Convert Hold/Unhold/Mute/Unmute/TransferAck from unit variants to struct
variants with `version: u8` (serde default = 2). Every SignalMessage
variant now carries a version field, enabling future semantic versioning
and clean rejection of deprecated variants during federation routing.
305 tests passing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>