Add emit_call_debug events at every step of the Android connect/audio path so failures are visible in the Settings debug log without needing adb logcat: - connect:handshake_start/done/failed (with timing) - connect:android_audio_preflight (wzp_native loaded + RECORD_AUDIO permission check via new has_record_audio_permission() JNI helper) - connect:audio_stop_start/done - connect:audio_mode_start/done/failed - connect:audio_start_start/failed/panic/done (with oboe error code) - connect:reuse_endpoint (endpoint reuse diagnostic) Also adds has_record_audio_permission() to android_audio.rs — used in the preflight event to confirm the OS has granted mic access before wzp_oboe_start is called. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
11 KiB
BUG-001: Android "Connecting…" Hangs / Join Voice Never Completes
Severity: P0 — renders the app non-functional for room joins on a fresh install
Status: Partially mitigated (5a13f12), narrowed by static review; Android repro/logcat still needed
Branch: experimental-ui
Last investigated: 2026-05-25
Device confirmed affected: Nothing Phone A059 (Android 15)
Symptom
User taps "Join Voice". Button changes to "Connecting…" and stays there indefinitely. No error toast, no drawer, no progress. The only recovery is force-quitting the app.
2026-05-25 Static Review Update
The exact indefinite "Connecting…" symptom most likely came from an APK older than 5a13f12, because current desktop/src/main.ts has a 15s JS-side timeout for manual room joins. The current branch can still produce closely related failures:
- Native Oboe start can report false success when Android leaves capture/playout in
Startingfor 2s. That manifests as "joined but silent/dead audio", not a true JS hang. - First-run microphone permission can still race the first
openStream(Direction::Input), especially when the user joins immediately after granting permission. - Direct-call auto-connect did not have the 15s JS timeout even after
5a13f12. - Toasts used
${e}, so object-shaped Tauri errors could appear as[object Object].
Working-tree diagnostic changes applied during this investigation:
crates/wzp-native/cpp/oboe_bridge.cpp: return-6if both streams do not reachStartedbefore the 2s poll deadline. This turns Oboe false-success into a visible Rust/JS error.desktop/src/main.ts: sharedconnectWithTimeout()for room joins and direct-call auto-connect; sharederrorMessage()for useful toast text.desktop/src-tauri/src/engine.rs: emitconnect:handshake_*,connect:android_audio_preflight,connect:audio_*markers around each Android-only join step.desktop/src-tauri/src/lib.rs: emitconnect:reuse_endpointso we can see whether the room join is sharing the signal QUIC endpoint.
Next Android repro should distinguish:
| Toast / log | Meaning |
|---|---|
Join failed: wzp_native_audio_start failed: code -2 |
mic permission / capture open failure |
Join failed: wzp_native_audio_start failed: code -6 |
Oboe streams opened/requested start, but HAL never transitioned both to Started |
Join failed: transport: timeout after 10000ms or similar after connect:handshake_start |
QUIC connected, but relay media handshake did not return CallAnswer |
Join failed: connect timed out (15s) - check audio permissions |
Tauri command did not resolve to JS; collect Rust/Tauri logs around connect:call_engine_starting |
Root Cause Chain
The invoke("connect") Tauri command runs the full CallEngine::start coroutine on Android. Execution order:
- Parse relay address → QUIC dial → crypto handshake (~200ms, works — relay logs confirm room join succeeds)
audio_stop()(no-op on first launch)tokio::time::sleep(50ms)set_audio_mode_communication()(JNI into Kotlin)tokio::task::spawn_blocking(crate::wzp_native::audio_start)← primary hang point
audio_start calls wzp_oboe_start() (C++ FFI in crates/wzp-native/cpp/oboe_bridge.cpp), which:
- Opens capture stream (
captureBuilder.openStream) - Opens playout stream (
playoutBuilder.openStream) g_capture_stream->requestStart()g_playout_stream->requestStart()- Polls up to 2 seconds in a
std::this_thread::sleep_for(10ms)busy-wait loop waiting for both streams to reachStartedstate (oboe_bridge.cpp:404–423)
Before the working-tree -6 diagnostic change, if the HAL never transitioned to Started, wzp_oboe_start returned 0 (success!) after the 2s timeout even though streams were not functional. Rust saw ret == 0, considered it success, and CallEngine::start returned Ok.
The invoke("connect") promise resolves successfully, enterVoice(false) is called, the voice drawer appears — but audio streams are dead. The send task reads silence, the playout ring never drains.
However, relay log evidence shows the connection is established and then dropped 166ms later with forwarded=0, which means CallEngine::start did return to the connect command. If the user still sees "Connecting…" at that point, the JS await connectRace is not resolving — suggesting either the Rust command returned an error (which should show as a toast) or the invoke promise is hanging for a different reason.
Evidence
Relay log (pangolin, session at 06:40:04 UTC):
room "general" join accepted
crypto handshake complete t=+184ms
connection dropped t=+350ms forwarded=0
The relay sees a clean connection that self-terminates in ~350ms total. forwarded=0 means no media was exchanged. Consistent with audio_start failing or the call task throwing before media loops start.
Four rapid connects at 06:40:04 in the relay log suggest multiple taps (no connectPending guard in the APK installed at that time, or user was on an older build).
Fixes Applied in 5a13f12
| # | Problem | Fix | File |
|---|---|---|---|
| 1 | wzp_oboe_start called directly on tokio worker thread → froze entire runtime including timeouts |
Changed to spawn_blocking |
desktop/src-tauri/src/engine.rs:609 |
| 2 | No JS-side timeout → "Connecting…" hangs forever if Rust never returns | Added 15s Promise.race |
desktop/src/main.ts:338 |
| 3 | No error feedback to user | Added showToast() in catch block |
desktop/src/main.ts:352 |
| 4 | Button disappeared on click | Changed to disabled + "Connecting…" text |
desktop/src/main.ts:335 |
| 5 | Handshake could hang forever waiting for CallAnswer |
Added 10s tokio::time::timeout |
crates/wzp-client/src/handshake.rs:105 |
Open Issues (Not Yet Fixed)
Issue A: g_running flag race between audio_stop and audio_start
Current status: likely fixed in current branch. crates/wzp-native/cpp/oboe_bridge.cpp:430 now clears g_running at the top of wzp_oboe_stop.
oboe_bridge.cpp:244 checks g_running.load() at entry to wzp_oboe_start. The engine calls audio_stop() then waits 50ms then calls audio_start(). If wzp_oboe_stop does not synchronously clear g_running before returning, the next wzp_oboe_start sees g_running == true and returns -1 immediately (line 246–247).
With 5a13f12, Rust now propagates this as "wzp_native_audio_start failed: code -1" → toast. Confirm via logcat.
Issue B: Mic permission granted at runtime causes audio HAL delay
After clearing app data, Android prompts for mic permission. The OS grants it but the audio HAL may not immediately honor it. The first openStream(Direction::Input) within ~1s of permission grant can fail with ErrorPermissionDenied → Oboe returns -2.
With 5a13f12 this should surface as toast: "Join failed: wzp_native_audio_start failed: code -2".
Issue C: wzp_oboe_start 2s poll timeout returns 0 (false success)
oboe_bridge.cpp:404–423: if streams don't reach Started state within 2s, the poll loop exits with no error — wzp_oboe_start returns 0. Rust treats this as success. The drawer appears but audio is dead. This is the "joined but silent" failure mode, distinct from "stuck on Connecting…".
Fix: return a distinct error code (e.g. -6) from wzp_oboe_start when the poll times out without both streams reaching Started.
Working-tree status: implemented as -6; needs Android NDK/device validation.
Issue D: Error object serialization in JS toast
The connect command returns Result<String, String>. Tauri wraps the Err as a JS exception. If e in the catch block is a Tauri error object rather than a plain string, ${e} renders as "[object Object]". Should use e?.message ?? String(e) for robust stringification.
Working-tree status: implemented via errorMessage(e).
wzp_oboe_start Return Codes Reference
| Code | Meaning |
|---|---|
| 0 | Success |
| -1 | Already running (g_running == true at entry) |
| -2 | captureBuilder.openStream failed |
| -3 | playoutBuilder.openStream failed |
| -4 | g_capture_stream->requestStart() failed |
| -5 | g_playout_stream->requestStart() failed |
| -6 | streams failed to reach Started before poll timeout |
Reproduction Steps
- Fresh install (or clear app data) on Nothing Phone A059
- Grant microphone permission when prompted
- Configure relay
193.180.213.68:4433, roomgeneral - Tap "Join Voice"
- Observe: button shows "Connecting…" indefinitely
Diagnostic Steps
We have never captured adb logcat from a failing connect. This is the single highest-value diagnostic:
adb logcat -s "wzp-native" "wzp-desktop" "RustStd" | grep -E "audio|oboe|start|handshake|connect"
Key log lines to look for:
| Log line | Diagnosis |
|---|---|
connect:reuse_endpoint |
Whether media is sharing the existing signal endpoint |
connect:handshake_start followed by 10s timeout |
Relay media handshake is stuck before Android audio starts |
connect:handshake_done |
Network/relay handshake succeeded; continue to audio diagnostics |
connect:android_audio_preflight |
Shows wzp-native load state and RECORD_AUDIO permission |
connect:audio_start_start with no done/failed |
Native Oboe call is hanging |
wzp_oboe_start: already running |
Issue A — g_running not cleared |
Failed to open capture stream: ErrorPermissionDenied |
Issue B — mic permission delay |
Failed to start capture / Failed to start playout |
Oboe HAL error, code -4 or -5 |
both streams Started after N polls |
audio_start succeeded |
audio_start task panic |
spawn_blocking panic (shouldn't happen) |
wzp_native_audio_start failed: code X |
Rust caught it, toast should be visible |
Alternatively: enable Call debug logs in Settings, reproduce, use the share button to extract logs without USB.
Proposed Fixes (Prioritized)
- Validate
-6fromwzp_oboe_starton poll timeout on Android builder/device — eliminates silent false-success - Add mic permission pre-check in Kotlin before calling into Rust — surface a cleaner error if permission is not yet effective
- If
-6reproduces on Nothing A059, test startup sequencing: request/start capture beforeMODE_IN_COMMUNICATION, add a short post-permission delay, or retry once after a fullwzp_oboe_stop
Related Files
crates/wzp-native/cpp/oboe_bridge.cpp—wzp_oboe_startimplementationcrates/wzp-native/src/lib.rs:238—audio_start_inner(Rust FFI wrapper)desktop/src-tauri/src/engine.rs:576–635—CallEngine::startaudio sectiondesktop/src/main.ts:328–360—joinVoiceBtnclick handlercrates/wzp-client/src/handshake.rs:105— handshake timeout