Blockers 4 & 5: browser getUserMedia → JPEG IPC → Rust I420 pipeline;
remote video strip renders decoded frames via canvas; EncryptingTransport
wraps QuinnTransport so WZP AEAD is applied to all media (C2 fix).
Test fixes: HandshakeResult.session destructuring across relay/client/crypto
integration tests; video_codecs field added to all CallOffer/CallAnswer
structs; wzp-video pipeline_roundtrip integration tests added.
PRD docs: five Kimi-ready specs for E2E encryption, Android NDK 0.9 migration,
quality upgrade flow, wire-format hardening, and clippy debt.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass AppHandle into run_signal_task so it can emit call-debug events
and Tauri events directly. On each RoomUpdate:
- emit connect:media:room_update debug event with participant list
- emit call-event/participants Tauri event for JS-side diagnostics
Helps diagnose whether room join and participant sync is working
independently of audio startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
spawn_blocking uses arbitrary thread-pool threads that don't have the
Android JNI context initialized, causing ndk_context::android_context()
to panic. Switch to run_on_main_thread (where the context is always
valid) via a oneshot channel, with a 2s timeout. Panic is caught and
forwarded as an Err so the debug log captures it rather than crashing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The JNI call into AudioManager.setMode() was running directly on the
tokio async thread. If the Android audio policy service is slow (e.g.
immediately after mic permission grant), this could block the runtime.
Moved to spawn_blocking with a 2s timeout; timeout and panic cases are
logged as connect:audio_mode_timeout / connect:audio_mode_panic debug
events and treated as non-fatal (we continue to audio_start).
Also removes the has_record_audio_permission call from the preflight
debug event — it was a redundant JNI round-trip that added latency and
is now captured separately in the preflight_start event context.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The legacy event_cb("connected") call between handshake and audio
preflight was a no-op on the frontend (it enters voice only after the
command resolves) but added noise to failing traces. Replaced with a
connect:connected_event_skipped debug event and added an explicit
connect:android_audio_preflight_start marker so the debug log shows a
clear boundary between handshake completion and audio startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- engine.rs: wrap spawn_blocking(audio_start) in an 8s tokio timeout so
the connect command fails fast with a clear error if the Oboe HAL
never returns, instead of blocking the JS 45s timer
- lib.rs: emit_call_debug now always forwards connect: and
register_signal: steps to the JS overlay regardless of the debug-logs
toggle — needed because app-data clears reset the toggle to false,
making join failures invisible on first install
- main.ts: JS timeout bumped to 45s (Rust 8s fires first); timeout
message now includes last native connect: step so the toast is
actionable without opening the debug log
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add emit_call_debug events at every step of the Android connect/audio
path so failures are visible in the Settings debug log without needing
adb logcat:
- connect:handshake_start/done/failed (with timing)
- connect:android_audio_preflight (wzp_native loaded + RECORD_AUDIO
permission check via new has_record_audio_permission() JNI helper)
- connect:audio_stop_start/done
- connect:audio_mode_start/done/failed
- connect:audio_start_start/failed/panic/done (with oboe error code)
- connect:reuse_endpoint (endpoint reuse diagnostic)
Also adds has_record_audio_permission() to android_audio.rs — used in
the preflight event to confirm the OS has granted mic access before
wzp_oboe_start is called.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- oboe_bridge.cpp: return -6 (instead of silent 0) when streams do not
reach Started within the 2s poll deadline; also clean up streams on
that path so a retry can succeed
- main.ts: shared connectWithTimeout() so room-join and direct-call
auto-connect both get the 15s JS timeout; shared errorMessage() so
Tauri error objects don't show as [object Object] in toasts
- docs/bugs/001-android-join-voice-hang.md: comprehensive bug report
with root cause chain, evidence, return code table, and next steps
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
wzp_oboe_start is a sync FFI call that can block the OS thread
indefinitely waiting on the Android audio HAL. Calling it directly
from an async context freezes all tokio tasks including Rust-side
timeouts. Fix: run it via spawn_blocking so tokio stays responsive.
Also add a 15s Promise.race timeout in JS so a frozen audio_start
surfaces as "connect timed out — check audio permissions" instead of
the join button staying stuck in "Connecting…" forever.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- handshake.rs: add 10s timeout on recv_signal() waiting for CallAnswer —
previously hung forever if relay didn't respond, making join button
disappear with no feedback
- main.ts: keep join button visible + show "Connecting…" state instead of
hiding it before the await; button restores correctly on error
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- main.ts: add showToast() — surfaces Rust connect errors that were
previously swallowed silently (key for diagnosing "never joins calls")
- main.ts: connectPending flag prevents double-tap race on Join Voice
and CallSetup auto-connect; hides button while connect is in-flight
- build-linux-docker.sh: send ntfy notification per-server after each
relay deploy (shows host + version deployed)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Copy/Share log now includes HH:MM:SS timestamps
- callInProgress stays true until call resolves (setup or hangup),
preventing multiple taps from firing multiple place_call offers
- Block place_call when there's a pending incoming call
- leaveVoice clears all call state (callInProgress, pendingCallId)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Filter self from lobby list (double-check in renderLobbyUsers)
- Disable "Direct Call" button when tapping own user
- Debounce call button (callInProgress flag prevents double-tap)
- Block calling own fingerprint
- Stats line shows codec names + fps + audio level
The direct call to the other phone failing is likely because
both phones share the same reflexive addr:port on the same NAT,
making determine_role return None (equal addrs). This is an
existing edge case in reflect.rs — not a UI bug.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Buttons: use text labels (Mic/Spk/End) instead of emoji HTML
entities that rendered as raw text on Android WebView
- Stats: match Rust CallStatus fields (tx_codec, rx_codec,
encode_fps, recv_fps, audio_level, spk_muted)
- Nicknames: register_signal sends derive_alias() as the alias
so other users see "Brave Falcon" instead of "a525:e9b2:..."
- Lobby header shows alias from get_app_info instead of raw fp
- pollStatus uses correct field names from Rust struct
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Discord-style bottom drawer for voice instead of navigating away:
- "Join Voice" hides the FAB, slides up a persistent bottom bar
- Drawer shows: room name, timer, P2P/Relay badge, level meter
- Controls: mic, speaker, end call — all in the drawer
- Direct call info (identicon, name, P2P badge) shown inline
- Lobby stays visible above the drawer at all times
- Stats line shows codec/packet/FEC info
- Leave voice = drawer slides away, FAB returns
Removed: full-screen call-screen, back button, old participant
list, old mic/speaker/hangup buttons. All voice interaction
happens in the 15% bottom drawer while the lobby stays live.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Settings now shows relay list with:
- Visual list of all configured relays
- Active relay highlighted in green with "ACTIVE" badge
- Tap a relay to switch (deregisters + reconnects automatically)
- X button to remove a relay (keeps at least 1)
- Add relay with name + address inputs
- Reconnect flow: deregister → clear lobby → auto-connect to new relay
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lobby now populates from PresenceList signal events:
- Relay broadcasts user list on register/deregister
- JS receives "presence_list" signal-event
- Updates lobbyUsers map (excluding self)
- Renders user rows with identicon, name, fingerprint
Users appear in the lobby as soon as they register their
signal channel — no need to join voice first.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New signal infrastructure for the lobby-first UI:
- PresenceUser struct: { fingerprint, alias }
- SignalMessage::PresenceList: relay broadcasts full user list
to all signal clients on every register/deregister
- SignalHub::presence_list(): builds the list from connected clients
- SignalHub::broadcast(): sends to ALL signal clients
- Relay calls broadcast on register + unregister
- Desktop emits "presence_list" signal-event to JS frontend
This gives clients real-time visibility of who's online via the
signal channel, without needing to join a voice room first.
603 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete JS rewrite for IRC-style lobby flow:
- Auto-connect signal channel on app launch (no connect button)
- Lobby shows online users with identicon, name, voice status
- "Join Voice" FAB toggles room voice on/off
- Tap user → context menu → Direct Call
- Incoming call banner slides up from bottom
- Back button returns from call to lobby
- Settings panel preserved with all debug toggles
~500 lines (down from 1786) — focused on the lobby experience.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New IRC-style lobby layout:
- Auto-connect on launch, drop into user list
- User rows with identicon, name, fingerprint, voice status
- Speaking indicator (green highlight + pulsing)
- Join Voice FAB (green, toggles to Leave/red)
- Incoming call banner (slides up from bottom)
- User context menu (tap user → Call / Message)
- Settings panel preserved from original
The old connect-screen HTML is removed. The call-screen is kept
intact. JS adaptation next.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New SignalMessage variants for P2P quality coordination:
UpgradeProposal/UpgradeResponse/UpgradeConfirm (#28):
- Consensual quality upgrade flow — proposer sends desired profile,
peer accepts/rejects based on own conditions, confirm commits both
- All carry call_id for relay routing
QualityCapability (#30):
- Peer reports its max sustainable profile — enables asymmetric
encoding where each side uses its own best quality instead of
forcing everyone to the weakest link
Relay forwards all 4 signals to the call peer (same pattern as
MediaPathReport, CandidateUpdate, HardNatProbe).
Desktop signal recv loop handles all 4 with debug logging.
Encoder switching TODOs noted for wiring into CallEngine.
4 new serde roundtrip tests. 603 total, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New setting: "Birthday attack (opens extra ports for hard NAT)"
- Default: OFF — no extra latency on call setup
- When ON: waits up to 3s for peer's birthday ports if peer has
non-cone NAT, adds them to the dial race
Gated end-to-end: Settings → localStorage → JS invoke →
Rust connect param → birthday wait + target injection.
LAN/cone calls unaffected regardless of setting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete Dialer-side birthday attack integration:
- SignalState stores peer_birthday_ports from HardNatBirthdayStart
- connect command: if peer's HardNatProbe shows non-cone NAT, waits
up to 3s for birthday ports to arrive (Acceptor needs time to open
32 sockets + STUN-probe each)
- When birthday ports arrive, generate_dialer_targets() builds hit
list (known ports + random fill) and adds them to PeerCandidates
- All birthday targets go into the dual-path race as extra candidates
- LAN/cone calls skip the wait entirely (gated on allocation type)
Full waterfall now:
1. Standard candidates (reflexive + mapped) → immediate
2. Port prediction (sequential delta) → immediate
3. Birthday targets (if non-cone peer) → +3s wait
4. All of above raced in parallel via JoinSet
5. Relay runs concurrently with 500ms head-start
599 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Birthday attack for random symmetric NATs:
- birthday.rs: open_acceptor_ports() opens N sockets, STUN-probes
each to learn external ports. generate_dialer_targets() builds
hit list (known ports first, then random fill). spray_dialer()
sprays QUIC connects with rate limiting, first success wins.
- Default: 32 acceptor ports, 128 dialer probes, 20ms interval
Signal coordination:
- HardNatBirthdayStart { acceptor_ports, external_ip } sent by
Acceptor when peer's HardNatProbe shows random/sequential NAT
- Relay forwards it like other call signals
- Desktop recv loop handles and logs it
Hybrid waterfall integration:
- On receiving HardNatProbe with non-cone allocation, Acceptor
auto-opens birthday ports and sends BirthdayStart
- Sockets kept alive 10s for NAT mapping persistence
- Dialer spray integration into race() pending (needs transport
hot-swap for background upgrade)
6 new tests, 599 total, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New toggle in Settings → "Direct-only mode (no relay fallback)":
- Default: OFF (normal behavior, relay fallback on P2P failure)
- When ON: connect returns error if P2P fails, with full
candidate_diags in the debug log showing why each candidate
failed. Call never falls back to relay.
Useful for testing NAT traversal — you see the exact failure
reason instead of the call silently working through relay.
Wired end-to-end:
- Settings.directOnly persisted in localStorage
- Passed as directOnly param to Rust connect command
- connect:path_negotiated shows direct_only flag
- connect:direct_only_failed emits on failure with diags
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes from real-world 5G↔Starlink testing:
NAT tickle fix:
- tokio::net::UdpSocket::bind() doesn't set SO_REUSEADDR, so binding
to the same port as quinn silently failed. Now uses socket2::Socket
with explicit SO_REUSEADDR + SO_REUSEPORT (via libc on unix).
- Tickle now logs success/failure for debugging.
Diagnostic fixes:
- connect:dual_path_race_start shows both dial_order_raw and
dial_order_smart so we can see what filtering removed
- Grace-period timeout (relay wins first, direct still running)
now fills "timeout:grace" diags for unrecorded candidates
- Previously candidate_diags was empty when relay won the race
Dependencies:
- Added socket2 = "0.5" to wzp-client
593 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major P2P improvements for cross-network calls:
Smart candidate filtering (smart_dial_order):
- Strip LAN candidates when peer's public IP differs from ours
(172.16.x.x is unreachable from a different network)
- Strip all IPv6 candidates (Phase 7 disabled, wastes dial slots)
- Only keep mapped + reflexive for cross-network calls
- LAN candidates preserved when both peers share the same public IP
Acceptor NAT tickle:
- A-role sends a 1-byte UDP packet to each peer candidate BEFORE
accepting. This opens the NAT pinhole for return traffic from
the Dialer's IP — critical for address-restricted NATs that only
allow inbound from IPs they've seen outbound traffic to.
- Uses SO_REUSEADDR on the same port as the quinn endpoint.
Direct timeout increased from 2s to 4s:
- Cross-network QUIC handshakes through CGNAT can take 2-3s
- 2s was too aggressive for 5G/LTE networks
Diagnostic fix:
- Record "timeout:4s" for candidates still in-flight when the
timeout fires (previously these had no diagnostic entry)
5 new tests for smart_dial_order edge cases.
593 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The callDebugBuffer.length=0 in showCallScreen() ran AFTER the
connect command returned, wiping all connect: events (path_negotiated,
race_start, race_done, candidate_diags). Only media: events survived
because they arrived after the clear.
Removed all automatic buffer clearing. The reverse().find() already
handles stale data by picking the most recent event. The manual
"Clear log" button (line 624) is the only way to clear now.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clearing callDebugBuffer in showConnectScreen() wiped all debug
events the moment a call ended, so the user saw empty logs. Moved
the clear to showCallScreen() instead — the buffer is reset at the
START of a new call, not the end. This way:
- After hanging up, all events from the call are still visible
- Starting a new call clears stale data from the previous one
- The reverse().find() for P2P badge still gets fresh data
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
findLast() requires Chrome 97+ / Android WebView 97+. Older Android
devices crash with TypeError in pollStatus(), killing all status
updates including the debug log. Use [...arr].reverse().find() which
works everywhere.
Also pass peerMappedAddr in the direct-call connect invoke.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added CandidateDiag struct to RaceResult with per-candidate:
- address attempted
- result (ok / skipped:ipv6 / error:reason)
- elapsed time in ms
Surfaced in call-debug events:
- connect:dual_path_race_start now includes dial_order + peer_mapped
- connect:dual_path_race_done now includes candidate_diags array
Upgraded dual_path tracing from debug to info for IPv6 skips and
dial failures so they appear in logcat/console.
Helps diagnose why P2P fails on specific networks (5G CGNAT,
address-restricted NATs, etc).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The callDebugBuffer persisted across calls, so .find() returned the
path_negotiated event from Call 1 (P2P Direct) when rendering the
badge during Call 2 (Relay). Two fixes:
1. Clear callDebugBuffer in showConnectScreen() between calls
2. Use .findLast() instead of .find() so the most recent event wins
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
End-to-end integration of sequential port prediction:
- place_call: spawns background detect_port_allocation() + sends
HardNatProbe signal after offer (doesn't delay call setup)
- answer_call: same for AcceptTrusted answers (privacy mode skips)
- Signal recv loop: stashes HardNatProbe in SignalState.peer_hard_nat_probe
- connect: reads peer's probe, if Sequential{delta} runs predict_ports()
and adds predicted addrs to PeerCandidates.local for the dual-path race
- parse_sequential_delta() helper for "sequential(delta=N)" strings
The full flow: both peers independently detect their NAT's port
allocation, exchange HardNatProbe via relay, and the connect command
uses the peer's sequence to predict which ports to dial — all before
the dual-path race starts.
588 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Edition 2024 on local macOS auto-resolves the Emitter trait, but the
Docker builder's Rust/Tauri version requires the explicit import for
AppHandle::emit() to resolve. Keeps the warning locally to avoid
breaking the CI build.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P2P calls now adapt codec quality based on observed network conditions,
matching what relay calls already had.
Three-layer implementation:
- QualityReport::from_path_stats(): construct reports from local quinn
stats (loss%, RTT, jitter) without needing relay-generated reports
- CallEncoder.pending_quality_report: one-shot attachment to next
source packet (consumed on encode, not repeated)
- Engine send tasks: generate quality report every 50 frames (~1s)
from quinn_path_stats() and attach via set_pending_quality_report()
- Engine recv tasks: self-observe from own QUIC path stats every 50
packets, feed to AdaptiveQualityController for P2P adaptation
(works even if peer isn't sending quality reports yet)
Both relay and P2P calls now have adaptive quality. On relay calls,
both peer-sent reports AND local observations feed the controller.
Hysteresis (3 consecutive bad reports to downgrade) prevents thrashing.
372 tests passing (+4 new: from_path_stats encoding, clamping, zero
values, encoder quality report attachment).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Partial reads from the capture ring consumed samples that were then
discarded when the send loop retried from buf[0]. For 20ms codecs this
was invisible (single Oboe burst fills 960 samples in one read), but
40ms codecs (Opus6k, 1920 samples) needed 2 bursts — the first partial
read consumed 960 real samples and threw them away.
Result: Opus6k produced ~11 frames/s instead of 25 (~44% of expected).
Fix: expose wzp_native_audio_capture_available() and check it before
reading, matching the desktop capture_ring.available() pattern. Partial
reads no longer occur because we only read when enough samples exist.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
frame_samples was immutable — when adaptive quality switched from 20ms
(Opus24k, 960 samples) to 40ms (Opus6k, 1920 samples), the send loop
kept reading 960 samples and feeding half-sized frames to the encoder.
This caused Opus6k to produce ~11 frames/s instead of 25, making audio
choppy.
Fix:
- frame_samples is now mut and updated on profile switch
- buf sized for max frame (1920) with frame_samples-bounded slices
- RMS, mute, encode, and capture reads all use &buf[..frame_samples]
- Applied to both Android and desktop send tasks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good +
Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5,
studio:10)
- Handle QualityDirective signals in both desktop and Android engines
— relay-coordinated codec switching now works end-to-end
- Add periodic TAP STATS to debug tap: packets in/out, fan-out avg,
seq gaps, codecs seen (every 5s)
- Mark task #2 done (ParticipantInfo in federation signals already
implemented)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wire AdaptiveQualityController into desktop engine send/recv tasks
(mirrors Android pattern: AtomicU8 pending_profile, auto-mode check)
- Wire same into Android engine send task (was only in recv before)
- QualityDirective SignalMessage variant for relay-initiated codec switch
- ParticipantQuality tracking in relay RoomManager (per-participant
AdaptiveQualityController, weakest-link tier computation)
- Relay broadcasts QualityDirective to all participants when room-wide
tier degrades (coordinated codec switching)
- Oboe stream state polling: poll getState() for up to 2s after
requestStart() to ensure both streams reach Started before proceeding
(fixes intermittent silent calls on cold start, Nothing Phone A059)
Tasks: #7, #25, #26, #31, #35
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without clearCommunicationDevice(), the BT headset stays locked in SCO
mode after the call. Media playback (video, music) can't route to BT
A2DP, requiring a device reboot to restore normal audio.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>