Quinn's cumulative loss_pct (lost / sent since connection start) was
biased forever by handshake-era losses. Even ~5 lost-out-of-100 early
packets pinned us at "Degraded" (5% threshold) and Codec2_1200 was just
a few more drops away. The metric only diluted as thousands more clean
packets accumulated — by which time the call was over.
LossWindow tracks prev (sent, lost) and reports delta loss per ~25-
packet window. The cumulative value is the fallback when the window
hasn't accumulated enough samples (< 20 packets).
All 6 sites converted (DRED tuner + QualityReport on both send tasks,
self-observation on both recv tasks).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirror the desktop video pipeline into the #[cfg(target_os="android")] start
function: capture _negotiated_video_codec from the handshake, spawn a video
send task that pulls VideoFrames from camera_tx, encodes/packetizes/sends.
Add video reassembly + decode + emit "video:frame" in the recv task before
the audio branch so Android can both send and receive video.
Instrumentation: emit video:first_send and video:first_recv on both desktop
and android paths so we can verify the pipeline end-to-end.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move video out of the voice drawer into a fixed-position stage that
covers the lobby above the drawer. Remote canvas fills the stage with
object-fit: contain; local preview is a 200x112 PiP in the bottom-right.
Placeholder shows "Waiting for remote video" with a frame counter until
the first frame arrives. Counter logs first remote frame to console for
debugging.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Voice regression: EncryptingTransport encrypts media with the pairwise
client↔relay session key, but the relay forwards bytes without re-encrypting
per recipient. Sender's key_A ≠ recipient's key_B → recipient cannot decrypt
→ silent audio between mac and android. Drop the wrapper; restore plaintext-
over-QUIC-TLS to the relay. Proper E2E needs MLS group keys or relay hop-by-
hop re-encryption (future PRD).
Android camera: add CAMERA manifest permission + runtime request via
MainActivity. NOTE: still not sufficient — Tauri/Wry's WebChromeClient does
not grant getUserMedia, so video on Android needs a Tauri plugin override
or native Camera2 path. Documented in MainActivity.kt.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blue FAB alongside Join Voice; click handler connects then calls
startCamera() so video is active from the moment the call starts.
Cam button inside drawer still toggles camera after joining either way.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blockers 4 & 5: browser getUserMedia → JPEG IPC → Rust I420 pipeline;
remote video strip renders decoded frames via canvas; EncryptingTransport
wraps QuinnTransport so WZP AEAD is applied to all media (C2 fix).
Test fixes: HandshakeResult.session destructuring across relay/client/crypto
integration tests; video_codecs field added to all CallOffer/CallAnswer
structs; wzp-video pipeline_roundtrip integration tests added.
PRD docs: five Kimi-ready specs for E2E encryption, Android NDK 0.9 migration,
quality upgrade flow, wire-format hardening, and clippy debt.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass AppHandle into run_signal_task so it can emit call-debug events
and Tauri events directly. On each RoomUpdate:
- emit connect:media:room_update debug event with participant list
- emit call-event/participants Tauri event for JS-side diagnostics
Helps diagnose whether room join and participant sync is working
independently of audio startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
spawn_blocking uses arbitrary thread-pool threads that don't have the
Android JNI context initialized, causing ndk_context::android_context()
to panic. Switch to run_on_main_thread (where the context is always
valid) via a oneshot channel, with a 2s timeout. Panic is caught and
forwarded as an Err so the debug log captures it rather than crashing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The JNI call into AudioManager.setMode() was running directly on the
tokio async thread. If the Android audio policy service is slow (e.g.
immediately after mic permission grant), this could block the runtime.
Moved to spawn_blocking with a 2s timeout; timeout and panic cases are
logged as connect:audio_mode_timeout / connect:audio_mode_panic debug
events and treated as non-fatal (we continue to audio_start).
Also removes the has_record_audio_permission call from the preflight
debug event — it was a redundant JNI round-trip that added latency and
is now captured separately in the preflight_start event context.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The legacy event_cb("connected") call between handshake and audio
preflight was a no-op on the frontend (it enters voice only after the
command resolves) but added noise to failing traces. Replaced with a
connect:connected_event_skipped debug event and added an explicit
connect:android_audio_preflight_start marker so the debug log shows a
clear boundary between handshake completion and audio startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- engine.rs: wrap spawn_blocking(audio_start) in an 8s tokio timeout so
the connect command fails fast with a clear error if the Oboe HAL
never returns, instead of blocking the JS 45s timer
- lib.rs: emit_call_debug now always forwards connect: and
register_signal: steps to the JS overlay regardless of the debug-logs
toggle — needed because app-data clears reset the toggle to false,
making join failures invisible on first install
- main.ts: JS timeout bumped to 45s (Rust 8s fires first); timeout
message now includes last native connect: step so the toast is
actionable without opening the debug log
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add emit_call_debug events at every step of the Android connect/audio
path so failures are visible in the Settings debug log without needing
adb logcat:
- connect:handshake_start/done/failed (with timing)
- connect:android_audio_preflight (wzp_native loaded + RECORD_AUDIO
permission check via new has_record_audio_permission() JNI helper)
- connect:audio_stop_start/done
- connect:audio_mode_start/done/failed
- connect:audio_start_start/failed/panic/done (with oboe error code)
- connect:reuse_endpoint (endpoint reuse diagnostic)
Also adds has_record_audio_permission() to android_audio.rs — used in
the preflight event to confirm the OS has granted mic access before
wzp_oboe_start is called.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- oboe_bridge.cpp: return -6 (instead of silent 0) when streams do not
reach Started within the 2s poll deadline; also clean up streams on
that path so a retry can succeed
- main.ts: shared connectWithTimeout() so room-join and direct-call
auto-connect both get the 15s JS timeout; shared errorMessage() so
Tauri error objects don't show as [object Object] in toasts
- docs/bugs/001-android-join-voice-hang.md: comprehensive bug report
with root cause chain, evidence, return code table, and next steps
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
wzp_oboe_start is a sync FFI call that can block the OS thread
indefinitely waiting on the Android audio HAL. Calling it directly
from an async context freezes all tokio tasks including Rust-side
timeouts. Fix: run it via spawn_blocking so tokio stays responsive.
Also add a 15s Promise.race timeout in JS so a frozen audio_start
surfaces as "connect timed out — check audio permissions" instead of
the join button staying stuck in "Connecting…" forever.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- handshake.rs: add 10s timeout on recv_signal() waiting for CallAnswer —
previously hung forever if relay didn't respond, making join button
disappear with no feedback
- main.ts: keep join button visible + show "Connecting…" state instead of
hiding it before the await; button restores correctly on error
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- main.ts: add showToast() — surfaces Rust connect errors that were
previously swallowed silently (key for diagnosing "never joins calls")
- main.ts: connectPending flag prevents double-tap race on Join Voice
and CallSetup auto-connect; hides button while connect is in-flight
- build-linux-docker.sh: send ntfy notification per-server after each
relay deploy (shows host + version deployed)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Copy/Share log now includes HH:MM:SS timestamps
- callInProgress stays true until call resolves (setup or hangup),
preventing multiple taps from firing multiple place_call offers
- Block place_call when there's a pending incoming call
- leaveVoice clears all call state (callInProgress, pendingCallId)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Filter self from lobby list (double-check in renderLobbyUsers)
- Disable "Direct Call" button when tapping own user
- Debounce call button (callInProgress flag prevents double-tap)
- Block calling own fingerprint
- Stats line shows codec names + fps + audio level
The direct call to the other phone failing is likely because
both phones share the same reflexive addr:port on the same NAT,
making determine_role return None (equal addrs). This is an
existing edge case in reflect.rs — not a UI bug.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Buttons: use text labels (Mic/Spk/End) instead of emoji HTML
entities that rendered as raw text on Android WebView
- Stats: match Rust CallStatus fields (tx_codec, rx_codec,
encode_fps, recv_fps, audio_level, spk_muted)
- Nicknames: register_signal sends derive_alias() as the alias
so other users see "Brave Falcon" instead of "a525:e9b2:..."
- Lobby header shows alias from get_app_info instead of raw fp
- pollStatus uses correct field names from Rust struct
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Discord-style bottom drawer for voice instead of navigating away:
- "Join Voice" hides the FAB, slides up a persistent bottom bar
- Drawer shows: room name, timer, P2P/Relay badge, level meter
- Controls: mic, speaker, end call — all in the drawer
- Direct call info (identicon, name, P2P badge) shown inline
- Lobby stays visible above the drawer at all times
- Stats line shows codec/packet/FEC info
- Leave voice = drawer slides away, FAB returns
Removed: full-screen call-screen, back button, old participant
list, old mic/speaker/hangup buttons. All voice interaction
happens in the 15% bottom drawer while the lobby stays live.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Settings now shows relay list with:
- Visual list of all configured relays
- Active relay highlighted in green with "ACTIVE" badge
- Tap a relay to switch (deregisters + reconnects automatically)
- X button to remove a relay (keeps at least 1)
- Add relay with name + address inputs
- Reconnect flow: deregister → clear lobby → auto-connect to new relay
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lobby now populates from PresenceList signal events:
- Relay broadcasts user list on register/deregister
- JS receives "presence_list" signal-event
- Updates lobbyUsers map (excluding self)
- Renders user rows with identicon, name, fingerprint
Users appear in the lobby as soon as they register their
signal channel — no need to join voice first.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New signal infrastructure for the lobby-first UI:
- PresenceUser struct: { fingerprint, alias }
- SignalMessage::PresenceList: relay broadcasts full user list
to all signal clients on every register/deregister
- SignalHub::presence_list(): builds the list from connected clients
- SignalHub::broadcast(): sends to ALL signal clients
- Relay calls broadcast on register + unregister
- Desktop emits "presence_list" signal-event to JS frontend
This gives clients real-time visibility of who's online via the
signal channel, without needing to join a voice room first.
603 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete JS rewrite for IRC-style lobby flow:
- Auto-connect signal channel on app launch (no connect button)
- Lobby shows online users with identicon, name, voice status
- "Join Voice" FAB toggles room voice on/off
- Tap user → context menu → Direct Call
- Incoming call banner slides up from bottom
- Back button returns from call to lobby
- Settings panel preserved with all debug toggles
~500 lines (down from 1786) — focused on the lobby experience.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New IRC-style lobby layout:
- Auto-connect on launch, drop into user list
- User rows with identicon, name, fingerprint, voice status
- Speaking indicator (green highlight + pulsing)
- Join Voice FAB (green, toggles to Leave/red)
- Incoming call banner (slides up from bottom)
- User context menu (tap user → Call / Message)
- Settings panel preserved from original
The old connect-screen HTML is removed. The call-screen is kept
intact. JS adaptation next.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New SignalMessage variants for P2P quality coordination:
UpgradeProposal/UpgradeResponse/UpgradeConfirm (#28):
- Consensual quality upgrade flow — proposer sends desired profile,
peer accepts/rejects based on own conditions, confirm commits both
- All carry call_id for relay routing
QualityCapability (#30):
- Peer reports its max sustainable profile — enables asymmetric
encoding where each side uses its own best quality instead of
forcing everyone to the weakest link
Relay forwards all 4 signals to the call peer (same pattern as
MediaPathReport, CandidateUpdate, HardNatProbe).
Desktop signal recv loop handles all 4 with debug logging.
Encoder switching TODOs noted for wiring into CallEngine.
4 new serde roundtrip tests. 603 total, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New setting: "Birthday attack (opens extra ports for hard NAT)"
- Default: OFF — no extra latency on call setup
- When ON: waits up to 3s for peer's birthday ports if peer has
non-cone NAT, adds them to the dial race
Gated end-to-end: Settings → localStorage → JS invoke →
Rust connect param → birthday wait + target injection.
LAN/cone calls unaffected regardless of setting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete Dialer-side birthday attack integration:
- SignalState stores peer_birthday_ports from HardNatBirthdayStart
- connect command: if peer's HardNatProbe shows non-cone NAT, waits
up to 3s for birthday ports to arrive (Acceptor needs time to open
32 sockets + STUN-probe each)
- When birthday ports arrive, generate_dialer_targets() builds hit
list (known ports + random fill) and adds them to PeerCandidates
- All birthday targets go into the dual-path race as extra candidates
- LAN/cone calls skip the wait entirely (gated on allocation type)
Full waterfall now:
1. Standard candidates (reflexive + mapped) → immediate
2. Port prediction (sequential delta) → immediate
3. Birthday targets (if non-cone peer) → +3s wait
4. All of above raced in parallel via JoinSet
5. Relay runs concurrently with 500ms head-start
599 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Birthday attack for random symmetric NATs:
- birthday.rs: open_acceptor_ports() opens N sockets, STUN-probes
each to learn external ports. generate_dialer_targets() builds
hit list (known ports first, then random fill). spray_dialer()
sprays QUIC connects with rate limiting, first success wins.
- Default: 32 acceptor ports, 128 dialer probes, 20ms interval
Signal coordination:
- HardNatBirthdayStart { acceptor_ports, external_ip } sent by
Acceptor when peer's HardNatProbe shows random/sequential NAT
- Relay forwards it like other call signals
- Desktop recv loop handles and logs it
Hybrid waterfall integration:
- On receiving HardNatProbe with non-cone allocation, Acceptor
auto-opens birthday ports and sends BirthdayStart
- Sockets kept alive 10s for NAT mapping persistence
- Dialer spray integration into race() pending (needs transport
hot-swap for background upgrade)
6 new tests, 599 total, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New toggle in Settings → "Direct-only mode (no relay fallback)":
- Default: OFF (normal behavior, relay fallback on P2P failure)
- When ON: connect returns error if P2P fails, with full
candidate_diags in the debug log showing why each candidate
failed. Call never falls back to relay.
Useful for testing NAT traversal — you see the exact failure
reason instead of the call silently working through relay.
Wired end-to-end:
- Settings.directOnly persisted in localStorage
- Passed as directOnly param to Rust connect command
- connect:path_negotiated shows direct_only flag
- connect:direct_only_failed emits on failure with diags
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes from real-world 5G↔Starlink testing:
NAT tickle fix:
- tokio::net::UdpSocket::bind() doesn't set SO_REUSEADDR, so binding
to the same port as quinn silently failed. Now uses socket2::Socket
with explicit SO_REUSEADDR + SO_REUSEPORT (via libc on unix).
- Tickle now logs success/failure for debugging.
Diagnostic fixes:
- connect:dual_path_race_start shows both dial_order_raw and
dial_order_smart so we can see what filtering removed
- Grace-period timeout (relay wins first, direct still running)
now fills "timeout:grace" diags for unrecorded candidates
- Previously candidate_diags was empty when relay won the race
Dependencies:
- Added socket2 = "0.5" to wzp-client
593 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major P2P improvements for cross-network calls:
Smart candidate filtering (smart_dial_order):
- Strip LAN candidates when peer's public IP differs from ours
(172.16.x.x is unreachable from a different network)
- Strip all IPv6 candidates (Phase 7 disabled, wastes dial slots)
- Only keep mapped + reflexive for cross-network calls
- LAN candidates preserved when both peers share the same public IP
Acceptor NAT tickle:
- A-role sends a 1-byte UDP packet to each peer candidate BEFORE
accepting. This opens the NAT pinhole for return traffic from
the Dialer's IP — critical for address-restricted NATs that only
allow inbound from IPs they've seen outbound traffic to.
- Uses SO_REUSEADDR on the same port as the quinn endpoint.
Direct timeout increased from 2s to 4s:
- Cross-network QUIC handshakes through CGNAT can take 2-3s
- 2s was too aggressive for 5G/LTE networks
Diagnostic fix:
- Record "timeout:4s" for candidates still in-flight when the
timeout fires (previously these had no diagnostic entry)
5 new tests for smart_dial_order edge cases.
593 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The callDebugBuffer.length=0 in showCallScreen() ran AFTER the
connect command returned, wiping all connect: events (path_negotiated,
race_start, race_done, candidate_diags). Only media: events survived
because they arrived after the clear.
Removed all automatic buffer clearing. The reverse().find() already
handles stale data by picking the most recent event. The manual
"Clear log" button (line 624) is the only way to clear now.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clearing callDebugBuffer in showConnectScreen() wiped all debug
events the moment a call ended, so the user saw empty logs. Moved
the clear to showCallScreen() instead — the buffer is reset at the
START of a new call, not the end. This way:
- After hanging up, all events from the call are still visible
- Starting a new call clears stale data from the previous one
- The reverse().find() for P2P badge still gets fresh data
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
findLast() requires Chrome 97+ / Android WebView 97+. Older Android
devices crash with TypeError in pollStatus(), killing all status
updates including the debug log. Use [...arr].reverse().find() which
works everywhere.
Also pass peerMappedAddr in the direct-call connect invoke.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>