Quinn's cumulative loss_pct (lost / sent since connection start) was
biased forever by handshake-era losses. Even ~5 lost-out-of-100 early
packets pinned us at "Degraded" (5% threshold) and Codec2_1200 was just
a few more drops away. The metric only diluted as thousands more clean
packets accumulated — by which time the call was over.
LossWindow tracks prev (sent, lost) and reports delta loss per ~25-
packet window. The cumulative value is the fallback when the window
hasn't accumulated enough samples (< 20 packets).
All 6 sites converted (DRED tuner + QualityReport on both send tasks,
self-observation on both recv tasks).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirror the desktop video pipeline into the #[cfg(target_os="android")] start
function: capture _negotiated_video_codec from the handshake, spawn a video
send task that pulls VideoFrames from camera_tx, encodes/packetizes/sends.
Add video reassembly + decode + emit "video:frame" in the recv task before
the audio branch so Android can both send and receive video.
Instrumentation: emit video:first_send and video:first_recv on both desktop
and android paths so we can verify the pipeline end-to-end.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Voice regression: EncryptingTransport encrypts media with the pairwise
client↔relay session key, but the relay forwards bytes without re-encrypting
per recipient. Sender's key_A ≠ recipient's key_B → recipient cannot decrypt
→ silent audio between mac and android. Drop the wrapper; restore plaintext-
over-QUIC-TLS to the relay. Proper E2E needs MLS group keys or relay hop-by-
hop re-encryption (future PRD).
Android camera: add CAMERA manifest permission + runtime request via
MainActivity. NOTE: still not sufficient — Tauri/Wry's WebChromeClient does
not grant getUserMedia, so video on Android needs a Tauri plugin override
or native Camera2 path. Documented in MainActivity.kt.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blockers 4 & 5: browser getUserMedia → JPEG IPC → Rust I420 pipeline;
remote video strip renders decoded frames via canvas; EncryptingTransport
wraps QuinnTransport so WZP AEAD is applied to all media (C2 fix).
Test fixes: HandshakeResult.session destructuring across relay/client/crypto
integration tests; video_codecs field added to all CallOffer/CallAnswer
structs; wzp-video pipeline_roundtrip integration tests added.
PRD docs: five Kimi-ready specs for E2E encryption, Android NDK 0.9 migration,
quality upgrade flow, wire-format hardening, and clippy debt.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass AppHandle into run_signal_task so it can emit call-debug events
and Tauri events directly. On each RoomUpdate:
- emit connect:media:room_update debug event with participant list
- emit call-event/participants Tauri event for JS-side diagnostics
Helps diagnose whether room join and participant sync is working
independently of audio startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
spawn_blocking uses arbitrary thread-pool threads that don't have the
Android JNI context initialized, causing ndk_context::android_context()
to panic. Switch to run_on_main_thread (where the context is always
valid) via a oneshot channel, with a 2s timeout. Panic is caught and
forwarded as an Err so the debug log captures it rather than crashing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The JNI call into AudioManager.setMode() was running directly on the
tokio async thread. If the Android audio policy service is slow (e.g.
immediately after mic permission grant), this could block the runtime.
Moved to spawn_blocking with a 2s timeout; timeout and panic cases are
logged as connect:audio_mode_timeout / connect:audio_mode_panic debug
events and treated as non-fatal (we continue to audio_start).
Also removes the has_record_audio_permission call from the preflight
debug event — it was a redundant JNI round-trip that added latency and
is now captured separately in the preflight_start event context.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The legacy event_cb("connected") call between handshake and audio
preflight was a no-op on the frontend (it enters voice only after the
command resolves) but added noise to failing traces. Replaced with a
connect:connected_event_skipped debug event and added an explicit
connect:android_audio_preflight_start marker so the debug log shows a
clear boundary between handshake completion and audio startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- engine.rs: wrap spawn_blocking(audio_start) in an 8s tokio timeout so
the connect command fails fast with a clear error if the Oboe HAL
never returns, instead of blocking the JS 45s timer
- lib.rs: emit_call_debug now always forwards connect: and
register_signal: steps to the JS overlay regardless of the debug-logs
toggle — needed because app-data clears reset the toggle to false,
making join failures invisible on first install
- main.ts: JS timeout bumped to 45s (Rust 8s fires first); timeout
message now includes last native connect: step so the toast is
actionable without opening the debug log
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add emit_call_debug events at every step of the Android connect/audio
path so failures are visible in the Settings debug log without needing
adb logcat:
- connect:handshake_start/done/failed (with timing)
- connect:android_audio_preflight (wzp_native loaded + RECORD_AUDIO
permission check via new has_record_audio_permission() JNI helper)
- connect:audio_stop_start/done
- connect:audio_mode_start/done/failed
- connect:audio_start_start/failed/panic/done (with oboe error code)
- connect:reuse_endpoint (endpoint reuse diagnostic)
Also adds has_record_audio_permission() to android_audio.rs — used in
the preflight event to confirm the OS has granted mic access before
wzp_oboe_start is called.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
wzp_oboe_start is a sync FFI call that can block the OS thread
indefinitely waiting on the Android audio HAL. Calling it directly
from an async context freezes all tokio tasks including Rust-side
timeouts. Fix: run it via spawn_blocking so tokio stays responsive.
Also add a 15s Promise.race timeout in JS so a frozen audio_start
surfaces as "connect timed out — check audio permissions" instead of
the join button staying stuck in "Connecting…" forever.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Buttons: use text labels (Mic/Spk/End) instead of emoji HTML
entities that rendered as raw text on Android WebView
- Stats: match Rust CallStatus fields (tx_codec, rx_codec,
encode_fps, recv_fps, audio_level, spk_muted)
- Nicknames: register_signal sends derive_alias() as the alias
so other users see "Brave Falcon" instead of "a525:e9b2:..."
- Lobby header shows alias from get_app_info instead of raw fp
- pollStatus uses correct field names from Rust struct
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New signal infrastructure for the lobby-first UI:
- PresenceUser struct: { fingerprint, alias }
- SignalMessage::PresenceList: relay broadcasts full user list
to all signal clients on every register/deregister
- SignalHub::presence_list(): builds the list from connected clients
- SignalHub::broadcast(): sends to ALL signal clients
- Relay calls broadcast on register + unregister
- Desktop emits "presence_list" signal-event to JS frontend
This gives clients real-time visibility of who's online via the
signal channel, without needing to join a voice room first.
603 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New SignalMessage variants for P2P quality coordination:
UpgradeProposal/UpgradeResponse/UpgradeConfirm (#28):
- Consensual quality upgrade flow — proposer sends desired profile,
peer accepts/rejects based on own conditions, confirm commits both
- All carry call_id for relay routing
QualityCapability (#30):
- Peer reports its max sustainable profile — enables asymmetric
encoding where each side uses its own best quality instead of
forcing everyone to the weakest link
Relay forwards all 4 signals to the call peer (same pattern as
MediaPathReport, CandidateUpdate, HardNatProbe).
Desktop signal recv loop handles all 4 with debug logging.
Encoder switching TODOs noted for wiring into CallEngine.
4 new serde roundtrip tests. 603 total, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New setting: "Birthday attack (opens extra ports for hard NAT)"
- Default: OFF — no extra latency on call setup
- When ON: waits up to 3s for peer's birthday ports if peer has
non-cone NAT, adds them to the dial race
Gated end-to-end: Settings → localStorage → JS invoke →
Rust connect param → birthday wait + target injection.
LAN/cone calls unaffected regardless of setting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete Dialer-side birthday attack integration:
- SignalState stores peer_birthday_ports from HardNatBirthdayStart
- connect command: if peer's HardNatProbe shows non-cone NAT, waits
up to 3s for birthday ports to arrive (Acceptor needs time to open
32 sockets + STUN-probe each)
- When birthday ports arrive, generate_dialer_targets() builds hit
list (known ports + random fill) and adds them to PeerCandidates
- All birthday targets go into the dual-path race as extra candidates
- LAN/cone calls skip the wait entirely (gated on allocation type)
Full waterfall now:
1. Standard candidates (reflexive + mapped) → immediate
2. Port prediction (sequential delta) → immediate
3. Birthday targets (if non-cone peer) → +3s wait
4. All of above raced in parallel via JoinSet
5. Relay runs concurrently with 500ms head-start
599 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Birthday attack for random symmetric NATs:
- birthday.rs: open_acceptor_ports() opens N sockets, STUN-probes
each to learn external ports. generate_dialer_targets() builds
hit list (known ports first, then random fill). spray_dialer()
sprays QUIC connects with rate limiting, first success wins.
- Default: 32 acceptor ports, 128 dialer probes, 20ms interval
Signal coordination:
- HardNatBirthdayStart { acceptor_ports, external_ip } sent by
Acceptor when peer's HardNatProbe shows random/sequential NAT
- Relay forwards it like other call signals
- Desktop recv loop handles and logs it
Hybrid waterfall integration:
- On receiving HardNatProbe with non-cone allocation, Acceptor
auto-opens birthday ports and sends BirthdayStart
- Sockets kept alive 10s for NAT mapping persistence
- Dialer spray integration into race() pending (needs transport
hot-swap for background upgrade)
6 new tests, 599 total, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New toggle in Settings → "Direct-only mode (no relay fallback)":
- Default: OFF (normal behavior, relay fallback on P2P failure)
- When ON: connect returns error if P2P fails, with full
candidate_diags in the debug log showing why each candidate
failed. Call never falls back to relay.
Useful for testing NAT traversal — you see the exact failure
reason instead of the call silently working through relay.
Wired end-to-end:
- Settings.directOnly persisted in localStorage
- Passed as directOnly param to Rust connect command
- connect:path_negotiated shows direct_only flag
- connect:direct_only_failed emits on failure with diags
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes from real-world 5G↔Starlink testing:
NAT tickle fix:
- tokio::net::UdpSocket::bind() doesn't set SO_REUSEADDR, so binding
to the same port as quinn silently failed. Now uses socket2::Socket
with explicit SO_REUSEADDR + SO_REUSEPORT (via libc on unix).
- Tickle now logs success/failure for debugging.
Diagnostic fixes:
- connect:dual_path_race_start shows both dial_order_raw and
dial_order_smart so we can see what filtering removed
- Grace-period timeout (relay wins first, direct still running)
now fills "timeout:grace" diags for unrecorded candidates
- Previously candidate_diags was empty when relay won the race
Dependencies:
- Added socket2 = "0.5" to wzp-client
593 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major P2P improvements for cross-network calls:
Smart candidate filtering (smart_dial_order):
- Strip LAN candidates when peer's public IP differs from ours
(172.16.x.x is unreachable from a different network)
- Strip all IPv6 candidates (Phase 7 disabled, wastes dial slots)
- Only keep mapped + reflexive for cross-network calls
- LAN candidates preserved when both peers share the same public IP
Acceptor NAT tickle:
- A-role sends a 1-byte UDP packet to each peer candidate BEFORE
accepting. This opens the NAT pinhole for return traffic from
the Dialer's IP — critical for address-restricted NATs that only
allow inbound from IPs they've seen outbound traffic to.
- Uses SO_REUSEADDR on the same port as the quinn endpoint.
Direct timeout increased from 2s to 4s:
- Cross-network QUIC handshakes through CGNAT can take 2-3s
- 2s was too aggressive for 5G/LTE networks
Diagnostic fix:
- Record "timeout:4s" for candidates still in-flight when the
timeout fires (previously these had no diagnostic entry)
5 new tests for smart_dial_order edge cases.
593 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added CandidateDiag struct to RaceResult with per-candidate:
- address attempted
- result (ok / skipped:ipv6 / error:reason)
- elapsed time in ms
Surfaced in call-debug events:
- connect:dual_path_race_start now includes dial_order + peer_mapped
- connect:dual_path_race_done now includes candidate_diags array
Upgraded dual_path tracing from debug to info for IPv6 skips and
dial failures so they appear in logcat/console.
Helps diagnose why P2P fails on specific networks (5G CGNAT,
address-restricted NATs, etc).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
End-to-end integration of sequential port prediction:
- place_call: spawns background detect_port_allocation() + sends
HardNatProbe signal after offer (doesn't delay call setup)
- answer_call: same for AcceptTrusted answers (privacy mode skips)
- Signal recv loop: stashes HardNatProbe in SignalState.peer_hard_nat_probe
- connect: reads peer's probe, if Sequential{delta} runs predict_ports()
and adds predicted addrs to PeerCandidates.local for the dual-path race
- parse_sequential_delta() helper for "sequential(delta=N)" strings
The full flow: both peers independently detect their NAT's port
allocation, exchange HardNatProbe via relay, and the connect command
uses the peer's sequence to predict which ports to dial — all before
the dual-path race starts.
588 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Edition 2024 on local macOS auto-resolves the Emitter trait, but the
Docker builder's Rust/Tauri version requires the explicit import for
AppHandle::emit() to resolve. Keeps the warning locally to avoid
breaking the CI build.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P2P calls now adapt codec quality based on observed network conditions,
matching what relay calls already had.
Three-layer implementation:
- QualityReport::from_path_stats(): construct reports from local quinn
stats (loss%, RTT, jitter) without needing relay-generated reports
- CallEncoder.pending_quality_report: one-shot attachment to next
source packet (consumed on encode, not repeated)
- Engine send tasks: generate quality report every 50 frames (~1s)
from quinn_path_stats() and attach via set_pending_quality_report()
- Engine recv tasks: self-observe from own QUIC path stats every 50
packets, feed to AdaptiveQualityController for P2P adaptation
(works even if peer isn't sending quality reports yet)
Both relay and P2P calls now have adaptive quality. On relay calls,
both peer-sent reports AND local observations feed the controller.
Hysteresis (3 consecutive bad reports to downgrade) prevents thrashing.
372 tests passing (+4 new: from_path_stats encoding, clamping, zero
values, encoder quality report attachment).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Partial reads from the capture ring consumed samples that were then
discarded when the send loop retried from buf[0]. For 20ms codecs this
was invisible (single Oboe burst fills 960 samples in one read), but
40ms codecs (Opus6k, 1920 samples) needed 2 bursts — the first partial
read consumed 960 real samples and threw them away.
Result: Opus6k produced ~11 frames/s instead of 25 (~44% of expected).
Fix: expose wzp_native_audio_capture_available() and check it before
reading, matching the desktop capture_ring.available() pattern. Partial
reads no longer occur because we only read when enough samples exist.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
frame_samples was immutable — when adaptive quality switched from 20ms
(Opus24k, 960 samples) to 40ms (Opus6k, 1920 samples), the send loop
kept reading 960 samples and feeding half-sized frames to the encoder.
This caused Opus6k to produce ~11 frames/s instead of 25, making audio
choppy.
Fix:
- frame_samples is now mut and updated on profile switch
- buf sized for max frame (1920) with frame_samples-bounded slices
- RMS, mute, encode, and capture reads all use &buf[..frame_samples]
- Applied to both Android and desktop send tasks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good +
Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5,
studio:10)
- Handle QualityDirective signals in both desktop and Android engines
— relay-coordinated codec switching now works end-to-end
- Add periodic TAP STATS to debug tap: packets in/out, fan-out avg,
seq gaps, codecs seen (every 5s)
- Mark task #2 done (ParticipantInfo in federation signals already
implemented)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wire AdaptiveQualityController into desktop engine send/recv tasks
(mirrors Android pattern: AtomicU8 pending_profile, auto-mode check)
- Wire same into Android engine send task (was only in recv before)
- QualityDirective SignalMessage variant for relay-initiated codec switch
- ParticipantQuality tracking in relay RoomManager (per-participant
AdaptiveQualityController, weakest-link tier computation)
- Relay broadcasts QualityDirective to all participants when room-wide
tier degrades (coordinated codec switching)
- Oboe stream state polling: poll getState() for up to 2s after
requestStart() to ensure both streams reach Started before proceeding
(fixes intermittent silent calls on cold start, Nothing Phone A059)
Tasks: #7, #25, #26, #31, #35
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>