Full analysis of relay lock contention with precise inventory of every
lock acquisition in the hot path. Evaluates 4 design options:
A) Per-room Arc<Mutex<Room>> (recommended — 100x improvement for multi-room)
B) DashMap (good but less explicit)
C) Channel-based fan-out (over-engineered for current scale)
D) Snapshot-on-change via arc-swap (best perf, more complex)
Phase 1: per-room locks, Phase 2: federation lock fix, Phase 3: quality
tracking out of critical path. Estimated 1.5-2.5 days total.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Partial reads from the capture ring consumed samples that were then
discarded when the send loop retried from buf[0]. For 20ms codecs this
was invisible (single Oboe burst fills 960 samples in one read), but
40ms codecs (Opus6k, 1920 samples) needed 2 bursts — the first partial
read consumed 960 real samples and threw them away.
Result: Opus6k produced ~11 frames/s instead of 25 (~44% of expected).
Fix: expose wzp_native_audio_capture_available() and check it before
reading, matching the desktop capture_ring.available() pattern. Partial
reads no longer occur because we only read when enough samples exist.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
frame_samples was immutable — when adaptive quality switched from 20ms
(Opus24k, 960 samples) to 40ms (Opus6k, 1920 samples), the send loop
kept reading 960 samples and feeding half-sized frames to the encoder.
This caused Opus6k to produce ~11 frames/s instead of 25, making audio
choppy.
Fix:
- frame_samples is now mut and updated on profile switch
- buf sized for max frame (1920) with frame_samples-bounded slices
- RMS, mute, encode, and capture reads all use &buf[..frame_samples]
- Applied to both Android and desktop send tasks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keystores are gitignored so git reset --hard deletes them. The build
script now copies them from a persistent $BASE_DIR/data/keystore/ cache
into the source tree before building. This ensures both primary and alt
servers always have signing keys available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good +
Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5,
studio:10)
- Handle QualityDirective signals in both desktop and Android engines
— relay-coordinated codec switching now works end-to-end
- Add periodic TAP STATS to debug tap: packets in/out, fan-out avg,
seq gaps, codecs seen (every 5s)
- Mark task #2 done (ParticipantInfo in federation signals already
implemented)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build.sh was producing unsigned APKs because it reimplemented the Docker
build inline without the signing step from build-tauri-android.sh. Now
uses the same pipeline: find keystore (release preferred, debug fallback),
zipalign -f 4, apksigner sign with keystore credentials.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
find was picking up a cached 384MB debug APK over the fresh 25MB release
APK because the old file was listed first. Now:
1. Delete all APKs before the build starts (clean slate)
2. On upload, prefer *release*.apk over any other match
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wire AdaptiveQualityController into desktop engine send/recv tasks
(mirrors Android pattern: AtomicU8 pending_profile, auto-mode check)
- Wire same into Android engine send task (was only in recv before)
- QualityDirective SignalMessage variant for relay-initiated codec switch
- ParticipantQuality tracking in relay RoomManager (per-participant
AdaptiveQualityController, weakest-link tier computation)
- Relay broadcasts QualityDirective to all participants when room-wide
tier degrades (coordinated codec switching)
- Oboe stream state polling: poll getState() for up to 2s after
requestStart() to ensure both streams reach Started before proceeding
(fixes intermittent silent calls on cold start, Nothing Phone A059)
Tasks: #7, #25, #26, #31, #35
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without clearCommunicationDevice(), the BT headset stays locked in SCO
mode after the call. Media playback (video, music) can't route to BT
A2DP, requiring a device reboot to restore normal audio.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reflects the current reality: setCommunicationDevice API 31+, deferred
MODE_IN_COMMUNICATION, BT-mode Oboe (bt_active flag), per-arch builds,
Hangup call_id fix, and network monitoring integration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: MainActivity set MODE_IN_COMMUNICATION at app launch,
hijacking system audio routing immediately — BT A2DP music dropped to
earpiece, and the pre-existing communication mode confused subsequent
setCommunicationDevice calls for BT SCO.
Fix: MainActivity now only sets volumes. MODE_IN_COMMUNICATION is set
via JNI right before Oboe audio_start() in CallEngine, and MODE_NORMAL
is restored after audio_stop() when the call ends.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: Oboe capture at 48kHz with InputPreset::VoiceCommunication
cannot open against a BT SCO device (only supports 8/16kHz). The stream
silently falls back to builtin mic, delivering zeros.
Fix: add bt_active flag to WzpOboeConfig. When set, capture skips
setSampleRate and setInputPreset, letting the system route to BT SCO
at its native rate. Oboe's SampleRateConversionQuality::Best resamples
to 48kHz for our ring buffers. Playout uses Usage::Media in BT mode.
New API: wzp_native_audio_start_bt() for BT mode, called from
set_bluetooth_sco(on=true). Normal audio_start() restores the
standard config when switching back to earpiece/speaker.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes for BT audio silence:
1. Switch Oboe streams from Exclusive to Shared sharing mode. Exclusive
mode bypasses Oboe's internal resampler, so opening a 48kHz stream
against a BT SCO device (8/16kHz only) fails at the AudioPolicy
level. Shared mode lets Oboe's resampler bridge the gap.
2. Add 500ms post-SCO delay before Oboe restart. The audio policy needs
time to apply the bt-sco route after setCommunicationDevice returns.
Without the delay, Oboe opens against the old device (handset).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BT SCO devices only support 8kHz or 16kHz but our Oboe streams request
48kHz. Without resampling, AudioPolicyManager rejects the input stream
("getInputProfile could not find profile for... sampling rate 48000").
Fix: add setSampleRateConversionQuality(Best) to both capture and
playout stream builders. Oboe resamples internally so our ring buffers
stay at 48kHz regardless of the hardware sample rate.
Also removes the broken setBluetoothScoOn/isBluetoothScoOn calls from
stop_bluetooth_sco — just call stopBluetoothSco() unconditionally.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: setBluetoothScoOn(true) is silently rejected on Android 12+
for non-system apps ("is greater than FIRST_APPLICATION_UID exiting").
Audio policy routed to handset instead of BT despite SCO link being up.
Fix: use the modern setCommunicationDevice(AudioDeviceInfo) API on
API 31+ which properly routes voice audio to the BT device. Falls back
to deprecated startBluetoothSco() on older APIs.
Also uses getCommunicationDevice() for is_bluetooth_sco_on() and
clearCommunicationDevice() for stop, matching the modern API surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes for Bluetooth audio not working:
1. is_bluetooth_available() now checks for TYPE_BLUETOOTH_A2DP (8) in
addition to TYPE_BLUETOOTH_SCO (7) — many headsets only register as
A2DP until SCO is explicitly started.
2. set_bluetooth_sco(on=true) polls isBluetoothScoOn() for up to 3s
before restarting Oboe. startBluetoothSco() is async — the SCO link
takes 500ms-2s to establish. Without waiting, Oboe opens against
earpiece and audio goes nowhere.
3. Frontend skips redundant set_speakerphone(false) when transitioning
to BT — start_bluetooth_sco() handles speaker-off internally,
avoiding a double Oboe restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: Hangup had no call_id field. The relay forwarded hangups to
ALL active calls for a user. When user A hung up call 1 and user B
immediately placed call 2, the relay's processing of A's hangup would
also kill call 2 (race window ~1-2s).
Fix: add optional call_id to Hangup (backwards-compatible via serde
skip_serializing_if). When present, the relay only ends the named call.
Old clients send call_id=None and get the legacy broadcast behavior.
Also: clear pending_path_report in Hangup recv handler and
internal_deregister to prevent stale oneshot channels from blocking
subsequent call setups.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Release builds from cargo-tauri are unsigned. After Gradle produces the
APK, zipalign + apksigner now sign it with the release keystore
(android/keystore/wzp-release.jks). Falls back to debug keystore if
release is missing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bluetooth: wire existing AudioRouteManager SCO support through both app
variants. Replace binary speaker toggle with 3-way route cycling
(Earpiece → Speaker → Bluetooth). Tauri side adds JNI bridge functions
(start/stop/query SCO, device availability) and Oboe stream restart.
Network awareness: integrate Android ConnectivityManager to detect
WiFi/cellular transitions and feed them to AdaptiveQualityController
via lock-free AtomicU8 signaling. Enables proactive quality downgrade
and FEC boost on network handoffs.
Build: add --arch flag to build-tauri-android.sh supporting arm64,
armv7, or all (separate per-arch APKs for smaller tester binaries).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PRD 4: Disable IPv6 direct dial/accept temporarily. IPv6 QUIC
handshakes succeed but connections die immediately on datagram
send ("connection lost"). IPv4 candidates work reliably. IPv6
candidates still gathered but filtered at dial time.
PRD 1: Close losing transport after Phase 6 negotiation. The
non-selected transport now gets an explicit QUIC close frame
instead of silently dropping after 30s idle timeout. Prevents
phantom connections from polluting future accept() calls.
PRD 2: Harden accept loop with max 3 stale retries. Stale
connections are explicitly closed (conn.close) and counted.
After 3 stale connections, the accept loop aborts instead of
spinning until the race timeout.
PRD 3: Resource cleanup — close old IPv6 endpoint before
creating a new one in place_call/answer_call. Add Drop impl
to CallEngine so tasks are signalled to stop on ungraceful
shutdown.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The originating relay (where the caller is) never set peer_relay_fp
because the call was created locally. When the callee's answer
arrived via federation, the cross-relay dispatcher handled it but
didn't mark the call as cross-relay. This meant the caller's
MediaPathReport was delivered via local hub.send_to() to a peer
fingerprint that isn't connected locally — silently dropped.
Fix: in the cross-relay answer dispatcher, call
reg.set_peer_relay_fp(call_id, Some(origin_relay_fp)) so the
originating relay knows to forward MediaPathReport via federation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add relay_build field to RegisterPresenceAck so the client logs
which relay version it connected to. Shows in the debug log as
register_signal:ack_received {"relay_build":"f843a93"}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MediaPathReport was only delivered via local signal_hub, so calls
between peers on different relays always hit peer_report_timeout
and fell back to relay — even when direct P2P worked perfectly.
Fix: check peer_relay_fp in call_registry (same pattern as
DirectCallAnswer). If the peer is on a remote relay, wrap in
FederatedSignalForward and send via federation link. Also fix
the cross-relay dispatcher to deliver to BOTH caller and callee
(not just caller), since the report can come from either side.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When peers are on different relays, MediaPathReport can't be
forwarded — causing a 3s timeout and false relay fallback even
though direct P2P works perfectly.
Fix: on timeout, if local_direct_ok is true AND the direct
transport's connection is still alive (no close_reason), trust
the direct path instead of falling back to relay. The timeout
indicates a relay forwarding issue, not a direct path failure.
Also fix ALT build paste URL (paste.tbs.manko.yoga not amn.gg).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Acceptor's accept() on the shared signal endpoint can dequeue
a stale QUIC connection from a previous call that the Dialer has
already dropped. This results in "connection lost" errors when
media datagrams are sent — 100% drops on both sides.
Fix: after accepting a connection, check close_reason(). If the
connection is already closed, log a warning and re-accept. Also
verify max_datagram_size() is available before returning.
Additionally: emit transport details (remote addr, max_datagram,
close_reason) in the call_engine_starting debug event so stale
connection issues are visible in the user-facing debug log.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When direct P2P calls show 100% datagram drops, we need to know
WHY send_media() fails. This commit adds:
- Remote address + stable_id logging on A-role accept and D-role
dial success (dual_path.rs) — tells us which candidate won
- Remote address + max_datagram_size on engine transport init —
verifies datagrams are negotiated
- last_send_err in send heartbeat — captures the actual error
from send_datagram() failures
- QuinnTransport::remote_address() helper
Also fixes UI badge: was looking for wrong event name
("dual_path_race_won" → "path_negotiated").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The UI looked for event "connect:dual_path_race_won" which doesn't
exist — the actual event is "connect:path_negotiated" with a
use_direct boolean. Badge always showed "Via Relay" even when the
call was direct P2P.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CLI binary was missing the new caller_build_version and
callee_build_version fields, causing E0063 compile errors on
Linux relay/client builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The peer's MediaPathReport can arrive while our dual_path::race is
still running. Previously, the oneshot was created AFTER the race
completed, so the recv loop had nowhere to deliver the report —
it was silently dropped, causing a 3s timeout and false relay
fallback on ~50% of calls.
Fix: create the oneshot and install it in SignalState BEFORE
starting the race. The oneshot::Receiver buffers the value so the
connect command can read it immediately after the race finishes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add caller_build_version / callee_build_version (git short hash)
to DirectCallOffer and DirectCallAnswer so peers can identify each
other's build in debug logs. Also log own build at register time.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CallSetup enum gained peer_direct_addr and peer_local_addrs
in Phase 5.5 but the wzp-android signal recv match arm was never
updated, breaking cargo ndk builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>