diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index d104db5..1cb8562 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -473,6 +473,34 @@ sequenceDiagram R->>R: Remove from room, broadcast RoomUpdate ``` +## Relay Concurrency Model + +### Threading +- Multi-threaded Tokio runtime (all available cores, work-stealing scheduler) +- Task-per-connection: each QUIC connection gets a dedicated `tokio::spawn` +- Task-per-participant-per-room: each participant's media forwarding loop is independent + +### Shared State & Locking + +| Lock | Protected Data | Hold Duration | Contention | +|------|---------------|---------------|------------| +| `RoomManager` (Mutex) | Rooms, participants, quality tiers | ~1ms/packet | O(N) per room | +| `PresenceRegistry` (Mutex) | Fingerprint registrations | ~1ms | Low (join/leave only) | +| `SessionManager` (Mutex) | Active session tracking | ~1ms | Low | +| `FederationManager.peer_links` (Mutex) | Peer connections | ~10ms during forward | Per-federation-packet | + +### Scaling Characteristics + +- **Many small rooms**: Scales well across all cores (rooms are independent) +- **Large single room (100+ participants)**: Serialized by RoomManager lock +- **Federation**: Per-peer tasks scale; `peer_links` lock held during send loop + +### Primary Bottleneck + +The RoomManager Mutex is acquired per-packet by every participant to get the fan-out peer list. Lock is released before I/O (sends happen outside lock), but packet processing is serialized through the lock within a room. + +Future optimization: per-room locks or lock-free participant lists via `DashMap`. + ## Client Architecture ### Desktop Engine (Tauri) diff --git a/docs/PRD-dred-integration.md b/docs/PRD-dred-integration.md index bf35df1..f776184 100644 --- a/docs/PRD-dred-integration.md +++ b/docs/PRD-dred-integration.md @@ -386,3 +386,17 @@ When instantaneous jitter exceeds the EWMA × 1.3 (asymmetric: fast-up α=0.3, s - 10 unit tests for tuner math (baseline, scaling, spike, cooldown, codec switch, Codec2 no-op) - 4 integration tests (encoder adjustment, spike boost, Codec2 no-op, profile switch with encode verification) + +### Opus6k Frame Starvation Bug (Fixed 2026-04-13) + +During testing of the extended 1040ms DRED window on Opus6k, the 40ms codec produced only ~11 frames/s instead of 25 — making audio choppy regardless of DRED quality. + +**Root cause:** The Android capture ring read loop did partial reads that consumed samples from the ring but discarded them when retrying: +1. Ring has 960 samples (one Oboe burst) +2. `audio_read_capture(&mut buf[..1920])` reads 960 into `buf[0..960]`, returns 960 +3. Loop sees 960 < 1920, sleeps, retries from `buf[0..]` → overwrites the consumed samples +4. ~50% of captured audio thrown away per frame + +**Fix:** Added `wzp_native_audio_capture_available()` to check ring fill level before reading (same pattern as the desktop CPAL path's `capture_ring.available()`). Also made `frame_samples` mutable so codec switches update the read size. + +**Affected codecs:** Only 40ms frame codecs (Opus6k, Codec2_1200). 20ms codecs (Opus24k, etc.) were unaffected because a single Oboe burst fills the entire request. diff --git a/docs/PROGRESS.md b/docs/PROGRESS.md index 52c143f..921ab6a 100644 --- a/docs/PROGRESS.md +++ b/docs/PROGRESS.md @@ -290,3 +290,17 @@ Run with `wzp-bench --all`. Representative results (Apple M-series, single core) - Logs initial state, poll count, and final state for HAL debugging - Does NOT fail on timeout — Rust-side stall detector remains as safety net - Targets Nothing Phone A059 intermittent silent calls on cold start + +### Opus6k Frame Starvation Fix (2026-04-13) +- Root cause: partial reads from capture ring consumed samples that were discarded on retry +- `audio_read_capture(&mut buf[..1920])` with only 960 available → read 960, loop retried from buf[0], overwriting +- Added `wzp_native_audio_capture_available()` — check before reading (matches desktop pattern) +- `frame_samples` made mutable and updated on adaptive profile switch +- `buf` sized to max frame (1920) with `[..frame_samples]` slices throughout +- Result: Opus6k frame rate restored from ~11/s to expected 25/s + +### Build Script Fixes (2026-04-13) +- Stale APK cleanup: delete all APKs before build, prefer `*release*.apk` on upload +- APK signing: added zipalign + apksigner pipeline to `build.sh` (was in `build-tauri-android.sh` only) +- Keystore persistence: `$BASE_DIR/data/keystore/` cache synced into source tree before build +- Fixes: 384MB debug APK uploaded instead of 25MB release; unsigned APK on alt server