docs: relay concurrency model, Opus6k fix, build script fixes

- ARCHITECTURE.md: new "Relay Concurrency Model" section documenting threading, shared state locking table, scaling characteristics, and the RoomManager Mutex as primary bottleneck - PROGRESS.md: Opus6k frame starvation fix, build script fixes - PRD-dred-integration.md: Opus6k frame starvation bug documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:54:37 +04:00
parent 9ae9441de4
commit f265fd772d
3 changed files with 56 additions and 0 deletions
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -473,6 +473,34 @@ sequenceDiagram
    R->>R: Remove from room, broadcast RoomUpdate
 ```
 ## Relay Concurrency Model
 ### Threading
 - Multi-threaded Tokio runtime (all available cores, work-stealing scheduler)
 - Task-per-connection: each QUIC connection gets a dedicated `tokio::spawn`
 - Task-per-participant-per-room: each participant's media forwarding loop is independent
 ### Shared State & Locking
 | Lock | Protected Data | Hold Duration | Contention |
 |------|---------------|---------------|------------|
 | `RoomManager` (Mutex) | Rooms, participants, quality tiers | ~1ms/packet | O(N) per room |
 | `PresenceRegistry` (Mutex) | Fingerprint registrations | ~1ms | Low (join/leave only) |
 | `SessionManager` (Mutex) | Active session tracking | ~1ms | Low |
 | `FederationManager.peer_links` (Mutex) | Peer connections | ~10ms during forward | Per-federation-packet |
 ### Scaling Characteristics
 - **Many small rooms**: Scales well across all cores (rooms are independent)
 - **Large single room (100+ participants)**: Serialized by RoomManager lock
 - **Federation**: Per-peer tasks scale; `peer_links` lock held during send loop
 ### Primary Bottleneck
 The RoomManager Mutex is acquired per-packet by every participant to get the fan-out peer list. Lock is released before I/O (sends happen outside lock), but packet processing is serialized through the lock within a room.
 Future optimization: per-room locks or lock-free participant lists via `DashMap`.
 ## Client Architecture
 ### Desktop Engine (Tauri)
--- a/docs/PRD-dred-integration.md
+++ b/docs/PRD-dred-integration.md
@@ -386,3 +386,17 @@ When instantaneous jitter exceeds the EWMA × 1.3 (asymmetric: fast-up α=0.3, s
 - 10 unit tests for tuner math (baseline, scaling, spike, cooldown, codec switch, Codec2 no-op)
 - 4 integration tests (encoder adjustment, spike boost, Codec2 no-op, profile switch with encode verification)
 ### Opus6k Frame Starvation Bug (Fixed 2026-04-13)
 During testing of the extended 1040ms DRED window on Opus6k, the 40ms codec produced only ~11 frames/s instead of 25 — making audio choppy regardless of DRED quality.
 **Root cause:** The Android capture ring read loop did partial reads that consumed samples from the ring but discarded them when retrying:
 1. Ring has 960 samples (one Oboe burst)
 2. `audio_read_capture(&mut buf[..1920])` reads 960 into `buf[0..960]`, returns 960
 3. Loop sees 960 < 1920, sleeps, retries from `buf[0..]` → overwrites the consumed samples
 4. ~50% of captured audio thrown away per frame
 **Fix:** Added `wzp_native_audio_capture_available()` to check ring fill level before reading (same pattern as the desktop CPAL path's `capture_ring.available()`). Also made `frame_samples` mutable so codec switches update the read size.
 **Affected codecs:** Only 40ms frame codecs (Opus6k, Codec2_1200). 20ms codecs (Opus24k, etc.) were unaffected because a single Oboe burst fills the entire request.
--- a/docs/PROGRESS.md
+++ b/docs/PROGRESS.md
@@ -290,3 +290,17 @@ Run with `wzp-bench --all`. Representative results (Apple M-series, single core)
 - Logs initial state, poll count, and final state for HAL debugging
 - Does NOT fail on timeout — Rust-side stall detector remains as safety net
 - Targets Nothing Phone A059 intermittent silent calls on cold start
 ### Opus6k Frame Starvation Fix (2026-04-13)
 - Root cause: partial reads from capture ring consumed samples that were discarded on retry
 - `audio_read_capture(&mut buf[..1920])` with only 960 available → read 960, loop retried from buf[0], overwriting
 - Added `wzp_native_audio_capture_available()` — check before reading (matches desktop pattern)
 - `frame_samples` made mutable and updated on adaptive profile switch
 - `buf` sized to max frame (1920) with `[..frame_samples]` slices throughout
 - Result: Opus6k frame rate restored from ~11/s to expected 25/s
 ### Build Script Fixes (2026-04-13)
 - Stale APK cleanup: delete all APKs before build, prefer `*release*.apk` on upload
 - APK signing: added zipalign + apksigner pipeline to `build.sh` (was in `build-tauri-android.sh` only)
 - Keystore persistence: `$BASE_DIR/data/keystore/` cache synced into source tree before build
 - Fixes: 384MB debug APK uploaded instead of 25MB release; unsigned APK on alt server