docs: relay concurrency model, Opus6k fix, build script fixes
- ARCHITECTURE.md: new "Relay Concurrency Model" section documenting threading, shared state locking table, scaling characteristics, and the RoomManager Mutex as primary bottleneck - PROGRESS.md: Opus6k frame starvation fix, build script fixes - PRD-dred-integration.md: Opus6k frame starvation bug documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -473,6 +473,34 @@ sequenceDiagram
|
||||
R->>R: Remove from room, broadcast RoomUpdate
|
||||
```
|
||||
|
||||
## Relay Concurrency Model
|
||||
|
||||
### Threading
|
||||
- Multi-threaded Tokio runtime (all available cores, work-stealing scheduler)
|
||||
- Task-per-connection: each QUIC connection gets a dedicated `tokio::spawn`
|
||||
- Task-per-participant-per-room: each participant's media forwarding loop is independent
|
||||
|
||||
### Shared State & Locking
|
||||
|
||||
| Lock | Protected Data | Hold Duration | Contention |
|
||||
|------|---------------|---------------|------------|
|
||||
| `RoomManager` (Mutex) | Rooms, participants, quality tiers | ~1ms/packet | O(N) per room |
|
||||
| `PresenceRegistry` (Mutex) | Fingerprint registrations | ~1ms | Low (join/leave only) |
|
||||
| `SessionManager` (Mutex) | Active session tracking | ~1ms | Low |
|
||||
| `FederationManager.peer_links` (Mutex) | Peer connections | ~10ms during forward | Per-federation-packet |
|
||||
|
||||
### Scaling Characteristics
|
||||
|
||||
- **Many small rooms**: Scales well across all cores (rooms are independent)
|
||||
- **Large single room (100+ participants)**: Serialized by RoomManager lock
|
||||
- **Federation**: Per-peer tasks scale; `peer_links` lock held during send loop
|
||||
|
||||
### Primary Bottleneck
|
||||
|
||||
The RoomManager Mutex is acquired per-packet by every participant to get the fan-out peer list. Lock is released before I/O (sends happen outside lock), but packet processing is serialized through the lock within a room.
|
||||
|
||||
Future optimization: per-room locks or lock-free participant lists via `DashMap`.
|
||||
|
||||
## Client Architecture
|
||||
|
||||
### Desktop Engine (Tauri)
|
||||
|
||||
@@ -386,3 +386,17 @@ When instantaneous jitter exceeds the EWMA × 1.3 (asymmetric: fast-up α=0.3, s
|
||||
|
||||
- 10 unit tests for tuner math (baseline, scaling, spike, cooldown, codec switch, Codec2 no-op)
|
||||
- 4 integration tests (encoder adjustment, spike boost, Codec2 no-op, profile switch with encode verification)
|
||||
|
||||
### Opus6k Frame Starvation Bug (Fixed 2026-04-13)
|
||||
|
||||
During testing of the extended 1040ms DRED window on Opus6k, the 40ms codec produced only ~11 frames/s instead of 25 — making audio choppy regardless of DRED quality.
|
||||
|
||||
**Root cause:** The Android capture ring read loop did partial reads that consumed samples from the ring but discarded them when retrying:
|
||||
1. Ring has 960 samples (one Oboe burst)
|
||||
2. `audio_read_capture(&mut buf[..1920])` reads 960 into `buf[0..960]`, returns 960
|
||||
3. Loop sees 960 < 1920, sleeps, retries from `buf[0..]` → overwrites the consumed samples
|
||||
4. ~50% of captured audio thrown away per frame
|
||||
|
||||
**Fix:** Added `wzp_native_audio_capture_available()` to check ring fill level before reading (same pattern as the desktop CPAL path's `capture_ring.available()`). Also made `frame_samples` mutable so codec switches update the read size.
|
||||
|
||||
**Affected codecs:** Only 40ms frame codecs (Opus6k, Codec2_1200). 20ms codecs (Opus24k, etc.) were unaffected because a single Oboe burst fills the entire request.
|
||||
|
||||
@@ -290,3 +290,17 @@ Run with `wzp-bench --all`. Representative results (Apple M-series, single core)
|
||||
- Logs initial state, poll count, and final state for HAL debugging
|
||||
- Does NOT fail on timeout — Rust-side stall detector remains as safety net
|
||||
- Targets Nothing Phone A059 intermittent silent calls on cold start
|
||||
|
||||
### Opus6k Frame Starvation Fix (2026-04-13)
|
||||
- Root cause: partial reads from capture ring consumed samples that were discarded on retry
|
||||
- `audio_read_capture(&mut buf[..1920])` with only 960 available → read 960, loop retried from buf[0], overwriting
|
||||
- Added `wzp_native_audio_capture_available()` — check before reading (matches desktop pattern)
|
||||
- `frame_samples` made mutable and updated on adaptive profile switch
|
||||
- `buf` sized to max frame (1920) with `[..frame_samples]` slices throughout
|
||||
- Result: Opus6k frame rate restored from ~11/s to expected 25/s
|
||||
|
||||
### Build Script Fixes (2026-04-13)
|
||||
- Stale APK cleanup: delete all APKs before build, prefer `*release*.apk` on upload
|
||||
- APK signing: added zipalign + apksigner pipeline to `build.sh` (was in `build-tauri-android.sh` only)
|
||||
- Keystore persistence: `$BASE_DIR/data/keystore/` cache synced into source tree before build
|
||||
- Fixes: 384MB debug APK uploaded instead of 25MB release; unsigned APK on alt server
|
||||
|
||||
Reference in New Issue
Block a user