docs: relay concurrency model, Opus6k fix, build script fixes
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 3m56s

- ARCHITECTURE.md: new "Relay Concurrency Model" section documenting
  threading, shared state locking table, scaling characteristics, and
  the RoomManager Mutex as primary bottleneck
- PROGRESS.md: Opus6k frame starvation fix, build script fixes
- PRD-dred-integration.md: Opus6k frame starvation bug documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-04-13 11:54:37 +04:00
parent 9ae9441de4
commit f265fd772d
3 changed files with 56 additions and 0 deletions

View File

@@ -473,6 +473,34 @@ sequenceDiagram
R->>R: Remove from room, broadcast RoomUpdate R->>R: Remove from room, broadcast RoomUpdate
``` ```
## Relay Concurrency Model
### Threading
- Multi-threaded Tokio runtime (all available cores, work-stealing scheduler)
- Task-per-connection: each QUIC connection gets a dedicated `tokio::spawn`
- Task-per-participant-per-room: each participant's media forwarding loop is independent
### Shared State & Locking
| Lock | Protected Data | Hold Duration | Contention |
|------|---------------|---------------|------------|
| `RoomManager` (Mutex) | Rooms, participants, quality tiers | ~1ms/packet | O(N) per room |
| `PresenceRegistry` (Mutex) | Fingerprint registrations | ~1ms | Low (join/leave only) |
| `SessionManager` (Mutex) | Active session tracking | ~1ms | Low |
| `FederationManager.peer_links` (Mutex) | Peer connections | ~10ms during forward | Per-federation-packet |
### Scaling Characteristics
- **Many small rooms**: Scales well across all cores (rooms are independent)
- **Large single room (100+ participants)**: Serialized by RoomManager lock
- **Federation**: Per-peer tasks scale; `peer_links` lock held during send loop
### Primary Bottleneck
The RoomManager Mutex is acquired per-packet by every participant to get the fan-out peer list. Lock is released before I/O (sends happen outside lock), but packet processing is serialized through the lock within a room.
Future optimization: per-room locks or lock-free participant lists via `DashMap`.
## Client Architecture ## Client Architecture
### Desktop Engine (Tauri) ### Desktop Engine (Tauri)

View File

@@ -386,3 +386,17 @@ When instantaneous jitter exceeds the EWMA × 1.3 (asymmetric: fast-up α=0.3, s
- 10 unit tests for tuner math (baseline, scaling, spike, cooldown, codec switch, Codec2 no-op) - 10 unit tests for tuner math (baseline, scaling, spike, cooldown, codec switch, Codec2 no-op)
- 4 integration tests (encoder adjustment, spike boost, Codec2 no-op, profile switch with encode verification) - 4 integration tests (encoder adjustment, spike boost, Codec2 no-op, profile switch with encode verification)
### Opus6k Frame Starvation Bug (Fixed 2026-04-13)
During testing of the extended 1040ms DRED window on Opus6k, the 40ms codec produced only ~11 frames/s instead of 25 — making audio choppy regardless of DRED quality.
**Root cause:** The Android capture ring read loop did partial reads that consumed samples from the ring but discarded them when retrying:
1. Ring has 960 samples (one Oboe burst)
2. `audio_read_capture(&mut buf[..1920])` reads 960 into `buf[0..960]`, returns 960
3. Loop sees 960 < 1920, sleeps, retries from `buf[0..]` → overwrites the consumed samples
4. ~50% of captured audio thrown away per frame
**Fix:** Added `wzp_native_audio_capture_available()` to check ring fill level before reading (same pattern as the desktop CPAL path's `capture_ring.available()`). Also made `frame_samples` mutable so codec switches update the read size.
**Affected codecs:** Only 40ms frame codecs (Opus6k, Codec2_1200). 20ms codecs (Opus24k, etc.) were unaffected because a single Oboe burst fills the entire request.

View File

@@ -290,3 +290,17 @@ Run with `wzp-bench --all`. Representative results (Apple M-series, single core)
- Logs initial state, poll count, and final state for HAL debugging - Logs initial state, poll count, and final state for HAL debugging
- Does NOT fail on timeout — Rust-side stall detector remains as safety net - Does NOT fail on timeout — Rust-side stall detector remains as safety net
- Targets Nothing Phone A059 intermittent silent calls on cold start - Targets Nothing Phone A059 intermittent silent calls on cold start
### Opus6k Frame Starvation Fix (2026-04-13)
- Root cause: partial reads from capture ring consumed samples that were discarded on retry
- `audio_read_capture(&mut buf[..1920])` with only 960 available → read 960, loop retried from buf[0], overwriting
- Added `wzp_native_audio_capture_available()` — check before reading (matches desktop pattern)
- `frame_samples` made mutable and updated on adaptive profile switch
- `buf` sized to max frame (1920) with `[..frame_samples]` slices throughout
- Result: Opus6k frame rate restored from ~11/s to expected 25/s
### Build Script Fixes (2026-04-13)
- Stale APK cleanup: delete all APKs before build, prefer `*release*.apk` on upload
- APK signing: added zipalign + apksigner pipeline to `build.sh` (was in `build-tauri-android.sh` only)
- Keystore persistence: `$BASE_DIR/data/keystore/` cache synced into source tree before build
- Fixes: 384MB debug APK uploaded instead of 25MB release; unsigned APK on alt server