--- tags: [report, wzp] type: report status: Pending Review --- # T3.1 — Confirm `RoomManager` concurrency (W13) **Status:** Pending Review **Agent:** Kimi Code CLI **Started:** 2026-05-11T20:55Z **Completed:** 2026-05-11T21:05Z **Commit:** (see git log) **PRD:** ../PRD-protocol-hardening.md ## What I changed - `crates/wzp-relay/src/room.rs` — `RoomManager` concurrency refactor: - Changed `rooms: DashMap` → `rooms: DashMap>>`. - Updated `RoomManager::others()` — now acquires `arc.read()` on the room-level RwLock after retrieving the Arc from DashMap. The DashMap shard guard is dropped before cloning senders. - Updated `RoomManager::observe_quality()` — now acquires `arc.write()` on the room-level RwLock instead of `DashMap::get_mut()`. Quality updates no longer contend with concurrent fan-out on the same room. - Updated `RoomManager::join()` / `leave()` — same pattern: brief DashMap access to get/insert the Arc, then room-level write lock for mutation. - Updated `room_size()`, `local_participant_list()`, `local_senders()`, `list()` — all use `arc.read()`. - `docs/PROTOCOL-AUDIT.md` — Marked W13 as **RESOLVED** with a one-line explanation of the fix. ## Why these choices The hot path is `others()`, called once per media packet per participant. Before this change, `others()` held the DashMap shard read lock while cloning all `ParticipantSender`s. With many participants, this clone is non-trivial and blocks concurrent `join()` / `leave()` / `observe_quality()` on the same shard. By wrapping each `Room` in `Arc>`: - `others()` → DashMap `get()` (brief) → `RwLock::read()` (while cloning senders) - `observe_quality()` → DashMap `get()` (brief) → `RwLock::write()` (while updating qualities) - Concurrent `others()` calls on the same room share the read lock. - `observe_quality()` only blocks writers, not other readers. `std::sync::RwLock` is safe here because all critical sections are synchronous (no `.await` inside the lock). ## Deviations from the task spec None. The task offered two options (`RwLock>` or `ArcSwap>`); wrapping the whole `Room` in `Arc>` is a superset that addresses the same hot path plus eliminates contention on `qualities` updates. ## Verification output ```bash $ cargo test -p wzp-relay running 86 tests ...(all 86 pass)... test result: ok. 86 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s ``` ```bash $ cargo test -p wzp-relay --test federation running 29 tests ...(all 29 pass)... test result: ok. 29 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.12s ``` ```bash $ cargo test -p wzp-relay --test handshake_integration running 5 tests ...(all 5 pass)... test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s ``` ## Test summary - Tests added: 0 - Tests modified: 0 - `wzp-relay` test count: 86 (unchanged) - Integration tests: 40+4 all pass - `cargo clippy -p wzp-relay --lib`: pass (no new warnings) - `cargo fmt --all -- --check`: pass ## Risks / follow-ups - `std::sync::RwLock` can panic if the lock is poisoned after a panicking thread. In practice, the relay is a single async task per participant, and panics are caught by tokio. If poison tolerance is needed, switch to `parking_lot::RwLock` (no poisoning) in a future dependency addition. - W13 was the last `Mutex`-based concern in the media hot path. The remaining contention points (ACL `std::sync::Mutex`, event broadcast channel) are on cold paths. ## Reviewer checklist (filled in by reviewer) - [ ] Code matches PRD intent - [ ] Verification output is real - [ ] No backward-incompat surprises - [ ] Tests cover the new behavior - [ ] Approved