3.7 KiB
T3.1 — Confirm RoomManager concurrency (W13)
Status: Pending Review Agent: Kimi Code CLI Started: 2026-05-11T20:55Z Completed: 2026-05-11T21:05Z Commit: (see git log) PRD: ../PRD-protocol-hardening.md
What I changed
-
crates/wzp-relay/src/room.rs—RoomManagerconcurrency refactor:- Changed
rooms: DashMap<String, Room>→rooms: DashMap<String, Arc<RwLock<Room>>>. - Updated
RoomManager::others()— now acquiresarc.read()on the room-level RwLock after retrieving the Arc from DashMap. The DashMap shard guard is dropped before cloning senders. - Updated
RoomManager::observe_quality()— now acquiresarc.write()on the room-level RwLock instead ofDashMap::get_mut(). Quality updates no longer contend with concurrent fan-out on the same room. - Updated
RoomManager::join()/leave()— same pattern: brief DashMap access to get/insert the Arc, then room-level write lock for mutation. - Updated
room_size(),local_participant_list(),local_senders(),list()— all usearc.read().
- Changed
-
docs/PROTOCOL-AUDIT.md— Marked W13 as RESOLVED with a one-line explanation of the fix.
Why these choices
The hot path is others(), called once per media packet per participant. Before this change, others() held the DashMap shard read lock while cloning all ParticipantSenders. With many participants, this clone is non-trivial and blocks concurrent join() / leave() / observe_quality() on the same shard.
By wrapping each Room in Arc<std::sync::RwLock<Room>>:
others()→ DashMapget()(brief) →RwLock::read()(while cloning senders)observe_quality()→ DashMapget()(brief) →RwLock::write()(while updating qualities)- Concurrent
others()calls on the same room share the read lock. observe_quality()only blocks writers, not other readers.
std::sync::RwLock is safe here because all critical sections are synchronous (no .await inside the lock).
Deviations from the task spec
None. The task offered two options (RwLock<Vec<Participant>> or ArcSwap<Vec<Participant>>); wrapping the whole Room in Arc<RwLock<Room>> is a superset that addresses the same hot path plus eliminates contention on qualities updates.
Verification output
$ cargo test -p wzp-relay
running 86 tests
...(all 86 pass)...
test result: ok. 86 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
$ cargo test -p wzp-relay --test federation
running 29 tests
...(all 29 pass)...
test result: ok. 29 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.12s
$ cargo test -p wzp-relay --test handshake_integration
running 5 tests
...(all 5 pass)...
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
Test summary
- Tests added: 0
- Tests modified: 0
wzp-relaytest count: 86 (unchanged)- Integration tests: 40+4 all pass
cargo clippy -p wzp-relay --lib: pass (no new warnings)cargo fmt --all -- --check: pass
Risks / follow-ups
std::sync::RwLockcan panic if the lock is poisoned after a panicking thread. In practice, the relay is a single async task per participant, and panics are caught by tokio. If poison tolerance is needed, switch toparking_lot::RwLock(no poisoning) in a future dependency addition.- W13 was the last
Mutex-based concern in the media hot path. The remaining contention points (ACLstd::sync::Mutex, event broadcast channel) are on cold paths.
Reviewer checklist (filled in by reviewer)
- Code matches PRD intent
- Verification output is real
- No backward-incompat surprises
- Tests cover the new behavior
- Approved