Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
90 lines
3.7 KiB
Markdown
90 lines
3.7 KiB
Markdown
---
|
|
tags: [report, wzp]
|
|
type: report
|
|
status: Pending Review
|
|
---
|
|
|
|
# T3.1 — Confirm `RoomManager` concurrency (W13)
|
|
|
|
**Status:** Pending Review
|
|
**Agent:** Kimi Code CLI
|
|
**Started:** 2026-05-11T20:55Z
|
|
**Completed:** 2026-05-11T21:05Z
|
|
**Commit:** (see git log)
|
|
**PRD:** ../PRD-protocol-hardening.md
|
|
|
|
## What I changed
|
|
|
|
- `crates/wzp-relay/src/room.rs` — `RoomManager` concurrency refactor:
|
|
- Changed `rooms: DashMap<String, Room>` → `rooms: DashMap<String, Arc<RwLock<Room>>>`.
|
|
- Updated `RoomManager::others()` — now acquires `arc.read()` on the room-level RwLock after retrieving the Arc from DashMap. The DashMap shard guard is dropped before cloning senders.
|
|
- Updated `RoomManager::observe_quality()` — now acquires `arc.write()` on the room-level RwLock instead of `DashMap::get_mut()`. Quality updates no longer contend with concurrent fan-out on the same room.
|
|
- Updated `RoomManager::join()` / `leave()` — same pattern: brief DashMap access to get/insert the Arc, then room-level write lock for mutation.
|
|
- Updated `room_size()`, `local_participant_list()`, `local_senders()`, `list()` — all use `arc.read()`.
|
|
|
|
- `docs/PROTOCOL-AUDIT.md` — Marked W13 as **RESOLVED** with a one-line explanation of the fix.
|
|
|
|
## Why these choices
|
|
|
|
The hot path is `others()`, called once per media packet per participant. Before this change, `others()` held the DashMap shard read lock while cloning all `ParticipantSender`s. With many participants, this clone is non-trivial and blocks concurrent `join()` / `leave()` / `observe_quality()` on the same shard.
|
|
|
|
By wrapping each `Room` in `Arc<std::sync::RwLock<Room>>`:
|
|
- `others()` → DashMap `get()` (brief) → `RwLock::read()` (while cloning senders)
|
|
- `observe_quality()` → DashMap `get()` (brief) → `RwLock::write()` (while updating qualities)
|
|
- Concurrent `others()` calls on the same room share the read lock.
|
|
- `observe_quality()` only blocks writers, not other readers.
|
|
|
|
`std::sync::RwLock` is safe here because all critical sections are synchronous (no `.await` inside the lock).
|
|
|
|
## Deviations from the task spec
|
|
|
|
None. The task offered two options (`RwLock<Vec<Participant>>` or `ArcSwap<Vec<Participant>>`); wrapping the whole `Room` in `Arc<RwLock<Room>>` is a superset that addresses the same hot path plus eliminates contention on `qualities` updates.
|
|
|
|
## Verification output
|
|
|
|
```bash
|
|
$ cargo test -p wzp-relay
|
|
running 86 tests
|
|
...(all 86 pass)...
|
|
|
|
test result: ok. 86 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
|
|
```
|
|
|
|
```bash
|
|
$ cargo test -p wzp-relay --test federation
|
|
running 29 tests
|
|
...(all 29 pass)...
|
|
|
|
test result: ok. 29 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.12s
|
|
```
|
|
|
|
```bash
|
|
$ cargo test -p wzp-relay --test handshake_integration
|
|
running 5 tests
|
|
...(all 5 pass)...
|
|
|
|
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
|
|
```
|
|
|
|
## Test summary
|
|
|
|
- Tests added: 0
|
|
- Tests modified: 0
|
|
- `wzp-relay` test count: 86 (unchanged)
|
|
- Integration tests: 40+4 all pass
|
|
- `cargo clippy -p wzp-relay --lib`: pass (no new warnings)
|
|
- `cargo fmt --all -- --check`: pass
|
|
|
|
## Risks / follow-ups
|
|
|
|
- `std::sync::RwLock` can panic if the lock is poisoned after a panicking thread. In practice, the relay is a single async task per participant, and panics are caught by tokio. If poison tolerance is needed, switch to `parking_lot::RwLock` (no poisoning) in a future dependency addition.
|
|
- W13 was the last `Mutex`-based concern in the media hot path. The remaining contention points (ACL `std::sync::Mutex`, event broadcast channel) are on cold paths.
|
|
|
|
## Reviewer checklist (filled in by reviewer)
|
|
|
|
- [ ] Code matches PRD intent
|
|
- [ ] Verification output is real
|
|
- [ ] No backward-incompat surprises
|
|
- [ ] Tests cover the new behavior
|
|
- [ ] Approved
|