Files
wz-phone/vault/Reports/T3.1-report.md
Siavash Sameni ed8a7ae5aa docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:00:17 +04:00

3.7 KiB

tags, type, status
tags type status
report
wzp
report Pending Review

T3.1 — Confirm RoomManager concurrency (W13)

Status: Pending Review Agent: Kimi Code CLI Started: 2026-05-11T20:55Z Completed: 2026-05-11T21:05Z Commit: (see git log) PRD: ../PRD-protocol-hardening.md

What I changed

  • crates/wzp-relay/src/room.rsRoomManager concurrency refactor:

    • Changed rooms: DashMap<String, Room>rooms: DashMap<String, Arc<RwLock<Room>>>.
    • Updated RoomManager::others() — now acquires arc.read() on the room-level RwLock after retrieving the Arc from DashMap. The DashMap shard guard is dropped before cloning senders.
    • Updated RoomManager::observe_quality() — now acquires arc.write() on the room-level RwLock instead of DashMap::get_mut(). Quality updates no longer contend with concurrent fan-out on the same room.
    • Updated RoomManager::join() / leave() — same pattern: brief DashMap access to get/insert the Arc, then room-level write lock for mutation.
    • Updated room_size(), local_participant_list(), local_senders(), list() — all use arc.read().
  • docs/PROTOCOL-AUDIT.md — Marked W13 as RESOLVED with a one-line explanation of the fix.

Why these choices

The hot path is others(), called once per media packet per participant. Before this change, others() held the DashMap shard read lock while cloning all ParticipantSenders. With many participants, this clone is non-trivial and blocks concurrent join() / leave() / observe_quality() on the same shard.

By wrapping each Room in Arc<std::sync::RwLock<Room>>:

  • others() → DashMap get() (brief) → RwLock::read() (while cloning senders)
  • observe_quality() → DashMap get() (brief) → RwLock::write() (while updating qualities)
  • Concurrent others() calls on the same room share the read lock.
  • observe_quality() only blocks writers, not other readers.

std::sync::RwLock is safe here because all critical sections are synchronous (no .await inside the lock).

Deviations from the task spec

None. The task offered two options (RwLock<Vec<Participant>> or ArcSwap<Vec<Participant>>); wrapping the whole Room in Arc<RwLock<Room>> is a superset that addresses the same hot path plus eliminates contention on qualities updates.

Verification output

$ cargo test -p wzp-relay
running 86 tests
...(all 86 pass)...

test result: ok. 86 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
$ cargo test -p wzp-relay --test federation
running 29 tests
...(all 29 pass)...

test result: ok. 29 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.12s
$ cargo test -p wzp-relay --test handshake_integration
running 5 tests
...(all 5 pass)...

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

Test summary

  • Tests added: 0
  • Tests modified: 0
  • wzp-relay test count: 86 (unchanged)
  • Integration tests: 40+4 all pass
  • cargo clippy -p wzp-relay --lib: pass (no new warnings)
  • cargo fmt --all -- --check: pass

Risks / follow-ups

  • std::sync::RwLock can panic if the lock is poisoned after a panicking thread. In practice, the relay is a single async task per participant, and panics are caught by tokio. If poison tolerance is needed, switch to parking_lot::RwLock (no poisoning) in a future dependency addition.
  • W13 was the last Mutex-based concern in the media hot path. The remaining contention points (ACL std::sync::Mutex, event broadcast channel) are on cold paths.

Reviewer checklist (filled in by reviewer)

  • Code matches PRD intent
  • Verification output is real
  • No backward-incompat surprises
  • Tests cover the new behavior
  • Approved