docs: protocol audit 2026-05-25, update architecture + Obsidian vault

Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-05-25 06:00:17 +04:00
parent 12b0d9738f
commit ed8a7ae5aa
120 changed files with 22781 additions and 65 deletions

View File

@@ -59,6 +59,7 @@ graph TD
FEC["wzp-fec<br/>RaptorQ FEC"]
CRYPTO["wzp-crypto<br/>ChaCha20 + Identity"]
TRANSPORT["wzp-transport<br/>QUIC / Quinn"]
VIDEO["wzp-video<br/>H.264 + H.265 + AV1"]
RELAY["wzp-relay<br/>Relay Daemon"]
CLIENT["wzp-client<br/>CLI + Call Engine"]
@@ -68,16 +69,19 @@ graph TD
PROTO --> FEC
PROTO --> CRYPTO
PROTO --> TRANSPORT
PROTO --> VIDEO
CODEC --> CLIENT
FEC --> CLIENT
CRYPTO --> CLIENT
TRANSPORT --> CLIENT
VIDEO --> CLIENT
CODEC --> RELAY
FEC --> RELAY
CRYPTO --> RELAY
TRANSPORT --> RELAY
VIDEO --> RELAY
CLIENT --> WEB
TRANSPORT --> WEB
@@ -90,9 +94,10 @@ graph TD
style CLIENT fill:#00b894,color:#fff
style WEB fill:#0984e3,color:#fff
style FC fill:#fd79a8,color:#fff
style VIDEO fill:#a29bfe,color:#fff
```
**Star pattern**: Each leaf crate (`wzp-codec`, `wzp-fec`, `wzp-crypto`, `wzp-transport`) depends only on `wzp-proto`. No leaf depends on another leaf. Integration crates (`wzp-relay`, `wzp-client`, `wzp-web`) depend on all leaves.
**Star pattern**: Each leaf crate (`wzp-codec`, `wzp-fec`, `wzp-crypto`, `wzp-transport`, `wzp-video`) depends only on `wzp-proto`. No leaf depends on another leaf. Integration crates (`wzp-relay`, `wzp-client`, `wzp-web`) depend on all leaves.
## Audio Encode Pipeline
@@ -106,7 +111,7 @@ sequenceDiagram
participant DT as DredTuner<br/>(wzp-proto)
participant FEC as RaptorQ FEC
participant INT as Interleaver<br/>(depth=3)
participant HDR as MediaHeader<br/>(12B or Mini 4B)
participant HDR as MediaHeader<br/>(16B or Mini 5B)
participant Enc as ChaCha20-Poly1305
participant QUIC as QUIC Datagram
participant QPS as QuinnPathSnapshot
@@ -144,7 +149,7 @@ sequenceDiagram
- RNNoise processes **2 x 480** samples (ML-based noise suppression via nnnoiseless)
- Silence detection uses VAD + 100ms hangover before switching to ComfortNoise
- FEC symbols are padded to **256 bytes** with a 2-byte LE length prefix
- MiniHeaders (4 bytes) replace full headers (12 bytes) for 49 of every 50 frames
- MiniHeaders (5 bytes) replace full headers (16 bytes) for 49 of every 50 audio frames; video always uses full headers
- DRED tuner polls quinn path stats every 25 frames (~500ms) and adjusts DRED lookback duration continuously
- Opus tiers bypass RaptorQ entirely -- DRED handles loss recovery at the codec layer
- Opus6k DRED window: 1040ms (maximum libopus allows)
@@ -324,35 +329,29 @@ sequenceDiagram
## Wire Formats
### MediaHeader (12 bytes)
### `MediaHeader` v2 (16 bytes, byte-aligned)
```
Byte 0: [V:1][T:1][CodecID:4][Q:1][FecRatioHi:1]
Byte 1: [FecRatioLo:6][unused:2]
Bytes 2-3: sequence (u16 BE)
Bytes 4-7: timestamp_ms (u32 BE)
Byte 8: fec_block_id (u8)
Byte 9: fec_symbol_idx (u8)
Byte 10: reserved
Byte 11: csrc_count
Byte 0: version (u8) 0x02
Byte 1: flags (u8) [T:1][Q:1][KeyFrame:1][FrameEnd:1][reserved:4]
T = FEC repair, Q = QualityReport trailer
KeyFrame = packet belongs to an I-frame (video)
FrameEnd = last packet of an access unit (video)
Byte 2: media_type (u8) 0=audio, 1=video, 2=data, 3=control
Byte 3: codec_id (u8) widened from 4-bit (room for 256 codec IDs)
Byte 4: stream_id (u8) simulcast layer; 0=base
Byte 5: fec_ratio (u8) 0..200 → 0.0..2.0
Bytes 6-9: sequence (u32 BE) wrapping packet sequence number
Bytes 10-13: timestamp_ms (u32 BE) milliseconds since session start
Bytes 14-15: fec_block_id (u16 BE)
audio: low 8 bits = block_id, high 8 bits = symbol_idx
video: full u16 block_id (large blocks for I-frames)
```
| Field | Bits | Description |
|-------|------|-------------|
| V (version) | 1 | Protocol version (0 = v1) |
| T (is_repair) | 1 | 1 = FEC repair packet, 0 = source media |
| CodecID | 4 | Codec identifier (0-8, see table below) |
| Q | 1 | 1 = QualityReport trailer appended |
| FecRatio | 7 | FEC ratio encoded as 0-127 mapping to 0.0-2.0 |
| sequence | 16 | Wrapping packet sequence number |
| timestamp_ms | 32 | Milliseconds since session start |
| fec_block_id | 8 | FEC source block ID (wrapping) |
| fec_symbol_idx | 8 | Symbol index within FEC block |
| reserved | 8 | Reserved flags |
| csrc_count | 8 | Contributing source count (future mixing) |
#### CodecID Values
**Audio codecs (media_type = 0)**
| Value | Codec | Bitrate | Sample Rate | Frame Duration |
|-------|-------|---------|-------------|---------------|
| 0 | Opus 24k | 24 kbps | 48 kHz | 20ms |
@@ -365,15 +364,25 @@ Byte 11: csrc_count
| 7 | Opus 48k | 48 kbps | 48 kHz | 20ms |
| 8 | Opus 64k | 64 kbps | 48 kHz | 20ms |
### MiniHeader (4 bytes, compressed)
**Video codecs (media_type = 1)**
| Value | Codec | Notes |
|-------|-------|-------|
| 9 | H.264 Baseline | Universal HW encode coverage |
| 10 | H.264 Main | Slight quality win over baseline |
| 11 | H.265 Main | Apple A10+, Snapdragon ~2017, NVENC GTX 9xx+; ~30% better than H.264 |
| 12 | AV1 Main | Apple M3/A17+, Snapdragon 8 Gen 3+, RTX 40+; best efficiency, narrow HW |
### `MiniHeader` v2 (5 bytes)
```
[FRAME_TYPE_MINI: 0x01]
Bytes 0-1: timestamp_delta_ms (u16 BE)
Bytes 2-3: payload_len (u16 BE)
[FRAME_TYPE_MINI = 0x01]
Byte 0: seq_delta (u8) delta from last full header's seq
Bytes 1-2: timestamp_delta_ms (u16 BE)
Bytes 3-4: payload_len (u16 BE)
```
Used for 49 of every 50 frames (~1s cycle). Saves 8 bytes per packet (67% header reduction). Full header is sent every 50th frame to resynchronize state.
Used for audio only (49 of every 50 frames). Saves 11 bytes per audio packet vs the full 16B header. Full header is sent every 50th frame to resynchronize state. Video always uses full 16B headers.
### TrunkFrame (batched datagrams)
@@ -482,9 +491,12 @@ sequenceDiagram
### Shared State & Locking
The `RoomManager` stores `DashMap<String, Arc<RwLock<Room>>>`. The DashMap guard is held only long enough to clone the `Arc`; all per-room operations then acquire the room-level `RwLock`. Concurrent fan-out calls share a read lock; join/leave acquire write lock.
| Lock | Protected Data | Hold Duration | Contention |
|------|---------------|---------------|------------|
| `RoomManager` (Mutex) | Rooms, participants, quality tiers | ~1ms/packet | O(N) per room |
| `DashMap<room_id, Arc<RwLock<Room>>>` | Room registry | Instant (clone Arc only) | Near-zero |
| `Room` (RwLock) | Participants, quality tiers | ~1ms/packet (read); ~1ms (write on join/leave) | Low (concurrent reads) |
| `PresenceRegistry` (Mutex) | Fingerprint registrations | ~1ms | Low (join/leave only) |
| `SessionManager` (Mutex) | Active session tracking | ~1ms | Low |
| `FederationManager.peer_links` (Mutex) | Peer connections | ~10ms during forward | Per-federation-packet |
@@ -492,15 +504,9 @@ sequenceDiagram
### Scaling Characteristics
- **Many small rooms**: Scales well across all cores (rooms are independent)
- **Large single room (100+ participants)**: Serialized by RoomManager lock
- **Large single room (100+ participants)**: Fan-out reads share RwLock (non-blocking); only join/leave serializes
- **Federation**: Per-peer tasks scale; `peer_links` lock held during send loop
### Primary Bottleneck
The RoomManager Mutex is acquired per-packet by every participant to get the fan-out peer list. Lock is released before I/O (sends happen outside lock), but packet processing is serialized through the lock within a room.
Future optimization: per-room locks or lock-free participant lists via `DashMap`.
## Client Architecture
### Desktop Engine (Tauri)
@@ -553,6 +559,8 @@ Key design decisions:
### Android Engine (Kotlin + JNI)
> **Note (2026-05-12):** The Kotlin+JNI Android app (`android/app/`) described below is superseded by the **Tauri 2.x mobile build** (`desktop/src-tauri/` + `crates/wzp-native/`). The Tauri approach uses the same Rust call engine as desktop, with Oboe audio via `wzp-native` cdylib. The Kotlin codebase is maintained for reference but the Tauri build is the live production app.
```mermaid
graph TB
subgraph "Compose UI"
@@ -902,6 +910,20 @@ warzonePhone/
│ │ └── rekey.rs # Forward secrecy rekeying
│ ├── wzp-transport/ # QUIC transport layer
│ │ └── src/lib.rs # QuinnTransport, send/recv media/signal/trunk
│ ├── wzp-video/ # Video codecs + framer
│ │ └── src/
│ │ ├── factory.rs # VideoEncoder factory (platform dispatch)
│ │ ├── framer.rs # NAL fragmentation (H.264/H.265)
│ │ ├── depacketizer.rs # NAL reassembly, access unit emit
│ │ ├── controller.rs # VideoQualityController
│ │ ├── simulcast.rs # Simulcast layer management
│ │ ├── encoder_mode.rs # Encoder mode selection
│ │ ├── av1_obu.rs # AV1 OBU framing + depacketizer
│ │ ├── dav1d.rs # dav1d AV1 software decoder
│ │ ├── svt_av1.rs # SVT-AV1 software encoder (non-Android)
│ │ ├── videotoolbox.rs # VideoToolbox H.265 + AV1 (macOS)
│ │ ├── mediacodec.rs # MediaCodec H.264/H.265/AV1 (Android, NDK 0.9 migration pending)
│ │ └── nack.rs # NACK sender/receiver framework
│ ├── wzp-relay/ # Relay daemon
│ │ └── src/
│ │ ├── main.rs # CLI, connection loop, auth + handshake
@@ -917,6 +939,10 @@ warzonePhone/
│ │ ├── presence.rs # PresenceRegistry
│ │ ├── route.rs # RouteResolver
│ │ ├── trunk.rs # TrunkBatcher
│ │ ├── audio_scorer.rs # Per-stream audio quality scoring
│ │ ├── response_policy.rs # Relay response policy (rate-limit, drop)
│ │ ├── verdict.rs # Verdict enum (Allow/RateLimit/Drop/Malicious)
│ │ ├── video_scorer.rs # VideoScorer (legitimacy scoring, keyframe regularity)
│ │ └── ws.rs # WebSocket handler for browser clients
│ ├── wzp-client/ # Call engine + CLI
│ │ └── src/
@@ -956,7 +982,7 @@ warzonePhone/
## Test Coverage
571 tests across all crates, 0 failures:
702 tests across all crates (excluding wzp-android), 0 failures:
| Crate | Tests | Key Coverage |
|-------|-------|-------------|
@@ -965,7 +991,8 @@ warzonePhone/
| wzp-fec | 21 | RaptorQ encode/decode, loss recovery, interleaving |
| wzp-crypto | 64 | Encrypt/decrypt, handshake, anti-replay, featherChat identity |
| wzp-transport | 11 | QUIC connection setup, path monitoring |
| wzp-relay | 122 | Room ACL, session mgmt, metrics, probes, mesh, trunking |
| wzp-relay | 137 | Room ACL, session mgmt, metrics, probes, mesh, trunking, scoring, verdict |
| wzp-video | 88 | NAL framing, AV1 OBU, simulcast, quality controller, NACK |
| wzp-client | 170 | Encoder/decoder, quality adapter, silence, drift, sweep |
| wzp-web | 2 | Metrics |
| wzp-native | 0 | Native platform bindings (no unit tests) |