docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
1245
vault/Architecture/Architecture.md
Normal file
1245
vault/Architecture/Architecture.md
Normal file
File diff suppressed because it is too large
Load Diff
233
vault/Architecture/Attack-Surface-Relay-Abuse.md
Normal file
233
vault/Architecture/Attack-Surface-Relay-Abuse.md
Normal file
@@ -0,0 +1,233 @@
|
||||
---
|
||||
tags: [architecture, wzp]
|
||||
type: architecture
|
||||
---
|
||||
|
||||
# Relay Abuse: Attack Surface & Mitigations
|
||||
|
||||
> WZP is end-to-end encrypted. The relay forwards ciphertext and cannot inspect payload content. This document enumerates the abuse vectors that survive E2E and the mitigations available without breaking it.
|
||||
>
|
||||
> Motivating threat: a PoC on another project (LiveKit) showed that an E2E SFU with no conformance enforcement can be repurposed as a free arbitrary-data tunnel. WZP must not be that.
|
||||
|
||||
## Threat model
|
||||
|
||||
### In scope
|
||||
|
||||
- **Bulk data tunneling.** Attacker uses a legitimate handshake, then pushes arbitrary bytes (file transfer, piracy, scraped traffic) through media datagrams.
|
||||
- **Bandwidth parasitism.** Attacker uses the relay as a cheap forwarder for unrelated traffic at scale.
|
||||
- **Quota / billing evasion.** Attacker disguises high-bandwidth use as low-bandwidth audio.
|
||||
- **DoS via amplification.** Attacker sends one packet → SFU fans out to N peers, multiplying egress cost N×.
|
||||
|
||||
### Out of scope (cannot be solved without breaking E2E)
|
||||
|
||||
- **Steganography inside real audio.** Modulating Opus-encoded waveforms to encode a covert channel. Information-theoretic limit; ~tens to hundreds of bps achievable; economically uninteresting.
|
||||
- **Modem-over-call.** Real audio whose semantic content is data. Same limit.
|
||||
- **Slow exfiltration under all rate caps.** Attacker who stays within audio's natural bandwidth envelope, indefinitely.
|
||||
|
||||
### Threat actor profile
|
||||
|
||||
We are defending against **economically motivated abuse at scale**, not against a determined nation-state covert channel. The former needs bandwidth and is loud; the latter is impossible to stop and not worth the engineering cost.
|
||||
|
||||
## What the relay can observe
|
||||
|
||||
Despite E2E, the relay sees a lot. None of this is encrypted to the relay:
|
||||
|
||||
| Observable | Source | Bits available |
|
||||
|---|---|---|
|
||||
| `CodecID` (declared codec) | `MediaHeader`, AAD | 4 (today) / 6 (v2) |
|
||||
| `MediaType` (audio / video / data / control) | `MediaHeader` v2 | 2 |
|
||||
| `sequence`, `timestamp_ms` | `MediaHeader` | 32 + 32 |
|
||||
| `fec_block_id`, `fec_symbol_idx`, `FecRatio`, `T` (repair) | `MediaHeader` | varies |
|
||||
| `KeyFrame` bit | `MediaHeader` v2 | 1 |
|
||||
| `Q` flag (QualityReport trailer present) | `MediaHeader` | 1 |
|
||||
| Packet size | QUIC layer | — |
|
||||
| Packet inter-arrival timing | QUIC layer | — |
|
||||
| Aggregate bytes/sec per session | RelayMetrics | — |
|
||||
| Source fingerprint, src IP | Session state | — |
|
||||
|
||||
This is enough surface for strong conformance enforcement without ever touching encrypted payload.
|
||||
|
||||
## Mitigation tiers
|
||||
|
||||
Listed in order of cost-to-implement vs. decisiveness. Tier A alone kills the gross-abuse threat. Higher tiers add defense in depth.
|
||||
|
||||
### Tier A — Codec-conformance bitrate caps
|
||||
|
||||
For each declared `CodecID`, the wire bitrate has a math-derivable hard ceiling:
|
||||
|
||||
```
|
||||
ceiling_bps[CodecID] = nominal_bitrate * (1 + max_FEC_ratio) * (1 + overhead_pct)
|
||||
= nominal * 3.0 * 1.15 // FEC max 2.0 → factor 3.0
|
||||
```
|
||||
|
||||
| Codec | Nominal | Hard ceiling |
|
||||
|---|---|---|
|
||||
| Opus 64k | 64 kbps | ~221 kbps |
|
||||
| Opus 24k | 24 kbps | ~83 kbps |
|
||||
| Opus 6k | 6 kbps | ~21 kbps |
|
||||
| Codec2 1200 | 1.2 kbps | ~4 kbps |
|
||||
| ComfortNoise | 0 | ~2 kbps |
|
||||
|
||||
Sliding 1 s window per session. Sustained excess → hard violation, close session.
|
||||
|
||||
Decisive against bulk tunneling. False-positive rate negligible if ceilings set at math-derived max × 1.5.
|
||||
|
||||
### Tier B — Packet-rate conformance
|
||||
|
||||
Each codec has a fixed frame interval (20 ms or 40 ms), so legal `pps` is 25 or 50, plus FEC repair packets (max ~150 pps total at FEC ratio 2.0). Anything sustaining > 200 pps for an audio codec is not audio.
|
||||
|
||||
### Tier C — Timestamp-rate consistency
|
||||
|
||||
`timestamp_ms` advances at the declared frame interval. `Δtimestamp / Δseq` over a rolling window should match the codec's frame duration ±2×. Divergence catches abusers who send audio-rate small packets but burn fields for payload.
|
||||
|
||||
### Tier D — Per-codec packet-size sanity
|
||||
|
||||
EWMA of packet size per session, compared to per-codec typical:
|
||||
|
||||
| Codec | Typical | Reject above |
|
||||
|---|---|---|
|
||||
| Opus 24k 20 ms | 60–80 B | 160 B |
|
||||
| Opus 6k 40 ms | 30–40 B | 90 B |
|
||||
| Codec2 1200 40 ms | 6 B | 30 B |
|
||||
| ComfortNoise | 0–4 B | 16 B |
|
||||
|
||||
### Tier E — Per-fingerprint / per-IP token bucket
|
||||
|
||||
Aggregate quota regardless of declared codec:
|
||||
|
||||
```
|
||||
For each (fingerprint, src_ip):
|
||||
monthly_bytes_quota authenticated = 50 GB (tune)
|
||||
anonymous = 1 GB
|
||||
per-session cap audio = 256 kbps
|
||||
video = 5 Mbps
|
||||
burst = 30 s at 2× cap
|
||||
```
|
||||
|
||||
Won't stop a single rogue session under cap; bounds aggregate blast radius and makes relay economics predictable.
|
||||
|
||||
### Tier F — Behavioral entropy / statistical fingerprinting
|
||||
|
||||
The deeper layer. Computed continuously per session over 10–30 s windows. Combined score flags streams that pass declared-codec checks but do not statistically look like real media.
|
||||
|
||||
**Why this works:** real audio and real video have very specific statistical signatures that tunneled data does not naturally produce, and that an attacker would have to deliberately and expensively mimic. The signatures differ wildly between audio and video — which is exactly why we separate them (see next section).
|
||||
|
||||
#### Audio fingerprint features
|
||||
|
||||
| Feature | Real Opus speech | Tunneled data |
|
||||
|---|---|---|
|
||||
| **IAT coefficient of variation** | 0.1–0.4 (clocked) | > 1.0 (bursty) |
|
||||
| **Payload-size distribution** | Bimodal: speech 60–80 B + silence/CN 0–10 B | Unimodal, large, MTU-skewed |
|
||||
| **Silence fraction** | 10–40 % (real conversation pauses) | < 2 % |
|
||||
| **Bitrate over 30 s** | Tracks nominal codec ±20 % | Often saturates ceiling |
|
||||
| **`Q` flag cadence** | Periodic, regular | Absent or random |
|
||||
| **DRED / FEC ratio response** | Tracks `QualityReport` trend | Static or noise |
|
||||
|
||||
Single derived score: `audio_legitimacy ∈ [0, 1]`. Below threshold (e.g. 0.3) for 60 s → flag.
|
||||
|
||||
#### Video fingerprint features (post-V1)
|
||||
|
||||
| Feature | Real H.264 / AV1 video | Tunneled data |
|
||||
|---|---|---|
|
||||
| **Keyframe periodicity** | Regular (every 1–4 s, or on PLI) | Absent or uniform `KeyFrame=1` |
|
||||
| **Frame-size ratio (I / P)** | 5–20× | ≈ 1× |
|
||||
| **Burst structure** | One I-frame = N packets in < 5 ms, then quiet | Uniform spacing |
|
||||
| **Bitrate response to BWE feedback** | Tracks `TransportFeedback::remb_bps` | Ignores it |
|
||||
| **Resolution / FPS implied by bitrate** | Coherent (240 p ≠ 8 Mbps) | Incoherent |
|
||||
| **NACK / PLI responsiveness** | Sender produces keyframe within 200 ms | No response |
|
||||
|
||||
Single derived score: `video_legitimacy ∈ [0, 1]`.
|
||||
|
||||
#### Implementation shape
|
||||
|
||||
```rust
|
||||
pub struct LegitimacyScorer {
|
||||
media_type: MediaType,
|
||||
iat_ewma: ExponentialMovingAverage,
|
||||
iat_variance: ExponentialMovingVariance,
|
||||
size_histogram: SizeBuckets<8>,
|
||||
silence_count: u32,
|
||||
speech_count: u32,
|
||||
quality_reports_seen: u32,
|
||||
keyframe_intervals: RingBuffer<u32, 16>,
|
||||
window_start: Instant,
|
||||
}
|
||||
|
||||
impl LegitimacyScorer {
|
||||
pub fn observe(&mut self, header: &MediaHeader, payload_len: usize, now: Instant);
|
||||
pub fn score(&self) -> f32; // [0, 1]
|
||||
pub fn verdict(&self) -> Verdict; // Legitimate | Suspect | Abusive
|
||||
}
|
||||
```
|
||||
|
||||
Cheap: a few floats and counters per session. Update on every packet, score every 1 s, escalate over 30+ s.
|
||||
|
||||
### Tier G — Reactive response
|
||||
|
||||
A scoring system needs a response policy:
|
||||
|
||||
| Verdict | Action |
|
||||
|---|---|
|
||||
| Legitimate | None |
|
||||
| Suspect | Apply tighter Tier-E quota; emit `relay_conformance_suspect_total` |
|
||||
| Abusive | Close session with `Hangup::PolicyViolation`; log to audit; cool-down fingerprint |
|
||||
| Repeat-abusive | Lower-tier quota across the federation (gossip via federation channel) |
|
||||
|
||||
Never silent-drop. Always close with a typed reason so legitimate users hitting a bug get a clear error.
|
||||
|
||||
## Separating audio and video
|
||||
|
||||
**Yes — this is one of the strongest arguments for the v2 `MediaType` bit and should be a hard design rule.**
|
||||
|
||||
Audio and video have nothing in common statistically:
|
||||
|
||||
| Property | Audio | Video |
|
||||
|---|---|---|
|
||||
| Bitrate | 6–64 kbps | 100 kbps – 5 Mbps |
|
||||
| Packet rate | 25–50 pps | 500–2000 pps |
|
||||
| Packet size | 6–160 B | 200–1450 B |
|
||||
| Burst structure | Clocked, near-CBR | Bursty (I-frames) |
|
||||
| Silence | Common (10–40 %) | Meaningless |
|
||||
| Loss tolerance | High (PLC, DRED) | Variable (keyframes critical) |
|
||||
| Recovery primitive | FEC + DRED | NACK + PLI + keyframe cache |
|
||||
|
||||
A single scoring model trying to cover both would have to be so permissive at the union of envelopes that it would let tunnels through. **Separation is mandatory for Tier F to work.**
|
||||
|
||||
### What separation requires
|
||||
|
||||
1. **`MediaType:2` in `MediaHeader` v2** (already in `ROAD-TO-VIDEO.md` Phase V1). Without this, the relay must keep a `CodecID → MediaType` table and update it every time a codec is added — fragile.
|
||||
2. **Per-`MediaType` conformance rules.** A and B and D have separate tables per type. Tier F has separate scorers.
|
||||
3. **Per-`MediaType` quotas.** Tier E uses two buckets: `audio_bps_cap`, `video_bps_cap`. A session in audio-only mode never gets to spend the video budget. A video session has both, audio-priority.
|
||||
4. **Per-`MediaType` keyframe/silence semantics.** `KeyFrame` bit is meaningless for audio; silence fraction is meaningless for video. The scorer needs to know which features apply.
|
||||
|
||||
### Bonus: separation also helps the SFU
|
||||
|
||||
Beyond abuse detection, the same separation makes graceful degradation cleaner: under congestion the relay can drop video packets first while preserving audio, because it knows which is which without parsing the codec table.
|
||||
|
||||
## Open questions for later decision
|
||||
|
||||
1. **Hard-close on first hard violation, or three-strikes?** Three-strikes is friendlier but lets twice the abuse through. Recommend hard-close + clear typed reason; legitimate users will reconnect, abusers won't try again at the same fingerprint.
|
||||
2. **Where do verdicts persist?** In-memory per relay is simplest. Federated gossip is more powerful but a new attack surface (poisoning).
|
||||
3. **Threshold tuning.** All thresholds in this doc are first-pass math. Real numbers come from a few weeks of Prometheus data on legitimate traffic before any enforcement turns on.
|
||||
4. **Anonymous vs. authenticated split.** featherChat-authed users get generous quotas; anonymous users get tight ones. This makes the economics of mass abuse hostile (need many real identities) without locking out small legitimate use.
|
||||
5. **What to log.** Conformance hits should be Prometheus counters + ringbuffer of recent violations; never log raw payload content (even encrypted) for privacy.
|
||||
|
||||
## Suggested implementation order (whenever this is picked up)
|
||||
|
||||
| Step | What | Why first |
|
||||
|---|---|---|
|
||||
| 1 | Land v2 wire format with `MediaType:2` | Prereq for separation; already on the road-to-video plan |
|
||||
| 2 | Tier A + B + C as `wzp-relay/src/conformance.rs` | Kills bulk tunneling; cheap; no false positives if math is right |
|
||||
| 3 | Prometheus metrics for violations + raw observables (IAT, size, silence frac) | Gather baseline of legitimate traffic before tightening |
|
||||
| 4 | Tier D + E (size sanity + token bucket) | Defense in depth |
|
||||
| 5 | Tier F scorer, audio-only first; tuned against the baseline from step 3 | Adds covert-tunnel pressure |
|
||||
| 6 | Tier F video scorer once video is in production | Same shape, different features |
|
||||
| 7 | Tier G response policy + audit log | Operationalize |
|
||||
|
||||
Steps 1–2 are decisive against the LiveKit-style PoC. The rest is steady tightening as real traffic accumulates.
|
||||
|
||||
## What this does NOT promise
|
||||
|
||||
- It does not stop a patient adversary running a slow covert channel inside real audio. Nothing E2E-preserving can.
|
||||
- It does not detect content (no CSAM scan, no copyright fingerprint). Those would require breaking E2E and are out of scope by design.
|
||||
- It does not eliminate abuse — it makes abuse loud, expensive, and detectable, which is the realistic goal for any E2E system.
|
||||
169
vault/Architecture/Branch-Desktop-Audio-Rewrite.md
Normal file
169
vault/Architecture/Branch-Desktop-Audio-Rewrite.md
Normal file
@@ -0,0 +1,169 @@
|
||||
---
|
||||
tags: [architecture, wzp]
|
||||
type: architecture
|
||||
---
|
||||
|
||||
# Branch: `feat/desktop-audio-rewrite`
|
||||
|
||||
Home of the Tauri desktop client for macOS, Windows, and Linux. Named "audio-rewrite" because the original driver was replacing a CPAL-only audio pipeline with platform-native backends that support OS-level echo cancellation (VoiceProcessingIO on macOS, WASAPI Communications on Windows), but the branch has grown into the full desktop story — Windows cross-compilation, vendored dependencies, history UI, direct calling, the whole thing.
|
||||
|
||||
## Purpose
|
||||
|
||||
The desktop client shares 100% of its frontend (`desktop/src/`) and Tauri command layer (`desktop/src-tauri/src/lib.rs`, `engine.rs`, `history.rs`) with the Android build on `android-rewrite`. Differences are limited to:
|
||||
|
||||
- **Audio backends**, which are platform-gated via Cargo target-dep sections in `desktop/src-tauri/Cargo.toml` and feature flags in `crates/wzp-client/Cargo.toml`.
|
||||
- **Identity storage paths**, which resolve via Tauri's `app_data_dir()` (`~/Library/Application Support/…` on macOS, `%APPDATA%\…` on Windows, `~/.local/share/…` on Linux).
|
||||
- **Build toolchains**: native `cargo build` on macOS/Linux, `cargo xwin` cross-compile from Linux for Windows via Docker on SepehrHomeserverdk.
|
||||
|
||||
## Audio backend matrix
|
||||
|
||||
| Target | Capture | Playback | AEC |
|
||||
|---|---|---|---|
|
||||
| macOS | CPAL (WASAPI/CoreAudio via cpal crate) OR VoiceProcessingIO (native Core Audio) | CPAL | VoiceProcessingIO native AEC (when `vpio` feature enabled) |
|
||||
| Windows (default) | CPAL → WASAPI shared mode | CPAL → WASAPI shared mode | None |
|
||||
| Windows (AEC build) | Direct WASAPI with `IAudioClient2::SetClientProperties(AudioCategory_Communications)` | CPAL → WASAPI shared mode | **OS-level**: Windows routes the capture stream through the driver's communications APO chain (AEC + NS + AGC) |
|
||||
| Linux | CPAL → ALSA/PulseAudio | CPAL → ALSA/PulseAudio | None |
|
||||
|
||||
The macOS VPIO path is gated behind the `vpio` feature in `wzp-client` and the `coreaudio-rs` dep is itself `cfg(target_os = "macos")`, so enabling the feature on Windows or Linux is a no-op.
|
||||
|
||||
The Windows AEC path is gated behind the `windows-aec` feature, also target-gated (the `windows` crate dep is only pulled in on Windows), and re-exports `WasapiAudioCapture as AudioCapture` when enabled so downstream code doesn't need to know which backend is active. The current Windows build at `target/windows-exe/wzp-desktop.exe` has `windows-aec` on; a baseline noAEC build is preserved at `target/windows-exe/wzp-desktop-noAEC.exe` for A/B comparison on real hardware.
|
||||
|
||||
See [`BRANCH-android-rewrite.md`](BRANCH-android-rewrite.md) for Oboe audio on Android, which is its own story.
|
||||
|
||||
## Recent major work
|
||||
|
||||
### 1. Desktop direct calling feature (commit `2fd9465` and neighbors)
|
||||
|
||||
Brought direct 1:1 calls to macOS with full parity to the Android client:
|
||||
|
||||
- **Identity path fix**: the desktop `CallEngine::start` was loading seed from `$HOME/.wzp/identity` while `register_signal` used Tauri's `app_data_dir()`, producing two different fingerprints per run. Both now route through `load_or_create_seed()` which uses `app_data_dir()` everywhere.
|
||||
- **Call history with dedup**: `history.rs` stores a `Vec<CallHistoryEntry>` with a `CallDirection` enum (`Placed | Received | Missed`). The `log` function dedupes by `call_id` so an outgoing call isn't logged twice as "missed" (when the signal loop's `DirectCallOffer` handler fires) and then again as "placed" (when `place_call` returns). Instead the entry is updated in place.
|
||||
- **Recent contacts row**: a horizontal chip UI in the direct-call panel showing the last N peers with friendly aliases, clickable to re-dial.
|
||||
- **Deregister button**: lets a user drop their signal registration without quitting the app, useful when switching identities.
|
||||
- **Random alias derivation**: a new client sees a human-friendly alias like "silent-forest-41" derived deterministically from its seed, so it's identifiable in the UI before manual naming.
|
||||
- **Default room "general"** instead of "android", since the desktop client is not Android.
|
||||
|
||||
### 2. macOS VoiceProcessingIO integration
|
||||
|
||||
`crates/wzp-client/src/audio_vpio.rs` — a native Core Audio implementation using `AUGraph` + `AudioComponentInstance` with the VPIO audio unit. Gives you hardware-accelerated AEC (same AEC Apple ships in FaceTime / iMessage audio / voice memos) at the cost of tight coupling to Apple frameworks. Lock-free ring pattern matches the CPAL path so the upper layers don't notice the difference.
|
||||
|
||||
Enabled by `features = ["audio", "vpio"]` in the macOS target section of `desktop/src-tauri/Cargo.toml`.
|
||||
|
||||
### 3. Windows cross-compilation via cargo-xwin
|
||||
|
||||
Cross-compiling Rust + Tauri to `x86_64-pc-windows-msvc` from Linux using `cargo-xwin`, which downloads the Microsoft CRT + Windows SDK on demand and drives `clang-cl` as the compiler. No Windows machine is needed for the build itself — only for runtime testing.
|
||||
|
||||
**Build infrastructure**:
|
||||
|
||||
- `scripts/Dockerfile.windows-builder` — Debian bookworm + Rust + cargo-xwin + Node 20 + cmake + ninja + llvm + clang + lld + nasm. Pre-warms the xwin MSVC CRT cache at image build time (saves ~4 minutes per cold build).
|
||||
- `scripts/build-windows-docker.sh` — fire-and-forget remote build via Docker on SepehrHomeserverdk. Same pattern as `build-tauri-android.sh`. Uploads the `.exe` to rustypaste and fires an `ntfy.sh/wzp` notification on start and on completion.
|
||||
- `scripts/build-windows-cloud.sh` — alternative pipeline using a temporary Hetzner Cloud VPS. Slower (full VM spin-up), more expensive, but useful when Docker image rebuilds would be disruptive.
|
||||
|
||||
**Two critical blockers resolved** on the way to a working `.exe`:
|
||||
|
||||
1. **libopus SSE4.1 / SSSE3 intrinsic compile failure**. `audiopus_sys` vendors libopus 1.3.1, whose `CMakeLists.txt` gates the per-file `-msse4.1` `COMPILE_FLAGS` behind `if(NOT MSVC)`. Under `clang-cl`, CMake sets `MSVC=1` (because `CMAKE_C_COMPILER_FRONTEND_VARIANT=MSVC` triggers `Platform/Windows-MSVC.cmake` which unconditionally sets the variable), so the per-file flag is never set and the SSE4.1 source files compile without the target feature — then fail with 20+ "always_inline function '_mm_cvtepi16_epi32' requires target feature 'sse4.1'" errors.
|
||||
|
||||
Fixed by **vendoring audiopus_sys into `vendor/audiopus_sys/`** and patching its bundled libopus to introduce an `MSVC_CL` variable that is true only for real `cl.exe` (distinguished via `CMAKE_C_COMPILER_ID STREQUAL "MSVC"`). The eight `if(NOT MSVC)` SIMD guards are flipped to `if(NOT MSVC_CL)` and the global `/arch` block at line 445 becomes `if(MSVC_CL)`, so clang-cl gets the GCC-style per-file flags while real cl.exe keeps the `/arch:AVX` / `/arch:SSE2` globals.
|
||||
|
||||
Wired in via `[patch.crates-io] audiopus_sys = { path = "vendor/audiopus_sys" }` at the workspace root.
|
||||
|
||||
Upstream tracking: [xiph/opus#256](https://github.com/xiph/opus/issues/256), [xiph/opus PR #257](https://github.com/xiph/opus/pull/257) (both stale).
|
||||
|
||||
2. **tauri-build needs `icons/icon.ico` for the Windows PE resource**. The desktop only had `icon.png`. Generated a multi-size ICO (16/24/32/48/64/128/256) from the existing placeholder via Pillow and committed it. Placeholder quality — real branded icons can replace it later.
|
||||
|
||||
### 4. Windows `AudioCategory_Communications` capture path (task #24)
|
||||
|
||||
`crates/wzp-client/src/audio_wasapi.rs` — direct WASAPI capture via `IMMDeviceEnumerator → IAudioClient2 → SetClientProperties` with `AudioCategory_Communications`. This tells Windows "this is a VoIP call" and Windows routes the capture stream through the driver's registered communications APO chain, which on most Win10/11 consumer hardware includes AEC, NS, and AGC.
|
||||
|
||||
**Caveat**: quality is driver-dependent. On a machine with a good communications APO (Intel Smart Sound, Dolby, modern Realtek on Win11 24H2+, anything with Voice Clarity enabled) it's excellent. On generic class-compliant drivers with no communications APO registered, it's a no-op. For a guaranteed AEC regardless of driver, see task #26 which tracks implementing the classic Voice Capture DSP (`CLSID_CWMAudioAEC`) as a fallback.
|
||||
|
||||
Gated behind the `windows-aec` feature in `wzp-client`. Enabled by default in the Windows target section of `desktop/src-tauri/Cargo.toml`.
|
||||
|
||||
## Build pipelines
|
||||
|
||||
### Native macOS / Linux
|
||||
|
||||
```bash
|
||||
cd desktop
|
||||
npm install
|
||||
npm run build
|
||||
cd src-tauri
|
||||
cargo build --release --bin wzp-desktop
|
||||
```
|
||||
|
||||
### Windows x86_64 via Docker on SepehrHomeserverdk
|
||||
|
||||
```bash
|
||||
./scripts/build-windows-docker.sh # Full: pull + build + download
|
||||
./scripts/build-windows-docker.sh --no-pull # Skip git fetch
|
||||
./scripts/build-windows-docker.sh --rust # Force-clean Rust target
|
||||
./scripts/build-windows-docker.sh --image-build # (Re)build the Docker image (fire-and-forget)
|
||||
```
|
||||
|
||||
Output lands at `target/windows-exe/wzp-desktop.exe`. Both `wzp-desktop.exe` and `wzp-desktop-noAEC.exe` can coexist in that directory; the script writes `wzp-desktop.exe` so renaming the prior build to `-noAEC.exe` (or any other name) before rebuilding preserves it.
|
||||
|
||||
### Windows x86_64 via Hetzner Cloud (alternative)
|
||||
|
||||
```bash
|
||||
./scripts/build-windows-cloud.sh # Full: create VM → build → download → destroy
|
||||
./scripts/build-windows-cloud.sh --prepare # Create VM and install deps only
|
||||
./scripts/build-windows-cloud.sh --build # Build on existing VM
|
||||
./scripts/build-windows-cloud.sh --destroy # Delete the VM
|
||||
WZP_KEEP_VM=1 ./scripts/build-windows-cloud.sh # Keep VM alive after build for debug
|
||||
```
|
||||
|
||||
Remember to destroy the VM at end of day with `--destroy`.
|
||||
|
||||
### Linux x86_64 (relay + CLI + bench)
|
||||
|
||||
```bash
|
||||
./scripts/build-linux-docker.sh # Fire-and-forget remote Docker build
|
||||
./scripts/build-linux-docker.sh --install # Wait for completion and download
|
||||
```
|
||||
|
||||
Uses the same `wzp-android-builder` Docker image as Android (not a separate image), since the deps (Rust + cmake + ring prereqs) are the same.
|
||||
|
||||
## Testing
|
||||
|
||||
### Direct calling parity
|
||||
|
||||
1. Build on two machines (macOS + Windows, or two macOS, or any combination).
|
||||
2. Both machines register on the same relay.
|
||||
3. Copy one machine's fingerprint into the other's direct-call panel.
|
||||
4. Place the call. Confirm ringing UI on the callee and "calling…" UI on the caller.
|
||||
5. Answer. Confirm audio flows both ways.
|
||||
6. Hang up from either side. Confirm call-history entries are labeled correctly (`Outgoing` on caller, `Incoming` on callee, never `Missed` on a successful call).
|
||||
|
||||
### Windows AEC A/B
|
||||
|
||||
1. Install `wzp-desktop-noAEC.exe` and `wzp-desktop.exe` on the same Windows box.
|
||||
2. Join a call from each (separately) while a second machine plays known audio through the first machine's speakers.
|
||||
3. On the remote (listening) side: the `noAEC` call should have clear audible echo; the AEC call should have minimal or no echo after a 1–2 s convergence period.
|
||||
4. If both builds sound identical (with echo) → the `AudioCategory_Communications` switch isn't triggering the driver's APO chain. Investigate via task #26 (Voice Capture DSP fallback).
|
||||
|
||||
## Known quirks
|
||||
|
||||
1. **libopus vendor path is workspace-relative**. `[patch.crates-io] audiopus_sys = { path = "vendor/audiopus_sys" }` works from any crate in the workspace because Cargo resolves it against the root `Cargo.toml`'s directory. If the workspace is moved or vendored into another workspace, update the path.
|
||||
|
||||
2. **`cargo xwin` overwrites `override.cmake` on every invocation**. Any attempt to patch `~/.cache/cargo-xwin/cmake/clang-cl/override.cmake` at Docker image build time is inert because `src/compiler/clang_cl.rs` line ~444 writes the bundled file fresh on every run. All real fixes must land in the source tree (via the vendored audiopus_sys, as done here), not in the cargo-xwin cache.
|
||||
|
||||
3. **WebView2 runtime is a prerequisite on Windows 10**. Windows 11 ships with it. If the `.exe` launches and immediately exits with no error on a Win10 machine, that's the missing runtime — install it from [Microsoft's Evergreen bootstrapper](https://developer.microsoft.com/en-us/microsoft-edge/webview2/).
|
||||
|
||||
4. **Rust 2024 edition `unsafe_op_in_unsafe_fn` lint**. The WASAPI backend in `audio_wasapi.rs` emits ~18 of these warnings because Rust 2024 requires explicit `unsafe { ... }` blocks inside `unsafe fn` bodies. The warnings don't block the build and don't affect runtime behavior; cleaning them up is tracked informally as tech debt.
|
||||
|
||||
## Files of interest
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `desktop/src/` | Shared frontend (TypeScript + HTML + CSS) |
|
||||
| `desktop/src-tauri/src/lib.rs` | Tauri commands shared with Android |
|
||||
| `desktop/src-tauri/src/engine.rs` | `CallEngine` wrapper |
|
||||
| `desktop/src-tauri/src/history.rs` | Persistent call history store with dedup |
|
||||
| `crates/wzp-client/src/audio_io.rs` | CPAL capture + playback (baseline) |
|
||||
| `crates/wzp-client/src/audio_vpio.rs` | macOS VoiceProcessingIO capture (AEC) |
|
||||
| `crates/wzp-client/src/audio_wasapi.rs` | Windows WASAPI communications capture (AEC) |
|
||||
| `vendor/audiopus_sys/opus/CMakeLists.txt` | Patched libopus for clang-cl SIMD |
|
||||
| `scripts/Dockerfile.windows-builder` | Windows cross-compile Docker image |
|
||||
| `scripts/build-windows-docker.sh` | Remote Docker build pipeline |
|
||||
| `scripts/build-windows-cloud.sh` | Hetzner VPS alternative pipeline |
|
||||
| `scripts/build-linux-docker.sh` | Linux x86_64 relay/CLI build pipeline |
|
||||
666
vault/Architecture/Design.md
Normal file
666
vault/Architecture/Design.md
Normal file
@@ -0,0 +1,666 @@
|
||||
---
|
||||
tags: [architecture, wzp]
|
||||
type: architecture
|
||||
---
|
||||
|
||||
# WarzonePhone Design Document
|
||||
|
||||
> Custom encrypted VoIP protocol built in Rust. Designed for hostile network conditions: 5-70% packet loss, 100-500 kbps throughput, 300-800 ms RTT. Multi-platform: Desktop (Tauri), Android, CLI, Web.
|
||||
|
||||
## System Overview
|
||||
|
||||
WarzonePhone is a voice-over-IP system built from scratch in Rust, targeting reliable encrypted voice communication over severely degraded networks. The protocol uses adaptive codecs (Opus + Codec2), fountain-code FEC (RaptorQ), and end-to-end ChaCha20-Poly1305 encryption over a QUIC transport layer.
|
||||
|
||||
The system comprises three categories of components:
|
||||
|
||||
1. **Protocol crates** -- a Rust workspace of 7 library crates with a star dependency graph enabling parallel development
|
||||
2. **Client applications** -- Desktop (Tauri), Android (Kotlin + JNI), CLI, and Web (browser bridge)
|
||||
3. **Relay infrastructure** -- SFU relay daemons with federation, health probing, and Prometheus metrics
|
||||
|
||||
### Design Principles
|
||||
|
||||
- **User sovereignty** -- client-driven route selection, BIP39 identity backup, no central authority
|
||||
- **End-to-end encryption** -- relays never see plaintext audio; SFU forwarding preserves E2E encryption
|
||||
- **Adaptive resilience** -- automatic codec and FEC switching based on observed network quality
|
||||
- **Parallel development** -- star dependency graph allows 5 agents/developers to work simultaneously with zero merge conflicts
|
||||
|
||||
## Architecture
|
||||
|
||||
### Crate Overview
|
||||
|
||||
The workspace contains 7 core crates plus integration binaries:
|
||||
|
||||
| Crate | Purpose | Key Dependencies |
|
||||
|-------|---------|-----------------|
|
||||
| `wzp-proto` | Protocol types, traits, wire format | serde, bytes |
|
||||
| `wzp-codec` | Audio codecs (Opus, Codec2, RNNoise) | audiopus, codec2, nnnoiseless |
|
||||
| `wzp-fec` | Forward error correction | raptorq |
|
||||
| `wzp-crypto` | Cryptography and identity | ed25519-dalek, x25519-dalek, chacha20poly1305, bip39 |
|
||||
| `wzp-transport` | QUIC transport layer | quinn, rustls |
|
||||
| `wzp-relay` | Relay daemon (SFU, federation, metrics) | tokio, prometheus |
|
||||
| `wzp-client` | Call engine and CLI | All above |
|
||||
|
||||
Additional integration targets: `wzp-web` (browser bridge via WebSocket), Android native library (JNI), Desktop (Tauri).
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
PROTO["wzp-proto<br/>(Types, Traits, Wire Format)"]
|
||||
|
||||
CODEC["wzp-codec<br/>(Opus + Codec2 + RNNoise)"]
|
||||
FEC["wzp-fec<br/>(RaptorQ FEC)"]
|
||||
CRYPTO["wzp-crypto<br/>(ChaCha20 + Identity)"]
|
||||
TRANSPORT["wzp-transport<br/>(QUIC / Quinn)"]
|
||||
|
||||
RELAY["wzp-relay<br/>(Relay Daemon)"]
|
||||
CLIENT["wzp-client<br/>(CLI + Call Engine)"]
|
||||
WEB["wzp-web<br/>(Browser Bridge)"]
|
||||
DESKTOP["Desktop<br/>(Tauri + CPAL)"]
|
||||
ANDROID["Android<br/>(Kotlin + JNI)"]
|
||||
|
||||
PROTO --> CODEC
|
||||
PROTO --> FEC
|
||||
PROTO --> CRYPTO
|
||||
PROTO --> TRANSPORT
|
||||
|
||||
CODEC --> CLIENT
|
||||
FEC --> CLIENT
|
||||
CRYPTO --> CLIENT
|
||||
TRANSPORT --> CLIENT
|
||||
|
||||
CODEC --> RELAY
|
||||
FEC --> RELAY
|
||||
CRYPTO --> RELAY
|
||||
TRANSPORT --> RELAY
|
||||
|
||||
CLIENT --> WEB
|
||||
CLIENT --> DESKTOP
|
||||
CLIENT --> ANDROID
|
||||
TRANSPORT --> WEB
|
||||
|
||||
FC["warzone-protocol<br/>(featherChat Identity)"] -.->|path dep| CRYPTO
|
||||
|
||||
style PROTO fill:#6c5ce7,color:#fff
|
||||
style RELAY fill:#ff9f43,color:#fff
|
||||
style CLIENT fill:#00b894,color:#fff
|
||||
style WEB fill:#0984e3,color:#fff
|
||||
style DESKTOP fill:#0984e3,color:#fff
|
||||
style ANDROID fill:#0984e3,color:#fff
|
||||
style FC fill:#fd79a8,color:#fff
|
||||
```
|
||||
|
||||
The star pattern ensures each leaf crate (`wzp-codec`, `wzp-fec`, `wzp-crypto`, `wzp-transport`) depends only on `wzp-proto` and never on each other. This enables:
|
||||
|
||||
- **Parallel development** -- 5 agents work on 5 crates with no merge conflicts
|
||||
- **Independent testing** -- each crate has self-contained tests
|
||||
- **Pluggability** -- any implementation can be swapped by implementing the same trait
|
||||
- **Fast compilation** -- changing one leaf only recompiles that leaf and integration crates
|
||||
|
||||
## Audio Pipeline
|
||||
|
||||
### Encode Pipeline (Mic to Network)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Mic as Microphone
|
||||
participant RNN as RNNoise Denoise
|
||||
participant VAD as Silence Detector
|
||||
participant ENC as Opus/Codec2 Encode
|
||||
participant FEC as RaptorQ FEC Encode
|
||||
participant INT as Interleaver
|
||||
participant HDR as Header Assembly
|
||||
participant CRYPT as ChaCha20-Poly1305
|
||||
participant QUIC as QUIC Datagram
|
||||
|
||||
Mic->>RNN: PCM i16 x 960 (20ms @ 48kHz)
|
||||
RNN->>VAD: Denoised samples (2 x 480)
|
||||
alt Silence detected (>100ms)
|
||||
VAD->>ENC: ComfortNoise packet (every 200ms)
|
||||
else Active speech or hangover
|
||||
VAD->>ENC: Active audio frame
|
||||
end
|
||||
ENC->>FEC: Compressed frame (padded to 256 bytes)
|
||||
FEC->>FEC: Accumulate block (5-10 frames)
|
||||
FEC->>INT: Source + repair symbols
|
||||
INT->>HDR: Interleaved packets (depth=3)
|
||||
HDR->>CRYPT: MediaHeader (12B) or MiniHeader (4B)
|
||||
CRYPT->>QUIC: Header=AAD, Payload=encrypted
|
||||
```
|
||||
|
||||
### Decode Pipeline (Network to Speaker)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant QUIC as QUIC Datagram
|
||||
participant CRYPT as ChaCha20-Poly1305
|
||||
participant HDR as Header Parse
|
||||
participant DEINT as De-interleaver
|
||||
participant FEC as RaptorQ FEC Decode
|
||||
participant JIT as Jitter Buffer
|
||||
participant DEC as Opus/Codec2 Decode
|
||||
participant SPK as Speaker
|
||||
|
||||
QUIC->>CRYPT: Encrypted packet
|
||||
CRYPT->>HDR: Decrypt (header=AAD)
|
||||
HDR->>DEINT: Parsed MediaHeader + payload
|
||||
DEINT->>FEC: Reordered symbols
|
||||
FEC->>FEC: Reconstruct from any K of K+R symbols
|
||||
FEC->>JIT: Recovered audio frames
|
||||
JIT->>JIT: Sequence-ordered BTreeMap
|
||||
JIT->>DEC: Pop when depth >= target
|
||||
DEC->>SPK: PCM i16 x 960
|
||||
```
|
||||
|
||||
## Codec System
|
||||
|
||||
WarzonePhone uses a dual-codec architecture to cover the full range of network conditions:
|
||||
|
||||
### Opus (Primary)
|
||||
|
||||
Opus is the primary codec for normal to degraded conditions. It operates at 48 kHz natively with built-in inband FEC and DTX (discontinuous transmission). The `audiopus` crate provides mature Rust bindings to libopus.
|
||||
|
||||
| Profile | Bitrate | Frame Duration | FEC Ratio | Total Bandwidth | Use Case |
|
||||
|---------|---------|---------------|-----------|----------------|----------|
|
||||
| Studio 64k | 64 kbps | 20ms | 10% | 70.4 kbps | LAN, excellent WiFi |
|
||||
| Studio 48k | 48 kbps | 20ms | 10% | 52.8 kbps | Good WiFi, wired |
|
||||
| Studio 32k | 32 kbps | 20ms | 10% | 35.2 kbps | WiFi, LTE |
|
||||
| Good (24k) | 24 kbps | 20ms | 20% | 28.8 kbps | WiFi, LTE, decent links |
|
||||
| Opus 16k | 16 kbps | 20ms | 20% | 19.2 kbps | 3G, moderate congestion |
|
||||
| Degraded (6k) | 6 kbps | 40ms | 50% | 9.0 kbps | 3G, congested WiFi |
|
||||
|
||||
### Codec2 (Fallback)
|
||||
|
||||
Codec2 is a narrowband vocoder designed for HF radio links with extreme bandwidth constraints. It operates at 8 kHz, and the adaptive layer handles 48 kHz <-> 8 kHz resampling transparently. The pure-Rust `codec2` crate means no C dependencies.
|
||||
|
||||
| Profile | Bitrate | Frame Duration | FEC Ratio | Total Bandwidth | Use Case |
|
||||
|---------|---------|---------------|-----------|----------------|----------|
|
||||
| Codec2 3200 | 3.2 kbps | 20ms | 50% | 4.8 kbps | Poor conditions |
|
||||
| Catastrophic (1200) | 1.2 kbps | 40ms | 100% | 2.4 kbps | Satellite, extreme loss |
|
||||
|
||||
### ComfortNoise
|
||||
|
||||
When the silence detector identifies no speech activity for over 100ms, the encoder switches to emitting a ComfortNoise packet every 200ms instead of encoding silence. This provides approximately 50% bandwidth savings in typical conversations.
|
||||
|
||||
### Adaptive Switching
|
||||
|
||||
The `AdaptiveEncoder`/`AdaptiveDecoder` in `wzp-codec` hold both codec instances and switch between them based on the active `QualityProfile`. This avoids codec re-initialization latency during tier transitions. The `AdaptiveQualityController` in `wzp-proto` manages tier transitions with hysteresis:
|
||||
|
||||
- **Downgrade**: 3 consecutive bad reports (2 on cellular networks)
|
||||
- **Upgrade**: 10 consecutive good reports (one tier at a time)
|
||||
- **Network handoff**: WiFi-to-cellular switch triggers preemptive one-tier downgrade plus a temporary 10-second FEC boost (+20%)
|
||||
|
||||
Quality tier classification thresholds:
|
||||
|
||||
| Tier | WiFi/Unknown | Cellular |
|
||||
|------|-------------|----------|
|
||||
| Good | loss < 10%, RTT < 400ms | loss < 8%, RTT < 300ms |
|
||||
| Degraded | loss 10-40%, RTT 400-600ms | loss 8-25%, RTT 300-500ms |
|
||||
| Catastrophic | loss > 40%, RTT > 600ms | loss > 25%, RTT > 500ms |
|
||||
|
||||
## Forward Error Correction (FEC)
|
||||
|
||||
### Why RaptorQ Over Reed-Solomon
|
||||
|
||||
WarzonePhone uses RaptorQ (RFC 6330) fountain codes via the `raptorq` crate:
|
||||
|
||||
1. **Rateless** -- generate arbitrary repair symbols on the fly; if conditions worsen mid-block, generate additional repair without re-encoding
|
||||
2. **Efficient decoding** -- decode from any K symbols with high probability (typically K + 1 or K + 2 suffice)
|
||||
3. **Lower complexity** -- O(K) encoding/decoding time vs O(K^2) for Reed-Solomon
|
||||
4. **Variable block sizes** -- 1-56,403 source symbols per block (WZP uses 5-10)
|
||||
|
||||
### FEC Block Structure
|
||||
|
||||
Each FEC block consists of 5-10 audio frames padded to 256-byte symbols with a 2-byte LE length prefix:
|
||||
|
||||
```
|
||||
[len:u16 LE][audio_frame][zero_padding_to_256_bytes]
|
||||
```
|
||||
|
||||
### Loss Survival by FEC Ratio
|
||||
|
||||
With 5 source frames per block:
|
||||
|
||||
| FEC Ratio | Repair Symbols | Survives Loss | Profile |
|
||||
|-----------|---------------|---------------|---------|
|
||||
| 10% | 1 | 1 of 6 (16.7%) | Studio |
|
||||
| 20% | 1 | 1 of 6 (16.7%) | Good |
|
||||
| 50% | 3 | 3 of 8 (37.5%) | Degraded |
|
||||
| 100% | 5 | 5 of 10 (50.0%) | Catastrophic |
|
||||
|
||||
### Interleaving
|
||||
|
||||
Burst loss protection via depth-3 interleaving: packets from 3 consecutive FEC blocks are interleaved before transmission. A burst of 3 consecutive lost packets affects 3 different blocks (1 loss each) rather than destroying 1 block entirely.
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "FEC Encoder"
|
||||
F1[Frame 1] --> BLK[Source Block<br/>5-10 frames]
|
||||
F2[Frame 2] --> BLK
|
||||
F3[Frame 3] --> BLK
|
||||
F4[Frame 4] --> BLK
|
||||
F5[Frame 5] --> BLK
|
||||
BLK --> SRC[Source Symbols]
|
||||
BLK --> REP[Repair Symbols<br/>ratio-dependent]
|
||||
SRC --> INT[Interleaver<br/>depth=3]
|
||||
REP --> INT
|
||||
end
|
||||
|
||||
subgraph "Network"
|
||||
INT --> LOSS{Packet Loss}
|
||||
LOSS -->|some lost| RCV[Received Symbols]
|
||||
end
|
||||
|
||||
subgraph "FEC Decoder"
|
||||
RCV --> DEINT[De-interleaver]
|
||||
DEINT --> RAPTORQ[RaptorQ Decode<br/>Any K of K+R]
|
||||
RAPTORQ --> OUT[Original Frames]
|
||||
end
|
||||
|
||||
style LOSS fill:#e17055,color:#fff
|
||||
style RAPTORQ fill:#00b894,color:#fff
|
||||
```
|
||||
|
||||
## Transport Layer
|
||||
|
||||
### Why QUIC Over Raw UDP
|
||||
|
||||
WarzonePhone uses QUIC (via the `quinn` crate) rather than raw UDP for several reasons:
|
||||
|
||||
| Feature | Benefit |
|
||||
|---------|---------|
|
||||
| DATAGRAM frames (RFC 9221) | Unreliable delivery without head-of-line blocking -- behaves like UDP for media |
|
||||
| Reliable streams | Multiplexed signaling (CallOffer, Hangup, Rekey) without a separate TCP connection |
|
||||
| Congestion control | Prevents overwhelming degraded links, important when chaining relays |
|
||||
| Connection migration | Connections survive IP address changes (WiFi to cellular handoff) |
|
||||
| TLS 1.3 built-in | Transport-level encryption protects headers and signaling |
|
||||
| NAT keepalive | 5-second interval maintains NAT bindings without application-level pings |
|
||||
| Firewall traversal | Runs on UDP port 443 with `wzp` ALPN identifier |
|
||||
|
||||
The tradeoff is approximately 20-40 bytes of additional per-packet overhead compared to raw UDP.
|
||||
|
||||
### Wire Formats
|
||||
|
||||
#### MediaHeader (12 bytes)
|
||||
|
||||
```
|
||||
Byte 0: [V:1][T:1][CodecID:4][Q:1][FecRatioHi:1]
|
||||
Byte 1: [FecRatioLo:6][unused:2]
|
||||
Bytes 2-3: sequence (u16 BE)
|
||||
Bytes 4-7: timestamp_ms (u32 BE)
|
||||
Byte 8: fec_block_id (u8)
|
||||
Byte 9: fec_symbol_idx (u8)
|
||||
Byte 10: reserved
|
||||
Byte 11: csrc_count
|
||||
|
||||
V = version (0), T = is_repair, CodecID = codec, Q = quality_report appended
|
||||
```
|
||||
|
||||
#### MiniHeader (4 bytes, compressed)
|
||||
|
||||
```
|
||||
Bytes 0-1: timestamp_delta_ms (u16 BE)
|
||||
Bytes 2-3: payload_len (u16 BE)
|
||||
|
||||
Preceded by FRAME_TYPE_MINI (0x01). Full header every 50 frames (~1s).
|
||||
Saves 8 bytes/packet (67% header reduction).
|
||||
```
|
||||
|
||||
#### TrunkFrame (batched datagrams)
|
||||
|
||||
```
|
||||
[count:u16]
|
||||
[session_id:2][len:u16][payload:len] x count
|
||||
|
||||
Packs multiple session packets into one QUIC datagram.
|
||||
Max 10 entries or 1200 bytes, flushed every 5ms.
|
||||
```
|
||||
|
||||
#### QualityReport (4 bytes, optional trailer)
|
||||
|
||||
```
|
||||
Byte 0: loss_pct (0-255 maps to 0-100%)
|
||||
Byte 1: rtt_4ms (0-255 maps to 0-1020ms)
|
||||
Byte 2: jitter_ms
|
||||
Byte 3: bitrate_cap_kbps
|
||||
```
|
||||
|
||||
### Bandwidth Summary
|
||||
|
||||
| Profile | Audio | FEC Overhead | Total | Silence Savings |
|
||||
|---------|-------|-------------|-------|----------------|
|
||||
| Studio 64k | 64 kbps | 10% = 6.4 kbps | **70.4 kbps** | ~50% with DTX |
|
||||
| Studio 48k | 48 kbps | 10% = 4.8 kbps | **52.8 kbps** | ~50% with DTX |
|
||||
| Studio 32k | 32 kbps | 10% = 3.2 kbps | **35.2 kbps** | ~50% with DTX |
|
||||
| Good (24k) | 24 kbps | 20% = 4.8 kbps | **28.8 kbps** | ~50% with DTX |
|
||||
| Degraded (6k) | 6 kbps | 50% = 3.0 kbps | **9.0 kbps** | ~50% with DTX |
|
||||
| Catastrophic (1.2k) | 1.2 kbps | 100% = 1.2 kbps | **2.4 kbps** | ~50% with DTX |
|
||||
|
||||
Additional savings: MiniHeaders save 8 bytes/packet (67% header reduction). Trunking shares QUIC overhead across multiplexed sessions.
|
||||
|
||||
## Security
|
||||
|
||||
### Identity Model
|
||||
|
||||
Every user has a persistent identity derived from a 32-byte seed:
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
SEED["32-byte Seed<br/>(BIP39 Mnemonic: 24 words)"] --> HKDF1["HKDF<br/>info='warzone-ed25519'"]
|
||||
SEED --> HKDF2["HKDF<br/>info='warzone-x25519'"]
|
||||
|
||||
HKDF1 --> ED["Ed25519 SigningKey<br/>(Digital Signatures)"]
|
||||
HKDF2 --> X25519["X25519 StaticSecret<br/>(Key Agreement)"]
|
||||
|
||||
ED --> VKEY["Ed25519 VerifyingKey<br/>(Public)"]
|
||||
X25519 --> XPUB["X25519 PublicKey<br/>(Public)"]
|
||||
|
||||
VKEY --> FP["Fingerprint<br/>SHA-256(pubkey), truncated 16 bytes<br/>xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx"]
|
||||
|
||||
style SEED fill:#6c5ce7,color:#fff
|
||||
style FP fill:#fd79a8,color:#fff
|
||||
style ED fill:#ee5a24,color:#fff
|
||||
style X25519 fill:#00b894,color:#fff
|
||||
```
|
||||
|
||||
**BIP39 Mnemonic Backup**: The 32-byte seed can be encoded as a 24-word BIP39 mnemonic for human-readable backup. The same seed produces the same identity on any platform.
|
||||
|
||||
**featherChat Compatibility**: The identity derivation is compatible with the Warzone messenger (featherChat), allowing a shared identity across messaging and calling.
|
||||
|
||||
### Cryptographic Handshake
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant C as Caller
|
||||
participant R as Relay / Callee
|
||||
|
||||
Note over C: Derive identity from seed<br/>Ed25519 + X25519 via HKDF
|
||||
|
||||
C->>C: Generate ephemeral X25519 keypair
|
||||
C->>C: Sign(ephemeral_pub || "call-offer")
|
||||
C->>R: CallOffer { identity_pub, ephemeral_pub, signature, profiles }
|
||||
|
||||
R->>R: Verify Ed25519 signature
|
||||
R->>R: Generate ephemeral X25519 keypair
|
||||
R->>R: shared_secret = DH(eph_b, eph_a)
|
||||
R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
|
||||
R->>R: Sign(ephemeral_pub || "call-answer")
|
||||
R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, profile }
|
||||
|
||||
C->>C: Verify signature
|
||||
C->>C: shared_secret = DH(eph_a, eph_b)
|
||||
C->>C: session_key = HKDF(shared_secret)
|
||||
|
||||
Note over C,R: Both have identical ChaCha20-Poly1305 session key
|
||||
C->>R: Encrypted media (QUIC datagrams)
|
||||
R->>C: Encrypted media (QUIC datagrams)
|
||||
|
||||
Note over C,R: Rekey every 65,536 packets<br/>New ephemeral DH + HKDF mix
|
||||
```
|
||||
|
||||
### Encryption Details
|
||||
|
||||
| Component | Algorithm | Purpose |
|
||||
|-----------|-----------|---------|
|
||||
| Identity signing | Ed25519 | Authenticate handshake messages |
|
||||
| Key agreement | X25519 (ephemeral) | Derive shared secret |
|
||||
| Key derivation | HKDF-SHA256 | Derive session key from shared secret |
|
||||
| Media encryption | ChaCha20-Poly1305 | Encrypt audio payloads (16-byte tag) |
|
||||
| Nonce construction | Deterministic from sequence number | No nonce reuse, no state sync needed |
|
||||
| Anti-replay | Sliding window (64-packet) | Reject duplicate/old packets |
|
||||
| Forward secrecy | Rekey every 65,536 packets | New ephemeral DH + HKDF mix |
|
||||
|
||||
**Why ChaCha20-Poly1305 over AES-GCM**:
|
||||
- Faster on hardware without AES-NI (ARM phones, Raspberry Pi relays)
|
||||
- Inherently constant-time (add-rotate-XOR only)
|
||||
- Compatible with Warzone messenger (featherChat)
|
||||
- Same 16-byte authentication tag overhead as AES-GCM
|
||||
|
||||
**AEAD with AAD**: The MediaHeader is used as Associated Authenticated Data. The header is authenticated but not encrypted, allowing relays to read routing information (block ID, sequence number) without decrypting the payload.
|
||||
|
||||
### Trust on First Use (TOFU)
|
||||
|
||||
Clients remember the relay's TLS certificate fingerprint after first connection. If the fingerprint changes on a subsequent connection, the desktop client shows a "Server Key Changed" warning dialog. The relay derives its TLS certificate deterministically from its persisted identity seed, so the fingerprint is stable across restarts.
|
||||
|
||||
## Relay Architecture
|
||||
|
||||
### Room Mode (Default SFU)
|
||||
|
||||
In room mode, the relay acts as a Selective Forwarding Unit. Clients join named rooms via the QUIC SNI (Server Name Indication) field. The relay forwards each participant's encrypted packets to all other participants in the room without decoding or re-encoding.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Room Mode (SFU)"
|
||||
C1[Client 1] -->|"QUIC SNI=room-hash"| RM[Room Manager]
|
||||
C2[Client 2] -->|"QUIC SNI=room-hash"| RM
|
||||
C3[Client 3] -->|"QUIC SNI=room-hash"| RM
|
||||
RM --> R1[Room 'podcast']
|
||||
R1 -->|fan-out| C1
|
||||
R1 -->|fan-out| C2
|
||||
R1 -->|fan-out| C3
|
||||
end
|
||||
|
||||
style RM fill:#ff9f43,color:#fff
|
||||
style R1 fill:#fdcb6e
|
||||
```
|
||||
|
||||
**SFU vs MCU trade-off**: SFU was chosen because it preserves end-to-end encryption (the relay never sees plaintext audio). An MCU would need to decode, mix, and re-encode, breaking E2E encryption. The trade-off is O(N) bandwidth at the relay for N participants.
|
||||
|
||||
### Forward Mode
|
||||
|
||||
With `--remote`, the relay forwards all traffic to a remote relay. Used for chaining relays across lossy or censored links:
|
||||
|
||||
```
|
||||
Client --> Relay A (--remote B) --> Relay B --> Destination Client
|
||||
```
|
||||
|
||||
The relay pipeline in forward mode: FEC decode, jitter buffer, then FEC re-encode for the next hop.
|
||||
|
||||
## Federation
|
||||
|
||||
### Overview
|
||||
|
||||
Two or more relays form a federation mesh. Each relay is an independent SFU. When configured to trust each other, they bridge **global rooms** -- participants on relay A in a global room hear participants on relay B in the same room.
|
||||
|
||||
### Configuration
|
||||
|
||||
Federation uses three TOML configuration sections:
|
||||
|
||||
- `[[peers]]` -- outbound connections to peer relays (url + TLS fingerprint)
|
||||
- `[[trusted]]` -- inbound connections accepted from relays (TLS fingerprint only)
|
||||
- `[[global_rooms]]` -- room names to bridge across all federated peers
|
||||
|
||||
### Federation Topology
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Relay A (EU)"
|
||||
A_RM[Room Manager]
|
||||
A_FM[Federation Manager]
|
||||
A1[Alice - local]
|
||||
A2[Bob - local]
|
||||
A_RM --> A_FM
|
||||
end
|
||||
|
||||
subgraph "Relay B (US)"
|
||||
B_RM[Room Manager]
|
||||
B_FM[Federation Manager]
|
||||
B1[Charlie - local]
|
||||
B_RM --> B_FM
|
||||
end
|
||||
|
||||
A_FM <-->|"QUIC SNI='_federation'<br/>GlobalRoomActive/Inactive<br/>Media forwarding"| B_FM
|
||||
|
||||
A1 -->|media| A_RM
|
||||
A2 -->|media| A_RM
|
||||
B1 -->|media| B_RM
|
||||
|
||||
A_RM -->|"federated fan-out"| A1
|
||||
A_RM -->|"federated fan-out"| A2
|
||||
B_RM -->|"federated fan-out"| B1
|
||||
|
||||
style A_FM fill:#6c5ce7,color:#fff
|
||||
style B_FM fill:#6c5ce7,color:#fff
|
||||
style A_RM fill:#ff9f43,color:#fff
|
||||
style B_RM fill:#ff9f43,color:#fff
|
||||
```
|
||||
|
||||
### Protocol
|
||||
|
||||
1. On startup, each relay connects to all configured `[[peers]]` via QUIC with SNI `"_federation"`
|
||||
2. After QUIC handshake, sends `FederationHello { tls_fingerprint }` for identity verification
|
||||
3. Peer verifies the fingerprint against its `[[trusted]]` or `[[peers]]` list
|
||||
4. When a local participant joins a global room, sends `GlobalRoomActive { room }` to all peers
|
||||
5. When the last local participant leaves, sends `GlobalRoomInactive { room }`
|
||||
6. Media is forwarded as `[room_hash:8][original_media_packet]` -- the relay does not decrypt
|
||||
|
||||
### What Relays Do NOT Do
|
||||
|
||||
- **No transcoding** -- media passes through as-is
|
||||
- **No re-encryption** -- packets are already encrypted E2E
|
||||
- **No central coordinator** -- each relay independently connects to configured peers
|
||||
- **No automatic peer discovery** -- peers must be explicitly configured
|
||||
|
||||
### Failure Handling
|
||||
|
||||
- If a peer goes down, local rooms continue working; federated participants disappear from presence
|
||||
- Reconnection: every 30 seconds with exponential backoff up to 5 minutes
|
||||
- If a peer restarts with a different identity, the fingerprint check fails with a clear log message
|
||||
|
||||
## Jitter Buffer
|
||||
|
||||
The jitter buffer balances latency vs quality:
|
||||
|
||||
| Setting | Client | Relay |
|
||||
|---------|--------|-------|
|
||||
| Target depth | 10 packets (200ms) | 50 packets (1s) |
|
||||
| Minimum before playout | 3 packets (60ms) | 25 packets (500ms) |
|
||||
| Maximum cap | 250 packets (5s) | 250 packets (5s) |
|
||||
|
||||
The relay uses a deeper buffer to absorb jitter from lossy inter-relay links. The client uses a shallower buffer for lower latency.
|
||||
|
||||
The adaptive playout delay tracks jitter via exponential moving average and adjusts the target depth:
|
||||
|
||||
```
|
||||
target_delay = ceil(jitter_ema / 20ms) + 2
|
||||
```
|
||||
|
||||
**Known limitation**: The current jitter buffer does not use timestamp-based playout scheduling. It relies on sequence-number ordering only, which can lead to drift during long calls.
|
||||
|
||||
## Signal Messages
|
||||
|
||||
Signal messages are sent over reliable QUIC streams as length-prefixed JSON:
|
||||
|
||||
```
|
||||
[4-byte length prefix][serde_json payload]
|
||||
```
|
||||
|
||||
| Message | Purpose |
|
||||
|---------|---------|
|
||||
| `CallOffer` | Identity, ephemeral key, signature, supported profiles |
|
||||
| `CallAnswer` | Identity, ephemeral key, signature, chosen profile |
|
||||
| `AuthToken` | featherChat bearer token for relay authentication |
|
||||
| `Hangup` | Reason: Normal, Busy, Declined, Timeout, Error |
|
||||
| `Hold` / `Unhold` | Call hold state |
|
||||
| `Mute` / `Unmute` | Mic mute state |
|
||||
| `Transfer` | Call transfer to another relay/fingerprint |
|
||||
| `Rekey` | New ephemeral key for forward secrecy |
|
||||
| `QualityUpdate` | Quality report + recommended profile |
|
||||
| `Ping` / `Pong` | Latency measurement (timestamp_ms) |
|
||||
| `RoomUpdate` | Participant list changes |
|
||||
| `PresenceUpdate` | Federation presence gossip |
|
||||
| `RouteQuery` / `RouteResponse` | Presence discovery for routing |
|
||||
| `FederationHello` | Relay identity during federation setup |
|
||||
| `GlobalRoomActive` / `GlobalRoomInactive` | Federation room bridging |
|
||||
|
||||
## Test Coverage
|
||||
|
||||
571 tests across all crates, 0 failures:
|
||||
|
||||
| Crate | Tests | Key Coverage |
|
||||
|-------|-------|-------------|
|
||||
| wzp-proto | 41 | Wire format, jitter buffer, quality tiers, mini-frames, trunking |
|
||||
| wzp-codec | 31 | Opus/Codec2 roundtrip, silence detection, noise suppression |
|
||||
| wzp-fec | 22 | RaptorQ encode/decode, loss recovery, interleaving |
|
||||
| wzp-crypto | 34 + 28 compat | Encrypt/decrypt, handshake, anti-replay, featherChat identity |
|
||||
| wzp-transport | 2 | QUIC connection setup |
|
||||
| wzp-relay | 40 + 4 integration | Room ACL, session mgmt, metrics, probes, mesh, trunking |
|
||||
| wzp-client | 30 + 2 integration | Encoder/decoder, quality adapter, silence, drift, sweep |
|
||||
| wzp-web | 2 | Metrics |
|
||||
|
||||
## Audio Routing (Android)
|
||||
|
||||
WarzonePhone supports three audio output routes on Android: **Earpiece**, **Speaker**, and **Bluetooth SCO**. The user cycles through available routes with a single button.
|
||||
|
||||
### Audio mode lifecycle
|
||||
|
||||
`MODE_IN_COMMUNICATION` is set **when the call engine starts** (right before Oboe `audio_start()`), not at app launch. This is critical — setting it early hijacks system audio routing (e.g. music drops from BT A2DP to earpiece). `MODE_NORMAL` is restored when the call engine stops.
|
||||
|
||||
```
|
||||
App launch → MODE_NORMAL (other apps' audio unaffected)
|
||||
Call start → set_audio_mode_communication() → MODE_IN_COMMUNICATION
|
||||
Call end → audio_stop() → set_audio_mode_normal() → MODE_NORMAL
|
||||
```
|
||||
|
||||
### Route lifecycle
|
||||
|
||||
1. Call starts → Earpiece (default).
|
||||
2. User taps route button → cycles to next available route.
|
||||
3. Route change requires Oboe stream restart (~60-400ms) because AAudio silently tears down streams on some OEMs when the routing target changes mid-stream.
|
||||
4. Bluetooth disconnect mid-call → `AudioDeviceCallback.onAudioDevicesRemoved` fires → auto-fallback to Earpiece or Speaker.
|
||||
|
||||
### Bluetooth SCO
|
||||
|
||||
SCO (Synchronous Connection Oriented) is the correct Bluetooth profile for VoIP — it provides bidirectional mono audio at 8/16 kHz with ~30ms latency. A2DP (stereo, high-quality) is unidirectional and adds 100-200ms of buffering, making it unsuitable for real-time voice.
|
||||
|
||||
On API 31+ (Android 12), we use the modern `setCommunicationDevice(AudioDeviceInfo)` API to route audio to the BT SCO device. The deprecated `startBluetoothSco()` + `setBluetoothScoOn()` path is used as fallback on older APIs. `setBluetoothScoOn()` is silently rejected on Android 12+ for non-system apps.
|
||||
|
||||
BT SCO devices only support 8/16kHz sample rates, but our pipeline runs at 48kHz. When BT is active, Oboe opens in **BT mode** (`bt_active=1`): capture skips `setSampleRate(48000)` and `setInputPreset(VoiceCommunication)`, letting the system open at the device's native rate. Oboe's `SampleRateConversionQuality::Best` resamples to/from 48kHz for our ring buffers.
|
||||
|
||||
### Two app variants
|
||||
|
||||
Both the native Kotlin app (`AudioRouteManager.kt`) and the Tauri app (`android_audio.rs` JNI bridge) support BT SCO routing. The native app uses `AudioDeviceCallback` for automatic device detection; the Tauri app uses `getAvailableCommunicationDevices()` (API 31+) or `getDevices()` on demand.
|
||||
|
||||
## Network Change Response
|
||||
|
||||
The `AdaptiveQualityController` in `wzp-proto` reacts to network transport changes signaled via `signal_network_change(NetworkContext)`:
|
||||
|
||||
| Transition | Response |
|
||||
|-----------|----------|
|
||||
| WiFi → Cellular | Preemptive 1-tier quality downgrade + 10s FEC boost |
|
||||
| Cellular → WiFi | FEC boost only (quality recovers via normal adaptive logic) |
|
||||
| Any change | Reset hysteresis counters to avoid stale state |
|
||||
|
||||
On Android, `NetworkMonitor.kt` wraps `ConnectivityManager.NetworkCallback` and classifies the transport type using bandwidth heuristics (no `READ_PHONE_STATE` needed). The classification is delivered to the Rust engine via JNI → `AtomicU8` → recv task polling — the same lock-free cross-task signaling pattern used for adaptive profile switches.
|
||||
|
||||
### Cellular generation heuristics
|
||||
|
||||
| Downstream bandwidth | Classification |
|
||||
|---------------------|---------------|
|
||||
| >= 100 Mbps | 5G NR |
|
||||
| >= 10 Mbps | LTE |
|
||||
| < 10 Mbps | 3G or worse |
|
||||
|
||||
These thresholds are conservative. Carriers over-report bandwidth, but for VoIP quality decisions the exact generation matters less than the rough category.
|
||||
|
||||
## Build Requirements
|
||||
|
||||
- **Rust** 1.85+ (2024 edition)
|
||||
- **Linux**: cmake, pkg-config, libasound2-dev (for audio feature)
|
||||
- **macOS**: Xcode command line tools (CoreAudio included)
|
||||
- **Android**: NDK 26.1 (r26b), cmake 3.25-3.28 (system package)
|
||||
|
||||
### Android APK Builds
|
||||
|
||||
```bash
|
||||
# arm64 only (default, 25MB release APK)
|
||||
./scripts/build-tauri-android.sh --init --release --arch arm64
|
||||
|
||||
# armv7 only (smaller devices)
|
||||
./scripts/build-tauri-android.sh --init --release --arch armv7
|
||||
|
||||
# both architectures as separate APKs
|
||||
./scripts/build-tauri-android.sh --init --release --arch all
|
||||
```
|
||||
|
||||
Release APKs are signed with `android/keystore/wzp-release.jks` via `apksigner`. Per-arch builds produce separate APKs (~25MB each vs ~50MB universal) for easier sharing with testers.
|
||||
209
vault/Architecture/Extensibility.md
Normal file
209
vault/Architecture/Extensibility.md
Normal file
@@ -0,0 +1,209 @@
|
||||
---
|
||||
tags: [architecture, wzp]
|
||||
type: architecture
|
||||
---
|
||||
|
||||
# WarzonePhone Extension Points & Future Features
|
||||
|
||||
## Trait-Based Architecture
|
||||
|
||||
The protocol is designed around trait interfaces defined in `crates/wzp-proto/src/traits.rs`. Any implementation that satisfies the trait contract can be plugged in without modifying other crates.
|
||||
|
||||
### Adding a New Audio Codec
|
||||
|
||||
Implement `AudioEncoder` and `AudioDecoder` from `wzp_proto::traits`:
|
||||
|
||||
```rust
|
||||
pub trait AudioEncoder: Send + Sync {
|
||||
fn encode(&mut self, pcm: &[i16], out: &mut [u8]) -> Result<usize, CodecError>;
|
||||
fn codec_id(&self) -> CodecId;
|
||||
fn set_profile(&mut self, profile: QualityProfile) -> Result<(), CodecError>;
|
||||
fn max_frame_bytes(&self) -> usize;
|
||||
fn set_inband_fec(&mut self, _enabled: bool) {}
|
||||
fn set_dtx(&mut self, _enabled: bool) {}
|
||||
}
|
||||
|
||||
pub trait AudioDecoder: Send + Sync {
|
||||
fn decode(&mut self, encoded: &[u8], pcm: &mut [i16]) -> Result<usize, CodecError>;
|
||||
fn decode_lost(&mut self, pcm: &mut [i16]) -> Result<usize, CodecError>;
|
||||
fn codec_id(&self) -> CodecId;
|
||||
fn set_profile(&mut self, profile: QualityProfile) -> Result<(), CodecError>;
|
||||
}
|
||||
```
|
||||
|
||||
Steps:
|
||||
1. Add a new variant to `CodecId` in `crates/wzp-proto/src/codec_id.rs` (uses 4-bit wire encoding, currently 5 of 16 values used)
|
||||
2. Implement `AudioEncoder` and `AudioDecoder` for your codec
|
||||
3. Register the codec in `AdaptiveEncoder`/`AdaptiveDecoder` in `crates/wzp-codec/src/adaptive.rs`
|
||||
4. Add a `QualityProfile` constant for the new codec
|
||||
|
||||
### Adding a New FEC Scheme
|
||||
|
||||
Implement `FecEncoder` and `FecDecoder` from `wzp_proto::traits`:
|
||||
|
||||
```rust
|
||||
pub trait FecEncoder: Send + Sync {
|
||||
fn add_source_symbol(&mut self, data: &[u8]) -> Result<(), FecError>;
|
||||
fn generate_repair(&mut self, ratio: f32) -> Result<Vec<(u8, Vec<u8>)>, FecError>;
|
||||
fn finalize_block(&mut self) -> Result<u8, FecError>;
|
||||
fn current_block_id(&self) -> u8;
|
||||
fn current_block_size(&self) -> usize;
|
||||
}
|
||||
|
||||
pub trait FecDecoder: Send + Sync {
|
||||
fn add_symbol(&mut self, block_id: u8, symbol_index: u8, is_repair: bool, data: &[u8]) -> Result<(), FecError>;
|
||||
fn try_decode(&mut self, block_id: u8) -> Result<Option<Vec<Vec<u8>>>, FecError>;
|
||||
fn expire_before(&mut self, block_id: u8);
|
||||
}
|
||||
```
|
||||
|
||||
For example, a Reed-Solomon implementation would maintain the same block/symbol structure but use a different coding algorithm internally. The FEC block ID and symbol index fields in `MediaHeader` support any scheme that fits the block/symbol model.
|
||||
|
||||
### Adding a New Transport
|
||||
|
||||
Implement `MediaTransport` from `wzp_proto::traits`:
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait MediaTransport: Send + Sync {
|
||||
async fn send_media(&self, packet: &MediaPacket) -> Result<(), TransportError>;
|
||||
async fn recv_media(&self) -> Result<Option<MediaPacket>, TransportError>;
|
||||
async fn send_signal(&self, msg: &SignalMessage) -> Result<(), TransportError>;
|
||||
async fn recv_signal(&self) -> Result<Option<SignalMessage>, TransportError>;
|
||||
fn path_quality(&self) -> PathQuality;
|
||||
async fn close(&self) -> Result<(), TransportError>;
|
||||
}
|
||||
```
|
||||
|
||||
A raw UDP transport, a WebRTC data channel transport, or a TCP tunnel transport could all implement this trait.
|
||||
|
||||
## Obfuscation Layer (Phase 2)
|
||||
|
||||
The `ObfuscationLayer` trait is defined in `crates/wzp-proto/src/traits.rs` but not yet implemented:
|
||||
|
||||
```rust
|
||||
pub trait ObfuscationLayer: Send + Sync {
|
||||
fn obfuscate(&mut self, data: &[u8], out: &mut Vec<u8>) -> Result<(), ObfuscationError>;
|
||||
fn deobfuscate(&mut self, data: &[u8], out: &mut Vec<u8>) -> Result<(), ObfuscationError>;
|
||||
}
|
||||
```
|
||||
|
||||
Planned implementations:
|
||||
- **TLS-in-TLS**: Wrap QUIC traffic inside a TLS connection to port 443, making it look like ordinary HTTPS
|
||||
- **HTTP/2 mimicry**: Frame QUIC packets as HTTP/2 data frames
|
||||
- **Random padding**: Add random-length padding to defeat traffic analysis
|
||||
- **Domain fronting**: Use CDN infrastructure to hide the true destination
|
||||
|
||||
The obfuscation layer sits between the crypto layer and the transport layer in the protocol stack, wrapping encrypted packets before transmission.
|
||||
|
||||
## FeatherChat / Warzone Messenger Integration
|
||||
|
||||
As described in `docs/featherchat.md`, WarzonePhone is designed to integrate with the existing Warzone messenger.
|
||||
|
||||
### Shared Identity Model
|
||||
|
||||
Both WarzonePhone and Warzone use the same identity derivation:
|
||||
- 32-byte seed (BIP39 mnemonic backup)
|
||||
- HKDF with context strings: `"warzone-ed25519-identity"` and `"warzone-x25519-identity"`
|
||||
- Ed25519 for signing, X25519 for encryption
|
||||
- Fingerprint: `SHA-256(Ed25519_pub)[:16]`
|
||||
|
||||
This is implemented in `crates/wzp-crypto/src/handshake.rs` as `WarzoneKeyExchange::from_identity_seed()`.
|
||||
|
||||
### Signaling via Existing WebSocket
|
||||
|
||||
Call initiation flows through the Warzone messenger's existing WebSocket connection:
|
||||
1. Caller looks up callee via `@alias`, federated address, or raw fingerprint
|
||||
2. Caller sends `WireMessage::CallOffer` through the existing message channel
|
||||
3. Callee receives the offer and responds with `WireMessage::CallAnswer`
|
||||
4. Both sides establish a direct QUIC connection to the relay using ephemeral keys from the signaling exchange
|
||||
|
||||
The `SignalMessage::CallOffer` and `SignalMessage::CallAnswer` variants in `crates/wzp-proto/src/packet.rs` carry the same fields needed for this flow.
|
||||
|
||||
### Key Derivation from Existing Shared Secret
|
||||
|
||||
When two Warzone users already have an X3DH shared secret from their messaging session, call keys can be derived from it:
|
||||
- `HKDF(x3dh_shared_secret, "warzone-call-session")` -> 32-byte session key
|
||||
- Or: fresh ephemeral exchange per call (current implementation) for independent forward secrecy
|
||||
|
||||
### Unified Addressing
|
||||
|
||||
The Warzone addressing system resolves user identities across multiple namespaces:
|
||||
|
||||
| Method | Format | Resolution |
|
||||
|--------|--------|------------|
|
||||
| Local alias | `@manwe` | Server resolves to fingerprint |
|
||||
| Federated | `@manwe.b1.example.com` | DNS TXT record -> fingerprint + endpoint |
|
||||
| ENS | `@manwe.eth` | Ethereum address -> fingerprint (planned) |
|
||||
| Raw fingerprint | `xxxx:xxxx:...` | Direct lookup |
|
||||
|
||||
A user calls `@manwe` the same way they message `@manwe`.
|
||||
|
||||
## Authentication: Caller Verification Before Bridging
|
||||
|
||||
Currently, relays forward packets without verifying caller identity. To add authentication:
|
||||
|
||||
1. **Relay-side handshake**: The relay receives the `CallOffer`, verifies the Ed25519 signature, and checks the caller's identity against an allowlist before accepting the connection.
|
||||
|
||||
2. **Implementation point**: `crates/wzp-relay/src/handshake.rs` already implements `accept_handshake()` which performs signature verification. To gate admission, add an authorization check after signature verification.
|
||||
|
||||
3. **Token-based auth**: Add a `token: Vec<u8>` field to `CallOffer` containing a relay-issued authentication token (e.g., signed by the relay operator's key).
|
||||
|
||||
## Multi-Relay Mesh
|
||||
|
||||
The current two-relay chain (`--remote` flag) can be extended to a multi-hop mesh:
|
||||
|
||||
```
|
||||
Client -> Relay A -> Relay B -> Relay C -> Destination
|
||||
```
|
||||
|
||||
Each hop uses the relay pipeline (FEC decode -> jitter buffer -> FEC re-encode) to absorb loss on each link independently. This requires:
|
||||
|
||||
1. Relay discovery and route selection (not yet implemented)
|
||||
2. Per-hop FEC parameters (each link may have different loss characteristics)
|
||||
3. Cumulative latency management (each hop adds jitter buffer delay)
|
||||
|
||||
## Video Support
|
||||
|
||||
The trait architecture supports video by adding:
|
||||
|
||||
1. **Video codec trait**: Similar to `AudioEncoder`/`AudioDecoder` but for video frames
|
||||
2. **Codec choices**: AV1 (best compression, higher CPU), VP9 SVC (scalable, moderate CPU)
|
||||
3. **Separate FEC strategy**: Video frames are larger and more critical (I-frames vs P-frames need different protection levels)
|
||||
4. **SVC (Scalable Video Coding)**: With VP9 SVC, the relay can drop enhancement layers without transcoding, adapting video quality to each receiver's bandwidth
|
||||
|
||||
Video would add new `CodecId` variants and a separate `QualityProfile` for video parameters.
|
||||
|
||||
## Android Native Client
|
||||
|
||||
The workspace is designed with Android in mind (`wzp-client` description mentions "for Android (JNI) and Windows desktop"):
|
||||
|
||||
1. **JNI bindings**: Use `jni` crate or `uniffi` to expose `CallEncoder`, `CallDecoder`, and `MediaTransport` to Kotlin/Java
|
||||
2. **Audio I/O**: Android uses AAudio or OpenSL ES instead of cpal
|
||||
3. **Build**: Cross-compile with `cargo ndk` targeting `aarch64-linux-android` and `armv7-linux-androideabi`
|
||||
4. **Permissions**: `RECORD_AUDIO`, `INTERNET`, `WAKE_LOCK`
|
||||
|
||||
## STUN/TURN NAT Traversal Integration
|
||||
|
||||
The `SignalMessage::IceCandidate` variant is already defined for NAT traversal:
|
||||
|
||||
```rust
|
||||
IceCandidate { candidate: String }
|
||||
```
|
||||
|
||||
Integration would involve:
|
||||
1. STUN server queries to discover the client's public IP/port
|
||||
2. ICE candidate exchange via the signaling channel
|
||||
3. TURN relay fallback when direct UDP is blocked
|
||||
4. Integration with the existing QUIC transport (QUIC can traverse NATs via its connection migration)
|
||||
|
||||
## Bandwidth Estimation and Adaptive Bitrate
|
||||
|
||||
The `PathMonitor` in `crates/wzp-transport/src/path_monitor.rs` already estimates bandwidth from observed packet rates. To close the loop:
|
||||
|
||||
1. Feed `PathMonitor::quality()` into `AdaptiveQualityController::observe()` as `QualityReport` values
|
||||
2. The controller will trigger tier transitions when conditions change
|
||||
3. Propagate the new `QualityProfile` to both encoder (codec switch) and FEC (ratio change)
|
||||
4. Signal the peer via `SignalMessage::QualityUpdate` so both sides switch simultaneously
|
||||
|
||||
The framework is in place; the missing piece is the integration wiring in the client's main loop to periodically generate quality reports from path metrics.
|
||||
113
vault/Architecture/Protocol-Audit.md
Normal file
113
vault/Architecture/Protocol-Audit.md
Normal file
@@ -0,0 +1,113 @@
|
||||
---
|
||||
tags: [architecture, wzp]
|
||||
type: architecture
|
||||
---
|
||||
|
||||
# WZP Protocol Audit
|
||||
|
||||
> Protocol-level review of WZP as of 2026-05-11. See `WZP-SPEC.md` for the spec being audited.
|
||||
|
||||
## Strengths
|
||||
|
||||
- **QUIC datagrams instead of raw UDP + SRTP** — buys TLS 1.3, PLPMTUD, path migration, and ACK-based loss/RTT estimation. Quinn's `PathSnapshot` feeding `DredTuner` is something WebRTC stacks build from scratch.
|
||||
- **Continuous DRED tuning.** Mapping RTT / loss / jitter to a continuous Opus DRED lookback window is genuinely better than discrete tiers — most stacks treat DRED as on/off.
|
||||
- **MiniHeader (49/50).** At 50 pps that is ~400 B/s saved per stream; meaningful at scale.
|
||||
- **SFU never decodes.** Preserves E2E. Most SFUs (LiveKit, Janus) terminate SRTP at the SFU.
|
||||
- **RaptorQ for low-bitrate Codec2 + DRED for Opus.** Correct split — DRED is cheaper than FEC at high bitrate; RaptorQ shines when you can afford many small symbols.
|
||||
|
||||
## Weaknesses
|
||||
|
||||
### W1. `u16` sequence wraps every ~21 minutes at 50 pps
|
||||
Anti-replay window is 64 packets so wrap is safe for replay. **But** the jitter buffer's `BTreeMap<u16, _>` will misorder across the wrap boundary if a packet is delayed more than ~32 k frames. Widen to `u32` (or version the field).
|
||||
|
||||
### W2. `fec_block_id: u8` wraps every 256 blocks (~25 s at 5-frame blocks)
|
||||
A late-joining peer or a slow reconstructor can collide block IDs. Widen to `u16` or carry an epoch counter.
|
||||
|
||||
### W3. `timestamp_ms` rebase behavior at rekey is unspecified
|
||||
Rekey every 65,536 packets (~22 min). If `timestamp_ms` resets, downstream sync glitches. If it does not, document explicitly.
|
||||
|
||||
### W4. `MiniHeader` has no `seq`
|
||||
Receiver infers absolute seq from the most recent full header + frame count. One missed full header (every 50 frames = 1 s) leaves 49 packets with unknown absolute seq. Acceptable for audio with short jitter buffers — **fatal for video** where one missed full header can desync an entire GOP. **Add `seq_delta: u8` to MiniHeader before video lands.**
|
||||
|
||||
### W5. `QualityReport` placement vs. AEAD
|
||||
A 4-byte trailer on encrypted media is fine **iff it sits inside the AEAD payload**. If it is outside, anything stripping the last 4 bytes corrupts decryption and creates a downgrade vector. Verify in `packet.rs`; if outside, move it inside or AAD-bind it.
|
||||
|
||||
### W6. Adaptive controller is loss / RTT-only — no bandwidth estimator
|
||||
Quinn exposes `cwnd` and `bytes_in_flight`, but `AdaptiveQualityController` does not consume them. Under low utilization you cannot detect that you *could* upgrade to Opus 64 k. **For video this is mandatory** — without BWE you will either oscillate or never use available capacity.
|
||||
|
||||
### W7. No NACK / explicit retransmit path
|
||||
For audio with DRED + FEC this is fine. For video keyframes it is wasteful — an I-frame is 50–200 packets, protecting at 50 % FEC doubles bitrate. A NACK path is cheap and far cheaper than blanket FEC for I-frames.
|
||||
|
||||
### W8. TrunkFrame batching multiplies AEAD cost
|
||||
Each inner payload is its own AEAD operation. At 10 entries that is 10× ChaCha calls per recv. Fine on x86 / ARM with AES-NI / NEON; profile on weak Android (Nothing A059 baseline).
|
||||
|
||||
### W9. `CodecID` is 4 bits → max 16 codecs; 9 already used
|
||||
Adding H.264, H.265, AV1, VP9 takes you to 13. Land the widening **before** deployment — either steal from `reserved` / `csrc_count` to make CodecID 8-bit, or split into `MediaType:2 / CodecID:6`. Doing this post-deployment is painful.
|
||||
|
||||
### W10. No `MediaType` field
|
||||
Audio vs. video vs. data is implicit in CodecID. A 2-bit `MediaType` lets the SFU apply per-type policy (drop video first under congestion, prioritize audio fan-out) without a codec lookup.
|
||||
|
||||
### W11. Anti-replay window 64 packets is tight for video
|
||||
One keyframe burst can be 100+ packets; a single reordered earlier packet stalls the window. Bump to 256 or 1024 for video streams, or maintain a per-stream window.
|
||||
|
||||
### W12. `SignalMessage` has no version byte
|
||||
Bincode + `#[serde(default, skip_serializing_if)]` covers field additions but not variant removal or semantic change. Lead every variant with `version: u8`.
|
||||
|
||||
### W13. RoomManager Mutex per-packet — **RESOLVED**
|
||||
Already flagged in `ARCHITECTURE.md`. At ~1500 pps/sender for video this becomes a real ceiling.
|
||||
|
||||
**Resolution (T3.1):** `RoomManager` now stores `DashMap<String, Arc<RwLock<Room>>>` instead of `DashMap<String, Room>`. The DashMap guard is held only long enough to clone the `Arc`; all per-room operations (fan-out `others()`, quality `observe_quality()`, join/leave) then acquire the room-level `std::sync::RwLock`. This lets concurrent `others()` calls share a read lock while writers hold the write lock, eliminating the per-packet DashMap contention that was the original concern.
|
||||
|
||||
### W14. No receiver → sender congestion feedback beyond inline QualityReport
|
||||
For video you need REMB-style or transport-CC-style explicit BWE feedback at ~50 ms cadence, independent of media packets.
|
||||
|
||||
## Priorities
|
||||
|
||||
| Priority | Issue | Why |
|
||||
|---|---|---|
|
||||
| P0 | W9 (CodecID width), W10 (MediaType), W4 (MiniHeader seq_delta) | Wire-format changes — must land before video, painful to change post-deploy |
|
||||
| P0 | W1 (seq u16 → u32) | Same window; audio benefits too |
|
||||
| P1 | W6 (BWE), W14 (transport feedback) | Blocking for usable video; improves audio adaptation |
|
||||
| P1 | W5 (QualityReport in AEAD) | Security correctness |
|
||||
| P2 | W2 (fec_block_id width), W11 (anti-replay window), W12 (signal version byte) | Long-tail correctness |
|
||||
| P2 | W7 (NACK path), W13 (RoomManager lock) | Video performance, not correctness |
|
||||
| P3 | W3 (timestamp rebase doc), W8 (AEAD profiling) | Documentation / measurement |
|
||||
|
||||
## Resolution status (2026-05-11)
|
||||
|
||||
The v2 wire format specified in `ROAD-TO-VIDEO.md` Phase V1 addresses:
|
||||
|
||||
| Issue | Resolved by |
|
||||
|---|---|
|
||||
| W1 (seq u16 → u32) | `sequence: u32` in MediaHeader v2 |
|
||||
| W4 (MiniHeader seq) | `seq_delta: u8` added; MiniHeader v2 is 5 B |
|
||||
| W9 (CodecID width) | Widened to 8-bit (room for 256) |
|
||||
| W10 (MediaType) | Explicit `media_type: u8` byte |
|
||||
|
||||
W6 / W14 (BWE + TransportFeedback) addressed in Phase V2. W7 (NACK) addressed in Phase V2 / V4. Others remain open.
|
||||
|
||||
## Known pre-existing clippy debt (as of T1.5.2)
|
||||
|
||||
Measured at commit `c93d302` on `experimental-ui` (2026-05-11).
|
||||
|
||||
`cargo clippy --workspace --all-targets -- -D warnings` fails in two crates with **pre-existing** errors (verified against `HEAD~1`). These are not introduced by any Wave 1 task; they should be cleaned up in a dedicated hygiene sprint or accepted as known debt.
|
||||
|
||||
### `wzp-codec` — 9 errors
|
||||
|
||||
| Category | Count | Lint | Files |
|
||||
|---|---|---|---|
|
||||
| Manual saturating sub | 1 | `clippy::implicit_saturating_sub` | `aec.rs:117` |
|
||||
| Needless range loop | 2 | `clippy::needless_range_loop` | `aec.rs:164`, `resample.rs:51` |
|
||||
| Manual `div_ceil` | 2 | `clippy::manual_div_ceil` | `codec2_dec.rs:48`, `codec2_enc.rs:48` |
|
||||
| Manual `clamp` | 2 | `clippy::manual_clamp` | `denoise.rs:59`, `opus_enc.rs:250` |
|
||||
| Manual ASCII case-cmp | 1 | `clippy::manual_ascii_check` | `opus_enc.rs:99` |
|
||||
| Same-item push in loop | 1 | `clippy::same_item_push` | `resample.rs:184` |
|
||||
|
||||
### `warzone-protocol` (submodule `deps/featherchat`) — 3 errors
|
||||
|
||||
| Category | Count | Lint | Files |
|
||||
|---|---|---|---|
|
||||
| `clone` on `Copy` type | 1 | `clippy::clone_on_copy` | `ratchet.rs:202` |
|
||||
| Missing `Default` impl | 2 | `clippy::new_without_default` | `types.rs:59`, `types.rs:69` |
|
||||
|
||||
**Policy:** New tasks must not add *new* clippy errors in crates they touch. The 12 errors above are grandfathered; a follow-up cleanup task should be scheduled to fix them (especially the `wzp-codec` ones, which are straightforward mechanical replacements).
|
||||
276
vault/Architecture/Refactor-Codebase-Audit.md
Normal file
276
vault/Architecture/Refactor-Codebase-Audit.md
Normal file
@@ -0,0 +1,276 @@
|
||||
---
|
||||
tags: [architecture, wzp]
|
||||
type: architecture
|
||||
---
|
||||
|
||||
# Codebase Refactoring Audit (2026-04-13)
|
||||
|
||||
> Full analysis of the WarzonePhone codebase after the DashMap relay refactor, DRED continuous tuning, and adaptive quality wiring. The codebase is ~15K lines of Rust across 8 crates plus a 1.7K-line Tauri engine. This document identifies every refactoring opportunity ranked by impact.
|
||||
|
||||
## Critical: engine.rs is 1,705 Lines With ~35% Duplication
|
||||
|
||||
`desktop/src-tauri/src/engine.rs` has two nearly-identical `CallEngine::start()` implementations:
|
||||
- **Android path:** 880 lines (lines 321–1200)
|
||||
- **Desktop path:** 430 lines (lines 1203–1633)
|
||||
|
||||
### What's Duplicated (350+ lines)
|
||||
|
||||
| Block | Android Lines | Desktop Lines | Size | Identical? |
|
||||
|-------|--------------|---------------|------|-----------|
|
||||
| CallConfig initialization | 529–539 | 1353–1363 | 23 lines | Yes |
|
||||
| DRED tuner + frame_samples setup | 541–555 | 1360–1375 | 15 lines | Yes |
|
||||
| Adaptive quality profile switch | 651–665 | 1414–1428 | 15 lines | Yes |
|
||||
| Codec-to-QualityProfile match | 852–864 | 1488–1500 | 19 lines | Yes |
|
||||
| DRED ingest + gap fill | 886–902 | 1511–1528 | 17 lines | Yes |
|
||||
| Quality report ingestion | 905–912 | 1531–1538 | 8 lines | Yes |
|
||||
| Signal task (entire thing) | 1133–1180 | 1569–1616 | 48 lines | Yes |
|
||||
|
||||
### Suggested Fix: Extract Shared Helpers
|
||||
|
||||
```rust
|
||||
// Top of engine.rs — shared between both platforms
|
||||
|
||||
fn build_call_config(quality: &str) -> CallConfig { ... }
|
||||
|
||||
fn codec_to_profile(codec: CodecId) -> QualityProfile { ... }
|
||||
|
||||
fn check_adaptive_switch(
|
||||
pending: &AtomicU8,
|
||||
encoder: &mut CallEncoder,
|
||||
tuner: &mut DredTuner,
|
||||
frame_samples: &mut usize,
|
||||
tx_codec: &Mutex<String>,
|
||||
) { ... }
|
||||
|
||||
async fn run_signal_task(
|
||||
transport: Arc<QuinnTransport>,
|
||||
running: Arc<AtomicBool>,
|
||||
pending_profile: Arc<AtomicU8>,
|
||||
participants: Arc<Mutex<Vec<ParticipantInfo>>>,
|
||||
) { ... }
|
||||
```
|
||||
|
||||
This would reduce engine.rs by ~200 lines and make the Android/desktop paths only differ in their audio I/O (Oboe vs CPAL).
|
||||
|
||||
**Effort:** 2-3 hours. **Impact:** High — every future change to the send/recv pipeline currently requires editing two places.
|
||||
|
||||
---
|
||||
|
||||
## High: SignalMessage Enum Has 36 Variants
|
||||
|
||||
`crates/wzp-proto/src/packet.rs` (1,727 lines) has a `SignalMessage` enum with 36 variants mixing orthogonal concerns:
|
||||
|
||||
- Legacy call signaling (CallOffer, CallAnswer, IceCandidate, Rekey...)
|
||||
- Direct calling (RegisterPresence, DirectCallOffer, DirectCallAnswer, CallSetup...)
|
||||
- Federation (FederationHello, GlobalRoomActive/Inactive, FederatedSignalForward)
|
||||
- Relay control (SessionForward, PresenceUpdate, RouteQuery, RoomUpdate)
|
||||
- NAT traversal (Reflect, ReflectResponse, MediaPathReport)
|
||||
- Quality (QualityUpdate, QualityDirective)
|
||||
- Call control (Ping/Pong, Hold/Unhold, Mute/Unmute, Transfer)
|
||||
|
||||
Every new feature adds variants here, and every match on `SignalMessage` must handle all 36 arms (or use `_` wildcard).
|
||||
|
||||
### Suggested Fix: Sub-Enum Grouping
|
||||
|
||||
```rust
|
||||
enum SignalMessage {
|
||||
Call(CallSignal), // CallOffer, CallAnswer, IceCandidate, Rekey, Hangup...
|
||||
Direct(DirectCallSignal), // RegisterPresence, DirectCallOffer, CallSetup, MediaPathReport...
|
||||
Federation(FedSignal), // FederationHello, GlobalRoomActive, FederatedSignalForward...
|
||||
Control(ControlSignal), // Ping/Pong, Hold/Unhold, Mute/Unmute, QualityDirective...
|
||||
Relay(RelaySignal), // SessionForward, PresenceUpdate, RouteQuery, RoomUpdate...
|
||||
}
|
||||
```
|
||||
|
||||
**Caution:** This is a wire-format change. Serde serialization must remain backward-compatible with already-deployed relays. Use `#[serde(untagged)]` or versioned deserialization. Consider doing this as a v2 protocol bump.
|
||||
|
||||
**Effort:** 1 day. **Impact:** High for maintainability, but risky for wire compatibility.
|
||||
|
||||
---
|
||||
|
||||
## High: Federation Has Zero Tests
|
||||
|
||||
`crates/wzp-relay/src/federation.rs` (1,132 lines) has **no unit tests and no integration tests**. This is the most complex file in the relay crate, handling:
|
||||
|
||||
- Peer link management (connect, reconnect, stale sweep)
|
||||
- Federation media egress (forward_to_peers)
|
||||
- Federation media ingress (handle_datagram: dedup, rate limit, local delivery, multi-hop)
|
||||
- Cross-relay signal forwarding
|
||||
- Room event subscription and GlobalRoomActive/Inactive broadcasting
|
||||
|
||||
The relay crate has 91 tests, but none cover federation. Any refactoring of federation (like the DashMap migration or clone-before-send) is flying blind.
|
||||
|
||||
### Suggested Fix
|
||||
|
||||
Priority test cases:
|
||||
1. `forward_to_peers` with 0, 1, 3 peers — verify datagram construction and label tracking
|
||||
2. `handle_datagram` — dedup (same packet twice → second dropped), rate limit (exceed → dropped)
|
||||
3. Stale presence sweeper — verify cleanup after timeout
|
||||
4. `broadcast_signal` — verify signal reaches all peers
|
||||
5. Multi-hop forward — verify source peer excluded from re-forward
|
||||
|
||||
**Effort:** 1 day. **Impact:** Critical for safe refactoring.
|
||||
|
||||
---
|
||||
|
||||
## Medium: Federation `peer_links` Lock-During-Send
|
||||
|
||||
`broadcast_signal()` (line 216) holds `peer_links` Mutex **across async `send_signal()` calls**. A slow peer blocks all signal delivery. `forward_to_peers()` (line 406) holds it during sync sends (less severe but still serializes).
|
||||
|
||||
### Fix (30 minutes)
|
||||
|
||||
```rust
|
||||
// Before:
|
||||
let links = self.peer_links.lock().await;
|
||||
for (fp, link) in links.iter() {
|
||||
link.transport.send_signal(msg).await; // lock held across await!
|
||||
}
|
||||
|
||||
// After:
|
||||
let peers: Vec<_> = {
|
||||
let links = self.peer_links.lock().await;
|
||||
links.values().map(|l| (l.label.clone(), l.transport.clone())).collect()
|
||||
};
|
||||
for (label, transport) in &peers {
|
||||
transport.send_signal(msg).await; // no lock held
|
||||
}
|
||||
```
|
||||
|
||||
Apply to `forward_to_peers()`, `broadcast_signal()`, and `send_signal_to_peer()`.
|
||||
|
||||
**Effort:** 30 minutes. **Impact:** Medium — eliminates last lock-during-I/O pattern.
|
||||
|
||||
---
|
||||
|
||||
## Medium: Magic Numbers Scattered Through engine.rs
|
||||
|
||||
```rust
|
||||
// These appear as literals in multiple places:
|
||||
tokio::time::sleep(Duration::from_millis(5)) // 6 occurrences
|
||||
tokio::time::sleep(Duration::from_millis(100)) // 2 occurrences
|
||||
Duration::from_millis(200) // 2 occurrences (signal timeout)
|
||||
Duration::from_secs(10) // 1 occurrence (QUIC connect timeout)
|
||||
Duration::from_secs(2) // 2 occurrences (heartbeat interval)
|
||||
const DRED_POLL_INTERVAL: u32 = 25; // defined twice (Android + desktop)
|
||||
vec![0i16; 1920] // 2 occurrences (should use FRAME_SAMPLES_40MS)
|
||||
```
|
||||
|
||||
### Fix
|
||||
|
||||
```rust
|
||||
// Top of engine.rs
|
||||
const CAPTURE_POLL_MS: u64 = 5;
|
||||
const RECV_TIMEOUT_MS: u64 = 100;
|
||||
const SIGNAL_TIMEOUT_MS: u64 = 200;
|
||||
const CONNECT_TIMEOUT_SECS: u64 = 10;
|
||||
const HEARTBEAT_INTERVAL_SECS: u64 = 2;
|
||||
const DRED_POLL_INTERVAL: u32 = 25;
|
||||
// Already exists: const FRAME_SAMPLES_40MS: usize = 1920;
|
||||
```
|
||||
|
||||
**Effort:** 15 minutes. **Impact:** Low but prevents bugs from inconsistent values.
|
||||
|
||||
---
|
||||
|
||||
## Medium: CLI Arg Parsing in Relay main.rs
|
||||
|
||||
`parse_args()` in main.rs is 154 lines of manual `while i < args.len()` parsing with `match args[i].as_str()`. Every new flag adds 5-10 lines of boilerplate.
|
||||
|
||||
### Suggested Fix
|
||||
|
||||
Replace with `clap` derive macro:
|
||||
|
||||
```rust
|
||||
#[derive(clap::Parser)]
|
||||
struct RelayArgs {
|
||||
#[arg(long, default_value = "0.0.0.0:4433")]
|
||||
listen: SocketAddr,
|
||||
#[arg(long)]
|
||||
remote: Option<String>,
|
||||
#[arg(long)]
|
||||
auth_url: Option<String>,
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
**Effort:** 1 hour. **Impact:** Medium — cleaner, auto-generates `--help`, validates types at parse time.
|
||||
|
||||
---
|
||||
|
||||
## Medium: Error Handling Inconsistency
|
||||
|
||||
13 instances of `.ok()` silently swallowing errors on `transport.close()` across the relay. Federation signal forwarding has inconsistent error handling — some paths log, some don't.
|
||||
|
||||
### Fix
|
||||
|
||||
```rust
|
||||
// Helper at top of main.rs/federation.rs:
|
||||
async fn close_transport(t: &impl MediaTransport, context: &str) {
|
||||
if let Err(e) = t.close().await {
|
||||
tracing::debug!(context, error = %e, "transport close error (non-fatal)");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Effort:** 30 minutes. **Impact:** Better observability when debugging connection issues.
|
||||
|
||||
---
|
||||
|
||||
## Low: Unused Crypto Fields
|
||||
|
||||
`crates/wzp-crypto/src/handshake.rs` has `x25519_static_secret` and `x25519_static_public` fields marked `#[allow(dead_code)]`. These are derived from the identity seed but never used in any handshake flow.
|
||||
|
||||
**Decision needed:** Are these intended for a future feature (static key federation auth)? If not, remove. If yes, document the intended use.
|
||||
|
||||
**Effort:** 5 minutes to remove, or 10 minutes to document.
|
||||
|
||||
---
|
||||
|
||||
## Low: 20 Unsafe Functions Missing Safety Docs
|
||||
|
||||
`crates/wzp-native/src/lib.rs` has 20 `unsafe` functions (extern "C" FFI bridge to Oboe) without `/// # Safety` documentation. Clippy flags all of them.
|
||||
|
||||
**Effort:** 30 minutes. **Impact:** Clippy clean, better documentation for contributors.
|
||||
|
||||
---
|
||||
|
||||
## Low: quality.rs vs dred_tuner.rs Overlap
|
||||
|
||||
Both files deal with network quality → codec decisions, but they're complementary:
|
||||
- `quality.rs`: discrete tier classification (Good/Degraded/Catastrophic) → codec profile
|
||||
- `dred_tuner.rs`: continuous DRED frame mapping from loss/RTT/jitter
|
||||
|
||||
No consolidation needed, but add cross-references:
|
||||
|
||||
```rust
|
||||
// In dred_tuner.rs:
|
||||
//! See also: `quality.rs` for discrete tier classification that drives
|
||||
//! codec switching. DredTuner operates within a tier, adjusting DRED
|
||||
//! parameters continuously.
|
||||
|
||||
// In quality.rs:
|
||||
//! See also: `dred_tuner.rs` for continuous DRED tuning within a tier.
|
||||
```
|
||||
|
||||
**Effort:** 5 minutes.
|
||||
|
||||
---
|
||||
|
||||
## Summary: Priority Matrix
|
||||
|
||||
| # | Refactor | Effort | Impact | Risk |
|
||||
|---|----------|--------|--------|------|
|
||||
| 1 | Extract shared engine.rs helpers | 2-3h | High | Low |
|
||||
| 2 | Federation tests | 1 day | Critical | None |
|
||||
| 3 | Federation clone-before-send | 30 min | Medium | Low |
|
||||
| 4 | Extract magic numbers to constants | 15 min | Low | None |
|
||||
| 5 | Error handling helpers | 30 min | Medium | None |
|
||||
| 6 | CLI parser → clap | 1h | Medium | Low |
|
||||
| 7 | SignalMessage sub-enums | 1 day | High | High (wire compat) |
|
||||
| 8 | Safety docs on unsafe fns | 30 min | Low | None |
|
||||
| 9 | Remove/document dead crypto fields | 5 min | Low | None |
|
||||
| 10 | Cross-reference quality.rs ↔ dred_tuner.rs | 5 min | Low | None |
|
||||
|
||||
**Recommended order:** 4 → 3 → 5 → 1 → 2 → 6 → 8 → 9 → 10 → 7
|
||||
|
||||
Items 4, 3, 5 are quick wins (under 1 hour total). Item 1 is the biggest maintainability win. Item 2 is the most important for safety. Item 7 should wait for a protocol version bump.
|
||||
261
vault/Architecture/Refactor-Relay-Concurrency.md
Normal file
261
vault/Architecture/Refactor-Relay-Concurrency.md
Normal file
@@ -0,0 +1,261 @@
|
||||
---
|
||||
tags: [architecture, wzp]
|
||||
type: architecture
|
||||
---
|
||||
|
||||
# Relay Concurrency Refactor Guide
|
||||
|
||||
> Post-DashMap analysis: what was done, what remains, and what to do next.
|
||||
|
||||
## What Was Done (2026-04-13)
|
||||
|
||||
Replaced the global `Arc<Mutex<RoomManager>>` with `DashMap<String, Room>` inside `RoomManager`. The relay's media forwarding hot path no longer serializes through a single lock.
|
||||
|
||||
### Before
|
||||
|
||||
```
|
||||
Participant A recv_media()
|
||||
→ room_mgr.lock().await ← ALL participants, ALL rooms compete here
|
||||
→ mgr.observe_quality(...) ← O(N) quality computation inside lock
|
||||
→ mgr.others(...) ← clone Vec<ParticipantSender>
|
||||
→ drop(lock)
|
||||
→ fan-out sends
|
||||
```
|
||||
|
||||
One `tokio::sync::Mutex` guarding all rooms, all participants, all quality state. A 100-room relay was effectively single-threaded for media forwarding.
|
||||
|
||||
### After
|
||||
|
||||
```
|
||||
Participant A recv_media()
|
||||
→ room_mgr.observe_quality(...) ← DashMap::get_mut(), per-room shard lock
|
||||
→ room_mgr.others(...) ← DashMap::get(), shared shard lock
|
||||
→ fan-out sends ← no lock held
|
||||
```
|
||||
|
||||
64 internal shards. Rooms on different shards are fully parallel. Rooms on the same shard use RwLock semantics — reads (`others()`) are concurrent, writes (`observe_quality()`, `join()`, `leave()`) are exclusive per-shard only.
|
||||
|
||||
### Files Changed
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `crates/wzp-relay/Cargo.toml` | Added `dashmap = "6"` |
|
||||
| `crates/wzp-relay/src/room.rs` | `HashMap<String, Room>` → `DashMap<String, Room>`, per-room quality/tier, all methods `&self` |
|
||||
| `crates/wzp-relay/src/main.rs` | `Arc<Mutex<RoomManager>>` → `Arc<RoomManager>`, 3 lock sites removed |
|
||||
| `crates/wzp-relay/src/federation.rs` | 11 lock sites removed, `room_mgr` field type changed |
|
||||
| `crates/wzp-relay/src/ws.rs` | 3 lock sites removed, `room_mgr` field type changed |
|
||||
|
||||
### Measured Improvement
|
||||
|
||||
| Metric | Before | After |
|
||||
|--------|--------|-------|
|
||||
| Lock type (rooms) | 1 global `tokio::sync::Mutex` | 64-shard `DashMap` with per-shard RwLock |
|
||||
| Cross-room blocking | Yes (all rooms share 1 lock) | No (rooms are independent) |
|
||||
| Read concurrency within room | None (Mutex is exclusive) | Yes (`get()` is shared) |
|
||||
| `.lock().await` sites | 20 across 4 files | 0 for room operations |
|
||||
| Test count | 314 passing | 314 passing (0 regressions) |
|
||||
|
||||
---
|
||||
|
||||
## Current Lock Inventory
|
||||
|
||||
### Tier 0: Eliminated (Room Hot Path)
|
||||
|
||||
These are gone — DashMap handles them internally:
|
||||
|
||||
- ~~`room_mgr.lock().await` in media forwarding~~ → `room_mgr.others()` (DashMap shard)
|
||||
- ~~`room_mgr.lock().await` in quality tracking~~ → `room_mgr.observe_quality()` (DashMap shard)
|
||||
- ~~`room_mgr.lock().await` in join/leave~~ → `room_mgr.join()` / `.leave()` (DashMap entry)
|
||||
|
||||
### Tier 1: Federation `peer_links` (Medium Priority)
|
||||
|
||||
**Location:** `crates/wzp-relay/src/federation.rs:142`
|
||||
```rust
|
||||
peer_links: Arc<Mutex<HashMap<String, PeerLink>>>
|
||||
```
|
||||
|
||||
**22 lock sites** across federation.rs. The most important:
|
||||
|
||||
| Method | Line | Hold Duration | I/O While Locked | Frequency |
|
||||
|--------|------|---------------|-------------------|-----------|
|
||||
| `forward_to_peers()` | 406 | 1-5ms (iterate + sync send) | Sync only | Per-packet batch |
|
||||
| `broadcast_signal()` | 216 | N × send_signal latency | **YES (async)** | Per-signal |
|
||||
| `handle_datagram()` multi-hop | 1123 | 1-2ms (iterate + sync send) | Sync only | Per-federation-packet |
|
||||
| `send_signal_to_peer()` | 246 | send_signal latency | **YES (async)** | Per-signal |
|
||||
| Stale sweeper | 523 | 1-5ms | No | Every 5s |
|
||||
|
||||
**Impact:** Only matters with 5+ federation peers or high federation datagram rates (>1000 pps). For 1-3 peers, contention is negligible.
|
||||
|
||||
### Tier 2: Control Plane (Low Priority)
|
||||
|
||||
These are on the connection setup / signal path, not the media hot path:
|
||||
|
||||
| Lock | Location | Frequency |
|
||||
|------|----------|-----------|
|
||||
| `session_mgr` | main.rs:450 | Per-connection setup |
|
||||
| `signal_hub` | main.rs:453 | Per-signal lookup |
|
||||
| `call_registry` | main.rs:454 | Per-call setup |
|
||||
| `presence` | main.rs:283 | Per-presence change |
|
||||
| `ACL` | room.rs:357 | Per-room join |
|
||||
|
||||
**Impact:** None. These handle rare events (connection setup, call signaling) and hold locks for <5ms with no I/O inside.
|
||||
|
||||
### Tier 3: Forward Mode Pipeline (Niche)
|
||||
|
||||
| Lock | Location | Notes |
|
||||
|------|----------|-------|
|
||||
| `RelayPipeline` | main.rs:198, 228 | Only used in `--remote` forward mode (relay-to-relay), not SFU room mode |
|
||||
|
||||
**Impact:** None for normal operation. Forward mode is a niche deployment.
|
||||
|
||||
---
|
||||
|
||||
## Suggested Next Refactors (Priority Order)
|
||||
|
||||
### 1. Federation `peer_links` Clone-Before-Send
|
||||
|
||||
**Effort:** 30 minutes
|
||||
**Impact:** Eliminates the lock-held-during-iteration pattern in `forward_to_peers()` and `broadcast_signal()`
|
||||
|
||||
**Current:**
|
||||
```rust
|
||||
pub async fn forward_to_peers(&self, ...) {
|
||||
let links = self.peer_links.lock().await; // held for entire loop
|
||||
for (_fp, link) in links.iter() {
|
||||
link.transport.send_raw_datagram(&tagged); // sync, but lock still held
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```rust
|
||||
pub async fn forward_to_peers(&self, ...) {
|
||||
let peers: Vec<(String, Arc<QuinnTransport>)> = {
|
||||
let links = self.peer_links.lock().await;
|
||||
links.values().map(|l| (l.label.clone(), l.transport.clone())).collect()
|
||||
}; // lock released — hold time: ~1μs for Arc clones
|
||||
|
||||
for (label, transport) in &peers {
|
||||
transport.send_raw_datagram(&tagged); // no lock held
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Same treatment for `broadcast_signal()` (line 216) which currently holds the lock across **async** `send_signal()` calls — this is the worst offender since a slow peer blocks all signal delivery.
|
||||
|
||||
### 2. Federation `peer_links` → DashMap
|
||||
|
||||
**Effort:** 2 hours
|
||||
**Impact:** Per-peer sharding, eliminates all cross-peer contention
|
||||
|
||||
Only worth doing if:
|
||||
- Running 10+ federation peers
|
||||
- `forward_to_peers()` shows up in profiling
|
||||
- The clone-before-send fix from suggestion 1 is insufficient
|
||||
|
||||
```rust
|
||||
peer_links: DashMap<String, PeerLink>
|
||||
```
|
||||
|
||||
Most lock sites become `self.peer_links.get(&fp)` or `.get_mut(&fp)`. The multi-hop forward loop would use `.iter()` which takes temporary shared locks per shard.
|
||||
|
||||
### 3. Quality Tracking Out of Hot Path
|
||||
|
||||
**Effort:** 1 day
|
||||
**Impact:** Reduces per-packet DashMap shard lock from exclusive (`get_mut`) to shared (`get`)
|
||||
|
||||
Currently, every packet with a `QualityReport` calls `observe_quality()` which uses `rooms.get_mut()` (exclusive shard lock). This serializes quality-carrying packets within the same DashMap shard.
|
||||
|
||||
**Fix:** Use per-participant `AtomicU8` for latest loss/RTT (written lock-free from hot path). A background task (every 1s) reads the atomics, computes tiers via `rooms.get_mut()`, and broadcasts `QualityDirective`. The per-packet hot path becomes purely read-only: `rooms.get()` → `others()`.
|
||||
|
||||
```rust
|
||||
struct ParticipantQualityAtomic {
|
||||
latest_loss: AtomicU8, // written per-packet (lock-free)
|
||||
latest_rtt: AtomicU8, // written per-packet (lock-free)
|
||||
}
|
||||
|
||||
// Hot path (per-packet):
|
||||
if let Some(ref qr) = pkt.quality_report {
|
||||
participant_quality.latest_loss.store(qr.loss_pct, Ordering::Relaxed);
|
||||
participant_quality.latest_rtt.store(qr.rtt_4ms, Ordering::Relaxed);
|
||||
}
|
||||
let others = room_mgr.others(&room_name, participant_id); // DashMap::get() — shared lock
|
||||
|
||||
// Background task (every 1 second):
|
||||
for room in room_mgr.rooms.iter_mut() { // DashMap::iter_mut() — exclusive per-shard
|
||||
room.recompute_tiers_from_atomics();
|
||||
if tier_changed { broadcast QualityDirective }
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Lock-Free Participant Snapshot (Future)
|
||||
|
||||
**Effort:** 0.5 day
|
||||
**Impact:** Zero-lock media hot path
|
||||
|
||||
Replace `Vec<Participant>` in `Room` with an `arc-swap` snapshot:
|
||||
|
||||
```rust
|
||||
struct Room {
|
||||
participants: Vec<Participant>,
|
||||
sender_snapshot: arc_swap::ArcSwap<Vec<ParticipantSender>>,
|
||||
}
|
||||
```
|
||||
|
||||
The snapshot is rebuilt on join/leave (rare). The hot path does `sender_snapshot.load()` — an atomic pointer read with zero locking. DashMap wouldn't even be involved in the per-packet path.
|
||||
|
||||
Only worth doing if DashMap shard contention becomes measurable in profiling (unlikely for rooms <100 people).
|
||||
|
||||
---
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Scenario | Current (DashMap) | + Clone-Before-Send | + Quality Atomics | + arc-swap |
|
||||
|----------|-------------------|---------------------|-------------------|-----------|
|
||||
| 10 rooms × 5 people | Saturates all cores | Same | Same | Same |
|
||||
| 1 room × 100 people | Good (shared read) | Same | Better (no exclusive) | Best |
|
||||
| 5 federation peers | 1-5ms contention | <1μs contention | Same | Same |
|
||||
| 20 federation peers | 10-20ms contention | <1μs contention | Same | Same |
|
||||
| 1000 rooms × 3 people | Excellent | Same | Same | Same |
|
||||
|
||||
**Recommendation:** Do suggestion 1 (clone-before-send, 30 min) now. Everything else is future optimization that current workloads don't need.
|
||||
|
||||
---
|
||||
|
||||
## Concurrency Diagram (Current State)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────┐
|
||||
│ tokio multi-threaded │
|
||||
│ work-stealing runtime │
|
||||
└───────────────┬─────────────────┘
|
||||
│
|
||||
┌────────────────────────────┼────────────────────────────┐
|
||||
│ │ │
|
||||
┌──────▼──────┐ ┌───────▼───────┐ ┌───────▼───────┐
|
||||
│ QUIC Accept │ │ Federation │ │ Signal Hub │
|
||||
│ (per-conn │ │ (per-peer │ │ (per-client │
|
||||
│ task) │ │ task) │ │ task) │
|
||||
└──────┬──────┘ └───────┬───────┘ └───────┬───────┘
|
||||
│ │ │
|
||||
┌──────▼──────┐ ┌───────▼───────┐ ┌───────▼───────┐
|
||||
│ Per-Room │ │ peer_links │ │ signal_hub │
|
||||
│ DashMap │◄──64 shards│ Mutex │◄──1 lock │ Mutex │
|
||||
│ (media hot │ │ (federation │ │ (signal │
|
||||
│ path) │ │ hot path) │ │ plane) │
|
||||
└─────────────┘ └───────────────┘ └───────────────┘
|
||||
│ │
|
||||
No cross-room Low frequency
|
||||
blocking (<1 call/sec)
|
||||
```
|
||||
|
||||
## Files Reference
|
||||
|
||||
| File | Lines | Role |
|
||||
|------|-------|------|
|
||||
| `crates/wzp-relay/src/room.rs` | ~1275 | DashMap room storage, participant management, quality tracking, media forwarding loops |
|
||||
| `crates/wzp-relay/src/federation.rs` | ~1152 | Peer link management, federation media egress/ingress, signal forwarding |
|
||||
| `crates/wzp-relay/src/main.rs` | ~1746 | Connection accept, handshake dispatch, signal handling, room/federation wiring |
|
||||
| `crates/wzp-relay/src/ws.rs` | ~250 | WebSocket bridge, room integration |
|
||||
| `crates/wzp-relay/src/metrics.rs` | ~200 | Prometheus counters (lock-free atomics) |
|
||||
| `crates/wzp-relay/src/trunk.rs` | ~150 | TrunkBatcher (per-instance, no shared state) |
|
||||
290
vault/Architecture/Road-To-Video.md
Normal file
290
vault/Architecture/Road-To-Video.md
Normal file
@@ -0,0 +1,290 @@
|
||||
---
|
||||
tags: [architecture, wzp]
|
||||
type: architecture
|
||||
---
|
||||
|
||||
# Road to Video
|
||||
|
||||
> Plan for adding video to WZP. Audio remains unchanged through Phase V1; video is additive. See `PROTOCOL-AUDIT.md` for the issues this plan addresses.
|
||||
|
||||
## Premise
|
||||
|
||||
The transport, crypto, session, federation, and SFU layers are codec-agnostic. The work is concentrated in:
|
||||
|
||||
1. Wire format (CodecID width, MediaType, MiniHeader seq, simulcast hooks)
|
||||
2. Framer / depacketizer (NAL fragmentation, access-unit reassembly)
|
||||
3. Bandwidth estimator (Quinn cwnd + transport feedback)
|
||||
4. Keyframe semantics (PLI, NACK, keyframe cache at SFU)
|
||||
5. Capture / encode pipeline (VideoToolbox / MediaCodec / NVENC)
|
||||
|
||||
## Implementation Status (as of 2026-05-25)
|
||||
|
||||
| Phase | Description | Status |
|
||||
|---|---|---|
|
||||
| V1 — Wire format | 16B MediaHeader v2, 5B MiniHeader v2, MediaType, u32 seq, 8-bit CodecID | ✅ Complete (T1.x) |
|
||||
| V2 — Transport additions | BWE, NACK loop, TransportFeedback, dynamic FEC boost on I-frames | 🔲 Not started |
|
||||
| V3 — `wzp-video` crate | H.264 baseline framer/depacketizer, VideoToolbox/MediaCodec/dav1d encoders | ✅ Substantially complete (T4.x, T5.x, T6.x) |
|
||||
| V3 — H.264 Baseline | Single-layer H.264 | ✅ Complete |
|
||||
| V3 — H.265 | VideoToolbox + MediaCodec H.265 | ✅ Complete (T5.x) |
|
||||
| V3 — AV1 | dav1d + SVT-AV1 (non-Android), VideoToolbox AV1 (macOS M3+) | ✅ Complete; Android MediaCodec AV1 compile errors pending (T4.3.1.1) |
|
||||
| V3 — Android MediaCodec | NDK 0.9 API migration for `mediacodec.rs` | 🔴 Blocked (31 compile errors) |
|
||||
| V3 — Call engine wiring | `create_video_encoder()` integrated into active call negotiation | 🔴 Not started (T6.1.2 follow-up) |
|
||||
| V4 — Keyframe & loss policy | NACK path, PLI, keyframe cache at SFU | 🟡 Framework present (`nack.rs`); not wired |
|
||||
| V5 — Video adaptive controller | `VideoQualityController` + `PriorityMode` | 🟡 Controller built (`controller.rs`); not wired into call |
|
||||
| V5 — Simulcast | Simulcast layer management | 🟡 `simulcast.rs` present; not wired |
|
||||
| V6 — SFU changes | Keyframe cache, per-receiver layer selection, PLI suppression | 🟡 PLI suppression wired; keyframe cache + layer selection not started |
|
||||
| V6 — Video scorer | `VideoScorer` legitimacy detection | 🟡 Built (`video_scorer.rs`); `observe()` not wired into room forwarding |
|
||||
| V7 — Capture pipeline | Camera capture (AVCaptureSession, Camera2, NVENC) | 🔲 Not started |
|
||||
|
||||
**Legend:** ✅ Complete · 🟡 Partial/Framework only · 🔴 Blocked · 🔲 Not started
|
||||
|
||||
### Critical path to first video call
|
||||
|
||||
1. Fix Android MediaCodec compile errors (T4.3.1.1) — ~2h
|
||||
2. Wire `create_video_encoder()` into call engine codec negotiation (T6.1.2) — ~2h
|
||||
3. Fix crypto nonce bug (`decrypt()` must use `MediaHeader.seq`) — see `AUDIT-2026-05-25.md` C1 — ~1h
|
||||
4. Wire `VideoScorer::observe()` into relay room forwarding (T6.2 follow-up) — ~2h
|
||||
5. Implement Phase V2 BWE (mandatory for usable video) — ~3–4 days
|
||||
6. Implement capture pipeline for at least one platform (V7) — ~1 week
|
||||
|
||||
## Phase V1 — Wire format & negotiation (no new code paths yet)
|
||||
|
||||
Bump protocol version. Land all wire changes together so compat breaks exactly once.
|
||||
|
||||
### Sizing decision (2026-05-11)
|
||||
|
||||
Hypothetical benchmarks on 12 B packed vs 16 B byte-aligned showed the overhead delta is invisible across every realistic scenario:
|
||||
|
||||
| Scenario | Δ overhead (12 B → 16 B) | Δ % of stream |
|
||||
|---|---|---|
|
||||
| Opus 24k audio (MiniHeader 49/50) | 4 B/s | 0.013 % |
|
||||
| Codec2 1200 audio | 2 B/s | 0.13 % |
|
||||
| H.264 SD 500 kbps video | 1.6 kbps | 0.32 % |
|
||||
| H.264 HD 2.5 Mbps video | 7.1 kbps | 0.28 % |
|
||||
| H.264 FHD 5 Mbps video | 14.1 kbps | 0.28 % |
|
||||
|
||||
Trunking cap (10) binds before MTU for audio, so TrunkFrame layout is unaffected. ChaCha20-Poly1305 cost is dominated by AEAD setup, not byte count — 4 extra bytes per packet is < 0.1 % of AEAD CPU on Cortex-A55.
|
||||
|
||||
**Decision: 16 B byte-aligned.** Bit-packing saves nothing material and costs recurring debug / fuzzer / evolution complexity. Reserves headroom for the next decade.
|
||||
|
||||
### `MediaHeader` v2 (16 B byte-aligned)
|
||||
|
||||
```
|
||||
Byte 0: version (u8) currently 0x02
|
||||
Byte 1: flags (u8) [T:1][Q:1][KeyFrame:1][FrameEnd:1][reserved:4]
|
||||
T = FEC repair
|
||||
Q = QualityReport trailer present
|
||||
KeyFrame = packet belongs to an I-frame (video)
|
||||
FrameEnd = last packet of an access unit (video)
|
||||
Byte 2: media_type (u8) 0=audio, 1=video, 2=data, 3=control
|
||||
Byte 3: codec_id (u8) widened from 4-bit (room for 256)
|
||||
Byte 4: stream_id (u8) simulcast layer; 0=base
|
||||
Byte 5: fec_ratio (u8) 0..200 → 0.0..2.0
|
||||
Bytes 6-9: sequence (u32 BE)
|
||||
Bytes 10-13: timestamp_ms (u32 BE)
|
||||
Bytes 14-15: fec_block_id (u16 BE)
|
||||
audio: low 8 bits block_id, high 8 bits symbol_idx
|
||||
video: full u16 block_id (large blocks for I-frames)
|
||||
```
|
||||
|
||||
- `version=2` is a hard switch — old clients receive a typed `Hangup::ProtocolVersionMismatch`.
|
||||
- `media_type` (W10) lets the SFU drop video first under load without a codec lookup.
|
||||
- `KeyFrame` lets a joining peer fast-forward to the next I-frame; SFU keyframe cache keys on it.
|
||||
- `FrameEnd` lets the depacketizer fire an access unit without counting packets.
|
||||
- `stream_id` is forward-compatible for simulcast (Phase V5).
|
||||
- `sequence` widened to u32 (W1) — also benefits audio.
|
||||
|
||||
### `MiniHeader` v2 (5 B)
|
||||
|
||||
```
|
||||
[FRAME_TYPE_MINI = 0x01]
|
||||
Byte 0: seq_delta (u8) ← new (W4)
|
||||
Bytes 1-2: timestamp_delta_ms (u16 BE)
|
||||
Bytes 3-4: payload_len (u16 BE)
|
||||
```
|
||||
|
||||
Audio-only in V1. Video pays the full 16 B header per packet (every frame is a new access unit; no clean periodic structure to compress).
|
||||
|
||||
### New codec IDs
|
||||
|
||||
| ID | Codec | Notes |
|
||||
|---|---|---|
|
||||
| 9 | H.264 baseline | Universal HW encode coverage; ship first |
|
||||
| 10 | H.264 main | Slight quality win over baseline; same HW |
|
||||
| 11 | H.265 main | Apple A10+ universal, Snapdragon since ~2017, NVENC GTX 9xx+; ~30 % win vs H.264 |
|
||||
| 12 | AV1 | Apple M3/A17+, Snapdragon 8 Gen 3+, RTX 40+, Arc, RX 7000+; best efficiency, narrow HW |
|
||||
| 13 | VP9 | Reserved; may not implement |
|
||||
|
||||
Negotiation: `CallOffer.supported_codecs: Vec<CodecId>`. Both sides pick the highest mutually supported codec from preference cascade `[AV1, H.265, H.264 main, H.264 baseline]`.
|
||||
|
||||
### `QualityProfile` extension
|
||||
|
||||
Add:
|
||||
- `video_bitrate_kbps: Option<u32>`
|
||||
- `video_resolution: Option<(u16, u16)>`
|
||||
- `video_fps: Option<u8>`
|
||||
- `priority_mode: PriorityMode` (see Phase V5)
|
||||
|
||||
`CallOffer` / `CallAnswer` already negotiate profiles — slot video into the same path.
|
||||
|
||||
### Acceptance
|
||||
- All 571 audio tests pass with `V=2` headers.
|
||||
- Old v1 clients refused gracefully (clear error in `CallAnswer`).
|
||||
|
||||
## Phase V2 — Transport additions
|
||||
|
||||
**Decision (2026-05-11): all media on QUIC datagrams; no separate "reliable media" stream.**
|
||||
|
||||
A QUIC stream for I-frames was considered and rejected. A 200 KB I-frame on a 1 Mbps mobile link takes ~1.6 s to transit a stream, and the next I-frame queues behind it (HoL blocking by design). Datagrams + NACK + dynamic per-keyframe FEC degrade more gracefully on the lossy links we care about.
|
||||
|
||||
1. **All media on datagrams.** Uniform wire format; no HoL.
|
||||
2. **NACK loop for video P-frames.** When `RTT < 2 × frame_interval`, receiver NACKs missing P-frame packets via `SignalMessage::Nack { stream_id, seqs }`. Otherwise (high RTT) skip NACK and request a keyframe via `PictureLossIndication`.
|
||||
3. **Dynamic FEC boost on I-frames.** Encoder bumps `fec_ratio` to ~0.5 for keyframe packets (k=20 source → r=10 repair). Recovers most I-frame loss without a round trip.
|
||||
4. **SPS/PPS / parameter sets on the existing signal stream.** Reliable, ordered, one-time at session start. Re-sent on codec switch. No new stream needed.
|
||||
5. **`SignalMessage::TransportFeedback`** — `{ acked_seqs: Vec<u32>, nacked_seqs: Vec<u32>, remb_bps: u32, recv_time_us: u64 }`. Sent every 50 ms or every N packets, whichever first. Feeds BWE.
|
||||
6. **`BandwidthEstimator` in `wzp-proto`** — consumes Quinn `cwnd`, `bytes_in_flight`, plus `TransportFeedback`. Output: `target_send_bps = min(cwnd_bps * 0.9, remb_bps)`.
|
||||
|
||||
### Acceptance
|
||||
- Audio adapts to bandwidth (not just loss/RTT); fewer oscillations between 24 k and 32 k Opus on stable links.
|
||||
- BWE output is on Prometheus.
|
||||
- NACK round-trip recovery verified under 1–5 % packet loss at RTT ≤ 100 ms.
|
||||
|
||||
## Phase V3 — `wzp-video` crate
|
||||
|
||||
New crate parallel to `wzp-codec`:
|
||||
|
||||
```
|
||||
wzp-video/
|
||||
src/
|
||||
encoder.rs # trait VideoEncoder; VideoToolboxEncoder, MediaCodecEncoder,
|
||||
# OpenH264Encoder fallback
|
||||
decoder.rs # trait VideoDecoder
|
||||
framer.rs # NAL unit fragmentation to MTU-sized chunks
|
||||
# (simpler than RFC 6184 FU-A — we own both ends)
|
||||
depacketizer.rs # Reassemble NALs, emit access units
|
||||
keyframe.rs # Keyframe request handling
|
||||
```
|
||||
|
||||
Framing rules:
|
||||
- One access unit → N packets, each ≤ MTU − 12 (MediaHeader) − 16 (AEAD tag).
|
||||
- `sequence` global per stream; `timestamp_ms` is presentation time.
|
||||
- `KeyFrame` bit set on every packet of an I-frame.
|
||||
- Last packet of frame: "frame end" bit (steal from `StreamId` or repurpose `reserved`).
|
||||
|
||||
Platform encoders:
|
||||
- macOS / iOS: VideoToolbox
|
||||
- Android: MediaCodec (surface texture path, no CPU copy)
|
||||
- Windows: MediaFoundation → NVENC / QSV / AMF
|
||||
- Linux: VAAPI / NVENC; OpenH264 software fallback
|
||||
|
||||
### Acceptance
|
||||
- Unidirectional H.264 call working between two desktop clients.
|
||||
- CPU usage on M1 < 5 % at 720p30; on Android mid-tier < 15 %.
|
||||
|
||||
## Phase V4 — Keyframe & loss policy
|
||||
|
||||
- On packet loss inside a P-frame: NACK if RTT < 2× frame interval, otherwise request keyframe via `SignalMessage::PictureLossIndication { stream_id }`.
|
||||
- Joining peer: relay sends most recent keyframe from its cache.
|
||||
- Tier downgrade: drop to lower simulcast layer, request keyframe for the new layer.
|
||||
|
||||
### Acceptance
|
||||
- Black-screen-on-join < 200 ms when keyframe cache is warm.
|
||||
- < 1 keyframe / 2 s on stable links; bursty on lossy links.
|
||||
|
||||
## Phase V5 — Video adaptive controller + PriorityMode
|
||||
|
||||
### `PriorityMode` on `QualityProfile`
|
||||
|
||||
```rust
|
||||
pub enum PriorityMode {
|
||||
AudioFirst, // default for calls: audio absolute priority, video elastic
|
||||
VideoFirst, // user override: video priority, audio degrades second
|
||||
ScreenShare, // video + slide-fallback; audio = intelligible speech only
|
||||
Balanced, // proportional split, no absolute priority
|
||||
}
|
||||
```
|
||||
|
||||
Selected at call setup. Mutable mid-call via `SignalMessage::SetPriorityMode { mode }`. Defaults to `AudioFirst` for voice/video calls; presentation apps set `ScreenShare`; users can override to `VideoFirst` from settings.
|
||||
|
||||
### `VideoQualityController`
|
||||
|
||||
```
|
||||
inputs: bwe_bps, loss_pct, rtt_ms, encoder_queue_ms, priority_mode
|
||||
outputs: target_bitrate, target_fps, target_resolution, simulcast_layer
|
||||
|
||||
allocation gate (per PriorityMode):
|
||||
|
||||
AudioFirst:
|
||||
audio_budget = max(24 kbps, audio_tier_min)
|
||||
video_budget = bwe_bps - audio_budget
|
||||
Under congestion: video → 0 before audio degrades.
|
||||
|
||||
VideoFirst:
|
||||
video_budget = max(video_floor, target_video_kbps)
|
||||
audio_budget = bwe_bps - video_budget
|
||||
Audio degrades first to Opus 16 k; video held at floor.
|
||||
|
||||
ScreenShare:
|
||||
video_budget = bwe_bps - 16 kbps // audio gets just Opus 16 k floor
|
||||
If video_budget < SD floor: switch encoder to slide mode
|
||||
(single high-quality I-frame every 2-5s instead of continuous video).
|
||||
Audio floor in this mode is Opus 16 k (speech only, no music).
|
||||
|
||||
Balanced:
|
||||
audio_budget = bwe_bps * 0.15
|
||||
video_budget = bwe_bps * 0.85
|
||||
Both degrade proportionally.
|
||||
```
|
||||
|
||||
Slide mode in `ScreenShare` is an encoder policy on the existing `wzp-video` framer (lower fps, higher per-frame quality, prefer HEVC/AV1 for text). No wire format change.
|
||||
|
||||
### Acceptance
|
||||
- On a 100 kbps link in `AudioFirst`, audio stays at Opus 24 k and video drops to 0.
|
||||
- On a 100 kbps link in `ScreenShare`, slide mode emits one I-frame every 3 s and audio holds Opus 16 k.
|
||||
- On a 5 Mbps link, video ramps to top simulcast layer within 10 s.
|
||||
- `SetPriorityMode` mid-call is honored within 1 s.
|
||||
|
||||
## Phase V6 — SFU changes
|
||||
|
||||
- **Per-room keyframe cache.** Latest I-frame per `(sender, stream_id)`. Sent to new joiners immediately. Eliminates "black screen for 2 seconds" on join.
|
||||
- **Per-receiver layer selection.** Sender uploads ~3 simulcast layers; relay decides which to forward to each receiver based on their last `QualityReport`. Critical for N > 3 rooms.
|
||||
- **PLI suppression.** If 10 receivers PLI within 200 ms, send one `KeyframeRequest` upstream, not 10.
|
||||
|
||||
### Acceptance
|
||||
- 8-peer room with mixed link quality; high-quality peers see HD, low-quality peers see SD, no peer holds the room back.
|
||||
- PLI traffic at SFU upstream < 1 / s under simulated mass packet loss.
|
||||
|
||||
## Phase V7 — Capture pipeline (platform-specific)
|
||||
|
||||
- macOS: `AVCaptureSession` → VideoToolbox → `wzp-video`. Wire into Tauri backend.
|
||||
- Android: Camera2 → MediaCodec → JNI bridge into `wzp-native` or sibling cdylib. Surface texture path.
|
||||
- Desktop Tauri (Windows): MediaFoundation → NVENC.
|
||||
|
||||
### Acceptance
|
||||
- Camera permission flows on all platforms.
|
||||
- < 50 ms end-to-end capture-to-encode latency on M1.
|
||||
|
||||
## Deferred
|
||||
|
||||
- **SVC** (per-layer temporal scalability in one bitstream). Simulcast (separate streams per layer) is enough for v1; wire format already supports it via `StreamId`.
|
||||
- **Screen sharing.** Same codec path with a different capture source.
|
||||
- **Group video keys.** Existing X25519 session key works; no protocol change needed.
|
||||
|
||||
## Suggested order of work
|
||||
|
||||
| Step | Effort | Output |
|
||||
|---|---|---|
|
||||
| 1. Wire format v2: 16 B MediaHeader, 5 B MiniHeader, MediaType, KeyFrame, FrameEnd, u32 seq, 8-bit CodecID | ~1 day | Audio still works under new header layout |
|
||||
| 2. TransportFeedback + BandwidthEstimator (Quinn cwnd + remb) | 3–4 days | Audio adaptation improves; BWE on Prom |
|
||||
| 3. `wzp-video` crate, H.264 baseline single-layer | 1–2 weeks | Unidirectional video call works |
|
||||
| 4. NACK path + dynamic FEC boost on I-frames | 4–5 days | Loss recovery for video |
|
||||
| 5. Keyframe cache at SFU + PLI suppression | 1 week | Fast join, low PLI traffic |
|
||||
| 6. H.265 codec support (reuse framer) | 3 days | ~30 % quality win on Apple HW |
|
||||
| 7. Simulcast + per-receiver layer selection | 1 week | Mixed-quality rooms work |
|
||||
| 8. `VideoQualityController` + PriorityMode (incl. ScreenShare slide mode) | 1 week | Graceful degradation under congestion, user choice |
|
||||
| 9. AV1 codec (gated on HW telemetry) | 4–5 days | Top-tier efficiency on capable devices |
|
||||
| 10. Native capture pipelines (VideoToolbox / MediaCodec / NVENC) | 2 weeks | Production camera support per OS |
|
||||
|
||||
Step 1 is the lowest-regret, highest-leverage change and unlocks everything else.
|
||||
|
||||
Steps 3 + 6 + 9 form the codec rollout: ship H.264 first (works everywhere → unblocks integration testing on every device), add H.265 once framer is stable (low-effort, big Apple win), gate AV1 on real device telemetry. By 2028 we should be in a position to deprecate H.264 if telemetry says < 5 % of sessions still need it.
|
||||
262
vault/Architecture/WS-Relay-Spec.md
Normal file
262
vault/Architecture/WS-Relay-Spec.md
Normal file
@@ -0,0 +1,262 @@
|
||||
---
|
||||
tags: [architecture, wzp]
|
||||
type: architecture
|
||||
---
|
||||
|
||||
# WS Support in wzp-relay — Implementation Spec
|
||||
|
||||
## Goal
|
||||
|
||||
Add WebSocket listener to `wzp-relay` so browsers connect directly, eliminating `wzp-web` bridge.
|
||||
|
||||
```
|
||||
Before: Browser → WS → wzp-web → QUIC → wzp-relay
|
||||
After: Browser → WS → wzp-relay (handles both WS + QUIC)
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
wzp-relay
|
||||
├── QUIC listener (:4433) — native clients, inter-relay
|
||||
├── WS listener (:8080) — browsers via Caddy
|
||||
│ ├── GET /ws/{room} — WebSocket upgrade
|
||||
│ └── Auth: first msg = {"type":"auth","token":"..."}
|
||||
└── Shared RoomManager — both transports in same rooms
|
||||
```
|
||||
|
||||
## Key Changes
|
||||
|
||||
### 1. Abstract `Participant` over transport type
|
||||
|
||||
**File: `room.rs`**
|
||||
|
||||
Currently:
|
||||
```rust
|
||||
struct Participant {
|
||||
id: ParticipantId,
|
||||
_addr: std::net::SocketAddr,
|
||||
transport: Arc<wzp_transport::QuinnTransport>,
|
||||
}
|
||||
```
|
||||
|
||||
Change to:
|
||||
```rust
|
||||
struct Participant {
|
||||
id: ParticipantId,
|
||||
_addr: std::net::SocketAddr,
|
||||
sender: ParticipantSender,
|
||||
}
|
||||
|
||||
/// How to send a media packet to a participant.
|
||||
enum ParticipantSender {
|
||||
Quic(Arc<wzp_transport::QuinnTransport>),
|
||||
WebSocket(tokio::sync::mpsc::Sender<bytes::Bytes>),
|
||||
}
|
||||
```
|
||||
|
||||
The `others()` method returns `Vec<ParticipantSender>` instead of `Vec<Arc<QuinnTransport>>`.
|
||||
|
||||
`ParticipantSender` implements a `send_pcm(&self, data: &[u8])` method:
|
||||
- **Quic**: wraps in `MediaPacket`, calls `transport.send_media()`
|
||||
- **WebSocket**: sends raw binary frame via the mpsc channel
|
||||
|
||||
### 2. Add `join_ws()` to RoomManager
|
||||
|
||||
```rust
|
||||
pub fn join_ws(
|
||||
&mut self,
|
||||
room_name: &str,
|
||||
addr: std::net::SocketAddr,
|
||||
sender: tokio::sync::mpsc::Sender<bytes::Bytes>,
|
||||
fingerprint: Option<&str>,
|
||||
) -> Result<ParticipantId, String>
|
||||
```
|
||||
|
||||
### 3. Add WS listener in `main.rs`
|
||||
|
||||
New flag: `--ws-port 8080`
|
||||
|
||||
```rust
|
||||
if let Some(ws_port) = config.ws_port {
|
||||
let room_mgr = room_mgr.clone();
|
||||
let auth_url = config.auth_url.clone();
|
||||
let metrics = metrics.clone();
|
||||
tokio::spawn(run_ws_server(ws_port, room_mgr, auth_url, metrics));
|
||||
}
|
||||
```
|
||||
|
||||
### 4. WebSocket handler (`ws.rs` — new file)
|
||||
|
||||
```rust
|
||||
use axum::{
|
||||
extract::{ws::{Message, WebSocket}, Path, WebSocketUpgrade},
|
||||
routing::get,
|
||||
Router,
|
||||
};
|
||||
|
||||
async fn ws_handler(
|
||||
Path(room): Path<String>,
|
||||
ws: WebSocketUpgrade,
|
||||
/* state */
|
||||
) -> impl IntoResponse {
|
||||
ws.on_upgrade(move |socket| handle_ws(socket, room, state))
|
||||
}
|
||||
|
||||
async fn handle_ws(mut socket: WebSocket, room: String, state: WsState) {
|
||||
let addr = /* peer addr */;
|
||||
|
||||
// 1. Auth: first message must be {"type":"auth","token":"..."}
|
||||
let fingerprint = if let Some(ref auth_url) = state.auth_url {
|
||||
match socket.recv().await {
|
||||
Some(Ok(Message::Text(text))) => {
|
||||
let parsed: serde_json::Value = serde_json::from_str(&text)?;
|
||||
if parsed["type"] == "auth" {
|
||||
let token = parsed["token"].as_str().unwrap();
|
||||
let client = auth::validate_token(auth_url, token).await?;
|
||||
Some(client.fingerprint)
|
||||
} else { return; }
|
||||
}
|
||||
_ => return,
|
||||
}
|
||||
} else { None };
|
||||
|
||||
// 2. Create mpsc channel for outbound frames
|
||||
let (tx, mut rx) = tokio::sync::mpsc::channel::<bytes::Bytes>(64);
|
||||
|
||||
// 3. Join room
|
||||
let participant_id = {
|
||||
let mut mgr = state.room_mgr.lock().await;
|
||||
mgr.join_ws(&room, addr, tx, fingerprint.as_deref())?
|
||||
};
|
||||
|
||||
// 4. Run send/recv loops
|
||||
let (mut ws_tx, mut ws_rx) = socket.split();
|
||||
|
||||
// Outbound: mpsc rx → WS send
|
||||
let send_task = tokio::spawn(async move {
|
||||
while let Some(data) = rx.recv().await {
|
||||
if ws_tx.send(Message::Binary(data.to_vec())).await.is_err() {
|
||||
break;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
// Inbound: WS recv → fan-out to room
|
||||
loop {
|
||||
match ws_rx.next().await {
|
||||
Some(Ok(Message::Binary(data))) => {
|
||||
// Raw PCM Int16 from browser — fan-out to all others
|
||||
let others = {
|
||||
let mgr = state.room_mgr.lock().await;
|
||||
mgr.others(&room, participant_id)
|
||||
};
|
||||
for other in &others {
|
||||
other.send_raw(&data);
|
||||
}
|
||||
}
|
||||
Some(Ok(Message::Close(_))) | None => break,
|
||||
_ => continue,
|
||||
}
|
||||
}
|
||||
|
||||
// 5. Cleanup
|
||||
send_task.abort();
|
||||
let mut mgr = state.room_mgr.lock().await;
|
||||
mgr.leave(&room, participant_id);
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Cross-transport fan-out
|
||||
|
||||
When a QUIC participant sends audio → WS participants receive raw PCM bytes.
|
||||
When a WS participant sends audio → QUIC participants receive a `MediaPacket`.
|
||||
|
||||
The `ParticipantSender::send_raw()` method:
|
||||
```rust
|
||||
impl ParticipantSender {
|
||||
async fn send_raw(&self, pcm_bytes: &[u8]) {
|
||||
match self {
|
||||
ParticipantSender::WebSocket(tx) => {
|
||||
let _ = tx.try_send(bytes::Bytes::copy_from_slice(pcm_bytes));
|
||||
}
|
||||
ParticipantSender::Quic(transport) => {
|
||||
// Wrap raw PCM in a MediaPacket
|
||||
let pkt = MediaPacket {
|
||||
header: MediaHeader::default_pcm(),
|
||||
payload: bytes::Bytes::copy_from_slice(pcm_bytes),
|
||||
quality_report: None,
|
||||
};
|
||||
let _ = transport.send_media(&pkt).await;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For QUIC→WS direction, `run_participant` extracts `pkt.payload` bytes and sends to WS channels.
|
||||
|
||||
### 6. Dependencies to add
|
||||
|
||||
```toml
|
||||
# wzp-relay/Cargo.toml
|
||||
axum = { version = "0.8", features = ["ws"] }
|
||||
tokio = { version = "1", features = ["full"] } # already present
|
||||
```
|
||||
|
||||
### 7. Config change
|
||||
|
||||
```rust
|
||||
// config.rs
|
||||
pub struct RelayConfig {
|
||||
// ... existing fields ...
|
||||
pub ws_port: Option<u16>,
|
||||
}
|
||||
```
|
||||
|
||||
### 8. Docker compose change (featherChat side)
|
||||
|
||||
Remove `wzp-web` service entirely. Update Caddy to proxy `/audio/*` to relay's WS port:
|
||||
|
||||
```yaml
|
||||
# Before:
|
||||
wzp-web:
|
||||
entrypoint: ["wzp-web"]
|
||||
command: ["--port", "8080", "--relay", "172.28.0.10:4433"]
|
||||
|
||||
# After: REMOVED. Relay handles WS directly.
|
||||
|
||||
wzp-relay:
|
||||
command:
|
||||
- "--listen"
|
||||
- "0.0.0.0:4433"
|
||||
- "--ws-port"
|
||||
- "8080"
|
||||
- "--auth-url"
|
||||
- "http://warzone-server:7700/v1/auth/validate"
|
||||
```
|
||||
|
||||
## What Stays the Same
|
||||
|
||||
- Browser's `startAudio()` — unchanged, still connects WS to `/audio/ws/ROOM`
|
||||
- Caddy proxies `/audio/*` → relay:8080 (same path, different backend)
|
||||
- Auth flow — same JSON token as first message
|
||||
- PCM format — same Int16 binary frames
|
||||
- QUIC clients — unchanged, still connect to :4433
|
||||
- Room naming, ACL, session management — all unchanged
|
||||
|
||||
## Testing
|
||||
|
||||
1. Start relay with `--ws-port 8080 --listen 0.0.0.0:4433`
|
||||
2. Open browser, initiate call via featherChat
|
||||
3. Verify audio flows (both directions)
|
||||
4. Verify QUIC + WS clients can be in same room (mixed mode)
|
||||
5. Verify auth works
|
||||
6. Verify room cleanup on disconnect
|
||||
|
||||
## Migration Path
|
||||
|
||||
1. Implement WS in relay
|
||||
2. Test with featherChat (no featherChat changes needed)
|
||||
3. Remove wzp-web from Docker stack
|
||||
4. Later: add WebTransport alongside WS
|
||||
152
vault/Architecture/WZP-Spec.md
Normal file
152
vault/Architecture/WZP-Spec.md
Normal file
@@ -0,0 +1,152 @@
|
||||
---
|
||||
tags: [architecture, wzp]
|
||||
type: architecture
|
||||
---
|
||||
|
||||
# WZP Protocol Specification (one-page reference)
|
||||
|
||||
> Distilled from `docs/ARCHITECTURE.md` and the `wzp-proto` crate. Authoritative wire details live in `crates/wzp-proto/src/packet.rs`.
|
||||
>
|
||||
> **Status:** v2 is the deployed protocol (audio + video, 16 B header, MediaType, u32 seq). v1 clients are rejected with `Hangup::ProtocolVersionMismatch`.
|
||||
|
||||
## Layer summary
|
||||
|
||||
| Layer | WZP | FaceTime equivalent |
|
||||
|---|---|---|
|
||||
| Transport | **QUIC datagrams** (Quinn), PLPMTUD 1200 → 1452 | RTP/SRTP over UDP, ICE |
|
||||
| Signaling | `SignalMessage` (bincode) over a QUIC stream, SNI = hashed room name | APNs-tunneled binary plist |
|
||||
| Identity | Ed25519 + X25519 from BIP39 seed; fingerprint = SHA-256(pubkey)[..16] | IDS RSA + ECDSA per device |
|
||||
| Key agreement | X25519 DH + HKDF, Ed25519 signatures, rekey every 65,536 packets | Per-call DH signed by IDS keys |
|
||||
| Bulk crypto | ChaCha20-Poly1305, 64-packet sliding anti-replay | SRTP (AES-CTR + HMAC) |
|
||||
| Loss recovery | **RaptorQ FEC + Opus DRED + classical PLC** | NACK / PLI + reference-picture selection |
|
||||
| Adaptive | 3-tier hysteresis (Good / Degraded / Catastrophic) + continuous DRED tuner | Per-frame bitrate ladder |
|
||||
| Topology | SFU rooms + inter-relay federation + P2P via ICE | Mesh ≤ ~3, SFU above, Apple relays |
|
||||
| Header | 16 B `MediaHeader` v2 / 5 B `MiniHeader` (49 of 50), 4 B `QualityReport` trailer | RTP 12 B + extensions |
|
||||
|
||||
## Distinctive choices
|
||||
|
||||
- **QUIC datagrams instead of raw UDP + SRTP.** Brings TLS 1.3, PLPMTUD, path migration, and ACK-based RTT/loss estimation for free.
|
||||
- **Continuous DRED tuning.** Maps live `(loss%, RTT, jitter)` to a continuous Opus DRED lookback window. Most stacks treat DRED as discrete tiers.
|
||||
- **MiniHeader (5 B for 49/50 packets).** Saves ~11 B/packet ≈ 550 B/s/stream at 50 pps vs. the full 16 B header.
|
||||
- **E2E-preserving SFU.** The relay forwards encrypted datagrams; it never decrypts media. Room membership uses SNI = `hash(room_name)`.
|
||||
- **Codec coordination via `QualityReport` trailer.** Receivers attach 4-byte loss/RTT/jitter/cap to media packets; the SFU broadcasts `QualityDirective` so all senders in a room converge on the same tier.
|
||||
|
||||
## Wire format (current — v2)
|
||||
|
||||
### `MediaHeader` v2 (16 bytes, byte-aligned)
|
||||
|
||||
```
|
||||
Byte 0: version (u8) 0x02
|
||||
Byte 1: flags (u8) [T:1][Q:1][KeyFrame:1][FrameEnd:1][reserved:4]
|
||||
Byte 2: media_type (u8) 0=audio, 1=video, 2=data, 3=control
|
||||
Byte 3: codec_id (u8) 0-255 (see codec table)
|
||||
Byte 4: stream_id (u8) simulcast layer; 0=base
|
||||
Byte 5: fec_ratio (u8) 0..200 → 0.0..2.0
|
||||
Bytes 6-9: sequence (u32 BE)
|
||||
Bytes 10-13: timestamp_ms (u32 BE)
|
||||
Bytes 14-15: fec_block_id (u16 BE)
|
||||
```
|
||||
|
||||
| Field | Bits | Meaning |
|
||||
|---|---|---|
|
||||
| version | 8 | Must be `0x02`; v1 clients receive `Hangup::ProtocolVersionMismatch` |
|
||||
| T (bit 7 of flags) | 1 | 1 = FEC repair packet |
|
||||
| Q (bit 6 of flags) | 1 | QualityReport trailer present |
|
||||
| KeyFrame (bit 5 of flags) | 1 | Packet belongs to a video I-frame |
|
||||
| FrameEnd (bit 4 of flags) | 1 | Last packet of an access unit |
|
||||
| reserved (bits 3-0 of flags) | 4 | Must be zero |
|
||||
| media_type | 8 | 0=audio, 1=video, 2=data, 3=control |
|
||||
| codec_id | 8 | See codec table (widened from v1's 4-bit field) |
|
||||
| stream_id | 8 | Simulcast layer; 0=base layer |
|
||||
| fec_ratio | 8 | 0..200 → 0.0..2.0 |
|
||||
| sequence | 32 | Monotonically increasing packet seq (not reset by rekey) |
|
||||
| timestamp_ms | 32 | ms since session start. Monotonic across the full session; **not reset by rekey** |
|
||||
| fec_block_id | 16 | FEC source block ID |
|
||||
|
||||
### Codec table
|
||||
|
||||
| ID | Codec | Bitrate | Sample | Frame |
|
||||
|---|---|---|---|---|
|
||||
| 0 | Opus 24k | 24 kbps | 48 kHz | 20 ms |
|
||||
| 1 | Opus 16k | 16 kbps | 48 kHz | 20 ms |
|
||||
| 2 | Opus 6k | 6 kbps | 48 kHz | 40 ms |
|
||||
| 3 | Codec2 3200 | 3.2 kbps | 8 kHz | 20 ms |
|
||||
| 4 | Codec2 1200 | 1.2 kbps | 8 kHz | 40 ms |
|
||||
| 5 | ComfortNoise | 0 | 48 kHz | 20 ms |
|
||||
| 6 | Opus 32k | 32 kbps | 48 kHz | 20 ms |
|
||||
| 7 | Opus 48k | 48 kbps | 48 kHz | 20 ms |
|
||||
| 8 | Opus 64k | 64 kbps | 48 kHz | 20 ms |
|
||||
| 9 | H.264 Baseline | — | — | — |
|
||||
| 10 | H.264 Main | — | — | — |
|
||||
| 11 | H.265 Main | — | — | — |
|
||||
| 12 | AV1 Main | — | — | — |
|
||||
|
||||
### `MiniHeader` v2 (5 bytes, compressed — 49 of every 50 packets)
|
||||
|
||||
```
|
||||
[FRAME_TYPE_MINI = 0x01]
|
||||
Byte 0: seq_delta (u8)
|
||||
Bytes 1-2: timestamp_delta_ms (u16 BE)
|
||||
Bytes 3-4: payload_len (u16 BE)
|
||||
```
|
||||
|
||||
Full header sent every 50th packet to resync.
|
||||
|
||||
### `TrunkFrame` (batched, relay-internal)
|
||||
|
||||
```
|
||||
[count: u16]
|
||||
[session_id: 2][len: u16][payload: len] × count
|
||||
```
|
||||
|
||||
Up to 10 entries or PMTUD-discovered MTU; flushed every 5 ms.
|
||||
|
||||
### `QualityReport` (4 bytes, optional inline trailer)
|
||||
|
||||
```
|
||||
Byte 0: loss_pct (0-255 → 0-100%)
|
||||
Byte 1: rtt_4ms (0-255 → 0-1020 ms)
|
||||
Byte 2: jitter_ms (0-255 ms)
|
||||
Byte 3: bitrate_cap_kbps (0-255 kbps)
|
||||
```
|
||||
|
||||
### Version negotiation
|
||||
|
||||
- `version=0x02` in `MediaHeader` is a hard switch — there is no fallback negotiation.
|
||||
- Both endpoints must speak v2. A v1 peer receives `Hangup::ProtocolVersionMismatch` immediately.
|
||||
- Relays inspect only `version` and `media_type`; they never downgrade or translate between versions.
|
||||
|
||||
## Session lifecycle
|
||||
|
||||
```
|
||||
Idle → Connecting → Handshaking → Active ⇄ Rekeying → Closed
|
||||
```
|
||||
|
||||
- `CallOffer { identity_pub, ephemeral_pub, signature, profiles }`
|
||||
- `CallAnswer { identity_pub, ephemeral_pub, signature, chosen_profile }`
|
||||
- `session_key = HKDF(X25519_DH(eph_a, eph_b), "warzone-session-key")`
|
||||
- Rekey every 65,536 packets via fresh ephemeral DH.
|
||||
|
||||
## SFU forwarding rules
|
||||
|
||||
1. Fan-out to all room participants except the sender.
|
||||
2. Failed sends are skipped; forwarding is best-effort.
|
||||
3. The relay never decrypts media.
|
||||
4. With trunking on, packets to the same receiver are batched (flush 5 ms).
|
||||
5. `QualityDirective` is broadcast when the room-wide tier degrades.
|
||||
|
||||
## Adaptive quality (audio, today)
|
||||
|
||||
| Tier | Codec | FEC | Frame |
|
||||
|---|---|---|---|
|
||||
| Good | Opus 24 k | 20 % | 20 ms |
|
||||
| Degraded | Opus 6 k | 50 % | 40 ms |
|
||||
| Catastrophic | Codec2 1200 | 100 % | 40 ms |
|
||||
|
||||
Hysteresis: 3 reports to downgrade (2 on cellular), 10 to upgrade.
|
||||
|
||||
## NAT traversal (Phase 8)
|
||||
|
||||
- Candidate types: Host, Port-mapped (NAT-PMP / PCP / UPnP), Server-reflexive (STUN), Relay.
|
||||
- Hard-NAT port prediction with `classify_port_allocation()` → `predict_ports()` → `HardNatProbe` signal.
|
||||
- Mid-call re-gather: `CandidateUpdate { generation }`.
|
||||
Reference in New Issue
Block a user