docs: comprehensive project documentation

- ARCHITECTURE.md: protocol design, wire format, FEC, crypto, relay modes
- USAGE.md: build instructions, all CLI flags, deployment examples
- DESIGN.md: rationale for codec/FEC/transport/crypto choices
- EXTENSIBILITY.md: trait extension points, Warzone integration, future features
- PROGRESS.md: phase 1-4 timeline, test coverage, known issues
- API.md: complete crate API reference for all 8 crates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-03-28 05:30:11 +04:00
parent d8330525ef
commit 5425c59e7d
6 changed files with 1840 additions and 0 deletions

329
docs/ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,329 @@
# WarzonePhone Protocol Design & Architecture
## Network Topology
```
Lossy / censored link
◄──────────────────────►
┌────────┐ ┌─────────┐ ┌─────────┐ ┌─────────────┐
│ Client │─QUIC─│ Relay A │─QUIC─│ Relay B │─QUIC─│ Destination │
└────────┘ └─────────┘ └─────────┘ └─────────────┘
│ │ │ │
Encode Forward Forward Decode
FEC FEC FEC FEC
Encrypt (opaque) (opaque) Decrypt
```
In the simplest deployment a single relay serves as the meeting point (room mode, SFU). Clients connect directly to one relay, which forwards media to all other participants in the same room. For censorship-resistant links, two relays can be chained: a client-facing relay forwards all traffic to a remote relay via QUIC.
Room names are carried in the QUIC SNI field during the TLS handshake, so a single relay can host many independent rooms without additional signaling.
## Protocol Stack
```
┌──────────────────────────────────────────────┐
│ Application (Opus / Codec2 audio) │ wzp-codec
├──────────────────────────────────────────────┤
│ Redundancy (RaptorQ FEC + interleaving) │ wzp-fec
├──────────────────────────────────────────────┤
│ Crypto (ChaCha20-Poly1305 + AEAD) │ wzp-crypto
├──────────────────────────────────────────────┤
│ Transport (QUIC DATAGRAM + reliable stream) │ wzp-transport
├──────────────────────────────────────────────┤
│ Obfuscation (Phase 2 — trait defined) │ wzp-proto::ObfuscationLayer
└──────────────────────────────────────────────┘
```
Audio and FEC are end-to-end between caller and callee. The relay operates on opaque, encrypted, FEC-protected packets. Crypto keys are never shared with relays.
## Wire Format
### MediaHeader (12 bytes)
```
Byte 0: [V:1][T:1][CodecID:4][Q:1][FecRatioHi:1]
Byte 1: [FecRatioLo:6][unused:2]
Byte 2-3: Sequence number (big-endian u16)
Byte 4-7: Timestamp in ms since session start (big-endian u32)
Byte 8: FEC block ID (wrapping u8)
Byte 9: FEC symbol index within block
Byte 10: Reserved / flags
Byte 11: CSRC count (for future mixing)
```
Field details:
| Field | Bits | Description |
|-------|------|-------------|
| V | 1 | Protocol version (0 = v1) |
| T | 1 | 1 = FEC repair packet, 0 = source media |
| CodecID | 4 | Codec identifier (0=Opus24k, 1=Opus16k, 2=Opus6k, 3=Codec2_3200, 4=Codec2_1200) |
| Q | 1 | QualityReport trailer appended |
| FecRatio | 7 | FEC ratio encoded as 7-bit value (0-127 maps to 0.0-2.0) |
| Seq | 16 | Wrapping packet sequence number |
| Timestamp | 32 | Milliseconds since session start |
| FEC block | 8 | Source block ID (wrapping) |
| FEC symbol | 8 | Symbol index within the FEC block |
| Reserved | 8 | Reserved flags |
| CSRC count | 8 | Contributing source count (future) |
Defined in `crates/wzp-proto/src/packet.rs` as `MediaHeader`.
### QualityReport (4 bytes)
Appended to a media packet when the Q flag is set.
```
Byte 0: loss_pct — 0-255 maps to 0-100% loss
Byte 1: rtt_4ms — RTT in 4ms units (0-255 = 0-1020ms)
Byte 2: jitter_ms — Jitter in milliseconds
Byte 3: bitrate_cap — Max receive bitrate in kbps
```
Defined in `crates/wzp-proto/src/packet.rs` as `QualityReport`.
### MediaPacket
A complete media packet on the wire:
```
[MediaHeader: 12 bytes][Payload: variable][QualityReport: 4 bytes if Q=1]
```
Defined in `crates/wzp-proto/src/packet.rs` as `MediaPacket`.
### SignalMessage (reliable stream)
Signaling uses length-prefixed JSON over reliable QUIC bidirectional streams. Each message opens a new bidi stream, writes a 4-byte big-endian length prefix followed by the JSON payload, then finishes the send side.
Variants defined in `crates/wzp-proto/src/packet.rs`:
- `CallOffer` — identity_pub, ephemeral_pub, signature, supported_profiles
- `CallAnswer` — identity_pub, ephemeral_pub, signature, chosen_profile
- `IceCandidate` — NAT traversal candidate string
- `Rekey` — new_ephemeral_pub, signature
- `QualityUpdate` — report, recommended_profile
- `Ping` / `Pong` — timestamp_ms for RTT measurement
- `Hangup` — reason (Normal, Busy, Declined, Timeout, Error)
## FEC Strategy
WarzonePhone uses **RaptorQ fountain codes** (via the `raptorq` crate) for forward error correction. This is implemented in `crates/wzp-fec/`.
### Block Structure
Audio frames are grouped into FEC blocks. Each block contains a fixed number of source symbols (configured per quality profile). Each source symbol is a single encoded audio frame, zero-padded to a uniform 256-byte symbol size with a 2-byte little-endian length prefix.
### Encoding Process
1. Audio frames are added to the encoder as source symbols
2. When a block is full (`frames_per_block` symbols), repair symbols are generated
3. The repair ratio determines how many repair symbols: `ceil(num_source * ratio)`
4. Both source and repair packets are transmitted with the block ID and symbol index in the header
### Decoding Process
1. Received symbols (source or repair) are fed to the decoder keyed by block ID
2. The decoder attempts reconstruction when sufficient symbols arrive
3. RaptorQ can recover the full block from any `K` symbols out of `K + R` total (where K = source count, R = repair count)
4. Old blocks are expired via wrapping u8 distance
### Interleaving
The `Interleaver` spreads symbols from multiple FEC blocks across transmission slots in round-robin fashion. With depth=3, a burst loss of 6 consecutive packets damages at most 2 symbols per block instead of 6 symbols in one block.
### FEC Configuration by Quality Tier
| Tier | Frames/Block | Repair Ratio | Total Bandwidth Overhead |
|------|-------------|-------------|-------------------------|
| GOOD | 5 | 0.2 (20%) | 1.2x |
| DEGRADED | 10 | 0.5 (50%) | 1.5x |
| CATASTROPHIC | 8 | 1.0 (100%) | 2.0x |
## Adaptive Quality
Three quality tiers drive codec and FEC selection. The controller is implemented in `crates/wzp-proto/src/quality.rs` as `AdaptiveQualityController`.
### Tier Thresholds
| Tier | Loss | RTT | Codec | FEC Ratio |
|------|------|-----|-------|-----------|
| GOOD | < 10% | < 400ms | Opus 24kbps, 20ms frames | 0.2 |
| DEGRADED | 10-40% or 400-600ms | | Opus 6kbps, 40ms frames | 0.5 |
| CATASTROPHIC | > 40% or > 600ms | | Codec2 1200bps, 40ms frames | 1.0 |
### Hysteresis
- **Downgrade**: Triggers after 3 consecutive reports in a worse tier (fast reaction)
- **Upgrade**: Triggers after 10 consecutive reports in a better tier (slow, cautious)
- **Step limit**: Upgrades move only one tier at a time (Catastrophic -> Degraded -> Good)
- **History**: A sliding window of 20 recent reports is maintained for smoothing
- **Force mode**: Manual `force_profile()` disables adaptive logic entirely
### QualityProfile Constants
```rust
GOOD: Opus24k, fec=0.2, 20ms, 5 frames/block 28.8 kbps total
DEGRADED: Opus6k, fec=0.5, 40ms, 10 frames/block 9.0 kbps total
CATASTROPHIC: Codec2_1200, fec=1.0, 40ms, 8 frames/block 2.4 kbps total
```
## Encryption
Implemented in `crates/wzp-crypto/`.
### Identity Model (Warzone-Compatible)
- **Seed**: 32-byte random value (BIP39 mnemonic for backup)
- **Ed25519**: Derived via `HKDF(seed, "warzone-ed25519-identity")` -- signing/identity
- **X25519**: Derived via `HKDF(seed, "warzone-x25519-identity")` -- encryption
- **Fingerprint**: `SHA-256(Ed25519_pub)[:16]` -- 128-bit identifier
### Per-Call Key Exchange
1. Each side generates an ephemeral X25519 keypair
2. Ephemeral public keys are exchanged via `CallOffer`/`CallAnswer` signaling
3. Signatures are computed: `Ed25519_sign(ephemeral_pub || context_string)`
4. Shared secret: `X25519_DH(our_ephemeral_secret, peer_ephemeral_pub)`
5. Session key: `HKDF(shared_secret, "warzone-session-key")` -> 32 bytes
### Nonce Construction (12 bytes, not transmitted)
```
session_id[0..4] || sequence_number (u32 BE) || direction (1 byte) || padding (3 bytes zero)
```
- `session_id`: First 4 bytes of `SHA-256(session_key)`
- `direction`: 0 = Send, 1 = Recv
- Nonces are derived deterministically, saving 12 bytes per packet
### AEAD Encryption
- Algorithm: ChaCha20-Poly1305
- AAD: The 12-byte MediaHeader (authenticated but not encrypted)
- Tag: 16 bytes appended to ciphertext
- Overhead per packet: 16 bytes
### Rekeying
- Trigger: Every 2^16 packets (65536)
- Process: New ephemeral X25519 exchange, mixed with old key via HKDF
- Key evolution: `HKDF(old_key as salt, new_DH_result, "warzone-rekey")`
- Old key is zeroized after derivation (forward secrecy)
- Sequence counters reset to 0 after rekey
### Anti-Replay
- Sliding window of 1024 packets using a bitmap
- Sequence numbers too old (> 1024 behind highest seen) are rejected
- Handles u16 wrapping correctly (RFC 1982 serial number arithmetic)
- Implemented in `crates/wzp-crypto/src/anti_replay.rs` as `AntiReplayWindow`
## Jitter Buffer
Implemented in `crates/wzp-proto/src/jitter.rs` as `JitterBuffer`.
- **Structure**: BTreeMap keyed by sequence number for ordered playout
- **Target depth**: 50 packets (1 second) default
- **Max depth**: 250 packets (5 seconds at 20ms/frame)
- **Min depth**: 25 packets (0.5 seconds) before playout begins
- **Sequence wrapping**: RFC 1982 serial number arithmetic for u16
- **Duplicate handling**: Silently dropped
- **Late packets**: Packets arriving after their sequence has been played out are dropped
- **Overflow**: When buffer exceeds max depth, oldest packets are evicted
### Playout Results
- `Packet(MediaPacket)` -- normal delivery
- `Missing { seq }` -- gap detected, decoder should generate PLC
- `NotReady` -- buffer not yet filled to minimum depth
### Known Limitations
- No adaptive depth adjustment based on observed jitter (target_depth is configurable but not self-tuning in the current implementation)
- No timestamp-based playout scheduling (uses sequence-number ordering only)
- Jitter buffer drift has been observed during long echo tests
## Session State Machine
Defined in `crates/wzp-proto/src/session.rs`:
```
Idle -> Connecting -> Handshaking -> Active <-> Rekeying -> Active
|
Closed
```
- Media flows during both `Active` and `Rekeying` states
- Any state can transition to `Closed` via `Terminate` or `ConnectionLost`
- Invalid transitions produce a `TransitionError`
## Relay Modes
### Room Mode (Default, SFU)
- Clients join named rooms via QUIC SNI
- When a participant sends a packet, the relay forwards it to all other participants
- No transcoding -- packets are forwarded opaquely
- Rooms are auto-created when the first participant joins and auto-deleted when empty
- Managed by `RoomManager` in `crates/wzp-relay/src/room.rs`
### Forward Mode (`--remote`)
- All incoming traffic is forwarded to a remote relay via QUIC
- Two-pipeline architecture: upstream (client->remote) and downstream (remote->client)
- Each direction has its own `RelayPipeline` with FEC decode/encode and jitter buffering
- Intended for chaining relays across censored/lossy boundaries
### Relay Pipeline (Forward Mode)
Implemented in `crates/wzp-relay/src/pipeline.rs` as `RelayPipeline`:
```
Inbound: recv -> FEC decode -> jitter buffer -> pop
Outbound: packet -> assign seq -> FEC encode -> repair packets -> send
```
The pipeline does NOT decode/re-encode audio. It operates on FEC-protected packets, managing loss recovery and re-FEC-encoding for the next hop.
## Transport
Implemented in `crates/wzp-transport/` using QUIC via the `quinn` crate.
### QUIC Configuration
- ALPN protocol: `wzp`
- Idle timeout: 30 seconds
- Keep-alive interval: 5 seconds
- DATAGRAM extension enabled (for unreliable media)
- Datagram receive buffer: 64 KB
- Receive window: 256 KB
- Send window: 128 KB
- Stream receive window: 64 KB per stream
- Initial RTT estimate: 300ms (tuned for high-latency links)
### Media Transport
- **Unreliable media**: QUIC DATAGRAM frames (no retransmission, no head-of-line blocking)
- **Reliable signaling**: QUIC bidirectional streams with length-prefixed JSON framing
### Path Quality Monitoring
`PathMonitor` in `crates/wzp-transport/src/path_monitor.rs` tracks:
- **Loss**: EWMA-smoothed percentage from sent/received packet counts
- **RTT**: EWMA-smoothed round-trip time (alpha=0.1)
- **Jitter**: EWMA of RTT variance (|current_rtt - previous_rtt|)
- **Bandwidth**: Estimated from bytes received over elapsed time
### Codec Selection by Tier
| Codec | Sample Rate | Frame Duration | Bitrate | Use Case |
|-------|------------|----------------|---------|----------|
| Opus24k | 48 kHz | 20ms (960 samples) | 24 kbps | Good conditions |
| Opus16k | 48 kHz | 20ms | 16 kbps | Moderate conditions |
| Opus6k | 48 kHz | 40ms (1920 samples) | 6 kbps | Degraded conditions |
| Codec2_3200 | 8 kHz | 20ms (160 samples) | 3.2 kbps | Poor conditions |
| Codec2_1200 | 8 kHz | 40ms (320 samples) | 1.2 kbps | Catastrophic conditions |
Opus operates at 48 kHz natively. When Codec2 is selected, the adaptive codec layer handles 48 kHz <-> 8 kHz resampling transparently using a simple linear resampler (6:1 decimation/interpolation).