Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1246 lines
51 KiB
Markdown
1246 lines
51 KiB
Markdown
# WarzonePhone Architecture
|
|
|
|
> Custom lossy VoIP protocol built in Rust. E2E encrypted, FEC-protected, adaptive quality, designed for hostile network conditions.
|
|
|
|
## System Overview
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Client A (Desktop / Android / CLI)"
|
|
MIC[Microphone] --> DN[NoiseSuppressor<br/>RNNoise ML]
|
|
DN --> SD[SilenceDetector<br/>VAD + Hangover]
|
|
SD --> ENC[CallEncoder<br/>Opus / Codec2]
|
|
ENC --> FEC_E[FEC Encoder<br/>RaptorQ]
|
|
FEC_E --> CRYPT_E[ChaCha20-Poly1305<br/>Encrypt]
|
|
CRYPT_E --> QUIC_S[QUIC Datagram<br/>Send]
|
|
|
|
QUIC_R[QUIC Datagram<br/>Recv] --> CRYPT_D[ChaCha20-Poly1305<br/>Decrypt]
|
|
CRYPT_D --> FEC_D[FEC Decoder<br/>RaptorQ]
|
|
FEC_D --> JIT[JitterBuffer<br/>Adaptive Playout]
|
|
JIT --> DEC[CallDecoder<br/>Opus / Codec2]
|
|
DEC --> SPK[Speaker]
|
|
end
|
|
|
|
subgraph "Relay (SFU)"
|
|
ACCEPT[Accept QUIC] --> AUTH{Auth?}
|
|
AUTH -->|token| VALIDATE[POST /v1/auth/validate]
|
|
AUTH -->|no auth| HS
|
|
VALIDATE --> HS[Crypto Handshake<br/>X25519 + Ed25519]
|
|
HS --> ROOM[Room Manager<br/>Named Rooms via SNI]
|
|
ROOM --> FWD[Forward to<br/>Other Participants]
|
|
end
|
|
|
|
subgraph "Client B"
|
|
B_SPK[Speaker]
|
|
B_MIC[Microphone]
|
|
end
|
|
|
|
QUIC_S -->|UDP / QUIC| ACCEPT
|
|
FWD -->|UDP / QUIC| QUIC_R
|
|
B_MIC -.->|same pipeline| ACCEPT
|
|
FWD -.->|same pipeline| B_SPK
|
|
|
|
style MIC fill:#4a9eff,color:#fff
|
|
style SPK fill:#4a9eff,color:#fff
|
|
style B_MIC fill:#4a9eff,color:#fff
|
|
style B_SPK fill:#4a9eff,color:#fff
|
|
style ROOM fill:#ff9f43,color:#fff
|
|
style CRYPT_E fill:#ee5a24,color:#fff
|
|
style CRYPT_D fill:#ee5a24,color:#fff
|
|
```
|
|
|
|
## Crate Dependency Graph
|
|
|
|
```mermaid
|
|
graph TD
|
|
PROTO["wzp-proto<br/>Types, Traits, Wire Format"]
|
|
|
|
CODEC["wzp-codec<br/>Opus + Codec2 + RNNoise"]
|
|
FEC["wzp-fec<br/>RaptorQ FEC"]
|
|
CRYPTO["wzp-crypto<br/>ChaCha20 + Identity"]
|
|
TRANSPORT["wzp-transport<br/>QUIC / Quinn"]
|
|
VIDEO["wzp-video<br/>H.264 + H.265 + AV1"]
|
|
|
|
RELAY["wzp-relay<br/>Relay Daemon"]
|
|
CLIENT["wzp-client<br/>CLI + Call Engine"]
|
|
WEB["wzp-web<br/>Browser Bridge"]
|
|
|
|
PROTO --> CODEC
|
|
PROTO --> FEC
|
|
PROTO --> CRYPTO
|
|
PROTO --> TRANSPORT
|
|
PROTO --> VIDEO
|
|
|
|
CODEC --> CLIENT
|
|
FEC --> CLIENT
|
|
CRYPTO --> CLIENT
|
|
TRANSPORT --> CLIENT
|
|
VIDEO --> CLIENT
|
|
|
|
CODEC --> RELAY
|
|
FEC --> RELAY
|
|
CRYPTO --> RELAY
|
|
TRANSPORT --> RELAY
|
|
VIDEO --> RELAY
|
|
|
|
CLIENT --> WEB
|
|
TRANSPORT --> WEB
|
|
CRYPTO --> WEB
|
|
|
|
FC["warzone-protocol<br/>featherChat Identity"] -.->|path dep| CRYPTO
|
|
|
|
style PROTO fill:#6c5ce7,color:#fff
|
|
style RELAY fill:#ff9f43,color:#fff
|
|
style CLIENT fill:#00b894,color:#fff
|
|
style WEB fill:#0984e3,color:#fff
|
|
style FC fill:#fd79a8,color:#fff
|
|
style VIDEO fill:#a29bfe,color:#fff
|
|
```
|
|
|
|
**Star pattern**: Each leaf crate (`wzp-codec`, `wzp-fec`, `wzp-crypto`, `wzp-transport`, `wzp-video`) depends only on `wzp-proto`. No leaf depends on another leaf. Integration crates (`wzp-relay`, `wzp-client`, `wzp-web`) depend on all leaves.
|
|
|
|
## Audio Encode Pipeline
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Mic as Microphone<br/>(48kHz)
|
|
participant Ring as SPSC Ring<br/>(lock-free)
|
|
participant RNN as RNNoise<br/>(2 x 480)
|
|
participant VAD as SilenceDetector
|
|
participant Codec as Opus / Codec2
|
|
participant DT as DredTuner<br/>(wzp-proto)
|
|
participant FEC as RaptorQ FEC
|
|
participant INT as Interleaver<br/>(depth=3)
|
|
participant HDR as MediaHeader<br/>(16B or Mini 5B)
|
|
participant Enc as ChaCha20-Poly1305
|
|
participant QUIC as QUIC Datagram
|
|
participant QPS as QuinnPathSnapshot
|
|
|
|
Mic->>Ring: f32 x 512 (macOS callback)
|
|
Ring->>Ring: Accumulate to 960 samples
|
|
Ring->>RNN: PCM i16 x 960 (20ms frame)
|
|
RNN->>VAD: Denoised audio
|
|
alt Speech active (or hangover)
|
|
VAD->>Codec: Encode active frame
|
|
else Silence (>100ms)
|
|
VAD->>Codec: ComfortNoise (every 200ms)
|
|
end
|
|
|
|
Note over QPS,DT: Every 25 frames (~500ms)
|
|
QPS->>DT: loss_pct, rtt_ms, jitter_ms
|
|
DT->>Codec: set_dred_duration() + set_expected_loss()
|
|
|
|
alt Opus tier (any bitrate)
|
|
Codec->>HDR: Compressed bytes + DRED side-channel (no RaptorQ)
|
|
else Codec2 tier
|
|
Codec->>FEC: Compressed bytes (pad to 256B symbol)
|
|
FEC->>FEC: Accumulate block (5-10 symbols)
|
|
FEC->>INT: Source + repair symbols
|
|
INT->>HDR: Interleaved packets
|
|
end
|
|
HDR->>Enc: Header as AAD
|
|
Enc->>QUIC: Encrypted payload + 16B tag
|
|
```
|
|
|
|
### Key Details
|
|
|
|
- macOS delivers **512 f32** samples per callback (not configurable to 960)
|
|
- Ring buffer accumulates to **960 samples** (20ms at 48 kHz) for codec frame
|
|
- RNNoise processes **2 x 480** samples (ML-based noise suppression via nnnoiseless)
|
|
- Silence detection uses VAD + 100ms hangover before switching to ComfortNoise
|
|
- FEC symbols are padded to **256 bytes** with a 2-byte LE length prefix
|
|
- MiniHeaders (5 bytes) replace full headers (16 bytes) for 49 of every 50 audio frames; video always uses full headers
|
|
- DRED tuner polls quinn path stats every 25 frames (~500ms) and adjusts DRED lookback duration continuously
|
|
- Opus tiers bypass RaptorQ entirely -- DRED handles loss recovery at the codec layer
|
|
- Opus6k DRED window: 1040ms (maximum libopus allows)
|
|
|
|
## Audio Decode Pipeline
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant QUIC as QUIC Datagram
|
|
participant Dec as ChaCha20-Poly1305
|
|
participant AR as Anti-Replay<br/>(sliding window)
|
|
participant HDR as Header Parse
|
|
participant DEINT as De-interleaver
|
|
participant FEC as RaptorQ FEC<br/>(reconstruct)
|
|
participant JIT as JitterBuffer<br/>(BTreeMap)
|
|
participant Codec as Opus / Codec2
|
|
participant Ring as SPSC Ring<br/>(lock-free)
|
|
participant SPK as Speaker
|
|
|
|
QUIC->>Dec: Encrypted packet
|
|
Dec->>AR: Decrypt (header = AAD)
|
|
AR->>AR: Check seq window (reject replay)
|
|
AR->>HDR: Verified packet
|
|
|
|
alt Opus packet
|
|
HDR->>JIT: Direct to jitter buffer (no FEC/interleave)
|
|
else Codec2 packet
|
|
HDR->>DEINT: MediaHeader + payload
|
|
DEINT->>FEC: Reordered symbols by block
|
|
FEC->>FEC: Attempt decode (need K of K+R)
|
|
FEC->>JIT: Recovered audio frames
|
|
end
|
|
|
|
JIT->>JIT: BTreeMap ordered by seq
|
|
JIT->>JIT: Wait until depth >= target
|
|
|
|
alt Packet present
|
|
JIT->>Codec: Pop lowest seq frame
|
|
else Packet missing (Opus)
|
|
JIT->>Codec: DRED reconstruction (neural)
|
|
alt DRED fails or unavailable
|
|
Codec->>Codec: Classical PLC fallback
|
|
end
|
|
else Packet missing (Codec2)
|
|
Codec->>Codec: Classical PLC
|
|
end
|
|
|
|
Codec->>Ring: PCM i16 x 960
|
|
Ring->>SPK: Audio callback pulls samples
|
|
```
|
|
|
|
### Key Details
|
|
|
|
- Anti-replay uses a **64-packet sliding window** to reject duplicates
|
|
- FEC decoder needs any **K of K+R** symbols to reconstruct a block
|
|
- Jitter buffer target: **10 packets (200ms)** for client, **50 packets (1s)** for relay
|
|
- Desktop client uses **direct playout** (no jitter buffer) with lock-free ring
|
|
- Codec2 frames at 8 kHz are resampled to 48 kHz transparently
|
|
- DRED reconstruction: on packet loss, decoder tries neural DRED reconstruction before falling back to classical PLC
|
|
- Jitter-spike detection pre-emptively boosts DRED to ceiling when jitter variance spikes >30%
|
|
|
|
## Relay SFU Forwarding
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Room Mode (Default SFU)"
|
|
C1[Client 1<br/>Alice] -->|"QUIC SNI=room-hash"| RM[Room Manager]
|
|
C2[Client 2<br/>Bob] -->|"QUIC SNI=room-hash"| RM
|
|
C3[Client 3<br/>Charlie] -->|"QUIC SNI=room-hash"| RM
|
|
RM --> R1["Room 'podcast'"]
|
|
R1 -->|"fan-out (skip sender)"| C1
|
|
R1 -->|"fan-out (skip sender)"| C2
|
|
R1 -->|"fan-out (skip sender)"| C3
|
|
end
|
|
|
|
subgraph "Forward Mode (--remote)"
|
|
C4[Client] -->|QUIC| RA[Relay A]
|
|
RA -->|"FEC decode<br/>jitter buffer<br/>FEC re-encode"| RB[Relay B<br/>--remote]
|
|
RB -->|QUIC| C5[Client]
|
|
end
|
|
|
|
subgraph "Probe Mode (--probe)"
|
|
PA[Relay A] -->|"Ping 1/s<br/>~50 bytes"| PB[Relay B]
|
|
PB -->|Pong| PA
|
|
PA --> PM[Prometheus<br/>RTT / Loss / Jitter]
|
|
end
|
|
|
|
style RM fill:#ff9f43,color:#fff
|
|
style R1 fill:#fdcb6e
|
|
style PM fill:#0984e3,color:#fff
|
|
```
|
|
|
|
### SFU Fan-out Rules
|
|
|
|
1. Each incoming datagram is forwarded to all other participants in the room
|
|
2. The sender is excluded from fan-out (no echo)
|
|
3. If one send fails, the relay continues to the next participant (best-effort)
|
|
4. The relay never decodes or re-encodes audio (preserves E2E encryption)
|
|
5. With trunking enabled, packets to the same receiver are batched into TrunkFrames (flushed every 5ms)
|
|
6. Relay tracks per-participant quality from QualityReport trailers and broadcasts `QualityDirective` when the room-wide tier degrades (coordinated codec switching)
|
|
|
|
## Federation Topology
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Relay A (EU)"
|
|
A_R["Room Manager"]
|
|
A_F["Federation<br/>Manager"]
|
|
A1["Alice (local)"]
|
|
A2["Bob (local)"]
|
|
end
|
|
|
|
subgraph "Relay B (US)"
|
|
B_R["Room Manager"]
|
|
B_F["Federation<br/>Manager"]
|
|
B1["Charlie (local)"]
|
|
end
|
|
|
|
subgraph "Relay C (APAC)"
|
|
C_R["Room Manager"]
|
|
C_F["Federation<br/>Manager"]
|
|
C1["Dave (local)"]
|
|
end
|
|
|
|
A1 -->|media| A_R
|
|
A2 -->|media| A_R
|
|
B1 -->|media| B_R
|
|
C1 -->|media| C_R
|
|
|
|
A_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| B_F
|
|
A_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| C_F
|
|
B_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| C_F
|
|
|
|
A_R --> A_F
|
|
B_R --> B_F
|
|
C_R --> C_F
|
|
|
|
style A_F fill:#6c5ce7,color:#fff
|
|
style B_F fill:#6c5ce7,color:#fff
|
|
style C_F fill:#6c5ce7,color:#fff
|
|
style A_R fill:#ff9f43,color:#fff
|
|
style B_R fill:#ff9f43,color:#fff
|
|
style C_R fill:#ff9f43,color:#fff
|
|
```
|
|
|
|
### Federation Protocol Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant RA as Relay A
|
|
participant RB as Relay B
|
|
|
|
Note over RA: Startup: connect to configured peers
|
|
|
|
RA->>RB: QUIC connect (SNI="_federation")
|
|
RA->>RB: FederationHello { tls_fingerprint }
|
|
RB->>RB: Verify fingerprint against [[trusted]]
|
|
|
|
Note over RA,RB: Federation link established
|
|
|
|
Note over RA: Alice joins global room "podcast"
|
|
RA->>RB: GlobalRoomActive { room: "podcast" }
|
|
|
|
Note over RB: Charlie joins global room "podcast"
|
|
RB->>RA: GlobalRoomActive { room: "podcast" }
|
|
|
|
Note over RA,RB: Media bridging active
|
|
|
|
loop Every media packet in global room
|
|
RA->>RB: [room_hash:8][encrypted_media]
|
|
RB->>RA: [room_hash:8][encrypted_media]
|
|
end
|
|
|
|
Note over RA: Last local participant leaves
|
|
RA->>RB: GlobalRoomInactive { room: "podcast" }
|
|
```
|
|
|
|
## Wire Formats
|
|
|
|
### `MediaHeader` v2 (16 bytes, byte-aligned)
|
|
|
|
```
|
|
Byte 0: version (u8) 0x02
|
|
Byte 1: flags (u8) [T:1][Q:1][KeyFrame:1][FrameEnd:1][reserved:4]
|
|
T = FEC repair, Q = QualityReport trailer
|
|
KeyFrame = packet belongs to an I-frame (video)
|
|
FrameEnd = last packet of an access unit (video)
|
|
Byte 2: media_type (u8) 0=audio, 1=video, 2=data, 3=control
|
|
Byte 3: codec_id (u8) widened from 4-bit (room for 256 codec IDs)
|
|
Byte 4: stream_id (u8) simulcast layer; 0=base
|
|
Byte 5: fec_ratio (u8) 0..200 → 0.0..2.0
|
|
Bytes 6-9: sequence (u32 BE) wrapping packet sequence number
|
|
Bytes 10-13: timestamp_ms (u32 BE) milliseconds since session start
|
|
Bytes 14-15: fec_block_id (u16 BE)
|
|
audio: low 8 bits = block_id, high 8 bits = symbol_idx
|
|
video: full u16 block_id (large blocks for I-frames)
|
|
```
|
|
|
|
#### CodecID Values
|
|
|
|
**Audio codecs (media_type = 0)**
|
|
|
|
| Value | Codec | Bitrate | Sample Rate | Frame Duration |
|
|
|-------|-------|---------|-------------|---------------|
|
|
| 0 | Opus 24k | 24 kbps | 48 kHz | 20ms |
|
|
| 1 | Opus 16k | 16 kbps | 48 kHz | 20ms |
|
|
| 2 | Opus 6k | 6 kbps | 48 kHz | 40ms |
|
|
| 3 | Codec2 3200 | 3.2 kbps | 8 kHz | 20ms |
|
|
| 4 | Codec2 1200 | 1.2 kbps | 8 kHz | 40ms |
|
|
| 5 | ComfortNoise | 0 | 48 kHz | 20ms |
|
|
| 6 | Opus 32k | 32 kbps | 48 kHz | 20ms |
|
|
| 7 | Opus 48k | 48 kbps | 48 kHz | 20ms |
|
|
| 8 | Opus 64k | 64 kbps | 48 kHz | 20ms |
|
|
|
|
**Video codecs (media_type = 1)**
|
|
|
|
| Value | Codec | Notes |
|
|
|-------|-------|-------|
|
|
| 9 | H.264 Baseline | Universal HW encode coverage |
|
|
| 10 | H.264 Main | Slight quality win over baseline |
|
|
| 11 | H.265 Main | Apple A10+, Snapdragon ~2017, NVENC GTX 9xx+; ~30% better than H.264 |
|
|
| 12 | AV1 Main | Apple M3/A17+, Snapdragon 8 Gen 3+, RTX 40+; best efficiency, narrow HW |
|
|
|
|
### `MiniHeader` v2 (5 bytes)
|
|
|
|
```
|
|
[FRAME_TYPE_MINI = 0x01]
|
|
Byte 0: seq_delta (u8) delta from last full header's seq
|
|
Bytes 1-2: timestamp_delta_ms (u16 BE)
|
|
Bytes 3-4: payload_len (u16 BE)
|
|
```
|
|
|
|
Used for audio only (49 of every 50 frames). Saves 11 bytes per audio packet vs the full 16B header. Full header is sent every 50th frame to resynchronize state. Video always uses full 16B headers.
|
|
|
|
### TrunkFrame (batched datagrams)
|
|
|
|
```
|
|
[count: u16]
|
|
[session_id: 2][len: u16][payload: len] x count
|
|
```
|
|
|
|
Packs multiple session packets into one QUIC datagram. Maximum 10 entries or PMTUD-discovered MTU (starts at 1200, grows to ~1452 on Ethernet), flushed every 5ms.
|
|
|
|
### QualityReport (4 bytes, optional trailer)
|
|
|
|
```
|
|
Byte 0: loss_pct (0-255 maps to 0-100%)
|
|
Byte 1: rtt_4ms (0-255 maps to 0-1020ms, resolution 4ms)
|
|
Byte 2: jitter_ms (0-255ms)
|
|
Byte 3: bitrate_cap_kbps (0-255 kbps)
|
|
```
|
|
|
|
Appended to a media packet when the Q flag is set in the MediaHeader.
|
|
|
|
## Path MTU Discovery
|
|
|
|
Quinn's PLPMTUD is enabled with:
|
|
- `initial_mtu`: 1200 bytes (QUIC minimum, always safe)
|
|
- `upper_bound`: 1452 bytes (Ethernet minus IP/UDP/QUIC headers)
|
|
- `interval`: 300s (re-probe every 5 minutes)
|
|
- `black_hole_cooldown`: 30s (faster retry on lossy links)
|
|
|
|
The discovered MTU is exposed via `QuinnPathSnapshot::current_mtu` and used by:
|
|
- `TrunkedForwarder`: refreshes `max_bytes` on every send to fill larger datagrams
|
|
- Future video framer: larger MTU = fewer application-layer fragments per frame
|
|
|
|
## Continuous DRED Tuning
|
|
|
|
Instead of locking DRED duration to 3 discrete quality tiers, the `DredTuner` (in `wzp-proto::dred_tuner`) maps live path quality to a continuous DRED duration:
|
|
|
|
| Input | Source | Update Rate |
|
|
|-------|--------|-------------|
|
|
| Loss % | `QuinnPathSnapshot::loss_pct` (from quinn ACK frames) | Every 25 packets (~500ms) |
|
|
| RTT ms | `QuinnPathSnapshot::rtt_ms` (quinn congestion controller) | Every 25 packets |
|
|
| Jitter ms | `PathMonitor::jitter_ms` (EWMA of RTT variance) | Every 25 packets |
|
|
|
|
### Mapping Logic
|
|
|
|
- **Baseline**: codec-tier default (Studio=100ms, Good=200ms, Degraded=500ms)
|
|
- **Ceiling**: codec-tier max (Studio=300ms, Good=500ms, Degraded=1040ms)
|
|
- **Continuous**: linear interpolation between baseline and ceiling based on loss (0%->baseline, 40%->ceiling)
|
|
- **RTT phantom loss**: high RTT (>200ms) adds phantom loss contribution to keep DRED generous
|
|
- **Jitter spike**: >30% EWMA spike pre-emptively boosts to ceiling for ~5s cooldown
|
|
|
|
### Output
|
|
|
|
`DredTuning { dred_frames: u8, expected_loss_pct: u8 }` -> fed to `CallEncoder::apply_dred_tuning()` -> `OpusEncoder::set_dred_duration()` + `set_expected_loss()`
|
|
|
|
## Signal Message Handshake Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant C as Client
|
|
participant R as Relay
|
|
|
|
C->>R: QUIC Connect (SNI = hashed room name)
|
|
|
|
alt Auth enabled (--auth-url)
|
|
C->>R: SignalMessage::AuthToken { token }
|
|
R->>R: POST auth_url to validate
|
|
R-->>C: (connection closed if invalid)
|
|
end
|
|
|
|
C->>R: CallOffer { identity_pub, ephemeral_pub, signature, supported_profiles }
|
|
R->>R: Verify Ed25519 signature
|
|
R->>R: Generate ephemeral X25519
|
|
R->>R: shared_secret = DH(eph_relay, eph_client)
|
|
R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
|
|
R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, chosen_profile }
|
|
|
|
C->>C: Verify signature
|
|
C->>C: Derive same session_key
|
|
|
|
Note over C,R: Session established -- both have ChaCha20-Poly1305 key
|
|
|
|
C->>R: RoomUpdate (join notification broadcast)
|
|
|
|
loop Media exchange
|
|
C->>R: QUIC Datagram (encrypted media)
|
|
R->>C: QUIC Datagram (forwarded from others)
|
|
end
|
|
|
|
opt Every 65,536 packets
|
|
C->>R: Rekey { new_ephemeral_pub, signature }
|
|
R->>C: Rekey { new_ephemeral_pub, signature }
|
|
Note over C,R: New session key via fresh DH
|
|
end
|
|
|
|
C->>R: Hangup { reason: Normal }
|
|
R->>R: Remove from room, broadcast RoomUpdate
|
|
```
|
|
|
|
## Relay Concurrency Model
|
|
|
|
### Threading
|
|
- Multi-threaded Tokio runtime (all available cores, work-stealing scheduler)
|
|
- Task-per-connection: each QUIC connection gets a dedicated `tokio::spawn`
|
|
- Task-per-participant-per-room: each participant's media forwarding loop is independent
|
|
|
|
### Shared State & Locking
|
|
|
|
The `RoomManager` stores `DashMap<String, Arc<RwLock<Room>>>`. The DashMap guard is held only long enough to clone the `Arc`; all per-room operations then acquire the room-level `RwLock`. Concurrent fan-out calls share a read lock; join/leave acquire write lock.
|
|
|
|
| Lock | Protected Data | Hold Duration | Contention |
|
|
|------|---------------|---------------|------------|
|
|
| `DashMap<room_id, Arc<RwLock<Room>>>` | Room registry | Instant (clone Arc only) | Near-zero |
|
|
| `Room` (RwLock) | Participants, quality tiers | ~1ms/packet (read); ~1ms (write on join/leave) | Low (concurrent reads) |
|
|
| `PresenceRegistry` (Mutex) | Fingerprint registrations | ~1ms | Low (join/leave only) |
|
|
| `SessionManager` (Mutex) | Active session tracking | ~1ms | Low |
|
|
| `FederationManager.peer_links` (Mutex) | Peer connections | ~10ms during forward | Per-federation-packet |
|
|
|
|
### Scaling Characteristics
|
|
|
|
- **Many small rooms**: Scales well across all cores (rooms are independent)
|
|
- **Large single room (100+ participants)**: Fan-out reads share RwLock (non-blocking); only join/leave serializes
|
|
- **Federation**: Per-peer tasks scale; `peer_links` lock held during send loop
|
|
|
|
## Client Architecture
|
|
|
|
### Desktop Engine (Tauri)
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Tauri Frontend (HTML/JS)"
|
|
UI[Connect / Call UI]
|
|
SET[Settings Panel]
|
|
end
|
|
|
|
subgraph "Tauri Rust Backend"
|
|
CMD[Tauri Commands<br/>connect/disconnect/toggle]
|
|
ENG[WzpEngine<br/>State Machine]
|
|
end
|
|
|
|
subgraph "Audio I/O"
|
|
CPAL_C[CPAL Capture<br/>or VoiceProcessingIO]
|
|
RING_C[SPSC Ring<br/>Capture]
|
|
RING_P[SPSC Ring<br/>Playout]
|
|
CPAL_P[CPAL Playback<br/>or VoiceProcessingIO]
|
|
end
|
|
|
|
subgraph "Network Tasks (tokio)"
|
|
SEND[Send Loop<br/>encode + encrypt]
|
|
RECV[Recv Loop<br/>decrypt + decode]
|
|
SIG[Signal Handler<br/>room updates]
|
|
end
|
|
|
|
UI --> CMD
|
|
SET --> CMD
|
|
CMD --> ENG
|
|
ENG --> SEND
|
|
ENG --> RECV
|
|
ENG --> SIG
|
|
|
|
CPAL_C --> RING_C --> SEND
|
|
RECV --> RING_P --> CPAL_P
|
|
|
|
style ENG fill:#00b894,color:#fff
|
|
style SEND fill:#0984e3,color:#fff
|
|
style RECV fill:#0984e3,color:#fff
|
|
```
|
|
|
|
Key design decisions:
|
|
- **Lock-free SPSC rings** between audio callbacks and network tasks (no mutex on audio thread)
|
|
- **VoiceProcessingIO** on macOS for OS-level AEC (CPAL uses HalOutput which has no AEC)
|
|
- **Direct playout** -- no jitter buffer on client; audio callback pulls from ring
|
|
- **Release builds required** -- debug builds too slow for real-time audio
|
|
|
|
### Android Engine (Kotlin + JNI)
|
|
|
|
> **Note (2026-05-12):** The Kotlin+JNI Android app (`android/app/`) described below is superseded by the **Tauri 2.x mobile build** (`desktop/src-tauri/` + `crates/wzp-native/`). The Tauri approach uses the same Rust call engine as desktop, with Oboe audio via `wzp-native` cdylib. The Kotlin codebase is maintained for reference but the Tauri build is the live production app.
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Compose UI"
|
|
CALL[CallActivity]
|
|
SET[SettingsScreen]
|
|
VM[CallViewModel]
|
|
end
|
|
|
|
subgraph "Service Layer"
|
|
SVC[CallService<br/>Foreground Service]
|
|
PIPE[AudioPipeline<br/>AudioTrack + AudioRecord]
|
|
end
|
|
|
|
subgraph "Rust Engine (JNI)"
|
|
JNI[WzpEngine.kt<br/>JNI bridge]
|
|
NATIVE[libwzp_android.so<br/>Rust call engine]
|
|
end
|
|
|
|
subgraph "Android Audio"
|
|
REC[AudioRecord<br/>+ AEC effect]
|
|
TRK[AudioTrack<br/>low-latency]
|
|
end
|
|
|
|
CALL --> VM
|
|
SET --> VM
|
|
VM --> SVC
|
|
SVC --> PIPE
|
|
PIPE --> JNI
|
|
JNI --> NATIVE
|
|
|
|
REC --> PIPE
|
|
PIPE --> TRK
|
|
|
|
style NATIVE fill:#00b894,color:#fff
|
|
style SVC fill:#ff9f43,color:#fff
|
|
style PIPE fill:#0984e3,color:#fff
|
|
```
|
|
|
|
Key design decisions:
|
|
- **Foreground service** keeps audio alive when the screen is off
|
|
- **AudioRecord + AudioTrack** with Android's built-in AEC (AudioEffect)
|
|
- **Lock-free AudioRing** with preallocated Vec (not push/pop) to avoid allocation on audio thread
|
|
- **JNI bridge** marshals PCM frames between Kotlin and Rust
|
|
|
|
### CLI Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "CLI Modes"
|
|
LIVE[--live<br/>Mic + Speaker]
|
|
TONE[--send-tone<br/>Sine Generator]
|
|
FILE[--send-file<br/>PCM Reader]
|
|
ECHO[--echo-test<br/>Quality Analysis]
|
|
DRIFT[--drift-test<br/>Clock Analysis]
|
|
SWEEP[--sweep<br/>Buffer Sweep]
|
|
end
|
|
|
|
subgraph "Call Engine"
|
|
ENCODE[CallEncoder<br/>codec + FEC]
|
|
DECODE[CallDecoder<br/>FEC + codec]
|
|
QA[QualityAdapter<br/>adaptive switching]
|
|
end
|
|
|
|
subgraph "Transport"
|
|
QUIC[QuinnTransport<br/>send/recv media + signal]
|
|
HS[Handshake<br/>X25519 + Ed25519]
|
|
end
|
|
|
|
LIVE --> ENCODE
|
|
TONE --> ENCODE
|
|
FILE --> ENCODE
|
|
ENCODE --> QUIC
|
|
QUIC --> DECODE
|
|
ECHO --> ENCODE
|
|
ECHO --> DECODE
|
|
DRIFT --> ENCODE
|
|
HS --> QUIC
|
|
|
|
style ENCODE fill:#00b894,color:#fff
|
|
style DECODE fill:#00b894,color:#fff
|
|
style QUIC fill:#0984e3,color:#fff
|
|
```
|
|
|
|
## Adaptive Quality System
|
|
|
|
```mermaid
|
|
graph LR
|
|
subgraph GOOD ["GOOD (28.8 kbps)"]
|
|
G_C[Opus 24kbps]
|
|
G_F[FEC 20%]
|
|
G_FR[20ms frames]
|
|
end
|
|
|
|
subgraph DEGRADED ["DEGRADED (9.0 kbps)"]
|
|
D_C[Opus 6kbps]
|
|
D_F[FEC 50%]
|
|
D_FR[40ms frames]
|
|
end
|
|
|
|
subgraph CATASTROPHIC ["CATASTROPHIC (2.4 kbps)"]
|
|
C_C[Codec2 1200bps]
|
|
C_F[FEC 100%]
|
|
C_FR[40ms frames]
|
|
end
|
|
|
|
GOOD -->|"loss>10% or RTT>400ms<br/>3 consecutive reports"| DEGRADED
|
|
DEGRADED -->|"loss>40% or RTT>600ms<br/>3 consecutive"| CATASTROPHIC
|
|
CATASTROPHIC -->|"loss<10% and RTT<400ms<br/>10 consecutive"| DEGRADED
|
|
DEGRADED -->|"loss<10% and RTT<400ms<br/>10 consecutive"| GOOD
|
|
|
|
style GOOD fill:#00b894,color:#fff
|
|
style DEGRADED fill:#fdcb6e
|
|
style CATASTROPHIC fill:#e17055,color:#fff
|
|
```
|
|
|
|
Hysteresis prevents tier flapping: **fast downgrade** (3 reports, or 2 on cellular) and **slow upgrade** (10 reports, one tier at a time).
|
|
|
|
## Cryptographic Handshake
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant C as Caller
|
|
participant R as Relay / Callee
|
|
|
|
Note over C: Derive identity from seed<br/>Ed25519 + X25519 via HKDF
|
|
|
|
C->>C: Generate ephemeral X25519
|
|
C->>C: Sign(ephemeral_pub || "call-offer")
|
|
C->>R: CallOffer { identity_pub, ephemeral_pub, signature, profiles }
|
|
|
|
R->>R: Verify Ed25519 signature
|
|
R->>R: Generate ephemeral X25519
|
|
R->>R: shared_secret = DH(eph_b, eph_a)
|
|
R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
|
|
R->>R: Sign(ephemeral_pub || "call-answer")
|
|
R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, profile }
|
|
|
|
C->>C: Verify signature
|
|
C->>C: shared_secret = DH(eph_a, eph_b)
|
|
C->>C: session_key = HKDF(shared_secret)
|
|
|
|
Note over C,R: Both have identical ChaCha20-Poly1305 session key
|
|
C->>R: Encrypted media (QUIC datagrams)
|
|
R->>C: Encrypted media (QUIC datagrams)
|
|
|
|
Note over C,R: Rekey every 65,536 packets<br/>New ephemeral DH + HKDF mix
|
|
```
|
|
|
|
## Identity Model
|
|
|
|
```mermaid
|
|
graph TD
|
|
SEED["32-byte Seed<br/>(BIP39 Mnemonic: 24 words)"] --> HKDF1["HKDF<br/>salt=None<br/>info='warzone-ed25519'"]
|
|
SEED --> HKDF2["HKDF<br/>salt=None<br/>info='warzone-x25519'"]
|
|
|
|
HKDF1 --> ED["Ed25519 SigningKey<br/>Digital Signatures"]
|
|
HKDF2 --> X25519["X25519 StaticSecret<br/>Key Agreement"]
|
|
|
|
ED --> VKEY["Ed25519 VerifyingKey<br/>(Public)"]
|
|
X25519 --> XPUB["X25519 PublicKey<br/>(Public)"]
|
|
|
|
VKEY --> FP["Fingerprint<br/>SHA-256(pubkey) truncated 16 bytes<br/>xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx"]
|
|
|
|
style SEED fill:#6c5ce7,color:#fff
|
|
style FP fill:#fd79a8,color:#fff
|
|
style ED fill:#ee5a24,color:#fff
|
|
style X25519 fill:#00b894,color:#fff
|
|
```
|
|
|
|
## Adaptive Jitter Buffer
|
|
|
|
```mermaid
|
|
graph TD
|
|
PKT[Incoming Packet] --> SEQ{Sequence Check}
|
|
SEQ -->|Duplicate| DROP[Drop + AntiReplay]
|
|
SEQ -->|Valid| BUF["BTreeMap Buffer<br/>(ordered by seq)"]
|
|
|
|
BUF --> ADAPT["AdaptivePlayoutDelay<br/>(EMA jitter tracking)"]
|
|
ADAPT --> TARGET["target_delay =<br/>ceil(jitter_ema / 20ms) + 2"]
|
|
|
|
BUF --> READY{"depth >= target?"}
|
|
READY -->|No| WAIT["Wait (Underrun++)"]
|
|
READY -->|Yes| POP[Pop lowest seq]
|
|
POP --> DECODE[Decode to PCM]
|
|
DECODE --> PLAY[Playout]
|
|
|
|
BUF --> OVERFLOW{"depth > max?"}
|
|
OVERFLOW -->|Yes| EVICT["Drop oldest (Overrun++)"]
|
|
|
|
style ADAPT fill:#fdcb6e
|
|
style DROP fill:#e17055,color:#fff
|
|
style EVICT fill:#e17055,color:#fff
|
|
```
|
|
|
|
## FEC Protection (RaptorQ)
|
|
|
|
```mermaid
|
|
graph LR
|
|
subgraph "Encoder"
|
|
F1[Frame 1] --> BLK["Source Block<br/>(5-10 frames)"]
|
|
F2[Frame 2] --> BLK
|
|
F3[Frame 3] --> BLK
|
|
F4[Frame 4] --> BLK
|
|
F5[Frame 5] --> BLK
|
|
BLK --> SRC[5 Source Symbols]
|
|
BLK --> REP["1-10 Repair Symbols<br/>(ratio dependent)"]
|
|
SRC --> INT["Interleaver<br/>(depth=3)"]
|
|
REP --> INT
|
|
end
|
|
|
|
subgraph "Network"
|
|
INT --> LOSS{Packet Loss}
|
|
LOSS -->|some lost| RCV[Received Symbols]
|
|
end
|
|
|
|
subgraph "Decoder"
|
|
RCV --> DEINT[De-interleaver]
|
|
DEINT --> RAPTORQ["RaptorQ Decoder<br/>Reconstruct from<br/>any K of K+R symbols"]
|
|
RAPTORQ --> OUT[Original Frames]
|
|
end
|
|
|
|
style LOSS fill:#e17055,color:#fff
|
|
style RAPTORQ fill:#00b894,color:#fff
|
|
```
|
|
|
|
## Telemetry Stack
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Relay"
|
|
RM["RelayMetrics<br/>sessions, rooms, packets"]
|
|
SM["SessionMetrics<br/>per-session jitter, loss, RTT"]
|
|
PM["ProbeMetrics<br/>inter-relay RTT, loss"]
|
|
RM --> PROM1["GET /metrics :9090"]
|
|
SM --> PROM1
|
|
PM --> PROM1
|
|
end
|
|
|
|
subgraph "Web Bridge"
|
|
WM["WebMetrics<br/>connections, frames, latency"]
|
|
WM --> PROM2["GET /metrics :8080"]
|
|
end
|
|
|
|
subgraph "Client"
|
|
CM["JitterStats + QualityAdapter"]
|
|
CM --> JSONL["--metrics-file<br/>JSONL 1 line/sec"]
|
|
end
|
|
|
|
PROM1 --> GRAF["Grafana Dashboard<br/>4 rows, 18 panels"]
|
|
PROM2 --> GRAF
|
|
JSONL --> ANALYSIS[Offline Analysis]
|
|
|
|
style GRAF fill:#ff6b6b,color:#fff
|
|
style PROM1 fill:#0984e3,color:#fff
|
|
style PROM2 fill:#0984e3,color:#fff
|
|
```
|
|
|
|
## Deployment Topology
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Region A"
|
|
RA["wzp-relay A<br/>:4433 UDP"]
|
|
WA["wzp-web A<br/>:8080 HTTPS"]
|
|
WA --> RA
|
|
end
|
|
|
|
subgraph "Region B"
|
|
RB["wzp-relay B<br/>:4433 UDP"]
|
|
WB["wzp-web B<br/>:8080 HTTPS"]
|
|
WB --> RB
|
|
end
|
|
|
|
RA <-->|"Probe 1/s + Federation"| RB
|
|
|
|
BA[Browser A] -->|WSS| WA
|
|
BB[Browser B] -->|WSS| WB
|
|
CA[CLI Client] -->|QUIC| RA
|
|
DA[Desktop Client] -->|QUIC| RA
|
|
MA[Android Client] -->|QUIC| RB
|
|
|
|
PROM[Prometheus] -->|scrape| RA
|
|
PROM -->|scrape| RB
|
|
PROM -->|scrape| WA
|
|
PROM --> GRAF[Grafana]
|
|
|
|
FC[featherChat Server] -->|auth validate| RA
|
|
FC -->|auth validate| RB
|
|
|
|
style RA fill:#ff9f43,color:#fff
|
|
style RB fill:#ff9f43,color:#fff
|
|
style GRAF fill:#ff6b6b,color:#fff
|
|
style FC fill:#fd79a8,color:#fff
|
|
```
|
|
|
|
## Session State Machine
|
|
|
|
```mermaid
|
|
stateDiagram-v2
|
|
[*] --> Idle
|
|
Idle --> Connecting: connect()
|
|
Connecting --> Handshaking: QUIC established
|
|
Handshaking --> Active: CallOffer/Answer complete
|
|
Active --> Rekeying: 65,536 packets
|
|
Rekeying --> Active: new key derived
|
|
Active --> Closed: Hangup / Error / Timeout
|
|
Rekeying --> Closed: Error
|
|
Connecting --> Closed: Timeout
|
|
Handshaking --> Closed: Signature fail
|
|
|
|
note right of Active: Media flows (encrypted)
|
|
note right of Rekeying: Media continues while rekeying
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
warzonePhone/
|
|
├── Cargo.toml # Workspace root
|
|
├── crates/
|
|
│ ├── wzp-proto/ # Protocol types, traits, wire format
|
|
│ │ └── src/
|
|
│ │ ├── codec_id.rs # CodecId, QualityProfile
|
|
│ │ ├── error.rs # Error types
|
|
│ │ ├── jitter.rs # JitterBuffer, AdaptivePlayoutDelay
|
|
│ │ ├── packet.rs # MediaHeader, MiniHeader, TrunkFrame, SignalMessage
|
|
│ │ ├── quality.rs # Tier, AdaptiveQualityController
|
|
│ │ ├── session.rs # SessionState machine
|
|
│ │ └── traits.rs # AudioEncoder, FecEncoder, CryptoSession, etc.
|
|
│ ├── wzp-codec/ # Audio codecs
|
|
│ │ └── src/
|
|
│ │ ├── adaptive.rs # AdaptiveEncoder/Decoder (Opus + Codec2)
|
|
│ │ ├── denoise.rs # NoiseSuppressor (RNNoise / nnnoiseless)
|
|
│ │ └── silence.rs # SilenceDetector, ComfortNoise
|
|
│ ├── wzp-fec/ # Forward error correction
|
|
│ │ └── src/
|
|
│ │ ├── encoder.rs # RaptorQFecEncoder
|
|
│ │ ├── decoder.rs # RaptorQFecDecoder
|
|
│ │ └── interleave.rs # Interleaver (burst protection)
|
|
│ ├── wzp-crypto/ # Cryptography + identity
|
|
│ │ └── src/
|
|
│ │ ├── identity.rs # Seed, Fingerprint, hash_room_name
|
|
│ │ ├── handshake.rs # WarzoneKeyExchange (X25519 + Ed25519)
|
|
│ │ ├── session.rs # ChaChaSession (ChaCha20-Poly1305)
|
|
│ │ ├── nonce.rs # Deterministic nonce construction
|
|
│ │ ├── anti_replay.rs # Sliding window replay protection
|
|
│ │ └── rekey.rs # Forward secrecy rekeying
|
|
│ ├── wzp-transport/ # QUIC transport layer
|
|
│ │ └── src/lib.rs # QuinnTransport, send/recv media/signal/trunk
|
|
│ ├── wzp-video/ # Video codecs + framer
|
|
│ │ └── src/
|
|
│ │ ├── factory.rs # VideoEncoder factory (platform dispatch)
|
|
│ │ ├── framer.rs # NAL fragmentation (H.264/H.265)
|
|
│ │ ├── depacketizer.rs # NAL reassembly, access unit emit
|
|
│ │ ├── controller.rs # VideoQualityController
|
|
│ │ ├── simulcast.rs # Simulcast layer management
|
|
│ │ ├── encoder_mode.rs # Encoder mode selection
|
|
│ │ ├── av1_obu.rs # AV1 OBU framing + depacketizer
|
|
│ │ ├── dav1d.rs # dav1d AV1 software decoder
|
|
│ │ ├── svt_av1.rs # SVT-AV1 software encoder (non-Android)
|
|
│ │ ├── videotoolbox.rs # VideoToolbox H.265 + AV1 (macOS)
|
|
│ │ ├── mediacodec.rs # MediaCodec H.264/H.265/AV1 (Android, NDK 0.9 migration pending)
|
|
│ │ └── nack.rs # NACK sender/receiver framework
|
|
│ ├── wzp-relay/ # Relay daemon
|
|
│ │ └── src/
|
|
│ │ ├── main.rs # CLI, connection loop, auth + handshake
|
|
│ │ ├── config.rs # RelayConfig, TOML parsing
|
|
│ │ ├── room.rs # RoomManager, TrunkedForwarder
|
|
│ │ ├── pipeline.rs # RelayPipeline (forward mode)
|
|
│ │ ├── session_mgr.rs # SessionManager (limits, lifecycle)
|
|
│ │ ├── auth.rs # featherChat token validation
|
|
│ │ ├── handshake.rs # Relay-side accept_handshake
|
|
│ │ ├── metrics.rs # Prometheus RelayMetrics + per-session
|
|
│ │ ├── probe.rs # Inter-relay probes + ProbeMesh
|
|
│ │ ├── federation.rs # FederationManager, global rooms
|
|
│ │ ├── presence.rs # PresenceRegistry
|
|
│ │ ├── route.rs # RouteResolver
|
|
│ │ ├── trunk.rs # TrunkBatcher
|
|
│ │ ├── audio_scorer.rs # Per-stream audio quality scoring
|
|
│ │ ├── response_policy.rs # Relay response policy (rate-limit, drop)
|
|
│ │ ├── verdict.rs # Verdict enum (Allow/RateLimit/Drop/Malicious)
|
|
│ │ ├── video_scorer.rs # VideoScorer (legitimacy scoring, keyframe regularity)
|
|
│ │ └── ws.rs # WebSocket handler for browser clients
|
|
│ ├── wzp-client/ # Call engine + CLI
|
|
│ │ └── src/
|
|
│ │ ├── cli.rs # CLI arg parsing + main
|
|
│ │ ├── call.rs # CallEncoder, CallDecoder, QualityAdapter
|
|
│ │ ├── handshake.rs # Client-side perform_handshake
|
|
│ │ ├── featherchat.rs # CallSignal bridge
|
|
│ │ ├── echo_test.rs # Automated echo quality test
|
|
│ │ ├── drift_test.rs # Clock drift measurement
|
|
│ │ ├── sweep.rs # Jitter buffer parameter sweep
|
|
│ │ ├── metrics.rs # JSONL telemetry writer
|
|
│ │ └── bench.rs # Component benchmarks
|
|
│ └── wzp-web/ # Browser bridge
|
|
│ ├── src/
|
|
│ │ ├── main.rs # Axum server, WS handler, TLS
|
|
│ │ └── metrics.rs # Prometheus WebMetrics
|
|
│ └── static/
|
|
│ ├── index.html # SPA UI (room, PTT, level meter)
|
|
│ └── audio-processor.js # AudioWorklet (capture + playback)
|
|
├── android/ # Android app (Kotlin + JNI)
|
|
│ └── app/src/main/java/com/wzp/
|
|
│ ├── audio/ # AudioPipeline, AudioRouteManager
|
|
│ ├── engine/ # WzpEngine (JNI), CallStats, WzpCallback
|
|
│ ├── ui/ # CallActivity, SettingsScreen, Identicon
|
|
│ ├── data/ # SettingsRepository
|
|
│ ├── net/ # RelayPinger
|
|
│ ├── service/ # CallService (foreground)
|
|
│ └── debug/ # DebugReporter
|
|
├── desktop/ # Desktop app (Tauri)
|
|
│ └── dist/ # Built frontend (HTML/JS/CSS)
|
|
├── deps/featherchat/ # Git submodule
|
|
├── docs/ # Documentation
|
|
├── scripts/ # Build scripts
|
|
│ └── build-linux.sh # Hetzner VM build
|
|
└── tools/ # Development tools
|
|
```
|
|
|
|
## Test Coverage
|
|
|
|
702 tests across all crates (excluding wzp-android), 0 failures:
|
|
|
|
| Crate | Tests | Key Coverage |
|
|
|-------|-------|-------------|
|
|
| wzp-proto | 112 | Wire format, jitter buffer, quality tiers, mini-frames, trunking |
|
|
| wzp-codec | 69 | Opus/Codec2 roundtrip, silence detection, noise suppression |
|
|
| wzp-fec | 21 | RaptorQ encode/decode, loss recovery, interleaving |
|
|
| wzp-crypto | 64 | Encrypt/decrypt, handshake, anti-replay, featherChat identity |
|
|
| wzp-transport | 11 | QUIC connection setup, path monitoring |
|
|
| wzp-relay | 137 | Room ACL, session mgmt, metrics, probes, mesh, trunking, scoring, verdict |
|
|
| wzp-video | 88 | NAL framing, AV1 OBU, simulcast, quality controller, NACK |
|
|
| wzp-client | 170 | Encoder/decoder, quality adapter, silence, drift, sweep |
|
|
| wzp-web | 2 | Metrics |
|
|
| wzp-native | 0 | Native platform bindings (no unit tests) |
|
|
|
|
## Audio Backend Architecture (Platform Matrix)
|
|
|
|
WarzonePhone's audio I/O goes through one of four backends depending on the target platform and feature flags. All backends expose the same public API (`AudioCapture::start() → AudioCapture { ring(), stop() }`) via conditional re-exports in `crates/wzp-client/src/lib.rs`, so the `CallEngine` above the audio layer doesn't know or care which backend is running.
|
|
|
|
```
|
|
┌─────────────────────────────────────────────┐
|
|
│ CallEngine (platform-agnostic) │
|
|
│ reads PCM from AudioCapture::ring() │
|
|
│ writes PCM to AudioPlayback::ring() │
|
|
└────────────────────┬────────────────────────┘
|
|
│
|
|
┌─────────────────────┼─────────────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌───────────────┐ ┌────────────────┐ ┌───────────────┐
|
|
│ audio_io │ │ audio_vpio │ │ audio_wasapi │
|
|
│ (CPAL) │ │ (Core Audio │ │ (Windows │
|
|
│ │ │ VoiceProc IO) │ │ IAudioClient2│
|
|
│ All platforms │ │ macOS only │ │ Windows │
|
|
│ (baseline) │ │ feature=vpio │ │ feature= │
|
|
│ │ │ │ │ windows-aec │
|
|
└───────────────┘ └────────────────┘ └───────────────┘
|
|
│
|
|
▼ on Android only
|
|
┌───────────────┐
|
|
│ wzp-native │
|
|
│ (Oboe bridge │
|
|
│ via dlopen) │
|
|
│ │
|
|
│ Android only │
|
|
│ libloading │
|
|
└───────────────┘
|
|
```
|
|
|
|
### Backend selection matrix
|
|
|
|
| Platform | Capture | Playback | OS AEC | Feature flags |
|
|
|---|---|---|---|---|
|
|
| macOS | VoiceProcessingIO (native Core Audio) | CPAL | **Yes** — Apple's hardware-accelerated AEC (same AEC as FaceTime, iMessage audio, Voice Memos) | `audio`, `vpio` |
|
|
| Windows (AEC build) | Direct WASAPI with `AudioCategory_Communications` | CPAL | **Yes** — Windows routes the capture stream through the driver's communications APO chain (AEC + NS + AGC), driver-dependent quality | `audio`, `windows-aec` |
|
|
| Windows (baseline) | CPAL (WASAPI shared mode) | CPAL | No | `audio` |
|
|
| Linux | CPAL (ALSA / PulseAudio) | CPAL | No | `audio` |
|
|
| Android (Tauri Mobile) | Oboe via `wzp-native` cdylib, `Usage::VoiceCommunication` + `MODE_IN_COMMUNICATION` | Same Oboe stream | Depends on device (some Android devices apply AEC to the voice-communication stream, most do not) | none (`wzp-client` compiled with `default-features = false`) |
|
|
|
|
### Why `wzp-native` is a standalone cdylib
|
|
|
|
On Android, the audio backend lives in a separate cdylib crate (`crates/wzp-native`) that `wzp-desktop`'s lib crate loads at runtime via `libloading`. It is **not** linked as a regular Rust dep.
|
|
|
|
This is deliberate. rust-lang/rust#104707 documents that a crate with `crate-type = ["cdylib", "staticlib"]` leaks non-exported symbols from the staticlib into the cdylib. On Android, that caused Bionic's private `__init_tcb` / `pthread_create` symbols to be bound LOCALLY inside our `.so` instead of resolved dynamically against `libc.so` at `dlopen` time — which crashed the app at launch as soon as `tao` tried to `std::thread::spawn()` from the JNI `onCreate` callback.
|
|
|
|
Keeping `wzp-native` in its own cdylib and loading it via `libloading` means:
|
|
|
|
1. The app's own `.so` has `crate-type = ["cdylib", "rlib"]` only — no `staticlib`, no symbol leak.
|
|
2. `libwzp_native.so` is loaded via `System.loadLibrary` from the JVM side (or `dlopen` from Rust), which triggers the normal Bionic resolver and binds all private symbols against `libc.so` at load time.
|
|
3. The C/C++ Oboe bridge is fully isolated inside `libwzp_native.so`'s symbol space — no chance of its archives leaking into `wzp-desktop`'s `.so`.
|
|
|
|
See `docs/BRANCH-android-rewrite.md` for the full incident postmortem and `docs/incident-tauri-android-init-tcb.md` for the debug log.
|
|
|
|
### Vendored `audiopus_sys` for libopus / clang-cl cross-compile
|
|
|
|
The workspace root carries a vendored copy of `audiopus_sys` at `vendor/audiopus_sys/` with a patched `opus/CMakeLists.txt`. This is needed because libopus 1.3.1 gates its per-file `-msse4.1` / `-mssse3` `COMPILE_FLAGS` behind `if(NOT MSVC)`, and under `clang-cl` (used by `cargo-xwin` for Windows cross-compiles) CMake sets `MSVC=1` unconditionally — so the SIMD source files compile without the required target feature and fail to link the intrinsic `always_inline` functions.
|
|
|
|
The patch introduces an `MSVC_CL` variable that is true only for real `cl.exe` (distinguished via `CMAKE_C_COMPILER_ID STREQUAL "MSVC"`), and flips the eight `if(NOT MSVC)` SIMD guards to `if(NOT MSVC_CL)` so clang-cl gets the GCC-style per-file flags. Wired in via `[patch.crates-io] audiopus_sys = { path = "vendor/audiopus_sys" }` at the workspace root.
|
|
|
|
This does not affect macOS or Linux builds — on those platforms `MSVC=0` everywhere so the patched logic behaves identically to upstream.
|
|
|
|
Upstream tracking: xiph/opus#256, xiph/opus PR #257 (both stale).
|
|
|
|
## Network Awareness (Android)
|
|
|
|
The adaptive quality controller (`AdaptiveQualityController` in `wzp-proto`) supports proactive network-aware adaptation via `signal_network_change(NetworkContext)`. On Android, this is fed by `NetworkMonitor.kt` which wraps `ConnectivityManager.NetworkCallback`.
|
|
|
|
```
|
|
ConnectivityManager
|
|
│ onCapabilitiesChanged / onLost
|
|
▼
|
|
NetworkMonitor.kt ──classify──► type: Int (WiFi=0, LTE=1, 5G=2, 3G=3)
|
|
│ onNetworkChanged(type, bw)
|
|
▼
|
|
CallViewModel ──► WzpEngine.onNetworkChanged()
|
|
│ JNI
|
|
▼
|
|
jni_bridge.rs
|
|
│
|
|
▼
|
|
EngineState.pending_network_type (AtomicU8, lock-free)
|
|
│ polled every ~20ms
|
|
▼
|
|
recv task: quality_ctrl.signal_network_change(ctx)
|
|
│
|
|
├─ WiFi → Cellular: preemptive 1-tier downgrade
|
|
├─ Any change: 10s FEC boost (+0.2 ratio)
|
|
└─ Cellular: faster downgrade thresholds (2 vs 3)
|
|
```
|
|
|
|
Cellular generation is approximated from `getLinkDownstreamBandwidthKbps()` to avoid requiring `READ_PHONE_STATE` permission.
|
|
|
|
## Audio Routing (Android)
|
|
|
|
Both Android app variants support 3-way audio routing: **Earpiece → Speaker → Bluetooth SCO**.
|
|
|
|
### Audio Mode Lifecycle
|
|
|
|
`MODE_IN_COMMUNICATION` is set by the Rust call engine (via JNI `AudioManager.setMode()`) right before Oboe streams open — NOT at app launch. Restored to `MODE_NORMAL` when the call ends. This prevents hijacking system audio routing (music, BT A2DP) before a call is active.
|
|
|
|
### Native Kotlin App
|
|
|
|
`AudioRouteManager.kt` handles device detection (via `AudioDeviceCallback`), SCO lifecycle, and auto-fallback on BT disconnect. `CallViewModel.cycleAudioRoute()` cycles through available routes.
|
|
|
|
### Tauri Desktop App
|
|
|
|
`android_audio.rs` provides JNI bridges to `AudioManager` for speakerphone and Bluetooth SCO control. After each route change, Oboe streams are stopped and restarted via `spawn_blocking`.
|
|
|
|
```
|
|
User tap ──► cycleAudioRoute()
|
|
│
|
|
├─ Earpiece: setSpeakerphoneOn(false) + clearCommunicationDevice()
|
|
├─ Speaker: setSpeakerphoneOn(true)
|
|
└─ BT SCO: setCommunicationDevice(bt_device) [API 31+]
|
|
│ fallback: startBluetoothSco() [API < 31]
|
|
▼
|
|
Oboe stop + start_bt() for BT / start() for others
|
|
```
|
|
|
|
### BT SCO and Oboe
|
|
|
|
BT SCO only supports 8/16kHz. When `bt_active=1`, Oboe capture skips `setSampleRate(48000)` and `setInputPreset(VoiceCommunication)`, letting the system choose the native BT rate. Oboe's `SampleRateConversionQuality::Best` bridges to our 48kHz ring buffers. Playout uses `Usage::Media` in BT mode to avoid conflicts with the communication device routing.
|
|
|
|
### Hangup Signal Fix
|
|
|
|
`SignalMessage::Hangup` now carries an optional `call_id` field. The relay uses it to end only the specific call instead of broadcasting to all active calls for the user — preventing a race where a hangup for call 1 kills a newly-placed call 2.
|
|
|
|
## Phase 8: Tailscale-Inspired NAT Traversal (2026-04-14)
|
|
|
|
Five new modules in `wzp-client` bring NAT traversal capability close to Tailscale's approach:
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────────┐
|
|
│ wzp-client NAT Traversal Stack │
|
|
│ │
|
|
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
|
|
│ │ stun.rs │ │ portmap.rs │ │ reflect.rs (existing) │ │
|
|
│ │ RFC 5389 │ │ NAT-PMP │ │ Relay-based STUN │ │
|
|
│ │ Public │ │ PCP │ │ Multi-relay NAT detect │ │
|
|
│ │ STUN │ │ UPnP IGD │ │ │ │
|
|
│ └──────┬──────┘ └──────┬───────┘ └────────────┬─────────────┘ │
|
|
│ │ │ │ │
|
|
│ └────────────────┼────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌───────▼────────┐ │
|
|
│ │ ice_agent.rs │ │
|
|
│ │ Gather / Re- │ │
|
|
│ │ gather / Apply│ │
|
|
│ └───────┬────────┘ │
|
|
│ │ │
|
|
│ ┌───────────┼───────────┐ │
|
|
│ │ │ │ │
|
|
│ ┌───────▼───┐ ┌───▼───┐ ┌───▼──────────┐ │
|
|
│ │ netcheck │ │ dual_ │ │ relay_map.rs │ │
|
|
│ │ .rs │ │ path │ │ RTT-sorted │ │
|
|
│ │ Diagnostic│ │ .rs │ │ relay list │ │
|
|
│ └───────────┘ │ Race │ └──────────────┘ │
|
|
│ └───────┘ │
|
|
└──────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Candidate Types
|
|
|
|
| Type | Source | Priority | When Used |
|
|
|------|--------|----------|-----------|
|
|
| Host | `local_host_candidates()` | 1 (highest) | Same-LAN peers |
|
|
| Port-mapped | `portmap::acquire_port_mapping()` | 2 | Router supports NAT-PMP/PCP/UPnP |
|
|
| Server-reflexive | `stun::discover_reflexive()` or relay Reflect | 3 | Cone NAT |
|
|
| Relay | Relay address (fallback) | 4 (lowest) | Always available |
|
|
|
|
### Signal Flow for Mid-Call Re-Gathering
|
|
|
|
```
|
|
Network change (WiFi → cellular)
|
|
│
|
|
▼
|
|
IceAgent::re_gather()
|
|
├── stun::discover_reflexive()
|
|
├── portmap::acquire_port_mapping()
|
|
└── local_host_candidates()
|
|
│
|
|
▼
|
|
SignalMessage::CandidateUpdate { generation: N+1, ... }
|
|
│
|
|
▼ (via relay)
|
|
Peer's IceAgent::apply_peer_update()
|
|
│
|
|
▼
|
|
PeerCandidates { reflexive, local, mapped }
|
|
│
|
|
▼
|
|
dual_path::race() with new candidates (TODO: transport hot-swap)
|
|
```
|
|
|
|
### New SignalMessage Variants & Fields
|
|
|
|
| Signal | New Fields | Purpose |
|
|
|--------|-----------|---------|
|
|
| `DirectCallOffer` | `caller_mapped_addr` | Port-mapped address from NAT-PMP/PCP/UPnP |
|
|
| `DirectCallAnswer` | `callee_mapped_addr` | Same, callee side |
|
|
| `CallSetup` | `peer_mapped_addr` | Relay cross-wires mapped addr to peer |
|
|
| `CandidateUpdate` | (new variant) | Mid-call candidate re-gathering |
|
|
| `RegisterPresenceAck` | `relay_region`, `available_relays` | Relay mesh metadata for auto-selection |
|
|
|
|
All new fields use `#[serde(default, skip_serializing_if)]` for backward compatibility with older clients/relays.
|
|
|
|
### Hard NAT Port Prediction
|
|
|
|
For symmetric NATs that don't support port mapping, the system detects the NAT's port allocation pattern:
|
|
|
|
```
|
|
Single socket → 5 STUN servers (sequential probes)
|
|
│
|
|
▼
|
|
Observed ports: [40001, 40002, 40003, 40004, 40005]
|
|
│
|
|
▼
|
|
classify_port_allocation() → Sequential { delta: 1 }
|
|
│
|
|
▼
|
|
predict_ports(last=40005, delta=1, offset=0, spread=2)
|
|
→ [40004, 40005, 40006, 40007, 40008]
|
|
│
|
|
▼
|
|
HardNatProbe signal → peer
|
|
│
|
|
▼
|
|
Peer dials predicted port range in parallel
|
|
```
|
|
|
|
| Pattern | Detection | Traversal Strategy |
|
|
|---------|-----------|-------------------|
|
|
| Port-preserving | All probes return same port | Standard hole-punch |
|
|
| Sequential (delta=N) | Consistent N-increment | Predict next port, dial range |
|
|
| Random | No pattern | Birthday attack or relay |
|
|
| Unknown | < 3 probes succeeded | Relay fallback |
|
|
|
|
The classifier tolerates:
|
|
- **Jitter**: ±1 from dominant delta (concurrent flow grabbed a port)
|
|
- **Wraparound**: 65535 → 1 treated as delta=+2, not -65534
|
|
- **Noise**: 60% threshold — if most deltas agree, call it sequential
|