Documents the feat/desktop-audio-rewrite branch story end-to-end: - Purpose: shared codebase with android-rewrite via Tauri, platform- specific audio backends via target-dep sections + feature flags - Audio backend matrix: CPAL baseline + macOS VPIO + Windows WASAPI AudioCategory_Communications - Recent work: desktop direct calling feature with history dedup, macOS VPIO integration, Windows cross-compile via cargo-xwin, the libopus/clang-cl vendored audiopus_sys fix, icon.ico generation, and the WASAPI communications capture backend (task #24) - Build pipelines: native cargo on macOS/Linux, Docker on SepehrHomeserverdk for Windows, Hetzner Cloud alternative - Testing procedures for direct calling parity and Windows AEC A/B - Known quirks: vendor path relative, cargo-xwin override.cmake clobber, WebView2 runtime prerequisite, 2024 edition unsafe lint warnings Also appends shared-doc sections (identical on both branches): - ARCHITECTURE.md: "Audio Backend Architecture (Platform Matrix)" - ADMINISTRATION.md: "Build Pipelines" - USER_GUIDE.md: "Direct 1:1 Calling" and "Windows AEC Variants" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
34 KiB
WarzonePhone Architecture
Custom lossy VoIP protocol built in Rust. E2E encrypted, FEC-protected, adaptive quality, designed for hostile network conditions.
System Overview
graph TB
subgraph "Client A (Desktop / Android / CLI)"
MIC[Microphone] --> DN[NoiseSuppressor<br/>RNNoise ML]
DN --> SD[SilenceDetector<br/>VAD + Hangover]
SD --> ENC[CallEncoder<br/>Opus / Codec2]
ENC --> FEC_E[FEC Encoder<br/>RaptorQ]
FEC_E --> CRYPT_E[ChaCha20-Poly1305<br/>Encrypt]
CRYPT_E --> QUIC_S[QUIC Datagram<br/>Send]
QUIC_R[QUIC Datagram<br/>Recv] --> CRYPT_D[ChaCha20-Poly1305<br/>Decrypt]
CRYPT_D --> FEC_D[FEC Decoder<br/>RaptorQ]
FEC_D --> JIT[JitterBuffer<br/>Adaptive Playout]
JIT --> DEC[CallDecoder<br/>Opus / Codec2]
DEC --> SPK[Speaker]
end
subgraph "Relay (SFU)"
ACCEPT[Accept QUIC] --> AUTH{Auth?}
AUTH -->|token| VALIDATE[POST /v1/auth/validate]
AUTH -->|no auth| HS
VALIDATE --> HS[Crypto Handshake<br/>X25519 + Ed25519]
HS --> ROOM[Room Manager<br/>Named Rooms via SNI]
ROOM --> FWD[Forward to<br/>Other Participants]
end
subgraph "Client B"
B_SPK[Speaker]
B_MIC[Microphone]
end
QUIC_S -->|UDP / QUIC| ACCEPT
FWD -->|UDP / QUIC| QUIC_R
B_MIC -.->|same pipeline| ACCEPT
FWD -.->|same pipeline| B_SPK
style MIC fill:#4a9eff,color:#fff
style SPK fill:#4a9eff,color:#fff
style B_MIC fill:#4a9eff,color:#fff
style B_SPK fill:#4a9eff,color:#fff
style ROOM fill:#ff9f43,color:#fff
style CRYPT_E fill:#ee5a24,color:#fff
style CRYPT_D fill:#ee5a24,color:#fff
Crate Dependency Graph
graph TD
PROTO["wzp-proto<br/>Types, Traits, Wire Format"]
CODEC["wzp-codec<br/>Opus + Codec2 + RNNoise"]
FEC["wzp-fec<br/>RaptorQ FEC"]
CRYPTO["wzp-crypto<br/>ChaCha20 + Identity"]
TRANSPORT["wzp-transport<br/>QUIC / Quinn"]
RELAY["wzp-relay<br/>Relay Daemon"]
CLIENT["wzp-client<br/>CLI + Call Engine"]
WEB["wzp-web<br/>Browser Bridge"]
PROTO --> CODEC
PROTO --> FEC
PROTO --> CRYPTO
PROTO --> TRANSPORT
CODEC --> CLIENT
FEC --> CLIENT
CRYPTO --> CLIENT
TRANSPORT --> CLIENT
CODEC --> RELAY
FEC --> RELAY
CRYPTO --> RELAY
TRANSPORT --> RELAY
CLIENT --> WEB
TRANSPORT --> WEB
CRYPTO --> WEB
FC["warzone-protocol<br/>featherChat Identity"] -.->|path dep| CRYPTO
style PROTO fill:#6c5ce7,color:#fff
style RELAY fill:#ff9f43,color:#fff
style CLIENT fill:#00b894,color:#fff
style WEB fill:#0984e3,color:#fff
style FC fill:#fd79a8,color:#fff
Star pattern: Each leaf crate (wzp-codec, wzp-fec, wzp-crypto, wzp-transport) depends only on wzp-proto. No leaf depends on another leaf. Integration crates (wzp-relay, wzp-client, wzp-web) depend on all leaves.
Audio Encode Pipeline
sequenceDiagram
participant Mic as Microphone<br/>(48kHz)
participant Ring as SPSC Ring<br/>(lock-free)
participant RNN as RNNoise<br/>(2 x 480)
participant VAD as SilenceDetector
participant Codec as Opus / Codec2
participant FEC as RaptorQ FEC
participant INT as Interleaver<br/>(depth=3)
participant HDR as MediaHeader<br/>(12B or Mini 4B)
participant Enc as ChaCha20-Poly1305
participant QUIC as QUIC Datagram
Mic->>Ring: f32 x 512 (macOS callback)
Ring->>Ring: Accumulate to 960 samples
Ring->>RNN: PCM i16 x 960 (20ms frame)
RNN->>VAD: Denoised audio
alt Speech active (or hangover)
VAD->>Codec: Encode active frame
else Silence (>100ms)
VAD->>Codec: ComfortNoise (every 200ms)
end
Codec->>FEC: Compressed bytes (pad to 256B symbol)
FEC->>FEC: Accumulate block (5-10 symbols)
FEC->>INT: Source + repair symbols
INT->>HDR: Interleaved packets
HDR->>Enc: Header as AAD
Enc->>QUIC: Encrypted payload + 16B tag
Key Details
- macOS delivers 512 f32 samples per callback (not configurable to 960)
- Ring buffer accumulates to 960 samples (20ms at 48 kHz) for codec frame
- RNNoise processes 2 x 480 samples (ML-based noise suppression via nnnoiseless)
- Silence detection uses VAD + 100ms hangover before switching to ComfortNoise
- FEC symbols are padded to 256 bytes with a 2-byte LE length prefix
- MiniHeaders (4 bytes) replace full headers (12 bytes) for 49 of every 50 frames
Audio Decode Pipeline
sequenceDiagram
participant QUIC as QUIC Datagram
participant Dec as ChaCha20-Poly1305
participant AR as Anti-Replay<br/>(sliding window)
participant HDR as Header Parse
participant DEINT as De-interleaver
participant FEC as RaptorQ FEC<br/>(reconstruct)
participant JIT as JitterBuffer<br/>(BTreeMap)
participant Codec as Opus / Codec2
participant Ring as SPSC Ring<br/>(lock-free)
participant SPK as Speaker
QUIC->>Dec: Encrypted packet
Dec->>AR: Decrypt (header = AAD)
AR->>AR: Check seq window (reject replay)
AR->>HDR: Verified packet
HDR->>DEINT: MediaHeader + payload
DEINT->>FEC: Reordered symbols by block
FEC->>FEC: Attempt decode (need K of K+R)
FEC->>JIT: Recovered audio frames
JIT->>JIT: BTreeMap ordered by seq
JIT->>JIT: Wait until depth >= target
JIT->>Codec: Pop lowest seq frame
Codec->>Ring: PCM i16 x 960
Ring->>SPK: Audio callback pulls samples
Key Details
- Anti-replay uses a 64-packet sliding window to reject duplicates
- FEC decoder needs any K of K+R symbols to reconstruct a block
- Jitter buffer target: 10 packets (200ms) for client, 50 packets (1s) for relay
- Desktop client uses direct playout (no jitter buffer) with lock-free ring
- Codec2 frames at 8 kHz are resampled to 48 kHz transparently
Relay SFU Forwarding
graph TB
subgraph "Room Mode (Default SFU)"
C1[Client 1<br/>Alice] -->|"QUIC SNI=room-hash"| RM[Room Manager]
C2[Client 2<br/>Bob] -->|"QUIC SNI=room-hash"| RM
C3[Client 3<br/>Charlie] -->|"QUIC SNI=room-hash"| RM
RM --> R1["Room 'podcast'"]
R1 -->|"fan-out (skip sender)"| C1
R1 -->|"fan-out (skip sender)"| C2
R1 -->|"fan-out (skip sender)"| C3
end
subgraph "Forward Mode (--remote)"
C4[Client] -->|QUIC| RA[Relay A]
RA -->|"FEC decode<br/>jitter buffer<br/>FEC re-encode"| RB[Relay B<br/>--remote]
RB -->|QUIC| C5[Client]
end
subgraph "Probe Mode (--probe)"
PA[Relay A] -->|"Ping 1/s<br/>~50 bytes"| PB[Relay B]
PB -->|Pong| PA
PA --> PM[Prometheus<br/>RTT / Loss / Jitter]
end
style RM fill:#ff9f43,color:#fff
style R1 fill:#fdcb6e
style PM fill:#0984e3,color:#fff
SFU Fan-out Rules
- Each incoming datagram is forwarded to all other participants in the room
- The sender is excluded from fan-out (no echo)
- If one send fails, the relay continues to the next participant (best-effort)
- The relay never decodes or re-encodes audio (preserves E2E encryption)
- With trunking enabled, packets to the same receiver are batched into TrunkFrames (flushed every 5ms)
Federation Topology
graph TB
subgraph "Relay A (EU)"
A_R["Room Manager"]
A_F["Federation<br/>Manager"]
A1["Alice (local)"]
A2["Bob (local)"]
end
subgraph "Relay B (US)"
B_R["Room Manager"]
B_F["Federation<br/>Manager"]
B1["Charlie (local)"]
end
subgraph "Relay C (APAC)"
C_R["Room Manager"]
C_F["Federation<br/>Manager"]
C1["Dave (local)"]
end
A1 -->|media| A_R
A2 -->|media| A_R
B1 -->|media| B_R
C1 -->|media| C_R
A_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| B_F
A_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| C_F
B_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| C_F
A_R --> A_F
B_R --> B_F
C_R --> C_F
style A_F fill:#6c5ce7,color:#fff
style B_F fill:#6c5ce7,color:#fff
style C_F fill:#6c5ce7,color:#fff
style A_R fill:#ff9f43,color:#fff
style B_R fill:#ff9f43,color:#fff
style C_R fill:#ff9f43,color:#fff
Federation Protocol Flow
sequenceDiagram
participant RA as Relay A
participant RB as Relay B
Note over RA: Startup: connect to configured peers
RA->>RB: QUIC connect (SNI="_federation")
RA->>RB: FederationHello { tls_fingerprint }
RB->>RB: Verify fingerprint against [[trusted]]
Note over RA,RB: Federation link established
Note over RA: Alice joins global room "podcast"
RA->>RB: GlobalRoomActive { room: "podcast" }
Note over RB: Charlie joins global room "podcast"
RB->>RA: GlobalRoomActive { room: "podcast" }
Note over RA,RB: Media bridging active
loop Every media packet in global room
RA->>RB: [room_hash:8][encrypted_media]
RB->>RA: [room_hash:8][encrypted_media]
end
Note over RA: Last local participant leaves
RA->>RB: GlobalRoomInactive { room: "podcast" }
Wire Formats
MediaHeader (12 bytes)
Byte 0: [V:1][T:1][CodecID:4][Q:1][FecRatioHi:1]
Byte 1: [FecRatioLo:6][unused:2]
Bytes 2-3: sequence (u16 BE)
Bytes 4-7: timestamp_ms (u32 BE)
Byte 8: fec_block_id (u8)
Byte 9: fec_symbol_idx (u8)
Byte 10: reserved
Byte 11: csrc_count
| Field | Bits | Description |
|---|---|---|
| V (version) | 1 | Protocol version (0 = v1) |
| T (is_repair) | 1 | 1 = FEC repair packet, 0 = source media |
| CodecID | 4 | Codec identifier (0-8, see table below) |
| Q | 1 | 1 = QualityReport trailer appended |
| FecRatio | 7 | FEC ratio encoded as 0-127 mapping to 0.0-2.0 |
| sequence | 16 | Wrapping packet sequence number |
| timestamp_ms | 32 | Milliseconds since session start |
| fec_block_id | 8 | FEC source block ID (wrapping) |
| fec_symbol_idx | 8 | Symbol index within FEC block |
| reserved | 8 | Reserved flags |
| csrc_count | 8 | Contributing source count (future mixing) |
CodecID Values
| Value | Codec | Bitrate | Sample Rate | Frame Duration |
|---|---|---|---|---|
| 0 | Opus 24k | 24 kbps | 48 kHz | 20ms |
| 1 | Opus 16k | 16 kbps | 48 kHz | 20ms |
| 2 | Opus 6k | 6 kbps | 48 kHz | 40ms |
| 3 | Codec2 3200 | 3.2 kbps | 8 kHz | 20ms |
| 4 | Codec2 1200 | 1.2 kbps | 8 kHz | 40ms |
| 5 | ComfortNoise | 0 | 48 kHz | 20ms |
| 6 | Opus 32k | 32 kbps | 48 kHz | 20ms |
| 7 | Opus 48k | 48 kbps | 48 kHz | 20ms |
| 8 | Opus 64k | 64 kbps | 48 kHz | 20ms |
MiniHeader (4 bytes, compressed)
[FRAME_TYPE_MINI: 0x01]
Bytes 0-1: timestamp_delta_ms (u16 BE)
Bytes 2-3: payload_len (u16 BE)
Used for 49 of every 50 frames (~1s cycle). Saves 8 bytes per packet (67% header reduction). Full header is sent every 50th frame to resynchronize state.
TrunkFrame (batched datagrams)
[count: u16]
[session_id: 2][len: u16][payload: len] x count
Packs multiple session packets into one QUIC datagram. Maximum 10 entries or 1200 bytes, flushed every 5ms.
QualityReport (4 bytes, optional trailer)
Byte 0: loss_pct (0-255 maps to 0-100%)
Byte 1: rtt_4ms (0-255 maps to 0-1020ms, resolution 4ms)
Byte 2: jitter_ms (0-255ms)
Byte 3: bitrate_cap_kbps (0-255 kbps)
Appended to a media packet when the Q flag is set in the MediaHeader.
Signal Message Handshake Flow
sequenceDiagram
participant C as Client
participant R as Relay
C->>R: QUIC Connect (SNI = hashed room name)
alt Auth enabled (--auth-url)
C->>R: SignalMessage::AuthToken { token }
R->>R: POST auth_url to validate
R-->>C: (connection closed if invalid)
end
C->>R: CallOffer { identity_pub, ephemeral_pub, signature, supported_profiles }
R->>R: Verify Ed25519 signature
R->>R: Generate ephemeral X25519
R->>R: shared_secret = DH(eph_relay, eph_client)
R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, chosen_profile }
C->>C: Verify signature
C->>C: Derive same session_key
Note over C,R: Session established -- both have ChaCha20-Poly1305 key
C->>R: RoomUpdate (join notification broadcast)
loop Media exchange
C->>R: QUIC Datagram (encrypted media)
R->>C: QUIC Datagram (forwarded from others)
end
opt Every 65,536 packets
C->>R: Rekey { new_ephemeral_pub, signature }
R->>C: Rekey { new_ephemeral_pub, signature }
Note over C,R: New session key via fresh DH
end
C->>R: Hangup { reason: Normal }
R->>R: Remove from room, broadcast RoomUpdate
Client Architecture
Desktop Engine (Tauri)
graph TB
subgraph "Tauri Frontend (HTML/JS)"
UI[Connect / Call UI]
SET[Settings Panel]
end
subgraph "Tauri Rust Backend"
CMD[Tauri Commands<br/>connect/disconnect/toggle]
ENG[WzpEngine<br/>State Machine]
end
subgraph "Audio I/O"
CPAL_C[CPAL Capture<br/>or VoiceProcessingIO]
RING_C[SPSC Ring<br/>Capture]
RING_P[SPSC Ring<br/>Playout]
CPAL_P[CPAL Playback<br/>or VoiceProcessingIO]
end
subgraph "Network Tasks (tokio)"
SEND[Send Loop<br/>encode + encrypt]
RECV[Recv Loop<br/>decrypt + decode]
SIG[Signal Handler<br/>room updates]
end
UI --> CMD
SET --> CMD
CMD --> ENG
ENG --> SEND
ENG --> RECV
ENG --> SIG
CPAL_C --> RING_C --> SEND
RECV --> RING_P --> CPAL_P
style ENG fill:#00b894,color:#fff
style SEND fill:#0984e3,color:#fff
style RECV fill:#0984e3,color:#fff
Key design decisions:
- Lock-free SPSC rings between audio callbacks and network tasks (no mutex on audio thread)
- VoiceProcessingIO on macOS for OS-level AEC (CPAL uses HalOutput which has no AEC)
- Direct playout -- no jitter buffer on client; audio callback pulls from ring
- Release builds required -- debug builds too slow for real-time audio
Android Engine (Kotlin + JNI)
graph TB
subgraph "Compose UI"
CALL[CallActivity]
SET[SettingsScreen]
VM[CallViewModel]
end
subgraph "Service Layer"
SVC[CallService<br/>Foreground Service]
PIPE[AudioPipeline<br/>AudioTrack + AudioRecord]
end
subgraph "Rust Engine (JNI)"
JNI[WzpEngine.kt<br/>JNI bridge]
NATIVE[libwzp_android.so<br/>Rust call engine]
end
subgraph "Android Audio"
REC[AudioRecord<br/>+ AEC effect]
TRK[AudioTrack<br/>low-latency]
end
CALL --> VM
SET --> VM
VM --> SVC
SVC --> PIPE
PIPE --> JNI
JNI --> NATIVE
REC --> PIPE
PIPE --> TRK
style NATIVE fill:#00b894,color:#fff
style SVC fill:#ff9f43,color:#fff
style PIPE fill:#0984e3,color:#fff
Key design decisions:
- Foreground service keeps audio alive when the screen is off
- AudioRecord + AudioTrack with Android's built-in AEC (AudioEffect)
- Lock-free AudioRing with preallocated Vec (not push/pop) to avoid allocation on audio thread
- JNI bridge marshals PCM frames between Kotlin and Rust
CLI Architecture
graph TB
subgraph "CLI Modes"
LIVE[--live<br/>Mic + Speaker]
TONE[--send-tone<br/>Sine Generator]
FILE[--send-file<br/>PCM Reader]
ECHO[--echo-test<br/>Quality Analysis]
DRIFT[--drift-test<br/>Clock Analysis]
SWEEP[--sweep<br/>Buffer Sweep]
end
subgraph "Call Engine"
ENCODE[CallEncoder<br/>codec + FEC]
DECODE[CallDecoder<br/>FEC + codec]
QA[QualityAdapter<br/>adaptive switching]
end
subgraph "Transport"
QUIC[QuinnTransport<br/>send/recv media + signal]
HS[Handshake<br/>X25519 + Ed25519]
end
LIVE --> ENCODE
TONE --> ENCODE
FILE --> ENCODE
ENCODE --> QUIC
QUIC --> DECODE
ECHO --> ENCODE
ECHO --> DECODE
DRIFT --> ENCODE
HS --> QUIC
style ENCODE fill:#00b894,color:#fff
style DECODE fill:#00b894,color:#fff
style QUIC fill:#0984e3,color:#fff
Adaptive Quality System
graph LR
subgraph GOOD ["GOOD (28.8 kbps)"]
G_C[Opus 24kbps]
G_F[FEC 20%]
G_FR[20ms frames]
end
subgraph DEGRADED ["DEGRADED (9.0 kbps)"]
D_C[Opus 6kbps]
D_F[FEC 50%]
D_FR[40ms frames]
end
subgraph CATASTROPHIC ["CATASTROPHIC (2.4 kbps)"]
C_C[Codec2 1200bps]
C_F[FEC 100%]
C_FR[40ms frames]
end
GOOD -->|"loss>10% or RTT>400ms<br/>3 consecutive reports"| DEGRADED
DEGRADED -->|"loss>40% or RTT>600ms<br/>3 consecutive"| CATASTROPHIC
CATASTROPHIC -->|"loss<10% and RTT<400ms<br/>10 consecutive"| DEGRADED
DEGRADED -->|"loss<10% and RTT<400ms<br/>10 consecutive"| GOOD
style GOOD fill:#00b894,color:#fff
style DEGRADED fill:#fdcb6e
style CATASTROPHIC fill:#e17055,color:#fff
Hysteresis prevents tier flapping: fast downgrade (3 reports, or 2 on cellular) and slow upgrade (10 reports, one tier at a time).
Cryptographic Handshake
sequenceDiagram
participant C as Caller
participant R as Relay / Callee
Note over C: Derive identity from seed<br/>Ed25519 + X25519 via HKDF
C->>C: Generate ephemeral X25519
C->>C: Sign(ephemeral_pub || "call-offer")
C->>R: CallOffer { identity_pub, ephemeral_pub, signature, profiles }
R->>R: Verify Ed25519 signature
R->>R: Generate ephemeral X25519
R->>R: shared_secret = DH(eph_b, eph_a)
R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
R->>R: Sign(ephemeral_pub || "call-answer")
R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, profile }
C->>C: Verify signature
C->>C: shared_secret = DH(eph_a, eph_b)
C->>C: session_key = HKDF(shared_secret)
Note over C,R: Both have identical ChaCha20-Poly1305 session key
C->>R: Encrypted media (QUIC datagrams)
R->>C: Encrypted media (QUIC datagrams)
Note over C,R: Rekey every 65,536 packets<br/>New ephemeral DH + HKDF mix
Identity Model
graph TD
SEED["32-byte Seed<br/>(BIP39 Mnemonic: 24 words)"] --> HKDF1["HKDF<br/>salt=None<br/>info='warzone-ed25519'"]
SEED --> HKDF2["HKDF<br/>salt=None<br/>info='warzone-x25519'"]
HKDF1 --> ED["Ed25519 SigningKey<br/>Digital Signatures"]
HKDF2 --> X25519["X25519 StaticSecret<br/>Key Agreement"]
ED --> VKEY["Ed25519 VerifyingKey<br/>(Public)"]
X25519 --> XPUB["X25519 PublicKey<br/>(Public)"]
VKEY --> FP["Fingerprint<br/>SHA-256(pubkey) truncated 16 bytes<br/>xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx"]
style SEED fill:#6c5ce7,color:#fff
style FP fill:#fd79a8,color:#fff
style ED fill:#ee5a24,color:#fff
style X25519 fill:#00b894,color:#fff
Adaptive Jitter Buffer
graph TD
PKT[Incoming Packet] --> SEQ{Sequence Check}
SEQ -->|Duplicate| DROP[Drop + AntiReplay]
SEQ -->|Valid| BUF["BTreeMap Buffer<br/>(ordered by seq)"]
BUF --> ADAPT["AdaptivePlayoutDelay<br/>(EMA jitter tracking)"]
ADAPT --> TARGET["target_delay =<br/>ceil(jitter_ema / 20ms) + 2"]
BUF --> READY{"depth >= target?"}
READY -->|No| WAIT["Wait (Underrun++)"]
READY -->|Yes| POP[Pop lowest seq]
POP --> DECODE[Decode to PCM]
DECODE --> PLAY[Playout]
BUF --> OVERFLOW{"depth > max?"}
OVERFLOW -->|Yes| EVICT["Drop oldest (Overrun++)"]
style ADAPT fill:#fdcb6e
style DROP fill:#e17055,color:#fff
style EVICT fill:#e17055,color:#fff
FEC Protection (RaptorQ)
graph LR
subgraph "Encoder"
F1[Frame 1] --> BLK["Source Block<br/>(5-10 frames)"]
F2[Frame 2] --> BLK
F3[Frame 3] --> BLK
F4[Frame 4] --> BLK
F5[Frame 5] --> BLK
BLK --> SRC[5 Source Symbols]
BLK --> REP["1-10 Repair Symbols<br/>(ratio dependent)"]
SRC --> INT["Interleaver<br/>(depth=3)"]
REP --> INT
end
subgraph "Network"
INT --> LOSS{Packet Loss}
LOSS -->|some lost| RCV[Received Symbols]
end
subgraph "Decoder"
RCV --> DEINT[De-interleaver]
DEINT --> RAPTORQ["RaptorQ Decoder<br/>Reconstruct from<br/>any K of K+R symbols"]
RAPTORQ --> OUT[Original Frames]
end
style LOSS fill:#e17055,color:#fff
style RAPTORQ fill:#00b894,color:#fff
Telemetry Stack
graph TB
subgraph "Relay"
RM["RelayMetrics<br/>sessions, rooms, packets"]
SM["SessionMetrics<br/>per-session jitter, loss, RTT"]
PM["ProbeMetrics<br/>inter-relay RTT, loss"]
RM --> PROM1["GET /metrics :9090"]
SM --> PROM1
PM --> PROM1
end
subgraph "Web Bridge"
WM["WebMetrics<br/>connections, frames, latency"]
WM --> PROM2["GET /metrics :8080"]
end
subgraph "Client"
CM["JitterStats + QualityAdapter"]
CM --> JSONL["--metrics-file<br/>JSONL 1 line/sec"]
end
PROM1 --> GRAF["Grafana Dashboard<br/>4 rows, 18 panels"]
PROM2 --> GRAF
JSONL --> ANALYSIS[Offline Analysis]
style GRAF fill:#ff6b6b,color:#fff
style PROM1 fill:#0984e3,color:#fff
style PROM2 fill:#0984e3,color:#fff
Deployment Topology
graph TB
subgraph "Region A"
RA["wzp-relay A<br/>:4433 UDP"]
WA["wzp-web A<br/>:8080 HTTPS"]
WA --> RA
end
subgraph "Region B"
RB["wzp-relay B<br/>:4433 UDP"]
WB["wzp-web B<br/>:8080 HTTPS"]
WB --> RB
end
RA <-->|"Probe 1/s + Federation"| RB
BA[Browser A] -->|WSS| WA
BB[Browser B] -->|WSS| WB
CA[CLI Client] -->|QUIC| RA
DA[Desktop Client] -->|QUIC| RA
MA[Android Client] -->|QUIC| RB
PROM[Prometheus] -->|scrape| RA
PROM -->|scrape| RB
PROM -->|scrape| WA
PROM --> GRAF[Grafana]
FC[featherChat Server] -->|auth validate| RA
FC -->|auth validate| RB
style RA fill:#ff9f43,color:#fff
style RB fill:#ff9f43,color:#fff
style GRAF fill:#ff6b6b,color:#fff
style FC fill:#fd79a8,color:#fff
Session State Machine
stateDiagram-v2
[*] --> Idle
Idle --> Connecting: connect()
Connecting --> Handshaking: QUIC established
Handshaking --> Active: CallOffer/Answer complete
Active --> Rekeying: 65,536 packets
Rekeying --> Active: new key derived
Active --> Closed: Hangup / Error / Timeout
Rekeying --> Closed: Error
Connecting --> Closed: Timeout
Handshaking --> Closed: Signature fail
note right of Active: Media flows (encrypted)
note right of Rekeying: Media continues while rekeying
Project Structure
warzonePhone/
├── Cargo.toml # Workspace root
├── crates/
│ ├── wzp-proto/ # Protocol types, traits, wire format
│ │ └── src/
│ │ ├── codec_id.rs # CodecId, QualityProfile
│ │ ├── error.rs # Error types
│ │ ├── jitter.rs # JitterBuffer, AdaptivePlayoutDelay
│ │ ├── packet.rs # MediaHeader, MiniHeader, TrunkFrame, SignalMessage
│ │ ├── quality.rs # Tier, AdaptiveQualityController
│ │ ├── session.rs # SessionState machine
│ │ └── traits.rs # AudioEncoder, FecEncoder, CryptoSession, etc.
│ ├── wzp-codec/ # Audio codecs
│ │ └── src/
│ │ ├── adaptive.rs # AdaptiveEncoder/Decoder (Opus + Codec2)
│ │ ├── denoise.rs # NoiseSuppressor (RNNoise / nnnoiseless)
│ │ └── silence.rs # SilenceDetector, ComfortNoise
│ ├── wzp-fec/ # Forward error correction
│ │ └── src/
│ │ ├── encoder.rs # RaptorQFecEncoder
│ │ ├── decoder.rs # RaptorQFecDecoder
│ │ └── interleave.rs # Interleaver (burst protection)
│ ├── wzp-crypto/ # Cryptography + identity
│ │ └── src/
│ │ ├── identity.rs # Seed, Fingerprint, hash_room_name
│ │ ├── handshake.rs # WarzoneKeyExchange (X25519 + Ed25519)
│ │ ├── session.rs # ChaChaSession (ChaCha20-Poly1305)
│ │ ├── nonce.rs # Deterministic nonce construction
│ │ ├── anti_replay.rs # Sliding window replay protection
│ │ └── rekey.rs # Forward secrecy rekeying
│ ├── wzp-transport/ # QUIC transport layer
│ │ └── src/lib.rs # QuinnTransport, send/recv media/signal/trunk
│ ├── wzp-relay/ # Relay daemon
│ │ └── src/
│ │ ├── main.rs # CLI, connection loop, auth + handshake
│ │ ├── config.rs # RelayConfig, TOML parsing
│ │ ├── room.rs # RoomManager, TrunkedForwarder
│ │ ├── pipeline.rs # RelayPipeline (forward mode)
│ │ ├── session_mgr.rs # SessionManager (limits, lifecycle)
│ │ ├── auth.rs # featherChat token validation
│ │ ├── handshake.rs # Relay-side accept_handshake
│ │ ├── metrics.rs # Prometheus RelayMetrics + per-session
│ │ ├── probe.rs # Inter-relay probes + ProbeMesh
│ │ ├── federation.rs # FederationManager, global rooms
│ │ ├── presence.rs # PresenceRegistry
│ │ ├── route.rs # RouteResolver
│ │ ├── trunk.rs # TrunkBatcher
│ │ └── ws.rs # WebSocket handler for browser clients
│ ├── wzp-client/ # Call engine + CLI
│ │ └── src/
│ │ ├── cli.rs # CLI arg parsing + main
│ │ ├── call.rs # CallEncoder, CallDecoder, QualityAdapter
│ │ ├── handshake.rs # Client-side perform_handshake
│ │ ├── featherchat.rs # CallSignal bridge
│ │ ├── echo_test.rs # Automated echo quality test
│ │ ├── drift_test.rs # Clock drift measurement
│ │ ├── sweep.rs # Jitter buffer parameter sweep
│ │ ├── metrics.rs # JSONL telemetry writer
│ │ └── bench.rs # Component benchmarks
│ └── wzp-web/ # Browser bridge
│ ├── src/
│ │ ├── main.rs # Axum server, WS handler, TLS
│ │ └── metrics.rs # Prometheus WebMetrics
│ └── static/
│ ├── index.html # SPA UI (room, PTT, level meter)
│ └── audio-processor.js # AudioWorklet (capture + playback)
├── android/ # Android app (Kotlin + JNI)
│ └── app/src/main/java/com/wzp/
│ ├── audio/ # AudioPipeline, AudioRouteManager
│ ├── engine/ # WzpEngine (JNI), CallStats, WzpCallback
│ ├── ui/ # CallActivity, SettingsScreen, Identicon
│ ├── data/ # SettingsRepository
│ ├── net/ # RelayPinger
│ ├── service/ # CallService (foreground)
│ └── debug/ # DebugReporter
├── desktop/ # Desktop app (Tauri)
│ └── dist/ # Built frontend (HTML/JS/CSS)
├── deps/featherchat/ # Git submodule
├── docs/ # Documentation
├── scripts/ # Build scripts
│ └── build-linux.sh # Hetzner VM build
└── tools/ # Development tools
Test Coverage
272 tests across all crates, 0 failures:
| Crate | Tests | Key Coverage |
|---|---|---|
| wzp-proto | 41 | Wire format, jitter buffer, quality tiers, mini-frames, trunking |
| wzp-codec | 31 | Opus/Codec2 roundtrip, silence detection, noise suppression |
| wzp-fec | 22 | RaptorQ encode/decode, loss recovery, interleaving |
| wzp-crypto | 34 + 28 compat | Encrypt/decrypt, handshake, anti-replay, featherChat identity |
| wzp-transport | 2 | QUIC connection setup |
| wzp-relay | 40 + 4 integration | Room ACL, session mgmt, metrics, probes, mesh, trunking |
| wzp-client | 30 + 2 integration | Encoder/decoder, quality adapter, silence, drift, sweep |
| wzp-web | 2 | Metrics |
Audio Backend Architecture (Platform Matrix)
WarzonePhone's audio I/O goes through one of four backends depending on the target platform and feature flags. All backends expose the same public API (AudioCapture::start() → AudioCapture { ring(), stop() }) via conditional re-exports in crates/wzp-client/src/lib.rs, so the CallEngine above the audio layer doesn't know or care which backend is running.
┌─────────────────────────────────────────────┐
│ CallEngine (platform-agnostic) │
│ reads PCM from AudioCapture::ring() │
│ writes PCM to AudioPlayback::ring() │
└────────────────────┬────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌────────────────┐ ┌───────────────┐
│ audio_io │ │ audio_vpio │ │ audio_wasapi │
│ (CPAL) │ │ (Core Audio │ │ (Windows │
│ │ │ VoiceProc IO) │ │ IAudioClient2│
│ All platforms │ │ macOS only │ │ Windows │
│ (baseline) │ │ feature=vpio │ │ feature= │
│ │ │ │ │ windows-aec │
└───────────────┘ └────────────────┘ └───────────────┘
│
▼ on Android only
┌───────────────┐
│ wzp-native │
│ (Oboe bridge │
│ via dlopen) │
│ │
│ Android only │
│ libloading │
└───────────────┘
Backend selection matrix
| Platform | Capture | Playback | OS AEC | Feature flags |
|---|---|---|---|---|
| macOS | VoiceProcessingIO (native Core Audio) | CPAL | Yes — Apple's hardware-accelerated AEC (same AEC as FaceTime, iMessage audio, Voice Memos) | audio, vpio |
| Windows (AEC build) | Direct WASAPI with AudioCategory_Communications |
CPAL | Yes — Windows routes the capture stream through the driver's communications APO chain (AEC + NS + AGC), driver-dependent quality | audio, windows-aec |
| Windows (baseline) | CPAL (WASAPI shared mode) | CPAL | No | audio |
| Linux | CPAL (ALSA / PulseAudio) | CPAL | No | audio |
| Android (Tauri Mobile) | Oboe via wzp-native cdylib, Usage::VoiceCommunication + MODE_IN_COMMUNICATION |
Same Oboe stream | Depends on device (some Android devices apply AEC to the voice-communication stream, most do not) | none (wzp-client compiled with default-features = false) |
Why wzp-native is a standalone cdylib
On Android, the audio backend lives in a separate cdylib crate (crates/wzp-native) that wzp-desktop's lib crate loads at runtime via libloading. It is not linked as a regular Rust dep.
This is deliberate. rust-lang/rust#104707 documents that a crate with crate-type = ["cdylib", "staticlib"] leaks non-exported symbols from the staticlib into the cdylib. On Android, that caused Bionic's private __init_tcb / pthread_create symbols to be bound LOCALLY inside our .so instead of resolved dynamically against libc.so at dlopen time — which crashed the app at launch as soon as tao tried to std::thread::spawn() from the JNI onCreate callback.
Keeping wzp-native in its own cdylib and loading it via libloading means:
- The app's own
.sohascrate-type = ["cdylib", "rlib"]only — nostaticlib, no symbol leak. libwzp_native.sois loaded viaSystem.loadLibraryfrom the JVM side (ordlopenfrom Rust), which triggers the normal Bionic resolver and binds all private symbols againstlibc.soat load time.- The C/C++ Oboe bridge is fully isolated inside
libwzp_native.so's symbol space — no chance of its archives leaking intowzp-desktop's.so.
See docs/BRANCH-android-rewrite.md for the full incident postmortem and docs/incident-tauri-android-init-tcb.md for the debug log.
Vendored audiopus_sys for libopus / clang-cl cross-compile
The workspace root carries a vendored copy of audiopus_sys at vendor/audiopus_sys/ with a patched opus/CMakeLists.txt. This is needed because libopus 1.3.1 gates its per-file -msse4.1 / -mssse3 COMPILE_FLAGS behind if(NOT MSVC), and under clang-cl (used by cargo-xwin for Windows cross-compiles) CMake sets MSVC=1 unconditionally — so the SIMD source files compile without the required target feature and fail to link the intrinsic always_inline functions.
The patch introduces an MSVC_CL variable that is true only for real cl.exe (distinguished via CMAKE_C_COMPILER_ID STREQUAL "MSVC"), and flips the eight if(NOT MSVC) SIMD guards to if(NOT MSVC_CL) so clang-cl gets the GCC-style per-file flags. Wired in via [patch.crates-io] audiopus_sys = { path = "vendor/audiopus_sys" } at the workspace root.
This does not affect macOS or Linux builds — on those platforms MSVC=0 everywhere so the patched logic behaves identically to upstream.
Upstream tracking: xiph/opus#256, xiph/opus PR #257 (both stale).