Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
51 KiB
WarzonePhone Architecture
Custom lossy VoIP protocol built in Rust. E2E encrypted, FEC-protected, adaptive quality, designed for hostile network conditions.
System Overview
graph TB
subgraph "Client A (Desktop / Android / CLI)"
MIC[Microphone] --> DN[NoiseSuppressor<br/>RNNoise ML]
DN --> SD[SilenceDetector<br/>VAD + Hangover]
SD --> ENC[CallEncoder<br/>Opus / Codec2]
ENC --> FEC_E[FEC Encoder<br/>RaptorQ]
FEC_E --> CRYPT_E[ChaCha20-Poly1305<br/>Encrypt]
CRYPT_E --> QUIC_S[QUIC Datagram<br/>Send]
QUIC_R[QUIC Datagram<br/>Recv] --> CRYPT_D[ChaCha20-Poly1305<br/>Decrypt]
CRYPT_D --> FEC_D[FEC Decoder<br/>RaptorQ]
FEC_D --> JIT[JitterBuffer<br/>Adaptive Playout]
JIT --> DEC[CallDecoder<br/>Opus / Codec2]
DEC --> SPK[Speaker]
end
subgraph "Relay (SFU)"
ACCEPT[Accept QUIC] --> AUTH{Auth?}
AUTH -->|token| VALIDATE[POST /v1/auth/validate]
AUTH -->|no auth| HS
VALIDATE --> HS[Crypto Handshake<br/>X25519 + Ed25519]
HS --> ROOM[Room Manager<br/>Named Rooms via SNI]
ROOM --> FWD[Forward to<br/>Other Participants]
end
subgraph "Client B"
B_SPK[Speaker]
B_MIC[Microphone]
end
QUIC_S -->|UDP / QUIC| ACCEPT
FWD -->|UDP / QUIC| QUIC_R
B_MIC -.->|same pipeline| ACCEPT
FWD -.->|same pipeline| B_SPK
style MIC fill:#4a9eff,color:#fff
style SPK fill:#4a9eff,color:#fff
style B_MIC fill:#4a9eff,color:#fff
style B_SPK fill:#4a9eff,color:#fff
style ROOM fill:#ff9f43,color:#fff
style CRYPT_E fill:#ee5a24,color:#fff
style CRYPT_D fill:#ee5a24,color:#fff
Crate Dependency Graph
graph TD
PROTO["wzp-proto<br/>Types, Traits, Wire Format"]
CODEC["wzp-codec<br/>Opus + Codec2 + RNNoise"]
FEC["wzp-fec<br/>RaptorQ FEC"]
CRYPTO["wzp-crypto<br/>ChaCha20 + Identity"]
TRANSPORT["wzp-transport<br/>QUIC / Quinn"]
VIDEO["wzp-video<br/>H.264 + H.265 + AV1"]
RELAY["wzp-relay<br/>Relay Daemon"]
CLIENT["wzp-client<br/>CLI + Call Engine"]
WEB["wzp-web<br/>Browser Bridge"]
PROTO --> CODEC
PROTO --> FEC
PROTO --> CRYPTO
PROTO --> TRANSPORT
PROTO --> VIDEO
CODEC --> CLIENT
FEC --> CLIENT
CRYPTO --> CLIENT
TRANSPORT --> CLIENT
VIDEO --> CLIENT
CODEC --> RELAY
FEC --> RELAY
CRYPTO --> RELAY
TRANSPORT --> RELAY
VIDEO --> RELAY
CLIENT --> WEB
TRANSPORT --> WEB
CRYPTO --> WEB
FC["warzone-protocol<br/>featherChat Identity"] -.->|path dep| CRYPTO
style PROTO fill:#6c5ce7,color:#fff
style RELAY fill:#ff9f43,color:#fff
style CLIENT fill:#00b894,color:#fff
style WEB fill:#0984e3,color:#fff
style FC fill:#fd79a8,color:#fff
style VIDEO fill:#a29bfe,color:#fff
Star pattern: Each leaf crate (wzp-codec, wzp-fec, wzp-crypto, wzp-transport, wzp-video) depends only on wzp-proto. No leaf depends on another leaf. Integration crates (wzp-relay, wzp-client, wzp-web) depend on all leaves.
Audio Encode Pipeline
sequenceDiagram
participant Mic as Microphone<br/>(48kHz)
participant Ring as SPSC Ring<br/>(lock-free)
participant RNN as RNNoise<br/>(2 x 480)
participant VAD as SilenceDetector
participant Codec as Opus / Codec2
participant DT as DredTuner<br/>(wzp-proto)
participant FEC as RaptorQ FEC
participant INT as Interleaver<br/>(depth=3)
participant HDR as MediaHeader<br/>(16B or Mini 5B)
participant Enc as ChaCha20-Poly1305
participant QUIC as QUIC Datagram
participant QPS as QuinnPathSnapshot
Mic->>Ring: f32 x 512 (macOS callback)
Ring->>Ring: Accumulate to 960 samples
Ring->>RNN: PCM i16 x 960 (20ms frame)
RNN->>VAD: Denoised audio
alt Speech active (or hangover)
VAD->>Codec: Encode active frame
else Silence (>100ms)
VAD->>Codec: ComfortNoise (every 200ms)
end
Note over QPS,DT: Every 25 frames (~500ms)
QPS->>DT: loss_pct, rtt_ms, jitter_ms
DT->>Codec: set_dred_duration() + set_expected_loss()
alt Opus tier (any bitrate)
Codec->>HDR: Compressed bytes + DRED side-channel (no RaptorQ)
else Codec2 tier
Codec->>FEC: Compressed bytes (pad to 256B symbol)
FEC->>FEC: Accumulate block (5-10 symbols)
FEC->>INT: Source + repair symbols
INT->>HDR: Interleaved packets
end
HDR->>Enc: Header as AAD
Enc->>QUIC: Encrypted payload + 16B tag
Key Details
- macOS delivers 512 f32 samples per callback (not configurable to 960)
- Ring buffer accumulates to 960 samples (20ms at 48 kHz) for codec frame
- RNNoise processes 2 x 480 samples (ML-based noise suppression via nnnoiseless)
- Silence detection uses VAD + 100ms hangover before switching to ComfortNoise
- FEC symbols are padded to 256 bytes with a 2-byte LE length prefix
- MiniHeaders (5 bytes) replace full headers (16 bytes) for 49 of every 50 audio frames; video always uses full headers
- DRED tuner polls quinn path stats every 25 frames (~500ms) and adjusts DRED lookback duration continuously
- Opus tiers bypass RaptorQ entirely -- DRED handles loss recovery at the codec layer
- Opus6k DRED window: 1040ms (maximum libopus allows)
Audio Decode Pipeline
sequenceDiagram
participant QUIC as QUIC Datagram
participant Dec as ChaCha20-Poly1305
participant AR as Anti-Replay<br/>(sliding window)
participant HDR as Header Parse
participant DEINT as De-interleaver
participant FEC as RaptorQ FEC<br/>(reconstruct)
participant JIT as JitterBuffer<br/>(BTreeMap)
participant Codec as Opus / Codec2
participant Ring as SPSC Ring<br/>(lock-free)
participant SPK as Speaker
QUIC->>Dec: Encrypted packet
Dec->>AR: Decrypt (header = AAD)
AR->>AR: Check seq window (reject replay)
AR->>HDR: Verified packet
alt Opus packet
HDR->>JIT: Direct to jitter buffer (no FEC/interleave)
else Codec2 packet
HDR->>DEINT: MediaHeader + payload
DEINT->>FEC: Reordered symbols by block
FEC->>FEC: Attempt decode (need K of K+R)
FEC->>JIT: Recovered audio frames
end
JIT->>JIT: BTreeMap ordered by seq
JIT->>JIT: Wait until depth >= target
alt Packet present
JIT->>Codec: Pop lowest seq frame
else Packet missing (Opus)
JIT->>Codec: DRED reconstruction (neural)
alt DRED fails or unavailable
Codec->>Codec: Classical PLC fallback
end
else Packet missing (Codec2)
Codec->>Codec: Classical PLC
end
Codec->>Ring: PCM i16 x 960
Ring->>SPK: Audio callback pulls samples
Key Details
- Anti-replay uses a 64-packet sliding window to reject duplicates
- FEC decoder needs any K of K+R symbols to reconstruct a block
- Jitter buffer target: 10 packets (200ms) for client, 50 packets (1s) for relay
- Desktop client uses direct playout (no jitter buffer) with lock-free ring
- Codec2 frames at 8 kHz are resampled to 48 kHz transparently
- DRED reconstruction: on packet loss, decoder tries neural DRED reconstruction before falling back to classical PLC
- Jitter-spike detection pre-emptively boosts DRED to ceiling when jitter variance spikes >30%
Relay SFU Forwarding
graph TB
subgraph "Room Mode (Default SFU)"
C1[Client 1<br/>Alice] -->|"QUIC SNI=room-hash"| RM[Room Manager]
C2[Client 2<br/>Bob] -->|"QUIC SNI=room-hash"| RM
C3[Client 3<br/>Charlie] -->|"QUIC SNI=room-hash"| RM
RM --> R1["Room 'podcast'"]
R1 -->|"fan-out (skip sender)"| C1
R1 -->|"fan-out (skip sender)"| C2
R1 -->|"fan-out (skip sender)"| C3
end
subgraph "Forward Mode (--remote)"
C4[Client] -->|QUIC| RA[Relay A]
RA -->|"FEC decode<br/>jitter buffer<br/>FEC re-encode"| RB[Relay B<br/>--remote]
RB -->|QUIC| C5[Client]
end
subgraph "Probe Mode (--probe)"
PA[Relay A] -->|"Ping 1/s<br/>~50 bytes"| PB[Relay B]
PB -->|Pong| PA
PA --> PM[Prometheus<br/>RTT / Loss / Jitter]
end
style RM fill:#ff9f43,color:#fff
style R1 fill:#fdcb6e
style PM fill:#0984e3,color:#fff
SFU Fan-out Rules
- Each incoming datagram is forwarded to all other participants in the room
- The sender is excluded from fan-out (no echo)
- If one send fails, the relay continues to the next participant (best-effort)
- The relay never decodes or re-encodes audio (preserves E2E encryption)
- With trunking enabled, packets to the same receiver are batched into TrunkFrames (flushed every 5ms)
- Relay tracks per-participant quality from QualityReport trailers and broadcasts
QualityDirectivewhen the room-wide tier degrades (coordinated codec switching)
Federation Topology
graph TB
subgraph "Relay A (EU)"
A_R["Room Manager"]
A_F["Federation<br/>Manager"]
A1["Alice (local)"]
A2["Bob (local)"]
end
subgraph "Relay B (US)"
B_R["Room Manager"]
B_F["Federation<br/>Manager"]
B1["Charlie (local)"]
end
subgraph "Relay C (APAC)"
C_R["Room Manager"]
C_F["Federation<br/>Manager"]
C1["Dave (local)"]
end
A1 -->|media| A_R
A2 -->|media| A_R
B1 -->|media| B_R
C1 -->|media| C_R
A_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| B_F
A_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| C_F
B_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| C_F
A_R --> A_F
B_R --> B_F
C_R --> C_F
style A_F fill:#6c5ce7,color:#fff
style B_F fill:#6c5ce7,color:#fff
style C_F fill:#6c5ce7,color:#fff
style A_R fill:#ff9f43,color:#fff
style B_R fill:#ff9f43,color:#fff
style C_R fill:#ff9f43,color:#fff
Federation Protocol Flow
sequenceDiagram
participant RA as Relay A
participant RB as Relay B
Note over RA: Startup: connect to configured peers
RA->>RB: QUIC connect (SNI="_federation")
RA->>RB: FederationHello { tls_fingerprint }
RB->>RB: Verify fingerprint against [[trusted]]
Note over RA,RB: Federation link established
Note over RA: Alice joins global room "podcast"
RA->>RB: GlobalRoomActive { room: "podcast" }
Note over RB: Charlie joins global room "podcast"
RB->>RA: GlobalRoomActive { room: "podcast" }
Note over RA,RB: Media bridging active
loop Every media packet in global room
RA->>RB: [room_hash:8][encrypted_media]
RB->>RA: [room_hash:8][encrypted_media]
end
Note over RA: Last local participant leaves
RA->>RB: GlobalRoomInactive { room: "podcast" }
Wire Formats
MediaHeader v2 (16 bytes, byte-aligned)
Byte 0: version (u8) 0x02
Byte 1: flags (u8) [T:1][Q:1][KeyFrame:1][FrameEnd:1][reserved:4]
T = FEC repair, Q = QualityReport trailer
KeyFrame = packet belongs to an I-frame (video)
FrameEnd = last packet of an access unit (video)
Byte 2: media_type (u8) 0=audio, 1=video, 2=data, 3=control
Byte 3: codec_id (u8) widened from 4-bit (room for 256 codec IDs)
Byte 4: stream_id (u8) simulcast layer; 0=base
Byte 5: fec_ratio (u8) 0..200 → 0.0..2.0
Bytes 6-9: sequence (u32 BE) wrapping packet sequence number
Bytes 10-13: timestamp_ms (u32 BE) milliseconds since session start
Bytes 14-15: fec_block_id (u16 BE)
audio: low 8 bits = block_id, high 8 bits = symbol_idx
video: full u16 block_id (large blocks for I-frames)
CodecID Values
Audio codecs (media_type = 0)
| Value | Codec | Bitrate | Sample Rate | Frame Duration |
|---|---|---|---|---|
| 0 | Opus 24k | 24 kbps | 48 kHz | 20ms |
| 1 | Opus 16k | 16 kbps | 48 kHz | 20ms |
| 2 | Opus 6k | 6 kbps | 48 kHz | 40ms |
| 3 | Codec2 3200 | 3.2 kbps | 8 kHz | 20ms |
| 4 | Codec2 1200 | 1.2 kbps | 8 kHz | 40ms |
| 5 | ComfortNoise | 0 | 48 kHz | 20ms |
| 6 | Opus 32k | 32 kbps | 48 kHz | 20ms |
| 7 | Opus 48k | 48 kbps | 48 kHz | 20ms |
| 8 | Opus 64k | 64 kbps | 48 kHz | 20ms |
Video codecs (media_type = 1)
| Value | Codec | Notes |
|---|---|---|
| 9 | H.264 Baseline | Universal HW encode coverage |
| 10 | H.264 Main | Slight quality win over baseline |
| 11 | H.265 Main | Apple A10+, Snapdragon ~2017, NVENC GTX 9xx+; ~30% better than H.264 |
| 12 | AV1 Main | Apple M3/A17+, Snapdragon 8 Gen 3+, RTX 40+; best efficiency, narrow HW |
MiniHeader v2 (5 bytes)
[FRAME_TYPE_MINI = 0x01]
Byte 0: seq_delta (u8) delta from last full header's seq
Bytes 1-2: timestamp_delta_ms (u16 BE)
Bytes 3-4: payload_len (u16 BE)
Used for audio only (49 of every 50 frames). Saves 11 bytes per audio packet vs the full 16B header. Full header is sent every 50th frame to resynchronize state. Video always uses full 16B headers.
TrunkFrame (batched datagrams)
[count: u16]
[session_id: 2][len: u16][payload: len] x count
Packs multiple session packets into one QUIC datagram. Maximum 10 entries or PMTUD-discovered MTU (starts at 1200, grows to ~1452 on Ethernet), flushed every 5ms.
QualityReport (4 bytes, optional trailer)
Byte 0: loss_pct (0-255 maps to 0-100%)
Byte 1: rtt_4ms (0-255 maps to 0-1020ms, resolution 4ms)
Byte 2: jitter_ms (0-255ms)
Byte 3: bitrate_cap_kbps (0-255 kbps)
Appended to a media packet when the Q flag is set in the MediaHeader.
Path MTU Discovery
Quinn's PLPMTUD is enabled with:
initial_mtu: 1200 bytes (QUIC minimum, always safe)upper_bound: 1452 bytes (Ethernet minus IP/UDP/QUIC headers)interval: 300s (re-probe every 5 minutes)black_hole_cooldown: 30s (faster retry on lossy links)
The discovered MTU is exposed via QuinnPathSnapshot::current_mtu and used by:
TrunkedForwarder: refreshesmax_byteson every send to fill larger datagrams- Future video framer: larger MTU = fewer application-layer fragments per frame
Continuous DRED Tuning
Instead of locking DRED duration to 3 discrete quality tiers, the DredTuner (in wzp-proto::dred_tuner) maps live path quality to a continuous DRED duration:
| Input | Source | Update Rate |
|---|---|---|
| Loss % | QuinnPathSnapshot::loss_pct (from quinn ACK frames) |
Every 25 packets (~500ms) |
| RTT ms | QuinnPathSnapshot::rtt_ms (quinn congestion controller) |
Every 25 packets |
| Jitter ms | PathMonitor::jitter_ms (EWMA of RTT variance) |
Every 25 packets |
Mapping Logic
- Baseline: codec-tier default (Studio=100ms, Good=200ms, Degraded=500ms)
- Ceiling: codec-tier max (Studio=300ms, Good=500ms, Degraded=1040ms)
- Continuous: linear interpolation between baseline and ceiling based on loss (0%->baseline, 40%->ceiling)
- RTT phantom loss: high RTT (>200ms) adds phantom loss contribution to keep DRED generous
- Jitter spike: >30% EWMA spike pre-emptively boosts to ceiling for ~5s cooldown
Output
DredTuning { dred_frames: u8, expected_loss_pct: u8 } -> fed to CallEncoder::apply_dred_tuning() -> OpusEncoder::set_dred_duration() + set_expected_loss()
Signal Message Handshake Flow
sequenceDiagram
participant C as Client
participant R as Relay
C->>R: QUIC Connect (SNI = hashed room name)
alt Auth enabled (--auth-url)
C->>R: SignalMessage::AuthToken { token }
R->>R: POST auth_url to validate
R-->>C: (connection closed if invalid)
end
C->>R: CallOffer { identity_pub, ephemeral_pub, signature, supported_profiles }
R->>R: Verify Ed25519 signature
R->>R: Generate ephemeral X25519
R->>R: shared_secret = DH(eph_relay, eph_client)
R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, chosen_profile }
C->>C: Verify signature
C->>C: Derive same session_key
Note over C,R: Session established -- both have ChaCha20-Poly1305 key
C->>R: RoomUpdate (join notification broadcast)
loop Media exchange
C->>R: QUIC Datagram (encrypted media)
R->>C: QUIC Datagram (forwarded from others)
end
opt Every 65,536 packets
C->>R: Rekey { new_ephemeral_pub, signature }
R->>C: Rekey { new_ephemeral_pub, signature }
Note over C,R: New session key via fresh DH
end
C->>R: Hangup { reason: Normal }
R->>R: Remove from room, broadcast RoomUpdate
Relay Concurrency Model
Threading
- Multi-threaded Tokio runtime (all available cores, work-stealing scheduler)
- Task-per-connection: each QUIC connection gets a dedicated
tokio::spawn - Task-per-participant-per-room: each participant's media forwarding loop is independent
Shared State & Locking
The RoomManager stores DashMap<String, Arc<RwLock<Room>>>. The DashMap guard is held only long enough to clone the Arc; all per-room operations then acquire the room-level RwLock. Concurrent fan-out calls share a read lock; join/leave acquire write lock.
| Lock | Protected Data | Hold Duration | Contention |
|---|---|---|---|
DashMap<room_id, Arc<RwLock<Room>>> |
Room registry | Instant (clone Arc only) | Near-zero |
Room (RwLock) |
Participants, quality tiers | ~1ms/packet (read); ~1ms (write on join/leave) | Low (concurrent reads) |
PresenceRegistry (Mutex) |
Fingerprint registrations | ~1ms | Low (join/leave only) |
SessionManager (Mutex) |
Active session tracking | ~1ms | Low |
FederationManager.peer_links (Mutex) |
Peer connections | ~10ms during forward | Per-federation-packet |
Scaling Characteristics
- Many small rooms: Scales well across all cores (rooms are independent)
- Large single room (100+ participants): Fan-out reads share RwLock (non-blocking); only join/leave serializes
- Federation: Per-peer tasks scale;
peer_linkslock held during send loop
Client Architecture
Desktop Engine (Tauri)
graph TB
subgraph "Tauri Frontend (HTML/JS)"
UI[Connect / Call UI]
SET[Settings Panel]
end
subgraph "Tauri Rust Backend"
CMD[Tauri Commands<br/>connect/disconnect/toggle]
ENG[WzpEngine<br/>State Machine]
end
subgraph "Audio I/O"
CPAL_C[CPAL Capture<br/>or VoiceProcessingIO]
RING_C[SPSC Ring<br/>Capture]
RING_P[SPSC Ring<br/>Playout]
CPAL_P[CPAL Playback<br/>or VoiceProcessingIO]
end
subgraph "Network Tasks (tokio)"
SEND[Send Loop<br/>encode + encrypt]
RECV[Recv Loop<br/>decrypt + decode]
SIG[Signal Handler<br/>room updates]
end
UI --> CMD
SET --> CMD
CMD --> ENG
ENG --> SEND
ENG --> RECV
ENG --> SIG
CPAL_C --> RING_C --> SEND
RECV --> RING_P --> CPAL_P
style ENG fill:#00b894,color:#fff
style SEND fill:#0984e3,color:#fff
style RECV fill:#0984e3,color:#fff
Key design decisions:
- Lock-free SPSC rings between audio callbacks and network tasks (no mutex on audio thread)
- VoiceProcessingIO on macOS for OS-level AEC (CPAL uses HalOutput which has no AEC)
- Direct playout -- no jitter buffer on client; audio callback pulls from ring
- Release builds required -- debug builds too slow for real-time audio
Android Engine (Kotlin + JNI)
Note (2026-05-12): The Kotlin+JNI Android app (
android/app/) described below is superseded by the Tauri 2.x mobile build (desktop/src-tauri/+crates/wzp-native/). The Tauri approach uses the same Rust call engine as desktop, with Oboe audio viawzp-nativecdylib. The Kotlin codebase is maintained for reference but the Tauri build is the live production app.
graph TB
subgraph "Compose UI"
CALL[CallActivity]
SET[SettingsScreen]
VM[CallViewModel]
end
subgraph "Service Layer"
SVC[CallService<br/>Foreground Service]
PIPE[AudioPipeline<br/>AudioTrack + AudioRecord]
end
subgraph "Rust Engine (JNI)"
JNI[WzpEngine.kt<br/>JNI bridge]
NATIVE[libwzp_android.so<br/>Rust call engine]
end
subgraph "Android Audio"
REC[AudioRecord<br/>+ AEC effect]
TRK[AudioTrack<br/>low-latency]
end
CALL --> VM
SET --> VM
VM --> SVC
SVC --> PIPE
PIPE --> JNI
JNI --> NATIVE
REC --> PIPE
PIPE --> TRK
style NATIVE fill:#00b894,color:#fff
style SVC fill:#ff9f43,color:#fff
style PIPE fill:#0984e3,color:#fff
Key design decisions:
- Foreground service keeps audio alive when the screen is off
- AudioRecord + AudioTrack with Android's built-in AEC (AudioEffect)
- Lock-free AudioRing with preallocated Vec (not push/pop) to avoid allocation on audio thread
- JNI bridge marshals PCM frames between Kotlin and Rust
CLI Architecture
graph TB
subgraph "CLI Modes"
LIVE[--live<br/>Mic + Speaker]
TONE[--send-tone<br/>Sine Generator]
FILE[--send-file<br/>PCM Reader]
ECHO[--echo-test<br/>Quality Analysis]
DRIFT[--drift-test<br/>Clock Analysis]
SWEEP[--sweep<br/>Buffer Sweep]
end
subgraph "Call Engine"
ENCODE[CallEncoder<br/>codec + FEC]
DECODE[CallDecoder<br/>FEC + codec]
QA[QualityAdapter<br/>adaptive switching]
end
subgraph "Transport"
QUIC[QuinnTransport<br/>send/recv media + signal]
HS[Handshake<br/>X25519 + Ed25519]
end
LIVE --> ENCODE
TONE --> ENCODE
FILE --> ENCODE
ENCODE --> QUIC
QUIC --> DECODE
ECHO --> ENCODE
ECHO --> DECODE
DRIFT --> ENCODE
HS --> QUIC
style ENCODE fill:#00b894,color:#fff
style DECODE fill:#00b894,color:#fff
style QUIC fill:#0984e3,color:#fff
Adaptive Quality System
graph LR
subgraph GOOD ["GOOD (28.8 kbps)"]
G_C[Opus 24kbps]
G_F[FEC 20%]
G_FR[20ms frames]
end
subgraph DEGRADED ["DEGRADED (9.0 kbps)"]
D_C[Opus 6kbps]
D_F[FEC 50%]
D_FR[40ms frames]
end
subgraph CATASTROPHIC ["CATASTROPHIC (2.4 kbps)"]
C_C[Codec2 1200bps]
C_F[FEC 100%]
C_FR[40ms frames]
end
GOOD -->|"loss>10% or RTT>400ms<br/>3 consecutive reports"| DEGRADED
DEGRADED -->|"loss>40% or RTT>600ms<br/>3 consecutive"| CATASTROPHIC
CATASTROPHIC -->|"loss<10% and RTT<400ms<br/>10 consecutive"| DEGRADED
DEGRADED -->|"loss<10% and RTT<400ms<br/>10 consecutive"| GOOD
style GOOD fill:#00b894,color:#fff
style DEGRADED fill:#fdcb6e
style CATASTROPHIC fill:#e17055,color:#fff
Hysteresis prevents tier flapping: fast downgrade (3 reports, or 2 on cellular) and slow upgrade (10 reports, one tier at a time).
Cryptographic Handshake
sequenceDiagram
participant C as Caller
participant R as Relay / Callee
Note over C: Derive identity from seed<br/>Ed25519 + X25519 via HKDF
C->>C: Generate ephemeral X25519
C->>C: Sign(ephemeral_pub || "call-offer")
C->>R: CallOffer { identity_pub, ephemeral_pub, signature, profiles }
R->>R: Verify Ed25519 signature
R->>R: Generate ephemeral X25519
R->>R: shared_secret = DH(eph_b, eph_a)
R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
R->>R: Sign(ephemeral_pub || "call-answer")
R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, profile }
C->>C: Verify signature
C->>C: shared_secret = DH(eph_a, eph_b)
C->>C: session_key = HKDF(shared_secret)
Note over C,R: Both have identical ChaCha20-Poly1305 session key
C->>R: Encrypted media (QUIC datagrams)
R->>C: Encrypted media (QUIC datagrams)
Note over C,R: Rekey every 65,536 packets<br/>New ephemeral DH + HKDF mix
Identity Model
graph TD
SEED["32-byte Seed<br/>(BIP39 Mnemonic: 24 words)"] --> HKDF1["HKDF<br/>salt=None<br/>info='warzone-ed25519'"]
SEED --> HKDF2["HKDF<br/>salt=None<br/>info='warzone-x25519'"]
HKDF1 --> ED["Ed25519 SigningKey<br/>Digital Signatures"]
HKDF2 --> X25519["X25519 StaticSecret<br/>Key Agreement"]
ED --> VKEY["Ed25519 VerifyingKey<br/>(Public)"]
X25519 --> XPUB["X25519 PublicKey<br/>(Public)"]
VKEY --> FP["Fingerprint<br/>SHA-256(pubkey) truncated 16 bytes<br/>xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx"]
style SEED fill:#6c5ce7,color:#fff
style FP fill:#fd79a8,color:#fff
style ED fill:#ee5a24,color:#fff
style X25519 fill:#00b894,color:#fff
Adaptive Jitter Buffer
graph TD
PKT[Incoming Packet] --> SEQ{Sequence Check}
SEQ -->|Duplicate| DROP[Drop + AntiReplay]
SEQ -->|Valid| BUF["BTreeMap Buffer<br/>(ordered by seq)"]
BUF --> ADAPT["AdaptivePlayoutDelay<br/>(EMA jitter tracking)"]
ADAPT --> TARGET["target_delay =<br/>ceil(jitter_ema / 20ms) + 2"]
BUF --> READY{"depth >= target?"}
READY -->|No| WAIT["Wait (Underrun++)"]
READY -->|Yes| POP[Pop lowest seq]
POP --> DECODE[Decode to PCM]
DECODE --> PLAY[Playout]
BUF --> OVERFLOW{"depth > max?"}
OVERFLOW -->|Yes| EVICT["Drop oldest (Overrun++)"]
style ADAPT fill:#fdcb6e
style DROP fill:#e17055,color:#fff
style EVICT fill:#e17055,color:#fff
FEC Protection (RaptorQ)
graph LR
subgraph "Encoder"
F1[Frame 1] --> BLK["Source Block<br/>(5-10 frames)"]
F2[Frame 2] --> BLK
F3[Frame 3] --> BLK
F4[Frame 4] --> BLK
F5[Frame 5] --> BLK
BLK --> SRC[5 Source Symbols]
BLK --> REP["1-10 Repair Symbols<br/>(ratio dependent)"]
SRC --> INT["Interleaver<br/>(depth=3)"]
REP --> INT
end
subgraph "Network"
INT --> LOSS{Packet Loss}
LOSS -->|some lost| RCV[Received Symbols]
end
subgraph "Decoder"
RCV --> DEINT[De-interleaver]
DEINT --> RAPTORQ["RaptorQ Decoder<br/>Reconstruct from<br/>any K of K+R symbols"]
RAPTORQ --> OUT[Original Frames]
end
style LOSS fill:#e17055,color:#fff
style RAPTORQ fill:#00b894,color:#fff
Telemetry Stack
graph TB
subgraph "Relay"
RM["RelayMetrics<br/>sessions, rooms, packets"]
SM["SessionMetrics<br/>per-session jitter, loss, RTT"]
PM["ProbeMetrics<br/>inter-relay RTT, loss"]
RM --> PROM1["GET /metrics :9090"]
SM --> PROM1
PM --> PROM1
end
subgraph "Web Bridge"
WM["WebMetrics<br/>connections, frames, latency"]
WM --> PROM2["GET /metrics :8080"]
end
subgraph "Client"
CM["JitterStats + QualityAdapter"]
CM --> JSONL["--metrics-file<br/>JSONL 1 line/sec"]
end
PROM1 --> GRAF["Grafana Dashboard<br/>4 rows, 18 panels"]
PROM2 --> GRAF
JSONL --> ANALYSIS[Offline Analysis]
style GRAF fill:#ff6b6b,color:#fff
style PROM1 fill:#0984e3,color:#fff
style PROM2 fill:#0984e3,color:#fff
Deployment Topology
graph TB
subgraph "Region A"
RA["wzp-relay A<br/>:4433 UDP"]
WA["wzp-web A<br/>:8080 HTTPS"]
WA --> RA
end
subgraph "Region B"
RB["wzp-relay B<br/>:4433 UDP"]
WB["wzp-web B<br/>:8080 HTTPS"]
WB --> RB
end
RA <-->|"Probe 1/s + Federation"| RB
BA[Browser A] -->|WSS| WA
BB[Browser B] -->|WSS| WB
CA[CLI Client] -->|QUIC| RA
DA[Desktop Client] -->|QUIC| RA
MA[Android Client] -->|QUIC| RB
PROM[Prometheus] -->|scrape| RA
PROM -->|scrape| RB
PROM -->|scrape| WA
PROM --> GRAF[Grafana]
FC[featherChat Server] -->|auth validate| RA
FC -->|auth validate| RB
style RA fill:#ff9f43,color:#fff
style RB fill:#ff9f43,color:#fff
style GRAF fill:#ff6b6b,color:#fff
style FC fill:#fd79a8,color:#fff
Session State Machine
stateDiagram-v2
[*] --> Idle
Idle --> Connecting: connect()
Connecting --> Handshaking: QUIC established
Handshaking --> Active: CallOffer/Answer complete
Active --> Rekeying: 65,536 packets
Rekeying --> Active: new key derived
Active --> Closed: Hangup / Error / Timeout
Rekeying --> Closed: Error
Connecting --> Closed: Timeout
Handshaking --> Closed: Signature fail
note right of Active: Media flows (encrypted)
note right of Rekeying: Media continues while rekeying
Project Structure
warzonePhone/
├── Cargo.toml # Workspace root
├── crates/
│ ├── wzp-proto/ # Protocol types, traits, wire format
│ │ └── src/
│ │ ├── codec_id.rs # CodecId, QualityProfile
│ │ ├── error.rs # Error types
│ │ ├── jitter.rs # JitterBuffer, AdaptivePlayoutDelay
│ │ ├── packet.rs # MediaHeader, MiniHeader, TrunkFrame, SignalMessage
│ │ ├── quality.rs # Tier, AdaptiveQualityController
│ │ ├── session.rs # SessionState machine
│ │ └── traits.rs # AudioEncoder, FecEncoder, CryptoSession, etc.
│ ├── wzp-codec/ # Audio codecs
│ │ └── src/
│ │ ├── adaptive.rs # AdaptiveEncoder/Decoder (Opus + Codec2)
│ │ ├── denoise.rs # NoiseSuppressor (RNNoise / nnnoiseless)
│ │ └── silence.rs # SilenceDetector, ComfortNoise
│ ├── wzp-fec/ # Forward error correction
│ │ └── src/
│ │ ├── encoder.rs # RaptorQFecEncoder
│ │ ├── decoder.rs # RaptorQFecDecoder
│ │ └── interleave.rs # Interleaver (burst protection)
│ ├── wzp-crypto/ # Cryptography + identity
│ │ └── src/
│ │ ├── identity.rs # Seed, Fingerprint, hash_room_name
│ │ ├── handshake.rs # WarzoneKeyExchange (X25519 + Ed25519)
│ │ ├── session.rs # ChaChaSession (ChaCha20-Poly1305)
│ │ ├── nonce.rs # Deterministic nonce construction
│ │ ├── anti_replay.rs # Sliding window replay protection
│ │ └── rekey.rs # Forward secrecy rekeying
│ ├── wzp-transport/ # QUIC transport layer
│ │ └── src/lib.rs # QuinnTransport, send/recv media/signal/trunk
│ ├── wzp-video/ # Video codecs + framer
│ │ └── src/
│ │ ├── factory.rs # VideoEncoder factory (platform dispatch)
│ │ ├── framer.rs # NAL fragmentation (H.264/H.265)
│ │ ├── depacketizer.rs # NAL reassembly, access unit emit
│ │ ├── controller.rs # VideoQualityController
│ │ ├── simulcast.rs # Simulcast layer management
│ │ ├── encoder_mode.rs # Encoder mode selection
│ │ ├── av1_obu.rs # AV1 OBU framing + depacketizer
│ │ ├── dav1d.rs # dav1d AV1 software decoder
│ │ ├── svt_av1.rs # SVT-AV1 software encoder (non-Android)
│ │ ├── videotoolbox.rs # VideoToolbox H.265 + AV1 (macOS)
│ │ ├── mediacodec.rs # MediaCodec H.264/H.265/AV1 (Android, NDK 0.9 migration pending)
│ │ └── nack.rs # NACK sender/receiver framework
│ ├── wzp-relay/ # Relay daemon
│ │ └── src/
│ │ ├── main.rs # CLI, connection loop, auth + handshake
│ │ ├── config.rs # RelayConfig, TOML parsing
│ │ ├── room.rs # RoomManager, TrunkedForwarder
│ │ ├── pipeline.rs # RelayPipeline (forward mode)
│ │ ├── session_mgr.rs # SessionManager (limits, lifecycle)
│ │ ├── auth.rs # featherChat token validation
│ │ ├── handshake.rs # Relay-side accept_handshake
│ │ ├── metrics.rs # Prometheus RelayMetrics + per-session
│ │ ├── probe.rs # Inter-relay probes + ProbeMesh
│ │ ├── federation.rs # FederationManager, global rooms
│ │ ├── presence.rs # PresenceRegistry
│ │ ├── route.rs # RouteResolver
│ │ ├── trunk.rs # TrunkBatcher
│ │ ├── audio_scorer.rs # Per-stream audio quality scoring
│ │ ├── response_policy.rs # Relay response policy (rate-limit, drop)
│ │ ├── verdict.rs # Verdict enum (Allow/RateLimit/Drop/Malicious)
│ │ ├── video_scorer.rs # VideoScorer (legitimacy scoring, keyframe regularity)
│ │ └── ws.rs # WebSocket handler for browser clients
│ ├── wzp-client/ # Call engine + CLI
│ │ └── src/
│ │ ├── cli.rs # CLI arg parsing + main
│ │ ├── call.rs # CallEncoder, CallDecoder, QualityAdapter
│ │ ├── handshake.rs # Client-side perform_handshake
│ │ ├── featherchat.rs # CallSignal bridge
│ │ ├── echo_test.rs # Automated echo quality test
│ │ ├── drift_test.rs # Clock drift measurement
│ │ ├── sweep.rs # Jitter buffer parameter sweep
│ │ ├── metrics.rs # JSONL telemetry writer
│ │ └── bench.rs # Component benchmarks
│ └── wzp-web/ # Browser bridge
│ ├── src/
│ │ ├── main.rs # Axum server, WS handler, TLS
│ │ └── metrics.rs # Prometheus WebMetrics
│ └── static/
│ ├── index.html # SPA UI (room, PTT, level meter)
│ └── audio-processor.js # AudioWorklet (capture + playback)
├── android/ # Android app (Kotlin + JNI)
│ └── app/src/main/java/com/wzp/
│ ├── audio/ # AudioPipeline, AudioRouteManager
│ ├── engine/ # WzpEngine (JNI), CallStats, WzpCallback
│ ├── ui/ # CallActivity, SettingsScreen, Identicon
│ ├── data/ # SettingsRepository
│ ├── net/ # RelayPinger
│ ├── service/ # CallService (foreground)
│ └── debug/ # DebugReporter
├── desktop/ # Desktop app (Tauri)
│ └── dist/ # Built frontend (HTML/JS/CSS)
├── deps/featherchat/ # Git submodule
├── docs/ # Documentation
├── scripts/ # Build scripts
│ └── build-linux.sh # Hetzner VM build
└── tools/ # Development tools
Test Coverage
702 tests across all crates (excluding wzp-android), 0 failures:
| Crate | Tests | Key Coverage |
|---|---|---|
| wzp-proto | 112 | Wire format, jitter buffer, quality tiers, mini-frames, trunking |
| wzp-codec | 69 | Opus/Codec2 roundtrip, silence detection, noise suppression |
| wzp-fec | 21 | RaptorQ encode/decode, loss recovery, interleaving |
| wzp-crypto | 64 | Encrypt/decrypt, handshake, anti-replay, featherChat identity |
| wzp-transport | 11 | QUIC connection setup, path monitoring |
| wzp-relay | 137 | Room ACL, session mgmt, metrics, probes, mesh, trunking, scoring, verdict |
| wzp-video | 88 | NAL framing, AV1 OBU, simulcast, quality controller, NACK |
| wzp-client | 170 | Encoder/decoder, quality adapter, silence, drift, sweep |
| wzp-web | 2 | Metrics |
| wzp-native | 0 | Native platform bindings (no unit tests) |
Audio Backend Architecture (Platform Matrix)
WarzonePhone's audio I/O goes through one of four backends depending on the target platform and feature flags. All backends expose the same public API (AudioCapture::start() → AudioCapture { ring(), stop() }) via conditional re-exports in crates/wzp-client/src/lib.rs, so the CallEngine above the audio layer doesn't know or care which backend is running.
┌─────────────────────────────────────────────┐
│ CallEngine (platform-agnostic) │
│ reads PCM from AudioCapture::ring() │
│ writes PCM to AudioPlayback::ring() │
└────────────────────┬────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌────────────────┐ ┌───────────────┐
│ audio_io │ │ audio_vpio │ │ audio_wasapi │
│ (CPAL) │ │ (Core Audio │ │ (Windows │
│ │ │ VoiceProc IO) │ │ IAudioClient2│
│ All platforms │ │ macOS only │ │ Windows │
│ (baseline) │ │ feature=vpio │ │ feature= │
│ │ │ │ │ windows-aec │
└───────────────┘ └────────────────┘ └───────────────┘
│
▼ on Android only
┌───────────────┐
│ wzp-native │
│ (Oboe bridge │
│ via dlopen) │
│ │
│ Android only │
│ libloading │
└───────────────┘
Backend selection matrix
| Platform | Capture | Playback | OS AEC | Feature flags |
|---|---|---|---|---|
| macOS | VoiceProcessingIO (native Core Audio) | CPAL | Yes — Apple's hardware-accelerated AEC (same AEC as FaceTime, iMessage audio, Voice Memos) | audio, vpio |
| Windows (AEC build) | Direct WASAPI with AudioCategory_Communications |
CPAL | Yes — Windows routes the capture stream through the driver's communications APO chain (AEC + NS + AGC), driver-dependent quality | audio, windows-aec |
| Windows (baseline) | CPAL (WASAPI shared mode) | CPAL | No | audio |
| Linux | CPAL (ALSA / PulseAudio) | CPAL | No | audio |
| Android (Tauri Mobile) | Oboe via wzp-native cdylib, Usage::VoiceCommunication + MODE_IN_COMMUNICATION |
Same Oboe stream | Depends on device (some Android devices apply AEC to the voice-communication stream, most do not) | none (wzp-client compiled with default-features = false) |
Why wzp-native is a standalone cdylib
On Android, the audio backend lives in a separate cdylib crate (crates/wzp-native) that wzp-desktop's lib crate loads at runtime via libloading. It is not linked as a regular Rust dep.
This is deliberate. rust-lang/rust#104707 documents that a crate with crate-type = ["cdylib", "staticlib"] leaks non-exported symbols from the staticlib into the cdylib. On Android, that caused Bionic's private __init_tcb / pthread_create symbols to be bound LOCALLY inside our .so instead of resolved dynamically against libc.so at dlopen time — which crashed the app at launch as soon as tao tried to std::thread::spawn() from the JNI onCreate callback.
Keeping wzp-native in its own cdylib and loading it via libloading means:
- The app's own
.sohascrate-type = ["cdylib", "rlib"]only — nostaticlib, no symbol leak. libwzp_native.sois loaded viaSystem.loadLibraryfrom the JVM side (ordlopenfrom Rust), which triggers the normal Bionic resolver and binds all private symbols againstlibc.soat load time.- The C/C++ Oboe bridge is fully isolated inside
libwzp_native.so's symbol space — no chance of its archives leaking intowzp-desktop's.so.
See docs/BRANCH-android-rewrite.md for the full incident postmortem and docs/incident-tauri-android-init-tcb.md for the debug log.
Vendored audiopus_sys for libopus / clang-cl cross-compile
The workspace root carries a vendored copy of audiopus_sys at vendor/audiopus_sys/ with a patched opus/CMakeLists.txt. This is needed because libopus 1.3.1 gates its per-file -msse4.1 / -mssse3 COMPILE_FLAGS behind if(NOT MSVC), and under clang-cl (used by cargo-xwin for Windows cross-compiles) CMake sets MSVC=1 unconditionally — so the SIMD source files compile without the required target feature and fail to link the intrinsic always_inline functions.
The patch introduces an MSVC_CL variable that is true only for real cl.exe (distinguished via CMAKE_C_COMPILER_ID STREQUAL "MSVC"), and flips the eight if(NOT MSVC) SIMD guards to if(NOT MSVC_CL) so clang-cl gets the GCC-style per-file flags. Wired in via [patch.crates-io] audiopus_sys = { path = "vendor/audiopus_sys" } at the workspace root.
This does not affect macOS or Linux builds — on those platforms MSVC=0 everywhere so the patched logic behaves identically to upstream.
Upstream tracking: xiph/opus#256, xiph/opus PR #257 (both stale).
Network Awareness (Android)
The adaptive quality controller (AdaptiveQualityController in wzp-proto) supports proactive network-aware adaptation via signal_network_change(NetworkContext). On Android, this is fed by NetworkMonitor.kt which wraps ConnectivityManager.NetworkCallback.
ConnectivityManager
│ onCapabilitiesChanged / onLost
▼
NetworkMonitor.kt ──classify──► type: Int (WiFi=0, LTE=1, 5G=2, 3G=3)
│ onNetworkChanged(type, bw)
▼
CallViewModel ──► WzpEngine.onNetworkChanged()
│ JNI
▼
jni_bridge.rs
│
▼
EngineState.pending_network_type (AtomicU8, lock-free)
│ polled every ~20ms
▼
recv task: quality_ctrl.signal_network_change(ctx)
│
├─ WiFi → Cellular: preemptive 1-tier downgrade
├─ Any change: 10s FEC boost (+0.2 ratio)
└─ Cellular: faster downgrade thresholds (2 vs 3)
Cellular generation is approximated from getLinkDownstreamBandwidthKbps() to avoid requiring READ_PHONE_STATE permission.
Audio Routing (Android)
Both Android app variants support 3-way audio routing: Earpiece → Speaker → Bluetooth SCO.
Audio Mode Lifecycle
MODE_IN_COMMUNICATION is set by the Rust call engine (via JNI AudioManager.setMode()) right before Oboe streams open — NOT at app launch. Restored to MODE_NORMAL when the call ends. This prevents hijacking system audio routing (music, BT A2DP) before a call is active.
Native Kotlin App
AudioRouteManager.kt handles device detection (via AudioDeviceCallback), SCO lifecycle, and auto-fallback on BT disconnect. CallViewModel.cycleAudioRoute() cycles through available routes.
Tauri Desktop App
android_audio.rs provides JNI bridges to AudioManager for speakerphone and Bluetooth SCO control. After each route change, Oboe streams are stopped and restarted via spawn_blocking.
User tap ──► cycleAudioRoute()
│
├─ Earpiece: setSpeakerphoneOn(false) + clearCommunicationDevice()
├─ Speaker: setSpeakerphoneOn(true)
└─ BT SCO: setCommunicationDevice(bt_device) [API 31+]
│ fallback: startBluetoothSco() [API < 31]
▼
Oboe stop + start_bt() for BT / start() for others
BT SCO and Oboe
BT SCO only supports 8/16kHz. When bt_active=1, Oboe capture skips setSampleRate(48000) and setInputPreset(VoiceCommunication), letting the system choose the native BT rate. Oboe's SampleRateConversionQuality::Best bridges to our 48kHz ring buffers. Playout uses Usage::Media in BT mode to avoid conflicts with the communication device routing.
Hangup Signal Fix
SignalMessage::Hangup now carries an optional call_id field. The relay uses it to end only the specific call instead of broadcasting to all active calls for the user — preventing a race where a hangup for call 1 kills a newly-placed call 2.
Phase 8: Tailscale-Inspired NAT Traversal (2026-04-14)
Five new modules in wzp-client bring NAT traversal capability close to Tailscale's approach:
┌──────────────────────────────────────────────────────────────────────┐
│ wzp-client NAT Traversal Stack │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ stun.rs │ │ portmap.rs │ │ reflect.rs (existing) │ │
│ │ RFC 5389 │ │ NAT-PMP │ │ Relay-based STUN │ │
│ │ Public │ │ PCP │ │ Multi-relay NAT detect │ │
│ │ STUN │ │ UPnP IGD │ │ │ │
│ └──────┬──────┘ └──────┬───────┘ └────────────┬─────────────┘ │
│ │ │ │ │
│ └────────────────┼────────────────────────┘ │
│ │ │
│ ┌───────▼────────┐ │
│ │ ice_agent.rs │ │
│ │ Gather / Re- │ │
│ │ gather / Apply│ │
│ └───────┬────────┘ │
│ │ │
│ ┌───────────┼───────────┐ │
│ │ │ │ │
│ ┌───────▼───┐ ┌───▼───┐ ┌───▼──────────┐ │
│ │ netcheck │ │ dual_ │ │ relay_map.rs │ │
│ │ .rs │ │ path │ │ RTT-sorted │ │
│ │ Diagnostic│ │ .rs │ │ relay list │ │
│ └───────────┘ │ Race │ └──────────────┘ │
│ └───────┘ │
└──────────────────────────────────────────────────────────────────────┘
Candidate Types
| Type | Source | Priority | When Used |
|---|---|---|---|
| Host | local_host_candidates() |
1 (highest) | Same-LAN peers |
| Port-mapped | portmap::acquire_port_mapping() |
2 | Router supports NAT-PMP/PCP/UPnP |
| Server-reflexive | stun::discover_reflexive() or relay Reflect |
3 | Cone NAT |
| Relay | Relay address (fallback) | 4 (lowest) | Always available |
Signal Flow for Mid-Call Re-Gathering
Network change (WiFi → cellular)
│
▼
IceAgent::re_gather()
├── stun::discover_reflexive()
├── portmap::acquire_port_mapping()
└── local_host_candidates()
│
▼
SignalMessage::CandidateUpdate { generation: N+1, ... }
│
▼ (via relay)
Peer's IceAgent::apply_peer_update()
│
▼
PeerCandidates { reflexive, local, mapped }
│
▼
dual_path::race() with new candidates (TODO: transport hot-swap)
New SignalMessage Variants & Fields
| Signal | New Fields | Purpose |
|---|---|---|
DirectCallOffer |
caller_mapped_addr |
Port-mapped address from NAT-PMP/PCP/UPnP |
DirectCallAnswer |
callee_mapped_addr |
Same, callee side |
CallSetup |
peer_mapped_addr |
Relay cross-wires mapped addr to peer |
CandidateUpdate |
(new variant) | Mid-call candidate re-gathering |
RegisterPresenceAck |
relay_region, available_relays |
Relay mesh metadata for auto-selection |
All new fields use #[serde(default, skip_serializing_if)] for backward compatibility with older clients/relays.
Hard NAT Port Prediction
For symmetric NATs that don't support port mapping, the system detects the NAT's port allocation pattern:
Single socket → 5 STUN servers (sequential probes)
│
▼
Observed ports: [40001, 40002, 40003, 40004, 40005]
│
▼
classify_port_allocation() → Sequential { delta: 1 }
│
▼
predict_ports(last=40005, delta=1, offset=0, spread=2)
→ [40004, 40005, 40006, 40007, 40008]
│
▼
HardNatProbe signal → peer
│
▼
Peer dials predicted port range in parallel
| Pattern | Detection | Traversal Strategy |
|---|---|---|
| Port-preserving | All probes return same port | Standard hole-punch |
| Sequential (delta=N) | Consistent N-increment | Predict next port, dial range |
| Random | No pattern | Birthday attack or relay |
| Unknown | < 3 probes succeeded | Relay fallback |
The classifier tolerates:
- Jitter: ±1 from dominant delta (concurrent flow grabbed a port)
- Wraparound: 65535 → 1 treated as delta=+2, not -65534
- Noise: 60% threshold — if most deltas agree, call it sequential