Files
wz-phone/docs/ARCHITECTURE.md
Siavash Sameni ed8a7ae5aa docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:00:17 +04:00

51 KiB

WarzonePhone Architecture

Custom lossy VoIP protocol built in Rust. E2E encrypted, FEC-protected, adaptive quality, designed for hostile network conditions.

System Overview

graph TB
    subgraph "Client A (Desktop / Android / CLI)"
        MIC[Microphone] --> DN[NoiseSuppressor<br/>RNNoise ML]
        DN --> SD[SilenceDetector<br/>VAD + Hangover]
        SD --> ENC[CallEncoder<br/>Opus / Codec2]
        ENC --> FEC_E[FEC Encoder<br/>RaptorQ]
        FEC_E --> CRYPT_E[ChaCha20-Poly1305<br/>Encrypt]
        CRYPT_E --> QUIC_S[QUIC Datagram<br/>Send]

        QUIC_R[QUIC Datagram<br/>Recv] --> CRYPT_D[ChaCha20-Poly1305<br/>Decrypt]
        CRYPT_D --> FEC_D[FEC Decoder<br/>RaptorQ]
        FEC_D --> JIT[JitterBuffer<br/>Adaptive Playout]
        JIT --> DEC[CallDecoder<br/>Opus / Codec2]
        DEC --> SPK[Speaker]
    end

    subgraph "Relay (SFU)"
        ACCEPT[Accept QUIC] --> AUTH{Auth?}
        AUTH -->|token| VALIDATE[POST /v1/auth/validate]
        AUTH -->|no auth| HS
        VALIDATE --> HS[Crypto Handshake<br/>X25519 + Ed25519]
        HS --> ROOM[Room Manager<br/>Named Rooms via SNI]
        ROOM --> FWD[Forward to<br/>Other Participants]
    end

    subgraph "Client B"
        B_SPK[Speaker]
        B_MIC[Microphone]
    end

    QUIC_S -->|UDP / QUIC| ACCEPT
    FWD -->|UDP / QUIC| QUIC_R
    B_MIC -.->|same pipeline| ACCEPT
    FWD -.->|same pipeline| B_SPK

    style MIC fill:#4a9eff,color:#fff
    style SPK fill:#4a9eff,color:#fff
    style B_MIC fill:#4a9eff,color:#fff
    style B_SPK fill:#4a9eff,color:#fff
    style ROOM fill:#ff9f43,color:#fff
    style CRYPT_E fill:#ee5a24,color:#fff
    style CRYPT_D fill:#ee5a24,color:#fff

Crate Dependency Graph

graph TD
    PROTO["wzp-proto<br/>Types, Traits, Wire Format"]

    CODEC["wzp-codec<br/>Opus + Codec2 + RNNoise"]
    FEC["wzp-fec<br/>RaptorQ FEC"]
    CRYPTO["wzp-crypto<br/>ChaCha20 + Identity"]
    TRANSPORT["wzp-transport<br/>QUIC / Quinn"]
    VIDEO["wzp-video<br/>H.264 + H.265 + AV1"]

    RELAY["wzp-relay<br/>Relay Daemon"]
    CLIENT["wzp-client<br/>CLI + Call Engine"]
    WEB["wzp-web<br/>Browser Bridge"]

    PROTO --> CODEC
    PROTO --> FEC
    PROTO --> CRYPTO
    PROTO --> TRANSPORT
    PROTO --> VIDEO

    CODEC --> CLIENT
    FEC --> CLIENT
    CRYPTO --> CLIENT
    TRANSPORT --> CLIENT
    VIDEO --> CLIENT

    CODEC --> RELAY
    FEC --> RELAY
    CRYPTO --> RELAY
    TRANSPORT --> RELAY
    VIDEO --> RELAY

    CLIENT --> WEB
    TRANSPORT --> WEB
    CRYPTO --> WEB

    FC["warzone-protocol<br/>featherChat Identity"] -.->|path dep| CRYPTO

    style PROTO fill:#6c5ce7,color:#fff
    style RELAY fill:#ff9f43,color:#fff
    style CLIENT fill:#00b894,color:#fff
    style WEB fill:#0984e3,color:#fff
    style FC fill:#fd79a8,color:#fff
    style VIDEO fill:#a29bfe,color:#fff

Star pattern: Each leaf crate (wzp-codec, wzp-fec, wzp-crypto, wzp-transport, wzp-video) depends only on wzp-proto. No leaf depends on another leaf. Integration crates (wzp-relay, wzp-client, wzp-web) depend on all leaves.

Audio Encode Pipeline

sequenceDiagram
    participant Mic as Microphone<br/>(48kHz)
    participant Ring as SPSC Ring<br/>(lock-free)
    participant RNN as RNNoise<br/>(2 x 480)
    participant VAD as SilenceDetector
    participant Codec as Opus / Codec2
    participant DT as DredTuner<br/>(wzp-proto)
    participant FEC as RaptorQ FEC
    participant INT as Interleaver<br/>(depth=3)
    participant HDR as MediaHeader<br/>(16B or Mini 5B)
    participant Enc as ChaCha20-Poly1305
    participant QUIC as QUIC Datagram
    participant QPS as QuinnPathSnapshot

    Mic->>Ring: f32 x 512 (macOS callback)
    Ring->>Ring: Accumulate to 960 samples
    Ring->>RNN: PCM i16 x 960 (20ms frame)
    RNN->>VAD: Denoised audio
    alt Speech active (or hangover)
        VAD->>Codec: Encode active frame
    else Silence (>100ms)
        VAD->>Codec: ComfortNoise (every 200ms)
    end

    Note over QPS,DT: Every 25 frames (~500ms)
    QPS->>DT: loss_pct, rtt_ms, jitter_ms
    DT->>Codec: set_dred_duration() + set_expected_loss()

    alt Opus tier (any bitrate)
        Codec->>HDR: Compressed bytes + DRED side-channel (no RaptorQ)
    else Codec2 tier
        Codec->>FEC: Compressed bytes (pad to 256B symbol)
        FEC->>FEC: Accumulate block (5-10 symbols)
        FEC->>INT: Source + repair symbols
        INT->>HDR: Interleaved packets
    end
    HDR->>Enc: Header as AAD
    Enc->>QUIC: Encrypted payload + 16B tag

Key Details

  • macOS delivers 512 f32 samples per callback (not configurable to 960)
  • Ring buffer accumulates to 960 samples (20ms at 48 kHz) for codec frame
  • RNNoise processes 2 x 480 samples (ML-based noise suppression via nnnoiseless)
  • Silence detection uses VAD + 100ms hangover before switching to ComfortNoise
  • FEC symbols are padded to 256 bytes with a 2-byte LE length prefix
  • MiniHeaders (5 bytes) replace full headers (16 bytes) for 49 of every 50 audio frames; video always uses full headers
  • DRED tuner polls quinn path stats every 25 frames (~500ms) and adjusts DRED lookback duration continuously
  • Opus tiers bypass RaptorQ entirely -- DRED handles loss recovery at the codec layer
  • Opus6k DRED window: 1040ms (maximum libopus allows)

Audio Decode Pipeline

sequenceDiagram
    participant QUIC as QUIC Datagram
    participant Dec as ChaCha20-Poly1305
    participant AR as Anti-Replay<br/>(sliding window)
    participant HDR as Header Parse
    participant DEINT as De-interleaver
    participant FEC as RaptorQ FEC<br/>(reconstruct)
    participant JIT as JitterBuffer<br/>(BTreeMap)
    participant Codec as Opus / Codec2
    participant Ring as SPSC Ring<br/>(lock-free)
    participant SPK as Speaker

    QUIC->>Dec: Encrypted packet
    Dec->>AR: Decrypt (header = AAD)
    AR->>AR: Check seq window (reject replay)
    AR->>HDR: Verified packet

    alt Opus packet
        HDR->>JIT: Direct to jitter buffer (no FEC/interleave)
    else Codec2 packet
        HDR->>DEINT: MediaHeader + payload
        DEINT->>FEC: Reordered symbols by block
        FEC->>FEC: Attempt decode (need K of K+R)
        FEC->>JIT: Recovered audio frames
    end

    JIT->>JIT: BTreeMap ordered by seq
    JIT->>JIT: Wait until depth >= target

    alt Packet present
        JIT->>Codec: Pop lowest seq frame
    else Packet missing (Opus)
        JIT->>Codec: DRED reconstruction (neural)
        alt DRED fails or unavailable
            Codec->>Codec: Classical PLC fallback
        end
    else Packet missing (Codec2)
        Codec->>Codec: Classical PLC
    end

    Codec->>Ring: PCM i16 x 960
    Ring->>SPK: Audio callback pulls samples

Key Details

  • Anti-replay uses a 64-packet sliding window to reject duplicates
  • FEC decoder needs any K of K+R symbols to reconstruct a block
  • Jitter buffer target: 10 packets (200ms) for client, 50 packets (1s) for relay
  • Desktop client uses direct playout (no jitter buffer) with lock-free ring
  • Codec2 frames at 8 kHz are resampled to 48 kHz transparently
  • DRED reconstruction: on packet loss, decoder tries neural DRED reconstruction before falling back to classical PLC
  • Jitter-spike detection pre-emptively boosts DRED to ceiling when jitter variance spikes >30%

Relay SFU Forwarding

graph TB
    subgraph "Room Mode (Default SFU)"
        C1[Client 1<br/>Alice] -->|"QUIC SNI=room-hash"| RM[Room Manager]
        C2[Client 2<br/>Bob] -->|"QUIC SNI=room-hash"| RM
        C3[Client 3<br/>Charlie] -->|"QUIC SNI=room-hash"| RM
        RM --> R1["Room 'podcast'"]
        R1 -->|"fan-out (skip sender)"| C1
        R1 -->|"fan-out (skip sender)"| C2
        R1 -->|"fan-out (skip sender)"| C3
    end

    subgraph "Forward Mode (--remote)"
        C4[Client] -->|QUIC| RA[Relay A]
        RA -->|"FEC decode<br/>jitter buffer<br/>FEC re-encode"| RB[Relay B<br/>--remote]
        RB -->|QUIC| C5[Client]
    end

    subgraph "Probe Mode (--probe)"
        PA[Relay A] -->|"Ping 1/s<br/>~50 bytes"| PB[Relay B]
        PB -->|Pong| PA
        PA --> PM[Prometheus<br/>RTT / Loss / Jitter]
    end

    style RM fill:#ff9f43,color:#fff
    style R1 fill:#fdcb6e
    style PM fill:#0984e3,color:#fff

SFU Fan-out Rules

  1. Each incoming datagram is forwarded to all other participants in the room
  2. The sender is excluded from fan-out (no echo)
  3. If one send fails, the relay continues to the next participant (best-effort)
  4. The relay never decodes or re-encodes audio (preserves E2E encryption)
  5. With trunking enabled, packets to the same receiver are batched into TrunkFrames (flushed every 5ms)
  6. Relay tracks per-participant quality from QualityReport trailers and broadcasts QualityDirective when the room-wide tier degrades (coordinated codec switching)

Federation Topology

graph TB
    subgraph "Relay A (EU)"
        A_R["Room Manager"]
        A_F["Federation<br/>Manager"]
        A1["Alice (local)"]
        A2["Bob (local)"]
    end

    subgraph "Relay B (US)"
        B_R["Room Manager"]
        B_F["Federation<br/>Manager"]
        B1["Charlie (local)"]
    end

    subgraph "Relay C (APAC)"
        C_R["Room Manager"]
        C_F["Federation<br/>Manager"]
        C1["Dave (local)"]
    end

    A1 -->|media| A_R
    A2 -->|media| A_R
    B1 -->|media| B_R
    C1 -->|media| C_R

    A_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| B_F
    A_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| C_F
    B_F <-->|"SNI='_federation'<br/>GlobalRoomActive<br/>media forward"| C_F

    A_R --> A_F
    B_R --> B_F
    C_R --> C_F

    style A_F fill:#6c5ce7,color:#fff
    style B_F fill:#6c5ce7,color:#fff
    style C_F fill:#6c5ce7,color:#fff
    style A_R fill:#ff9f43,color:#fff
    style B_R fill:#ff9f43,color:#fff
    style C_R fill:#ff9f43,color:#fff

Federation Protocol Flow

sequenceDiagram
    participant RA as Relay A
    participant RB as Relay B

    Note over RA: Startup: connect to configured peers

    RA->>RB: QUIC connect (SNI="_federation")
    RA->>RB: FederationHello { tls_fingerprint }
    RB->>RB: Verify fingerprint against [[trusted]]

    Note over RA,RB: Federation link established

    Note over RA: Alice joins global room "podcast"
    RA->>RB: GlobalRoomActive { room: "podcast" }

    Note over RB: Charlie joins global room "podcast"
    RB->>RA: GlobalRoomActive { room: "podcast" }

    Note over RA,RB: Media bridging active

    loop Every media packet in global room
        RA->>RB: [room_hash:8][encrypted_media]
        RB->>RA: [room_hash:8][encrypted_media]
    end

    Note over RA: Last local participant leaves
    RA->>RB: GlobalRoomInactive { room: "podcast" }

Wire Formats

MediaHeader v2 (16 bytes, byte-aligned)

Byte 0:    version       (u8)   0x02
Byte 1:    flags         (u8)   [T:1][Q:1][KeyFrame:1][FrameEnd:1][reserved:4]
                                T = FEC repair, Q = QualityReport trailer
                                KeyFrame = packet belongs to an I-frame (video)
                                FrameEnd = last packet of an access unit (video)
Byte 2:    media_type    (u8)   0=audio, 1=video, 2=data, 3=control
Byte 3:    codec_id      (u8)   widened from 4-bit (room for 256 codec IDs)
Byte 4:    stream_id     (u8)   simulcast layer; 0=base
Byte 5:    fec_ratio     (u8)   0..200 → 0.0..2.0
Bytes 6-9:   sequence    (u32 BE)   wrapping packet sequence number
Bytes 10-13: timestamp_ms (u32 BE)  milliseconds since session start
Bytes 14-15: fec_block_id (u16 BE)
                                audio: low 8 bits = block_id, high 8 bits = symbol_idx
                                video: full u16 block_id (large blocks for I-frames)

CodecID Values

Audio codecs (media_type = 0)

Value Codec Bitrate Sample Rate Frame Duration
0 Opus 24k 24 kbps 48 kHz 20ms
1 Opus 16k 16 kbps 48 kHz 20ms
2 Opus 6k 6 kbps 48 kHz 40ms
3 Codec2 3200 3.2 kbps 8 kHz 20ms
4 Codec2 1200 1.2 kbps 8 kHz 40ms
5 ComfortNoise 0 48 kHz 20ms
6 Opus 32k 32 kbps 48 kHz 20ms
7 Opus 48k 48 kbps 48 kHz 20ms
8 Opus 64k 64 kbps 48 kHz 20ms

Video codecs (media_type = 1)

Value Codec Notes
9 H.264 Baseline Universal HW encode coverage
10 H.264 Main Slight quality win over baseline
11 H.265 Main Apple A10+, Snapdragon ~2017, NVENC GTX 9xx+; ~30% better than H.264
12 AV1 Main Apple M3/A17+, Snapdragon 8 Gen 3+, RTX 40+; best efficiency, narrow HW

MiniHeader v2 (5 bytes)

[FRAME_TYPE_MINI = 0x01]
Byte 0:    seq_delta            (u8)       delta from last full header's seq
Bytes 1-2: timestamp_delta_ms   (u16 BE)
Bytes 3-4: payload_len          (u16 BE)

Used for audio only (49 of every 50 frames). Saves 11 bytes per audio packet vs the full 16B header. Full header is sent every 50th frame to resynchronize state. Video always uses full 16B headers.

TrunkFrame (batched datagrams)

[count: u16]
  [session_id: 2][len: u16][payload: len]  x count

Packs multiple session packets into one QUIC datagram. Maximum 10 entries or PMTUD-discovered MTU (starts at 1200, grows to ~1452 on Ethernet), flushed every 5ms.

QualityReport (4 bytes, optional trailer)

Byte 0: loss_pct    (0-255 maps to 0-100%)
Byte 1: rtt_4ms     (0-255 maps to 0-1020ms, resolution 4ms)
Byte 2: jitter_ms   (0-255ms)
Byte 3: bitrate_cap_kbps (0-255 kbps)

Appended to a media packet when the Q flag is set in the MediaHeader.

Path MTU Discovery

Quinn's PLPMTUD is enabled with:

  • initial_mtu: 1200 bytes (QUIC minimum, always safe)
  • upper_bound: 1452 bytes (Ethernet minus IP/UDP/QUIC headers)
  • interval: 300s (re-probe every 5 minutes)
  • black_hole_cooldown: 30s (faster retry on lossy links)

The discovered MTU is exposed via QuinnPathSnapshot::current_mtu and used by:

  • TrunkedForwarder: refreshes max_bytes on every send to fill larger datagrams
  • Future video framer: larger MTU = fewer application-layer fragments per frame

Continuous DRED Tuning

Instead of locking DRED duration to 3 discrete quality tiers, the DredTuner (in wzp-proto::dred_tuner) maps live path quality to a continuous DRED duration:

Input Source Update Rate
Loss % QuinnPathSnapshot::loss_pct (from quinn ACK frames) Every 25 packets (~500ms)
RTT ms QuinnPathSnapshot::rtt_ms (quinn congestion controller) Every 25 packets
Jitter ms PathMonitor::jitter_ms (EWMA of RTT variance) Every 25 packets

Mapping Logic

  • Baseline: codec-tier default (Studio=100ms, Good=200ms, Degraded=500ms)
  • Ceiling: codec-tier max (Studio=300ms, Good=500ms, Degraded=1040ms)
  • Continuous: linear interpolation between baseline and ceiling based on loss (0%->baseline, 40%->ceiling)
  • RTT phantom loss: high RTT (>200ms) adds phantom loss contribution to keep DRED generous
  • Jitter spike: >30% EWMA spike pre-emptively boosts to ceiling for ~5s cooldown

Output

DredTuning { dred_frames: u8, expected_loss_pct: u8 } -> fed to CallEncoder::apply_dred_tuning() -> OpusEncoder::set_dred_duration() + set_expected_loss()

Signal Message Handshake Flow

sequenceDiagram
    participant C as Client
    participant R as Relay

    C->>R: QUIC Connect (SNI = hashed room name)

    alt Auth enabled (--auth-url)
        C->>R: SignalMessage::AuthToken { token }
        R->>R: POST auth_url to validate
        R-->>C: (connection closed if invalid)
    end

    C->>R: CallOffer { identity_pub, ephemeral_pub, signature, supported_profiles }
    R->>R: Verify Ed25519 signature
    R->>R: Generate ephemeral X25519
    R->>R: shared_secret = DH(eph_relay, eph_client)
    R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
    R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, chosen_profile }

    C->>C: Verify signature
    C->>C: Derive same session_key

    Note over C,R: Session established -- both have ChaCha20-Poly1305 key

    C->>R: RoomUpdate (join notification broadcast)

    loop Media exchange
        C->>R: QUIC Datagram (encrypted media)
        R->>C: QUIC Datagram (forwarded from others)
    end

    opt Every 65,536 packets
        C->>R: Rekey { new_ephemeral_pub, signature }
        R->>C: Rekey { new_ephemeral_pub, signature }
        Note over C,R: New session key via fresh DH
    end

    C->>R: Hangup { reason: Normal }
    R->>R: Remove from room, broadcast RoomUpdate

Relay Concurrency Model

Threading

  • Multi-threaded Tokio runtime (all available cores, work-stealing scheduler)
  • Task-per-connection: each QUIC connection gets a dedicated tokio::spawn
  • Task-per-participant-per-room: each participant's media forwarding loop is independent

Shared State & Locking

The RoomManager stores DashMap<String, Arc<RwLock<Room>>>. The DashMap guard is held only long enough to clone the Arc; all per-room operations then acquire the room-level RwLock. Concurrent fan-out calls share a read lock; join/leave acquire write lock.

Lock Protected Data Hold Duration Contention
DashMap<room_id, Arc<RwLock<Room>>> Room registry Instant (clone Arc only) Near-zero
Room (RwLock) Participants, quality tiers ~1ms/packet (read); ~1ms (write on join/leave) Low (concurrent reads)
PresenceRegistry (Mutex) Fingerprint registrations ~1ms Low (join/leave only)
SessionManager (Mutex) Active session tracking ~1ms Low
FederationManager.peer_links (Mutex) Peer connections ~10ms during forward Per-federation-packet

Scaling Characteristics

  • Many small rooms: Scales well across all cores (rooms are independent)
  • Large single room (100+ participants): Fan-out reads share RwLock (non-blocking); only join/leave serializes
  • Federation: Per-peer tasks scale; peer_links lock held during send loop

Client Architecture

Desktop Engine (Tauri)

graph TB
    subgraph "Tauri Frontend (HTML/JS)"
        UI[Connect / Call UI]
        SET[Settings Panel]
    end

    subgraph "Tauri Rust Backend"
        CMD[Tauri Commands<br/>connect/disconnect/toggle]
        ENG[WzpEngine<br/>State Machine]
    end

    subgraph "Audio I/O"
        CPAL_C[CPAL Capture<br/>or VoiceProcessingIO]
        RING_C[SPSC Ring<br/>Capture]
        RING_P[SPSC Ring<br/>Playout]
        CPAL_P[CPAL Playback<br/>or VoiceProcessingIO]
    end

    subgraph "Network Tasks (tokio)"
        SEND[Send Loop<br/>encode + encrypt]
        RECV[Recv Loop<br/>decrypt + decode]
        SIG[Signal Handler<br/>room updates]
    end

    UI --> CMD
    SET --> CMD
    CMD --> ENG
    ENG --> SEND
    ENG --> RECV
    ENG --> SIG

    CPAL_C --> RING_C --> SEND
    RECV --> RING_P --> CPAL_P

    style ENG fill:#00b894,color:#fff
    style SEND fill:#0984e3,color:#fff
    style RECV fill:#0984e3,color:#fff

Key design decisions:

  • Lock-free SPSC rings between audio callbacks and network tasks (no mutex on audio thread)
  • VoiceProcessingIO on macOS for OS-level AEC (CPAL uses HalOutput which has no AEC)
  • Direct playout -- no jitter buffer on client; audio callback pulls from ring
  • Release builds required -- debug builds too slow for real-time audio

Android Engine (Kotlin + JNI)

Note (2026-05-12): The Kotlin+JNI Android app (android/app/) described below is superseded by the Tauri 2.x mobile build (desktop/src-tauri/ + crates/wzp-native/). The Tauri approach uses the same Rust call engine as desktop, with Oboe audio via wzp-native cdylib. The Kotlin codebase is maintained for reference but the Tauri build is the live production app.

graph TB
    subgraph "Compose UI"
        CALL[CallActivity]
        SET[SettingsScreen]
        VM[CallViewModel]
    end

    subgraph "Service Layer"
        SVC[CallService<br/>Foreground Service]
        PIPE[AudioPipeline<br/>AudioTrack + AudioRecord]
    end

    subgraph "Rust Engine (JNI)"
        JNI[WzpEngine.kt<br/>JNI bridge]
        NATIVE[libwzp_android.so<br/>Rust call engine]
    end

    subgraph "Android Audio"
        REC[AudioRecord<br/>+ AEC effect]
        TRK[AudioTrack<br/>low-latency]
    end

    CALL --> VM
    SET --> VM
    VM --> SVC
    SVC --> PIPE
    PIPE --> JNI
    JNI --> NATIVE

    REC --> PIPE
    PIPE --> TRK

    style NATIVE fill:#00b894,color:#fff
    style SVC fill:#ff9f43,color:#fff
    style PIPE fill:#0984e3,color:#fff

Key design decisions:

  • Foreground service keeps audio alive when the screen is off
  • AudioRecord + AudioTrack with Android's built-in AEC (AudioEffect)
  • Lock-free AudioRing with preallocated Vec (not push/pop) to avoid allocation on audio thread
  • JNI bridge marshals PCM frames between Kotlin and Rust

CLI Architecture

graph TB
    subgraph "CLI Modes"
        LIVE[--live<br/>Mic + Speaker]
        TONE[--send-tone<br/>Sine Generator]
        FILE[--send-file<br/>PCM Reader]
        ECHO[--echo-test<br/>Quality Analysis]
        DRIFT[--drift-test<br/>Clock Analysis]
        SWEEP[--sweep<br/>Buffer Sweep]
    end

    subgraph "Call Engine"
        ENCODE[CallEncoder<br/>codec + FEC]
        DECODE[CallDecoder<br/>FEC + codec]
        QA[QualityAdapter<br/>adaptive switching]
    end

    subgraph "Transport"
        QUIC[QuinnTransport<br/>send/recv media + signal]
        HS[Handshake<br/>X25519 + Ed25519]
    end

    LIVE --> ENCODE
    TONE --> ENCODE
    FILE --> ENCODE
    ENCODE --> QUIC
    QUIC --> DECODE
    ECHO --> ENCODE
    ECHO --> DECODE
    DRIFT --> ENCODE
    HS --> QUIC

    style ENCODE fill:#00b894,color:#fff
    style DECODE fill:#00b894,color:#fff
    style QUIC fill:#0984e3,color:#fff

Adaptive Quality System

graph LR
    subgraph GOOD ["GOOD (28.8 kbps)"]
        G_C[Opus 24kbps]
        G_F[FEC 20%]
        G_FR[20ms frames]
    end

    subgraph DEGRADED ["DEGRADED (9.0 kbps)"]
        D_C[Opus 6kbps]
        D_F[FEC 50%]
        D_FR[40ms frames]
    end

    subgraph CATASTROPHIC ["CATASTROPHIC (2.4 kbps)"]
        C_C[Codec2 1200bps]
        C_F[FEC 100%]
        C_FR[40ms frames]
    end

    GOOD -->|"loss>10% or RTT>400ms<br/>3 consecutive reports"| DEGRADED
    DEGRADED -->|"loss>40% or RTT>600ms<br/>3 consecutive"| CATASTROPHIC
    CATASTROPHIC -->|"loss<10% and RTT<400ms<br/>10 consecutive"| DEGRADED
    DEGRADED -->|"loss<10% and RTT<400ms<br/>10 consecutive"| GOOD

    style GOOD fill:#00b894,color:#fff
    style DEGRADED fill:#fdcb6e
    style CATASTROPHIC fill:#e17055,color:#fff

Hysteresis prevents tier flapping: fast downgrade (3 reports, or 2 on cellular) and slow upgrade (10 reports, one tier at a time).

Cryptographic Handshake

sequenceDiagram
    participant C as Caller
    participant R as Relay / Callee

    Note over C: Derive identity from seed<br/>Ed25519 + X25519 via HKDF

    C->>C: Generate ephemeral X25519
    C->>C: Sign(ephemeral_pub || "call-offer")
    C->>R: CallOffer { identity_pub, ephemeral_pub, signature, profiles }

    R->>R: Verify Ed25519 signature
    R->>R: Generate ephemeral X25519
    R->>R: shared_secret = DH(eph_b, eph_a)
    R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
    R->>R: Sign(ephemeral_pub || "call-answer")
    R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, profile }

    C->>C: Verify signature
    C->>C: shared_secret = DH(eph_a, eph_b)
    C->>C: session_key = HKDF(shared_secret)

    Note over C,R: Both have identical ChaCha20-Poly1305 session key
    C->>R: Encrypted media (QUIC datagrams)
    R->>C: Encrypted media (QUIC datagrams)

    Note over C,R: Rekey every 65,536 packets<br/>New ephemeral DH + HKDF mix

Identity Model

graph TD
    SEED["32-byte Seed<br/>(BIP39 Mnemonic: 24 words)"] --> HKDF1["HKDF<br/>salt=None<br/>info='warzone-ed25519'"]
    SEED --> HKDF2["HKDF<br/>salt=None<br/>info='warzone-x25519'"]

    HKDF1 --> ED["Ed25519 SigningKey<br/>Digital Signatures"]
    HKDF2 --> X25519["X25519 StaticSecret<br/>Key Agreement"]

    ED --> VKEY["Ed25519 VerifyingKey<br/>(Public)"]
    X25519 --> XPUB["X25519 PublicKey<br/>(Public)"]

    VKEY --> FP["Fingerprint<br/>SHA-256(pubkey) truncated 16 bytes<br/>xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx"]

    style SEED fill:#6c5ce7,color:#fff
    style FP fill:#fd79a8,color:#fff
    style ED fill:#ee5a24,color:#fff
    style X25519 fill:#00b894,color:#fff

Adaptive Jitter Buffer

graph TD
    PKT[Incoming Packet] --> SEQ{Sequence Check}
    SEQ -->|Duplicate| DROP[Drop + AntiReplay]
    SEQ -->|Valid| BUF["BTreeMap Buffer<br/>(ordered by seq)"]

    BUF --> ADAPT["AdaptivePlayoutDelay<br/>(EMA jitter tracking)"]
    ADAPT --> TARGET["target_delay =<br/>ceil(jitter_ema / 20ms) + 2"]

    BUF --> READY{"depth >= target?"}
    READY -->|No| WAIT["Wait (Underrun++)"]
    READY -->|Yes| POP[Pop lowest seq]
    POP --> DECODE[Decode to PCM]
    DECODE --> PLAY[Playout]

    BUF --> OVERFLOW{"depth > max?"}
    OVERFLOW -->|Yes| EVICT["Drop oldest (Overrun++)"]

    style ADAPT fill:#fdcb6e
    style DROP fill:#e17055,color:#fff
    style EVICT fill:#e17055,color:#fff

FEC Protection (RaptorQ)

graph LR
    subgraph "Encoder"
        F1[Frame 1] --> BLK["Source Block<br/>(5-10 frames)"]
        F2[Frame 2] --> BLK
        F3[Frame 3] --> BLK
        F4[Frame 4] --> BLK
        F5[Frame 5] --> BLK
        BLK --> SRC[5 Source Symbols]
        BLK --> REP["1-10 Repair Symbols<br/>(ratio dependent)"]
        SRC --> INT["Interleaver<br/>(depth=3)"]
        REP --> INT
    end

    subgraph "Network"
        INT --> LOSS{Packet Loss}
        LOSS -->|some lost| RCV[Received Symbols]
    end

    subgraph "Decoder"
        RCV --> DEINT[De-interleaver]
        DEINT --> RAPTORQ["RaptorQ Decoder<br/>Reconstruct from<br/>any K of K+R symbols"]
        RAPTORQ --> OUT[Original Frames]
    end

    style LOSS fill:#e17055,color:#fff
    style RAPTORQ fill:#00b894,color:#fff

Telemetry Stack

graph TB
    subgraph "Relay"
        RM["RelayMetrics<br/>sessions, rooms, packets"]
        SM["SessionMetrics<br/>per-session jitter, loss, RTT"]
        PM["ProbeMetrics<br/>inter-relay RTT, loss"]
        RM --> PROM1["GET /metrics :9090"]
        SM --> PROM1
        PM --> PROM1
    end

    subgraph "Web Bridge"
        WM["WebMetrics<br/>connections, frames, latency"]
        WM --> PROM2["GET /metrics :8080"]
    end

    subgraph "Client"
        CM["JitterStats + QualityAdapter"]
        CM --> JSONL["--metrics-file<br/>JSONL 1 line/sec"]
    end

    PROM1 --> GRAF["Grafana Dashboard<br/>4 rows, 18 panels"]
    PROM2 --> GRAF
    JSONL --> ANALYSIS[Offline Analysis]

    style GRAF fill:#ff6b6b,color:#fff
    style PROM1 fill:#0984e3,color:#fff
    style PROM2 fill:#0984e3,color:#fff

Deployment Topology

graph TB
    subgraph "Region A"
        RA["wzp-relay A<br/>:4433 UDP"]
        WA["wzp-web A<br/>:8080 HTTPS"]
        WA --> RA
    end

    subgraph "Region B"
        RB["wzp-relay B<br/>:4433 UDP"]
        WB["wzp-web B<br/>:8080 HTTPS"]
        WB --> RB
    end

    RA <-->|"Probe 1/s + Federation"| RB

    BA[Browser A] -->|WSS| WA
    BB[Browser B] -->|WSS| WB
    CA[CLI Client] -->|QUIC| RA
    DA[Desktop Client] -->|QUIC| RA
    MA[Android Client] -->|QUIC| RB

    PROM[Prometheus] -->|scrape| RA
    PROM -->|scrape| RB
    PROM -->|scrape| WA
    PROM --> GRAF[Grafana]

    FC[featherChat Server] -->|auth validate| RA
    FC -->|auth validate| RB

    style RA fill:#ff9f43,color:#fff
    style RB fill:#ff9f43,color:#fff
    style GRAF fill:#ff6b6b,color:#fff
    style FC fill:#fd79a8,color:#fff

Session State Machine

stateDiagram-v2
    [*] --> Idle
    Idle --> Connecting: connect()
    Connecting --> Handshaking: QUIC established
    Handshaking --> Active: CallOffer/Answer complete
    Active --> Rekeying: 65,536 packets
    Rekeying --> Active: new key derived
    Active --> Closed: Hangup / Error / Timeout
    Rekeying --> Closed: Error
    Connecting --> Closed: Timeout
    Handshaking --> Closed: Signature fail

    note right of Active: Media flows (encrypted)
    note right of Rekeying: Media continues while rekeying

Project Structure

warzonePhone/
├── Cargo.toml                    # Workspace root
├── crates/
│   ├── wzp-proto/                # Protocol types, traits, wire format
│   │   └── src/
│   │       ├── codec_id.rs       # CodecId, QualityProfile
│   │       ├── error.rs          # Error types
│   │       ├── jitter.rs         # JitterBuffer, AdaptivePlayoutDelay
│   │       ├── packet.rs         # MediaHeader, MiniHeader, TrunkFrame, SignalMessage
│   │       ├── quality.rs        # Tier, AdaptiveQualityController
│   │       ├── session.rs        # SessionState machine
│   │       └── traits.rs         # AudioEncoder, FecEncoder, CryptoSession, etc.
│   ├── wzp-codec/                # Audio codecs
│   │   └── src/
│   │       ├── adaptive.rs       # AdaptiveEncoder/Decoder (Opus + Codec2)
│   │       ├── denoise.rs        # NoiseSuppressor (RNNoise / nnnoiseless)
│   │       └── silence.rs        # SilenceDetector, ComfortNoise
│   ├── wzp-fec/                  # Forward error correction
│   │   └── src/
│   │       ├── encoder.rs        # RaptorQFecEncoder
│   │       ├── decoder.rs        # RaptorQFecDecoder
│   │       └── interleave.rs     # Interleaver (burst protection)
│   ├── wzp-crypto/               # Cryptography + identity
│   │   └── src/
│   │       ├── identity.rs       # Seed, Fingerprint, hash_room_name
│   │       ├── handshake.rs      # WarzoneKeyExchange (X25519 + Ed25519)
│   │       ├── session.rs        # ChaChaSession (ChaCha20-Poly1305)
│   │       ├── nonce.rs          # Deterministic nonce construction
│   │       ├── anti_replay.rs    # Sliding window replay protection
│   │       └── rekey.rs          # Forward secrecy rekeying
│   ├── wzp-transport/            # QUIC transport layer
│   │   └── src/lib.rs            # QuinnTransport, send/recv media/signal/trunk
│   ├── wzp-video/                # Video codecs + framer
│   │   └── src/
│   │       ├── factory.rs        # VideoEncoder factory (platform dispatch)
│   │       ├── framer.rs         # NAL fragmentation (H.264/H.265)
│   │       ├── depacketizer.rs   # NAL reassembly, access unit emit
│   │       ├── controller.rs     # VideoQualityController
│   │       ├── simulcast.rs      # Simulcast layer management
│   │       ├── encoder_mode.rs   # Encoder mode selection
│   │       ├── av1_obu.rs        # AV1 OBU framing + depacketizer
│   │       ├── dav1d.rs          # dav1d AV1 software decoder
│   │       ├── svt_av1.rs        # SVT-AV1 software encoder (non-Android)
│   │       ├── videotoolbox.rs   # VideoToolbox H.265 + AV1 (macOS)
│   │       ├── mediacodec.rs     # MediaCodec H.264/H.265/AV1 (Android, NDK 0.9 migration pending)
│   │       └── nack.rs           # NACK sender/receiver framework
│   ├── wzp-relay/                # Relay daemon
│   │   └── src/
│   │       ├── main.rs           # CLI, connection loop, auth + handshake
│   │       ├── config.rs         # RelayConfig, TOML parsing
│   │       ├── room.rs           # RoomManager, TrunkedForwarder
│   │       ├── pipeline.rs       # RelayPipeline (forward mode)
│   │       ├── session_mgr.rs    # SessionManager (limits, lifecycle)
│   │       ├── auth.rs           # featherChat token validation
│   │       ├── handshake.rs      # Relay-side accept_handshake
│   │       ├── metrics.rs        # Prometheus RelayMetrics + per-session
│   │       ├── probe.rs          # Inter-relay probes + ProbeMesh
│   │       ├── federation.rs     # FederationManager, global rooms
│   │       ├── presence.rs       # PresenceRegistry
│   │       ├── route.rs          # RouteResolver
│   │       ├── trunk.rs          # TrunkBatcher
│   │       ├── audio_scorer.rs   # Per-stream audio quality scoring
│   │       ├── response_policy.rs # Relay response policy (rate-limit, drop)
│   │       ├── verdict.rs        # Verdict enum (Allow/RateLimit/Drop/Malicious)
│   │       ├── video_scorer.rs   # VideoScorer (legitimacy scoring, keyframe regularity)
│   │       └── ws.rs             # WebSocket handler for browser clients
│   ├── wzp-client/               # Call engine + CLI
│   │   └── src/
│   │       ├── cli.rs            # CLI arg parsing + main
│   │       ├── call.rs           # CallEncoder, CallDecoder, QualityAdapter
│   │       ├── handshake.rs      # Client-side perform_handshake
│   │       ├── featherchat.rs    # CallSignal bridge
│   │       ├── echo_test.rs      # Automated echo quality test
│   │       ├── drift_test.rs     # Clock drift measurement
│   │       ├── sweep.rs          # Jitter buffer parameter sweep
│   │       ├── metrics.rs        # JSONL telemetry writer
│   │       └── bench.rs          # Component benchmarks
│   └── wzp-web/                  # Browser bridge
│       ├── src/
│       │   ├── main.rs           # Axum server, WS handler, TLS
│       │   └── metrics.rs        # Prometheus WebMetrics
│       └── static/
│           ├── index.html        # SPA UI (room, PTT, level meter)
│           └── audio-processor.js # AudioWorklet (capture + playback)
├── android/                      # Android app (Kotlin + JNI)
│   └── app/src/main/java/com/wzp/
│       ├── audio/                # AudioPipeline, AudioRouteManager
│       ├── engine/               # WzpEngine (JNI), CallStats, WzpCallback
│       ├── ui/                   # CallActivity, SettingsScreen, Identicon
│       ├── data/                 # SettingsRepository
│       ├── net/                  # RelayPinger
│       ├── service/              # CallService (foreground)
│       └── debug/                # DebugReporter
├── desktop/                      # Desktop app (Tauri)
│   └── dist/                     # Built frontend (HTML/JS/CSS)
├── deps/featherchat/             # Git submodule
├── docs/                         # Documentation
├── scripts/                      # Build scripts
│   └── build-linux.sh            # Hetzner VM build
└── tools/                        # Development tools

Test Coverage

702 tests across all crates (excluding wzp-android), 0 failures:

Crate Tests Key Coverage
wzp-proto 112 Wire format, jitter buffer, quality tiers, mini-frames, trunking
wzp-codec 69 Opus/Codec2 roundtrip, silence detection, noise suppression
wzp-fec 21 RaptorQ encode/decode, loss recovery, interleaving
wzp-crypto 64 Encrypt/decrypt, handshake, anti-replay, featherChat identity
wzp-transport 11 QUIC connection setup, path monitoring
wzp-relay 137 Room ACL, session mgmt, metrics, probes, mesh, trunking, scoring, verdict
wzp-video 88 NAL framing, AV1 OBU, simulcast, quality controller, NACK
wzp-client 170 Encoder/decoder, quality adapter, silence, drift, sweep
wzp-web 2 Metrics
wzp-native 0 Native platform bindings (no unit tests)

Audio Backend Architecture (Platform Matrix)

WarzonePhone's audio I/O goes through one of four backends depending on the target platform and feature flags. All backends expose the same public API (AudioCapture::start() → AudioCapture { ring(), stop() }) via conditional re-exports in crates/wzp-client/src/lib.rs, so the CallEngine above the audio layer doesn't know or care which backend is running.

            ┌─────────────────────────────────────────────┐
            │         CallEngine (platform-agnostic)       │
            │    reads PCM from AudioCapture::ring()       │
            │    writes PCM to   AudioPlayback::ring()     │
            └────────────────────┬────────────────────────┘
                                 │
           ┌─────────────────────┼─────────────────────┐
           │                     │                     │
           ▼                     ▼                     ▼
   ┌───────────────┐    ┌────────────────┐    ┌───────────────┐
   │   audio_io    │    │  audio_vpio    │    │ audio_wasapi  │
   │   (CPAL)      │    │ (Core Audio    │    │   (Windows    │
   │               │    │  VoiceProc IO) │    │  IAudioClient2│
   │ All platforms │    │   macOS only   │    │   Windows     │
   │  (baseline)   │    │   feature=vpio │    │ feature=      │
   │               │    │                │    │  windows-aec  │
   └───────────────┘    └────────────────┘    └───────────────┘
                                                       │
                                                       ▼ on Android only
                                               ┌───────────────┐
                                               │  wzp-native   │
                                               │ (Oboe bridge  │
                                               │  via dlopen)  │
                                               │               │
                                               │ Android only  │
                                               │  libloading   │
                                               └───────────────┘

Backend selection matrix

Platform Capture Playback OS AEC Feature flags
macOS VoiceProcessingIO (native Core Audio) CPAL Yes — Apple's hardware-accelerated AEC (same AEC as FaceTime, iMessage audio, Voice Memos) audio, vpio
Windows (AEC build) Direct WASAPI with AudioCategory_Communications CPAL Yes — Windows routes the capture stream through the driver's communications APO chain (AEC + NS + AGC), driver-dependent quality audio, windows-aec
Windows (baseline) CPAL (WASAPI shared mode) CPAL No audio
Linux CPAL (ALSA / PulseAudio) CPAL No audio
Android (Tauri Mobile) Oboe via wzp-native cdylib, Usage::VoiceCommunication + MODE_IN_COMMUNICATION Same Oboe stream Depends on device (some Android devices apply AEC to the voice-communication stream, most do not) none (wzp-client compiled with default-features = false)

Why wzp-native is a standalone cdylib

On Android, the audio backend lives in a separate cdylib crate (crates/wzp-native) that wzp-desktop's lib crate loads at runtime via libloading. It is not linked as a regular Rust dep.

This is deliberate. rust-lang/rust#104707 documents that a crate with crate-type = ["cdylib", "staticlib"] leaks non-exported symbols from the staticlib into the cdylib. On Android, that caused Bionic's private __init_tcb / pthread_create symbols to be bound LOCALLY inside our .so instead of resolved dynamically against libc.so at dlopen time — which crashed the app at launch as soon as tao tried to std::thread::spawn() from the JNI onCreate callback.

Keeping wzp-native in its own cdylib and loading it via libloading means:

  1. The app's own .so has crate-type = ["cdylib", "rlib"] only — no staticlib, no symbol leak.
  2. libwzp_native.so is loaded via System.loadLibrary from the JVM side (or dlopen from Rust), which triggers the normal Bionic resolver and binds all private symbols against libc.so at load time.
  3. The C/C++ Oboe bridge is fully isolated inside libwzp_native.so's symbol space — no chance of its archives leaking into wzp-desktop's .so.

See docs/BRANCH-android-rewrite.md for the full incident postmortem and docs/incident-tauri-android-init-tcb.md for the debug log.

Vendored audiopus_sys for libopus / clang-cl cross-compile

The workspace root carries a vendored copy of audiopus_sys at vendor/audiopus_sys/ with a patched opus/CMakeLists.txt. This is needed because libopus 1.3.1 gates its per-file -msse4.1 / -mssse3 COMPILE_FLAGS behind if(NOT MSVC), and under clang-cl (used by cargo-xwin for Windows cross-compiles) CMake sets MSVC=1 unconditionally — so the SIMD source files compile without the required target feature and fail to link the intrinsic always_inline functions.

The patch introduces an MSVC_CL variable that is true only for real cl.exe (distinguished via CMAKE_C_COMPILER_ID STREQUAL "MSVC"), and flips the eight if(NOT MSVC) SIMD guards to if(NOT MSVC_CL) so clang-cl gets the GCC-style per-file flags. Wired in via [patch.crates-io] audiopus_sys = { path = "vendor/audiopus_sys" } at the workspace root.

This does not affect macOS or Linux builds — on those platforms MSVC=0 everywhere so the patched logic behaves identically to upstream.

Upstream tracking: xiph/opus#256, xiph/opus PR #257 (both stale).

Network Awareness (Android)

The adaptive quality controller (AdaptiveQualityController in wzp-proto) supports proactive network-aware adaptation via signal_network_change(NetworkContext). On Android, this is fed by NetworkMonitor.kt which wraps ConnectivityManager.NetworkCallback.

ConnectivityManager
       │ onCapabilitiesChanged / onLost
       ▼
NetworkMonitor.kt  ──classify──►  type: Int (WiFi=0, LTE=1, 5G=2, 3G=3)
       │ onNetworkChanged(type, bw)
       ▼
CallViewModel  ──►  WzpEngine.onNetworkChanged()
                        │ JNI
                        ▼
                    jni_bridge.rs
                        │
                        ▼
                    EngineState.pending_network_type  (AtomicU8, lock-free)
                        │ polled every ~20ms
                        ▼
                    recv task: quality_ctrl.signal_network_change(ctx)
                        │
                        ├─ WiFi → Cellular: preemptive 1-tier downgrade
                        ├─ Any change: 10s FEC boost (+0.2 ratio)
                        └─ Cellular: faster downgrade thresholds (2 vs 3)

Cellular generation is approximated from getLinkDownstreamBandwidthKbps() to avoid requiring READ_PHONE_STATE permission.

Audio Routing (Android)

Both Android app variants support 3-way audio routing: Earpiece → Speaker → Bluetooth SCO.

Audio Mode Lifecycle

MODE_IN_COMMUNICATION is set by the Rust call engine (via JNI AudioManager.setMode()) right before Oboe streams open — NOT at app launch. Restored to MODE_NORMAL when the call ends. This prevents hijacking system audio routing (music, BT A2DP) before a call is active.

Native Kotlin App

AudioRouteManager.kt handles device detection (via AudioDeviceCallback), SCO lifecycle, and auto-fallback on BT disconnect. CallViewModel.cycleAudioRoute() cycles through available routes.

Tauri Desktop App

android_audio.rs provides JNI bridges to AudioManager for speakerphone and Bluetooth SCO control. After each route change, Oboe streams are stopped and restarted via spawn_blocking.

User tap ──► cycleAudioRoute()
                │
                ├─ Earpiece: setSpeakerphoneOn(false) + clearCommunicationDevice()
                ├─ Speaker:  setSpeakerphoneOn(true)
                └─ BT SCO:   setCommunicationDevice(bt_device)  [API 31+]
                │              fallback: startBluetoothSco()     [API < 31]
                ▼
            Oboe stop + start_bt() for BT / start() for others

BT SCO and Oboe

BT SCO only supports 8/16kHz. When bt_active=1, Oboe capture skips setSampleRate(48000) and setInputPreset(VoiceCommunication), letting the system choose the native BT rate. Oboe's SampleRateConversionQuality::Best bridges to our 48kHz ring buffers. Playout uses Usage::Media in BT mode to avoid conflicts with the communication device routing.

Hangup Signal Fix

SignalMessage::Hangup now carries an optional call_id field. The relay uses it to end only the specific call instead of broadcasting to all active calls for the user — preventing a race where a hangup for call 1 kills a newly-placed call 2.

Phase 8: Tailscale-Inspired NAT Traversal (2026-04-14)

Five new modules in wzp-client bring NAT traversal capability close to Tailscale's approach:

┌──────────────────────────────────────────────────────────────────────┐
│  wzp-client NAT Traversal Stack                                      │
│                                                                      │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────────────────┐    │
│  │  stun.rs    │  │  portmap.rs  │  │  reflect.rs (existing)   │    │
│  │  RFC 5389   │  │  NAT-PMP     │  │  Relay-based STUN        │    │
│  │  Public     │  │  PCP         │  │  Multi-relay NAT detect  │    │
│  │  STUN       │  │  UPnP IGD   │  │                          │    │
│  └──────┬──────┘  └──────┬───────┘  └────────────┬─────────────┘    │
│         │                │                        │                  │
│         └────────────────┼────────────────────────┘                  │
│                          │                                           │
│                  ┌───────▼────────┐                                  │
│                  │  ice_agent.rs  │                                  │
│                  │  Gather / Re-  │                                  │
│                  │  gather / Apply│                                  │
│                  └───────┬────────┘                                  │
│                          │                                           │
│              ┌───────────┼───────────┐                               │
│              │           │           │                               │
│      ┌───────▼───┐  ┌───▼───┐  ┌───▼──────────┐                    │
│      │ netcheck  │  │ dual_ │  │ relay_map.rs │                    │
│      │ .rs       │  │ path  │  │ RTT-sorted   │                    │
│      │ Diagnostic│  │ .rs   │  │ relay list   │                    │
│      └───────────┘  │ Race  │  └──────────────┘                    │
│                     └───────┘                                       │
└──────────────────────────────────────────────────────────────────────┘

Candidate Types

Type Source Priority When Used
Host local_host_candidates() 1 (highest) Same-LAN peers
Port-mapped portmap::acquire_port_mapping() 2 Router supports NAT-PMP/PCP/UPnP
Server-reflexive stun::discover_reflexive() or relay Reflect 3 Cone NAT
Relay Relay address (fallback) 4 (lowest) Always available

Signal Flow for Mid-Call Re-Gathering

Network change (WiFi → cellular)
    │
    ▼
IceAgent::re_gather()
    ├── stun::discover_reflexive()
    ├── portmap::acquire_port_mapping()
    └── local_host_candidates()
    │
    ▼
SignalMessage::CandidateUpdate { generation: N+1, ... }
    │
    ▼ (via relay)
Peer's IceAgent::apply_peer_update()
    │
    ▼
PeerCandidates { reflexive, local, mapped }
    │
    ▼
dual_path::race() with new candidates (TODO: transport hot-swap)

New SignalMessage Variants & Fields

Signal New Fields Purpose
DirectCallOffer caller_mapped_addr Port-mapped address from NAT-PMP/PCP/UPnP
DirectCallAnswer callee_mapped_addr Same, callee side
CallSetup peer_mapped_addr Relay cross-wires mapped addr to peer
CandidateUpdate (new variant) Mid-call candidate re-gathering
RegisterPresenceAck relay_region, available_relays Relay mesh metadata for auto-selection

All new fields use #[serde(default, skip_serializing_if)] for backward compatibility with older clients/relays.

Hard NAT Port Prediction

For symmetric NATs that don't support port mapping, the system detects the NAT's port allocation pattern:

Single socket → 5 STUN servers (sequential probes)
    │
    ▼
Observed ports: [40001, 40002, 40003, 40004, 40005]
    │
    ▼
classify_port_allocation() → Sequential { delta: 1 }
    │
    ▼
predict_ports(last=40005, delta=1, offset=0, spread=2)
    → [40004, 40005, 40006, 40007, 40008]
    │
    ▼
HardNatProbe signal → peer
    │
    ▼
Peer dials predicted port range in parallel
Pattern Detection Traversal Strategy
Port-preserving All probes return same port Standard hole-punch
Sequential (delta=N) Consistent N-increment Predict next port, dial range
Random No pattern Birthday attack or relay
Unknown < 3 probes succeeded Relay fallback

The classifier tolerates:

  • Jitter: ±1 from dominant delta (concurrent flow grabbed a port)
  • Wraparound: 65535 → 1 treated as delta=+2, not -65534
  • Noise: 60% threshold — if most deltas agree, call it sequential