Files
wz-phone/docs/android/architecture.md
Claude 8d5f6fe044 feat: wire QUIC transport, JNI bridge, connect UI + add docs
- Replace raw FFI with proper `jni` crate for string marshalling
- Wire QUIC transport in engine: connect to relay, crypto handshake
  (CallOffer/CallAnswer, X25519+Ed25519), send/recv MediaPackets
- Feed received packets into jitter buffer (was previously ignored)
- Add connect screen UI with CALL button (idle state) and in-call
  controls (mute, speaker, hang up, live stats)
- Hardcode relay 172.16.81.125:4433, room "android"
- Add comprehensive docs in docs/android/:
  architecture.md (8 mermaid diagrams), build-guide.md,
  debugging.md, maintenance.md, roadmap.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 04:43:49 +00:00

401 lines
14 KiB
Markdown

# Architecture
## System Overview
The Android client is a four-layer stack: Kotlin UI, JNI bridge, Rust engine, and C++ audio I/O. Each layer communicates through well-defined interfaces with minimal coupling.
```mermaid
graph TB
subgraph "Kotlin (Main Thread)"
CA[CallActivity]
VM[CallViewModel]
UI[InCallScreen<br/>Compose UI]
CA --> VM
VM --> UI
end
subgraph "JNI Bridge"
JB[jni_bridge.rs<br/>panic-safe FFI]
end
subgraph "Rust Engine"
ENG[WzpEngine<br/>Orchestrator]
CT[Codec Thread<br/>20ms real-time loop]
NET[Tokio Runtime<br/>2 async workers]
PIPE[Pipeline<br/>Encode/Decode/FEC/Jitter]
end
subgraph "C++ Audio"
OBOE[Oboe Bridge<br/>Capture + Playout callbacks]
RB[Ring Buffers<br/>Lock-free SPSC]
end
subgraph "Network"
QUIC[QUIC Connection<br/>quinn]
RELAY[WZP Relay<br/>SFU Room]
end
VM <-->|"JNI calls<br/>+ JSON stats"| JB
JB <--> ENG
ENG --> CT
ENG --> NET
CT <--> PIPE
CT <-->|"Atomic R/W"| RB
OBOE <-->|"Atomic R/W"| RB
CT <-->|"mpsc channels"| NET
NET <-->|"QUIC datagrams<br/>+ streams"| QUIC
QUIC <--> RELAY
```
## Thread Model
The engine uses four distinct thread contexts, each with specific responsibilities and real-time constraints.
```mermaid
graph LR
subgraph "Android Main Thread"
UI_T["UI + JNI calls<br/>startCall / stopCall / getStats"]
end
subgraph "Oboe Audio Thread (system)"
AUD["Capture callback: mic → ring buf<br/>Playout callback: ring buf → speaker<br/>⚡ Highest priority, no allocations"]
end
subgraph "Codec Thread (wzp-codec)"
COD["20ms loop:<br/>1. Read capture ring buf<br/>2. AEC → AGC → Encode<br/>3. Send to network channel<br/>4. Recv from network channel<br/>5. FEC → Jitter → Decode<br/>6. Write playout ring buf<br/>⚡ Pinned to big core, RT priority"]
end
subgraph "Tokio Runtime (2 workers)"
NET_S["Send task:<br/>Channel → MediaPacket → QUIC datagram"]
NET_R["Recv task:<br/>QUIC datagram → MediaPacket → Channel"]
HS["Handshake:<br/>CallOffer → CallAnswer"]
end
UI_T -->|"mpsc command channel"| COD
COD -->|"tokio::mpsc send_tx"| NET_S
NET_R -->|"tokio::mpsc recv_tx"| COD
AUD <-->|"Atomic ring buffers"| COD
```
### Thread Priorities and Constraints
| Thread | Priority | Allocations | Blocking | Lock-free |
|--------|----------|-------------|----------|-----------|
| Oboe audio | SCHED_FIFO (system) | None | Never | Yes |
| Codec | RT priority, big core | Pre-allocated buffers | sleep(remainder of 20ms) | Ring buf: yes, Stats: Mutex |
| Tokio workers | Normal | Allowed | Async only | N/A |
| Main/JNI | Normal | Allowed | Allowed | N/A |
## Call Lifecycle
```mermaid
sequenceDiagram
participant User
participant UI as InCallScreen
participant VM as CallViewModel
participant ENG as WzpEngine (JNI)
participant NET as Tokio Network
participant RELAY as WZP Relay
User->>UI: Tap CALL
UI->>VM: startCall()
VM->>ENG: init() + startCall(relay, room)
ENG->>ENG: Create tokio runtime
ENG->>NET: Spawn network task
NET->>RELAY: QUIC connect (SNI = room name)
RELAY-->>NET: Connection established
Note over NET,RELAY: Crypto Handshake
NET->>RELAY: CallOffer {identity_pub, ephemeral_pub, signature, profiles}
RELAY-->>NET: CallAnswer {ephemeral_pub, chosen_profile, signature}
NET->>NET: Derive ChaCha20-Poly1305 session
ENG->>ENG: Spawn codec thread
Note over ENG: State → Active
loop Every 20ms
ENG->>ENG: Read mic → AEC → AGC → Encode
ENG->>NET: Encoded frame via channel
NET->>RELAY: MediaPacket via QUIC DATAGRAM
RELAY->>NET: MediaPacket from other peer
NET->>ENG: MediaPacket via channel
ENG->>ENG: FEC → Jitter → Decode → Speaker
end
User->>UI: Tap END
UI->>VM: stopCall()
VM->>ENG: stopCall()
ENG->>ENG: Set running=false, send Stop command
ENG->>ENG: Join codec thread
ENG->>NET: Drop tokio runtime
NET->>RELAY: Connection close
```
## Audio Pipeline Detail
```mermaid
graph LR
subgraph "Capture Path"
MIC[Microphone] -->|"48kHz i16"| OBOE_C[Oboe Capture<br/>Callback]
OBOE_C -->|"ring_write()"| RB_C[Capture<br/>Ring Buffer]
RB_C -->|"read_capture()"| AEC[Echo<br/>Canceller]
AEC --> AGC[Auto Gain<br/>Control]
AGC --> ENC[AdaptiveEncoder<br/>Opus 24k]
ENC -->|"Vec u8"| FEC_E[RaptorQ<br/>FEC Encoder]
FEC_E -->|"send_tx"| CHAN_S[Send Channel]
end
subgraph "Network"
CHAN_S --> PKT_S[MediaPacket<br/>Header + Payload]
PKT_S -->|"QUIC DATAGRAM"| RELAY[Relay SFU]
RELAY -->|"QUIC DATAGRAM"| PKT_R[MediaPacket<br/>Deserialize]
PKT_R -->|"recv_tx"| CHAN_R[Recv Channel]
end
subgraph "Playout Path"
CHAN_R --> FEC_D[RaptorQ<br/>FEC Decoder]
FEC_D --> JB[Jitter Buffer<br/>10-250 pkts]
JB --> DEC[AdaptiveDecoder<br/>Opus 24k]
DEC -->|"48kHz i16"| AEC_REF[AEC Far-End<br/>Reference]
DEC -->|"write_playout()"| RB_P[Playout<br/>Ring Buffer]
RB_P -->|"ring_read()"| OBOE_P[Oboe Playout<br/>Callback]
OBOE_P --> SPK[Speaker]
end
```
### Audio Parameters
| Parameter | Value | Notes |
|-----------|-------|-------|
| Sample rate | 48,000 Hz | Opus native rate |
| Channels | 1 (mono) | VoIP only |
| Frame size | 960 samples | 20ms at 48kHz |
| Ring buffer | 7,680 samples | 160ms (8 frames) |
| Bit depth | 16-bit signed int | PCM format |
| AEC tail | 100ms | Echo canceller filter length |
## Crypto Handshake
```mermaid
sequenceDiagram
participant Client as Android Client
participant Relay as WZP Relay
Note over Client: Identity seed (32 bytes, random per launch)
Note over Client: HKDF → Ed25519 signing key + X25519 static key
Client->>Client: Generate ephemeral X25519 keypair
Client->>Client: Sign(ephemeral_pub || "call-offer") with Ed25519
Client->>Relay: SignalMessage::CallOffer<br/>{identity_pub, ephemeral_pub, signature, [GOOD, DEGRADED, CATASTROPHIC]}
Relay->>Relay: Verify Ed25519 signature
Relay->>Relay: Generate own ephemeral X25519
Relay->>Relay: Sign(ephemeral_pub || "call-answer")
Relay->>Relay: DH(relay_ephemeral, client_ephemeral) → shared secret
Relay->>Relay: HKDF(shared_secret) → ChaCha20-Poly1305 key
Relay->>Client: SignalMessage::CallAnswer<br/>{identity_pub, ephemeral_pub, signature, chosen_profile=GOOD}
Client->>Client: Verify relay signature
Client->>Client: DH(client_ephemeral, relay_ephemeral) → same shared secret
Client->>Client: HKDF(shared_secret) → same ChaCha20-Poly1305 key
Note over Client,Relay: Both sides now have identical session key
Note over Client,Relay: Media packets can be encrypted (not yet applied)
```
### Key Derivation Chain
```
Identity Seed (32 bytes, random)
├── HKDF(seed, info="warzone-ed25519") → Ed25519 signing key
│ └── Public key = identity_pub (32 bytes)
│ └── SHA-256(identity_pub)[:16] = fingerprint (16 bytes)
└── HKDF(seed, info="warzone-x25519") → X25519 static key (unused currently)
Per-Call Ephemeral:
Random X25519 keypair → ephemeral_pub (sent in CallOffer)
Session Key:
DH(our_ephemeral_secret, peer_ephemeral_pub) → shared_secret
HKDF(shared_secret, info="warzone-session-key") → ChaCha20-Poly1305 key (32 bytes)
```
## QUIC Transport
```mermaid
graph TB
subgraph "QUIC Connection"
EP[Client Endpoint<br/>0.0.0.0:0 UDP]
CONN[Connection to Relay<br/>SNI = room name]
subgraph "Unreliable Channel"
DG_S[Send DATAGRAM<br/>MediaPacket serialized]
DG_R[Recv DATAGRAM<br/>MediaPacket deserialized]
end
subgraph "Reliable Channel"
ST_S[Open bidi stream<br/>JSON length-prefixed<br/>SignalMessage]
ST_R[Accept bidi stream<br/>JSON length-prefixed<br/>SignalMessage]
end
EP --> CONN
CONN --> DG_S
CONN --> DG_R
CONN --> ST_S
CONN --> ST_R
end
```
### QUIC Configuration (VoIP-tuned)
| Setting | Value | Rationale |
|---------|-------|-----------|
| ALPN | `wzp` | Protocol identification |
| Idle timeout | 30s | Keep connection alive during silence |
| Keep-alive | 5s | Prevent NAT timeout |
| Datagram receive buffer | 65 KB | Buffer for burst arrivals |
| Flow control (recv) | 256 KB | Conservative for VoIP |
| Flow control (send) | 128 KB | Prevent bufferbloat |
| TLS | Self-signed certs | Development mode |
| Certificate verification | Disabled | Client accepts any cert |
## MediaPacket Wire Format
```
12-byte header:
┌─────────────────────────────────────────────────┐
│ Byte 0: V(1) T(1) CodecID(4) Q(1) FecHi(1) │
│ Byte 1: FecLo(6) unused(2) │
│ Byte 2-3: Sequence number (u16 BE) │
│ Byte 4-7: Timestamp ms (u32 BE) │
│ Byte 8: FEC block ID │
│ Byte 9: FEC symbol index │
│ Byte 10: Reserved │
│ Byte 11: CSRC count │
├─────────────────────────────────────────────────┤
│ Payload: Opus-encoded audio frame │
├─────────────────────────────────────────────────┤
│ Optional: QualityReport (4 bytes, if Q=1) │
│ loss_pct(u8) rtt_4ms(u8) jitter_ms(u8) │
│ bitrate_cap_kbps(u8) │
└─────────────────────────────────────────────────┘
```
## Relay Room Mode (SFU)
```mermaid
graph LR
subgraph "Room: android"
P1[Phone A<br/>QUIC conn] -->|MediaPacket| RELAY[Relay SFU]
RELAY -->|MediaPacket| P2[Phone B<br/>QUIC conn]
P2 -->|MediaPacket| RELAY
RELAY -->|MediaPacket| P1
end
Note1["Room name from QUIC TLS SNI<br/>No auth required<br/>Packets forwarded to all others"]
```
The relay operates as a Selective Forwarding Unit:
1. Client connects via QUIC, room name extracted from TLS SNI
2. Crypto handshake completes (relay has its own ephemeral identity)
3. Client joins named room
4. All received media packets are forwarded to every other participant in the room
5. Signaling messages are not forwarded (point-to-point with relay)
## Adaptive Quality System
```mermaid
graph TD
QR[QualityReport<br/>loss%, RTT, jitter] --> AQC[AdaptiveQualityController]
AQC -->|"loss<10%, RTT<400ms"| GOOD[GOOD<br/>Opus 24kbps<br/>FEC 20%<br/>20ms frames]
AQC -->|"loss 10-40%<br/>RTT 400-600ms"| DEG[DEGRADED<br/>Opus 6kbps<br/>FEC 50%<br/>40ms frames]
AQC -->|"loss>40%<br/>RTT>600ms"| CAT[CATASTROPHIC<br/>Codec2 1.2kbps<br/>FEC 100%<br/>40ms frames]
GOOD -->|"Hysteresis:<br/>sustained degradation"| DEG
DEG -->|"Sustained improvement"| GOOD
DEG -->|"Further degradation"| CAT
CAT -->|"Improvement"| DEG
```
| Profile | Codec | Bitrate | FEC Ratio | Frame Size | FEC Block |
|---------|-------|---------|-----------|------------|-----------|
| GOOD | Opus 24k | 24 kbps | 20% | 20ms | 5 frames |
| DEGRADED | Opus 6k | 6 kbps | 50% | 40ms | 10 frames |
| CATASTROPHIC | Codec2 1.2k | 1.2 kbps | 100% | 40ms | 8 frames |
## Module Dependency Graph
```mermaid
graph BT
PROTO[wzp-proto<br/>Types, traits, jitter,<br/>quality, session]
CODEC[wzp-codec<br/>Opus, Codec2, AEC,<br/>AGC, resampling]
FEC[wzp-fec<br/>RaptorQ fountain codes]
CRYPTO[wzp-crypto<br/>Ed25519, X25519,<br/>ChaCha20-Poly1305]
TRANSPORT[wzp-transport<br/>QUIC, datagrams,<br/>signaling streams]
ANDROID[wzp-android<br/>Engine, JNI bridge,<br/>Oboe audio, pipeline]
RELAY[wzp-relay<br/>SFU, rooms, auth,<br/>metrics, probes]
CODEC --> PROTO
FEC --> PROTO
CRYPTO --> PROTO
TRANSPORT --> PROTO
ANDROID --> PROTO
ANDROID --> CODEC
ANDROID --> FEC
ANDROID --> CRYPTO
ANDROID --> TRANSPORT
RELAY --> PROTO
RELAY --> CRYPTO
RELAY --> TRANSPORT
```
## File Map
### Kotlin (`android/app/src/main/java/com/wzp/`)
| File | Purpose |
|------|---------|
| `WzpApplication.kt` | App entry, notification channel creation |
| `engine/WzpEngine.kt` | JNI wrapper for native engine |
| `engine/WzpCallback.kt` | Callback interface for engine events |
| `engine/CallStats.kt` | Stats data class with JSON deserialization |
| `ui/call/CallActivity.kt` | Activity host, permissions, theme |
| `ui/call/CallViewModel.kt` | MVVM state holder, stats polling |
| `ui/call/InCallScreen.kt` | Compose UI (idle + in-call states) |
| `service/CallService.kt` | Foreground service, wake/wifi locks |
| `audio/AudioRouteManager.kt` | Speaker/earpiece/Bluetooth routing |
### Rust (`crates/wzp-android/src/`)
| File | Purpose |
|------|---------|
| `lib.rs` | Module declarations |
| `jni_bridge.rs` | JNI FFI (panic-safe, proper jni crate) |
| `engine.rs` | Call orchestrator (threads, channels, lifecycle) |
| `pipeline.rs` | Codec pipeline (AEC, AGC, encode, FEC, jitter, decode) |
| `audio_android.rs` | Oboe backend, SPSC ring buffers, RT scheduling |
| `commands.rs` | Engine command enum |
| `stats.rs` | CallState/CallStats types (serde) |
### C++ (`crates/wzp-android/cpp/`)
| File | Purpose |
|------|---------|
| `oboe_bridge.h` | FFI header for Rust-C++ audio interface |
| `oboe_bridge.cpp` | Oboe capture/playout callbacks, ring buffer I/O |
| `oboe_stub.cpp` | No-op stub for non-Android builds |
### Build
| File | Purpose |
|------|---------|
| `android/app/build.gradle.kts` | Android build config, cargo-ndk task |
| `crates/wzp-android/Cargo.toml` | Rust dependencies (cdylib output) |
| `crates/wzp-android/build.rs` | C++ compilation, Oboe fetch |