docs: protocol audit 2026-05-25, update architecture + Obsidian vault

Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-05-25 06:00:17 +04:00
parent 12b0d9738f
commit ed8a7ae5aa
120 changed files with 22781 additions and 65 deletions

View File

@@ -0,0 +1,405 @@
---
tags: [android, wzp]
type: reference
---
# Architecture
## System Overview
The Android client is a four-layer stack: Kotlin UI, JNI bridge, Rust engine, and C++ audio I/O. Each layer communicates through well-defined interfaces with minimal coupling.
```mermaid
graph TB
subgraph "Kotlin (Main Thread)"
CA[CallActivity]
VM[CallViewModel]
UI[InCallScreen<br/>Compose UI]
CA --> VM
VM --> UI
end
subgraph "JNI Bridge"
JB[jni_bridge.rs<br/>panic-safe FFI]
end
subgraph "Rust Engine"
ENG[WzpEngine<br/>Orchestrator]
CT[Codec Thread<br/>20ms real-time loop]
NET[Tokio Runtime<br/>2 async workers]
PIPE[Pipeline<br/>Encode/Decode/FEC/Jitter]
end
subgraph "C++ Audio"
OBOE[Oboe Bridge<br/>Capture + Playout callbacks]
RB[Ring Buffers<br/>Lock-free SPSC]
end
subgraph "Network"
QUIC[QUIC Connection<br/>quinn]
RELAY[WZP Relay<br/>SFU Room]
end
VM <-->|"JNI calls<br/>+ JSON stats"| JB
JB <--> ENG
ENG --> CT
ENG --> NET
CT <--> PIPE
CT <-->|"Atomic R/W"| RB
OBOE <-->|"Atomic R/W"| RB
CT <-->|"mpsc channels"| NET
NET <-->|"QUIC datagrams<br/>+ streams"| QUIC
QUIC <--> RELAY
```
## Thread Model
The engine uses four distinct thread contexts, each with specific responsibilities and real-time constraints.
```mermaid
graph LR
subgraph "Android Main Thread"
UI_T["UI + JNI calls<br/>startCall / stopCall / getStats"]
end
subgraph "Oboe Audio Thread (system)"
AUD["Capture callback: mic → ring buf<br/>Playout callback: ring buf → speaker<br/>⚡ Highest priority, no allocations"]
end
subgraph "Codec Thread (wzp-codec)"
COD["20ms loop:<br/>1. Read capture ring buf<br/>2. AEC → AGC → Encode<br/>3. Send to network channel<br/>4. Recv from network channel<br/>5. FEC → Jitter → Decode<br/>6. Write playout ring buf<br/>⚡ Pinned to big core, RT priority"]
end
subgraph "Tokio Runtime (2 workers)"
NET_S["Send task:<br/>Channel → MediaPacket → QUIC datagram"]
NET_R["Recv task:<br/>QUIC datagram → MediaPacket → Channel"]
HS["Handshake:<br/>CallOffer → CallAnswer"]
end
UI_T -->|"mpsc command channel"| COD
COD -->|"tokio::mpsc send_tx"| NET_S
NET_R -->|"tokio::mpsc recv_tx"| COD
AUD <-->|"Atomic ring buffers"| COD
```
### Thread Priorities and Constraints
| Thread | Priority | Allocations | Blocking | Lock-free |
|--------|----------|-------------|----------|-----------|
| Oboe audio | SCHED_FIFO (system) | None | Never | Yes |
| Codec | RT priority, big core | Pre-allocated buffers | sleep(remainder of 20ms) | Ring buf: yes, Stats: Mutex |
| Tokio workers | Normal | Allowed | Async only | N/A |
| Main/JNI | Normal | Allowed | Allowed | N/A |
## Call Lifecycle
```mermaid
sequenceDiagram
participant User
participant UI as InCallScreen
participant VM as CallViewModel
participant ENG as WzpEngine (JNI)
participant NET as Tokio Network
participant RELAY as WZP Relay
User->>UI: Tap CALL
UI->>VM: startCall()
VM->>ENG: init() + startCall(relay, room)
ENG->>ENG: Create tokio runtime
ENG->>NET: Spawn network task
NET->>RELAY: QUIC connect (SNI = room name)
RELAY-->>NET: Connection established
Note over NET,RELAY: Crypto Handshake
NET->>RELAY: CallOffer {identity_pub, ephemeral_pub, signature, profiles}
RELAY-->>NET: CallAnswer {ephemeral_pub, chosen_profile, signature}
NET->>NET: Derive ChaCha20-Poly1305 session
ENG->>ENG: Spawn codec thread
Note over ENG: State → Active
loop Every 20ms
ENG->>ENG: Read mic → AEC → AGC → Encode
ENG->>NET: Encoded frame via channel
NET->>RELAY: MediaPacket via QUIC DATAGRAM
RELAY->>NET: MediaPacket from other peer
NET->>ENG: MediaPacket via channel
ENG->>ENG: FEC → Jitter → Decode → Speaker
end
User->>UI: Tap END
UI->>VM: stopCall()
VM->>ENG: stopCall()
ENG->>ENG: Set running=false, send Stop command
ENG->>ENG: Join codec thread
ENG->>NET: Drop tokio runtime
NET->>RELAY: Connection close
```
## Audio Pipeline Detail
```mermaid
graph LR
subgraph "Capture Path"
MIC[Microphone] -->|"48kHz i16"| OBOE_C[Oboe Capture<br/>Callback]
OBOE_C -->|"ring_write()"| RB_C[Capture<br/>Ring Buffer]
RB_C -->|"read_capture()"| AEC[Echo<br/>Canceller]
AEC --> AGC[Auto Gain<br/>Control]
AGC --> ENC[AdaptiveEncoder<br/>Opus 24k]
ENC -->|"Vec u8"| FEC_E[RaptorQ<br/>FEC Encoder]
FEC_E -->|"send_tx"| CHAN_S[Send Channel]
end
subgraph "Network"
CHAN_S --> PKT_S[MediaPacket<br/>Header + Payload]
PKT_S -->|"QUIC DATAGRAM"| RELAY[Relay SFU]
RELAY -->|"QUIC DATAGRAM"| PKT_R[MediaPacket<br/>Deserialize]
PKT_R -->|"recv_tx"| CHAN_R[Recv Channel]
end
subgraph "Playout Path"
CHAN_R --> FEC_D[RaptorQ<br/>FEC Decoder]
FEC_D --> JB[Jitter Buffer<br/>10-250 pkts]
JB --> DEC[AdaptiveDecoder<br/>Opus 24k]
DEC -->|"48kHz i16"| AEC_REF[AEC Far-End<br/>Reference]
DEC -->|"write_playout()"| RB_P[Playout<br/>Ring Buffer]
RB_P -->|"ring_read()"| OBOE_P[Oboe Playout<br/>Callback]
OBOE_P --> SPK[Speaker]
end
```
### Audio Parameters
| Parameter | Value | Notes |
|-----------|-------|-------|
| Sample rate | 48,000 Hz | Opus native rate |
| Channels | 1 (mono) | VoIP only |
| Frame size | 960 samples | 20ms at 48kHz |
| Ring buffer | 7,680 samples | 160ms (8 frames) |
| Bit depth | 16-bit signed int | PCM format |
| AEC tail | 100ms | Echo canceller filter length |
## Crypto Handshake
```mermaid
sequenceDiagram
participant Client as Android Client
participant Relay as WZP Relay
Note over Client: Identity seed (32 bytes, random per launch)
Note over Client: HKDF → Ed25519 signing key + X25519 static key
Client->>Client: Generate ephemeral X25519 keypair
Client->>Client: Sign(ephemeral_pub || "call-offer") with Ed25519
Client->>Relay: SignalMessage::CallOffer<br/>{identity_pub, ephemeral_pub, signature, [GOOD, DEGRADED, CATASTROPHIC]}
Relay->>Relay: Verify Ed25519 signature
Relay->>Relay: Generate own ephemeral X25519
Relay->>Relay: Sign(ephemeral_pub || "call-answer")
Relay->>Relay: DH(relay_ephemeral, client_ephemeral) → shared secret
Relay->>Relay: HKDF(shared_secret) → ChaCha20-Poly1305 key
Relay->>Client: SignalMessage::CallAnswer<br/>{identity_pub, ephemeral_pub, signature, chosen_profile=GOOD}
Client->>Client: Verify relay signature
Client->>Client: DH(client_ephemeral, relay_ephemeral) → same shared secret
Client->>Client: HKDF(shared_secret) → same ChaCha20-Poly1305 key
Note over Client,Relay: Both sides now have identical session key
Note over Client,Relay: Media packets can be encrypted (not yet applied)
```
### Key Derivation Chain
```
Identity Seed (32 bytes, random)
├── HKDF(seed, info="warzone-ed25519") → Ed25519 signing key
│ └── Public key = identity_pub (32 bytes)
│ └── SHA-256(identity_pub)[:16] = fingerprint (16 bytes)
└── HKDF(seed, info="warzone-x25519") → X25519 static key (unused currently)
Per-Call Ephemeral:
Random X25519 keypair → ephemeral_pub (sent in CallOffer)
Session Key:
DH(our_ephemeral_secret, peer_ephemeral_pub) → shared_secret
HKDF(shared_secret, info="warzone-session-key") → ChaCha20-Poly1305 key (32 bytes)
```
## QUIC Transport
```mermaid
graph TB
subgraph "QUIC Connection"
EP[Client Endpoint<br/>0.0.0.0:0 UDP]
CONN[Connection to Relay<br/>SNI = room name]
subgraph "Unreliable Channel"
DG_S[Send DATAGRAM<br/>MediaPacket serialized]
DG_R[Recv DATAGRAM<br/>MediaPacket deserialized]
end
subgraph "Reliable Channel"
ST_S[Open bidi stream<br/>JSON length-prefixed<br/>SignalMessage]
ST_R[Accept bidi stream<br/>JSON length-prefixed<br/>SignalMessage]
end
EP --> CONN
CONN --> DG_S
CONN --> DG_R
CONN --> ST_S
CONN --> ST_R
end
```
### QUIC Configuration (VoIP-tuned)
| Setting | Value | Rationale |
|---------|-------|-----------|
| ALPN | `wzp` | Protocol identification |
| Idle timeout | 30s | Keep connection alive during silence |
| Keep-alive | 5s | Prevent NAT timeout |
| Datagram receive buffer | 65 KB | Buffer for burst arrivals |
| Flow control (recv) | 256 KB | Conservative for VoIP |
| Flow control (send) | 128 KB | Prevent bufferbloat |
| TLS | Self-signed certs | Development mode |
| Certificate verification | Disabled | Client accepts any cert |
## MediaPacket Wire Format
```
12-byte header:
┌─────────────────────────────────────────────────┐
│ Byte 0: V(1) T(1) CodecID(4) Q(1) FecHi(1) │
│ Byte 1: FecLo(6) unused(2) │
│ Byte 2-3: Sequence number (u16 BE) │
│ Byte 4-7: Timestamp ms (u32 BE) │
│ Byte 8: FEC block ID │
│ Byte 9: FEC symbol index │
│ Byte 10: Reserved │
│ Byte 11: CSRC count │
├─────────────────────────────────────────────────┤
│ Payload: Opus-encoded audio frame │
├─────────────────────────────────────────────────┤
│ Optional: QualityReport (4 bytes, if Q=1) │
│ loss_pct(u8) rtt_4ms(u8) jitter_ms(u8) │
│ bitrate_cap_kbps(u8) │
└─────────────────────────────────────────────────┘
```
## Relay Room Mode (SFU)
```mermaid
graph LR
subgraph "Room: android"
P1[Phone A<br/>QUIC conn] -->|MediaPacket| RELAY[Relay SFU]
RELAY -->|MediaPacket| P2[Phone B<br/>QUIC conn]
P2 -->|MediaPacket| RELAY
RELAY -->|MediaPacket| P1
end
Note1["Room name from QUIC TLS SNI<br/>No auth required<br/>Packets forwarded to all others"]
```
The relay operates as a Selective Forwarding Unit:
1. Client connects via QUIC, room name extracted from TLS SNI
2. Crypto handshake completes (relay has its own ephemeral identity)
3. Client joins named room
4. All received media packets are forwarded to every other participant in the room
5. Signaling messages are not forwarded (point-to-point with relay)
## Adaptive Quality System
```mermaid
graph TD
QR[QualityReport<br/>loss%, RTT, jitter] --> AQC[AdaptiveQualityController]
AQC -->|"loss<10%, RTT<400ms"| GOOD[GOOD<br/>Opus 24kbps<br/>FEC 20%<br/>20ms frames]
AQC -->|"loss 10-40%<br/>RTT 400-600ms"| DEG[DEGRADED<br/>Opus 6kbps<br/>FEC 50%<br/>40ms frames]
AQC -->|"loss>40%<br/>RTT>600ms"| CAT[CATASTROPHIC<br/>Codec2 1.2kbps<br/>FEC 100%<br/>40ms frames]
GOOD -->|"Hysteresis:<br/>sustained degradation"| DEG
DEG -->|"Sustained improvement"| GOOD
DEG -->|"Further degradation"| CAT
CAT -->|"Improvement"| DEG
```
| Profile | Codec | Bitrate | FEC Ratio | Frame Size | FEC Block |
|---------|-------|---------|-----------|------------|-----------|
| GOOD | Opus 24k | 24 kbps | 20% | 20ms | 5 frames |
| DEGRADED | Opus 6k | 6 kbps | 50% | 40ms | 10 frames |
| CATASTROPHIC | Codec2 1.2k | 1.2 kbps | 100% | 40ms | 8 frames |
## Module Dependency Graph
```mermaid
graph BT
PROTO[wzp-proto<br/>Types, traits, jitter,<br/>quality, session]
CODEC[wzp-codec<br/>Opus, Codec2, AEC,<br/>AGC, resampling]
FEC[wzp-fec<br/>RaptorQ fountain codes]
CRYPTO[wzp-crypto<br/>Ed25519, X25519,<br/>ChaCha20-Poly1305]
TRANSPORT[wzp-transport<br/>QUIC, datagrams,<br/>signaling streams]
ANDROID[wzp-android<br/>Engine, JNI bridge,<br/>Oboe audio, pipeline]
RELAY[wzp-relay<br/>SFU, rooms, auth,<br/>metrics, probes]
CODEC --> PROTO
FEC --> PROTO
CRYPTO --> PROTO
TRANSPORT --> PROTO
ANDROID --> PROTO
ANDROID --> CODEC
ANDROID --> FEC
ANDROID --> CRYPTO
ANDROID --> TRANSPORT
RELAY --> PROTO
RELAY --> CRYPTO
RELAY --> TRANSPORT
```
## File Map
### Kotlin (`android/app/src/main/java/com/wzp/`)
| File | Purpose |
|------|---------|
| `WzpApplication.kt` | App entry, notification channel creation |
| `engine/WzpEngine.kt` | JNI wrapper for native engine |
| `engine/WzpCallback.kt` | Callback interface for engine events |
| `engine/CallStats.kt` | Stats data class with JSON deserialization |
| `ui/call/CallActivity.kt` | Activity host, permissions, theme |
| `ui/call/CallViewModel.kt` | MVVM state holder, stats polling |
| `ui/call/InCallScreen.kt` | Compose UI (idle + in-call states) |
| `service/CallService.kt` | Foreground service, wake/wifi locks |
| `audio/AudioRouteManager.kt` | Speaker/earpiece/Bluetooth routing |
### Rust (`crates/wzp-android/src/`)
| File | Purpose |
|------|---------|
| `lib.rs` | Module declarations |
| `jni_bridge.rs` | JNI FFI (panic-safe, proper jni crate) |
| `engine.rs` | Call orchestrator (threads, channels, lifecycle) |
| `pipeline.rs` | Codec pipeline (AEC, AGC, encode, FEC, jitter, decode) |
| `audio_android.rs` | Oboe backend, SPSC ring buffers, RT scheduling |
| `commands.rs` | Engine command enum |
| `stats.rs` | CallState/CallStats types (serde) |
### C++ (`crates/wzp-android/cpp/`)
| File | Purpose |
|------|---------|
| `oboe_bridge.h` | FFI header for Rust-C++ audio interface |
| `oboe_bridge.cpp` | Oboe capture/playout callbacks, ring buffer I/O |
| `oboe_stub.cpp` | No-op stub for non-Android builds |
### Build
| File | Purpose |
|------|---------|
| `android/app/build.gradle.kts` | Android build config, cargo-ndk task |
| `crates/wzp-android/Cargo.toml` | Rust dependencies (cdylib output) |
| `crates/wzp-android/build.rs` | C++ compilation, Oboe fetch |

View File

@@ -0,0 +1,160 @@
---
tags: [android, wzp]
type: reference
---
# Build Guide
## Prerequisites
| Tool | Version | Purpose |
|------|---------|---------|
| JDK | 17 | Android Gradle builds |
| Android SDK | 34 | Compile SDK |
| Android NDK | 26.1.10909125 | Native C++/Rust compilation |
| Rust | 1.85+ | Native engine (edition 2024) |
| cargo-ndk | latest | Cross-compile Rust → Android |
| `aarch64-linux-android` target | - | Rust target for ARM64 |
### Install Rust Android target
```bash
rustup target add aarch64-linux-android
cargo install cargo-ndk
```
### Environment Variables
```bash
export JAVA_HOME="/usr/lib/jvm/java-17-openjdk-amd64"
export ANDROID_HOME="$HOME/android-sdk"
export ANDROID_NDK_HOME="$ANDROID_HOME/ndk/26.1.10909125"
# For manual cargo-ndk builds (Gradle sets these automatically):
export CC_aarch64_linux_android="$ANDROID_NDK_HOME/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android21-clang"
export CXX_aarch64_linux_android="$ANDROID_NDK_HOME/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android21-clang++"
export AR_aarch64_linux_android="$ANDROID_NDK_HOME/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-ar"
```
## Build Commands
### Full Build (Gradle drives everything)
```bash
cd android
./gradlew assembleRelease
```
This runs:
1. `cargoNdkBuild` task: invokes `cargo ndk -t arm64-v8a -o app/src/main/jniLibs build --release -p wzp-android`
2. Compiles Kotlin/Compose code
3. Packages APK with signing
### Native Library Only
```bash
cargo ndk -t arm64-v8a -o android/app/src/main/jniLibs build --release -p wzp-android
```
Output: `android/app/src/main/jniLibs/arm64-v8a/libwzp_android.so`
### Skip Native Rebuild
If the `.so` hasn't changed:
```bash
cd android
./gradlew assembleRelease -x cargoNdkBuild
```
### Debug Build
```bash
cd android
./gradlew assembleDebug
```
Debug APK is ~8.9 MB (unstripped `.so`), release is ~6.9 MB.
## Signing
### Debug
```
Keystore: android/keystore/wzp-debug.jks
Password: android
Key alias: wzp-debug
```
### Release
```
Keystore: android/keystore/wzp-release.jks
Password: wzphone2024
Key alias: wzp-release
```
Both keystores are checked into the repo for development convenience. For production, replace with proper key management.
## Build Artifacts
| Artifact | Path | Size |
|----------|------|------|
| Debug APK | `android/app/build/outputs/apk/debug/app-debug.apk` | ~8.9 MB |
| Release APK | `android/app/build/outputs/apk/release/app-release.apk` | ~6.9 MB |
| Native lib | `android/app/src/main/jniLibs/arm64-v8a/libwzp_android.so` | ~5 MB |
## ABI Support
Currently only `arm64-v8a` (ARM64) is built. This covers 95%+ of modern Android devices.
To add more ABIs, edit `build.gradle.kts`:
```kotlin
ndk { abiFilters += listOf("arm64-v8a", "armeabi-v7a") }
```
And update the cargo-ndk command in `cargoNdkBuild` task:
```kotlin
commandLine("cargo", "ndk", "-t", "arm64-v8a", "-t", "armeabi-v7a", ...)
```
## Oboe Dependency
The Oboe C++ audio library is fetched at build time by `build.rs`:
1. Attempts `git clone` of Oboe 1.8.1 into `$OUT_DIR/oboe`
2. If successful, compiles `oboe_bridge.cpp` with Oboe headers
3. If clone fails (no network), falls back to `oboe_stub.cpp` (no-op audio)
This means **first build requires internet** to fetch Oboe. Subsequent builds use the cached checkout.
## Common Build Issues
### `cargo ndk` not found
```bash
cargo install cargo-ndk
```
### Missing Android target
```bash
rustup target add aarch64-linux-android
```
### NDK not found
Ensure `ANDROID_NDK_HOME` points to the NDK directory containing `toolchains/llvm/`.
### C++ compilation errors
Check that `CXX_aarch64_linux_android` points to a valid clang++ from the NDK.
### Gradle daemon issues
```bash
./gradlew --stop
./gradlew assembleRelease --no-daemon
```

219
vault/Android/Debugging.md Normal file
View File

@@ -0,0 +1,219 @@
---
tags: [android, wzp]
type: reference
---
# Debugging Guide
## Crash on Launch
### Symptom: App crashes immediately after opening
**Most likely cause: Namespace mismatch in AndroidManifest.xml**
The Gradle namespace is `com.wzp.phone` but all Kotlin classes are in package `com.wzp.*`. If the manifest uses shorthand names (`.WzpApplication`, `.ui.call.CallActivity`), Android resolves them as `com.wzp.phone.WzpApplication` which doesn't exist.
**Fix**: Always use fully-qualified class names in the manifest:
```xml
<!-- WRONG -->
<application android:name=".WzpApplication">
<activity android:name=".ui.call.CallActivity">
<!-- CORRECT -->
<application android:name="com.wzp.WzpApplication">
<activity android:name="com.wzp.ui.call.CallActivity">
```
### Symptom: Crash in `System.loadLibrary("wzp_android")`
The native `.so` is missing or incompatible. Check:
```bash
# Verify the .so exists in the APK
unzip -l app-release.apk | grep libwzp
# Should show: lib/arm64-v8a/libwzp_android.so
# Verify ABI matches device
adb shell getprop ro.product.cpu.abi
# Should return: arm64-v8a
```
### Symptom: Crash when calling `nativeGetStats()` (returns null jstring)
The JNI bridge must return a valid `jstring`, not a null pointer. The Kotlin side declares the return as `String?` (nullable) and wraps in try/catch:
```kotlin
fun getStats(): String {
if (nativeHandle == 0L) return "{}"
return try {
nativeGetStats(nativeHandle) ?: "{}"
} catch (_: Exception) {
"{}"
}
}
```
### Symptom: Tracing subscriber panic
`tracing_subscriber::fmt()` writes to stdout, which doesn't exist on Android. The init was removed. If you need logging, use `android_logger` crate instead.
## Logcat Filters
### View all WZP logs
```bash
adb logcat -s wzp-android:V wzp-codec:V wzp-net:V
```
### View Rust tracing output (if android_logger is added)
```bash
adb logcat | grep -E "(wzp|WzpEngine|CallActivity)"
```
### View Oboe audio logs
```bash
adb logcat -s AAudio:V oboe:V
```
### View native crashes
```bash
adb logcat -s DEBUG:V libc:V
```
Look for `signal 11 (SIGSEGV)` or `signal 6 (SIGABRT)` with a backtrace in `libwzp_android.so`.
### Symbolicate native crash
```bash
# Find the .so with debug symbols (before stripping)
SO_PATH="target/aarch64-linux-android/release/libwzp_android.so"
# Use addr2line from NDK
$ANDROID_NDK_HOME/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-addr2line \
-e $SO_PATH -f 0x<address_from_crash>
```
## Network Issues
### Call stuck on "Connecting..."
The QUIC handshake to the relay is failing. Common causes:
1. **Relay not running**: Verify the relay is listening:
```bash
nc -zvu 172.16.81.125 4433
```
2. **Wrong relay address**: Hardcoded in `CallViewModel.kt`:
```kotlin
const val DEFAULT_RELAY = "172.16.81.125:4433"
```
3. **QUIC blocked by firewall**: QUIC uses UDP. Many networks block UDP traffic. Ensure UDP port 4433 is open.
4. **TLS handshake failure**: The client uses `client_config()` which disables certificate verification. If the relay's QUIC config changed, this may fail.
### Connected but no audio
1. **Microphone permission denied**: Check Android settings. The app requests `RECORD_AUDIO` on first launch.
2. **Oboe failed to start**: The codec thread logs this. Check logcat for "failed to start audio".
3. **Ring buffer underrun**: The stats overlay shows "Under" count. High underruns mean the codec thread isn't keeping up.
4. **Network not forwarding**: If both phones show "Active" but frame counters aren't increasing, the relay may not be forwarding. Check relay logs.
### High packet loss
The stats overlay shows loss percentage. Common causes:
- Wi-Fi congestion (try cellular or move closer to AP)
- UDP throttling by carrier/ISP
- Relay overloaded (check relay metrics)
## Audio Issues
### Echo
AEC (Acoustic Echo Cancellation) is enabled by default with a 100ms tail. If echo persists:
- The AEC may need a longer tail for the specific acoustic environment
- Speaker volume too high overwhelms the canceller
- Check that `last_decoded_farend` is being set (playout path working)
### Robot voice / glitching
Usually caused by jitter buffer underruns. The jitter buffer adapts between 10-250 packets. Check:
- `jitter_buffer_depth` in stats (should be > 0 during active call)
- `underruns` counter (should not climb rapidly)
- Network jitter (high jitter_ms causes adaptation)
### No sound from speaker
1. Check `isSpeaker` state in the UI
2. Oboe playout stream may have failed — check logcat for Oboe errors
3. Ring buffer might be empty — check `framesDecoded` counter
## JNI Issues
### `UnsatisfiedLinkError: No implementation found for...`
The JNI function name doesn't match. JNI names must follow the pattern:
```
Java_com_wzp_engine_WzpEngine_<methodName>
```
If the package structure changes, all JNI function names must be updated in `jni_bridge.rs`.
### Panic across FFI boundary
All JNI functions wrap their body in `panic::catch_unwind()`. If a Rust panic escapes to Java, it causes a `SIGABRT`. The catch_unwind returns safe defaults:
| Function | Panic return |
|----------|--------------|
| `nativeInit` | 0 (null handle) |
| `nativeStartCall` | -1 (error) |
| `nativeGetStats` | `JObject::null()` |
| Others | void (silently swallowed) |
### Thread safety
All JNI methods must be called from the same thread (Android main thread). The `EngineHandle` is a raw pointer — concurrent access is undefined behavior.
## Stats JSON Format
The `nativeGetStats()` returns JSON matching this Rust struct:
```json
{
"state": "Active",
"duration_secs": 42.5,
"quality_tier": 0,
"loss_pct": 0.5,
"rtt_ms": 45,
"jitter_ms": 12,
"jitter_buffer_depth": 3,
"frames_encoded": 2125,
"frames_decoded": 2100,
"underruns": 5
}
```
Kotlin deserializes this via `CallStats.fromJson()` using `org.json.JSONObject` (Android built-in, no library needed).
## Diagnostic Checklist
When something doesn't work, check in this order:
1. **APK installed for correct ABI?** (`arm64-v8a` only)
2. **Manifest class names fully qualified?** (no dots prefix)
3. **Relay running and reachable?** (`nc -zvu <host> <port>`)
4. **Microphone permission granted?**
5. **Stats polling working?** (check if frame counters increment)
6. **Logcat for native crashes?** (`adb logcat -s DEBUG:V`)
7. **Network connectivity?** (UDP port open, no firewall)

View File

@@ -0,0 +1,399 @@
---
tags: [android, wzp]
type: reference
---
# Fix: AudioRing SPSC Buffer Cursor Desync
## Problem
A critical bug causes 10-16 seconds of bidirectional audio silence mid-call (~25-30s in). Both participants go silent at the exact same moment. The QUIC transport, relay, Opus codec, and FEC are all healthy — the bug is in the lock-free ring buffer that transfers decoded PCM from the Rust recv task to the Kotlin AudioTrack playout thread.
**Root cause:** `AudioRing::write()` modifies `read_pos` from the producer thread during overflow handling (lines 68-72 of `audio_ring.rs`). This violates the SPSC invariant — only the consumer should own `read_pos`. When both threads write to `read_pos`, a race corrupts the cursor state, causing the reader to see an empty or stale buffer for 12-16 seconds.
**Full forensics:** `debug/INCIDENT-2026-04-06-playout-ring-desync.md`
---
## Solution: Reader-Detects-Lap Architecture
The writer NEVER touches `read_pos`. On overflow, the writer simply overwrites old buffer data and advances `write_pos`. The reader detects it was lapped and self-corrects by snapping its own `read_pos` forward.
---
## Implementation Steps
### Step 1: Rewrite `AudioRing`
**File:** `crates/wzp-android/src/audio_ring.rs`
Replace the entire implementation with:
**Constants:**
```rust
/// Ring buffer capacity — must be a power of 2 for bitmask indexing.
/// 16384 samples = 341.3ms at 48kHz mono. Provides 70% more headroom
/// than the previous 9600 (200ms) for surviving Android GC pauses.
const RING_CAPACITY: usize = 16384; // 2^14
const RING_MASK: usize = RING_CAPACITY - 1;
```
**Struct:**
```rust
pub struct AudioRing {
buf: Box<[i16; RING_CAPACITY]>,
write_pos: AtomicUsize, // monotonically increasing, ONLY written by producer
read_pos: AtomicUsize, // monotonically increasing, ONLY written by consumer
overflow_count: AtomicU64, // incremented by reader when it detects a lap
underrun_count: AtomicU64, // incremented by reader when ring is empty
}
```
**`write()` — producer. Does NOT touch `read_pos`:**
```rust
pub fn write(&self, samples: &[i16]) -> usize {
let count = samples.len().min(RING_CAPACITY);
let w = self.write_pos.load(Ordering::Relaxed);
for i in 0..count {
unsafe {
let ptr = self.buf.as_ptr() as *mut i16;
*ptr.add((w + i) & RING_MASK) = samples[i];
}
}
self.write_pos.store(w.wrapping_add(count), Ordering::Release);
count
}
```
**`read()` — consumer. Detects lap, self-corrects:**
```rust
pub fn read(&self, out: &mut [i16]) -> usize {
let w = self.write_pos.load(Ordering::Acquire);
let mut r = self.read_pos.load(Ordering::Relaxed);
let mut avail = w.wrapping_sub(r);
// Lap detection: writer has overwritten our unread data.
// Snap read_pos forward to oldest valid data in the buffer.
// Safe because we (the reader) are the sole owner of read_pos.
if avail > RING_CAPACITY {
r = w.wrapping_sub(RING_CAPACITY);
avail = RING_CAPACITY;
self.overflow_count.fetch_add(1, Ordering::Relaxed);
}
let count = out.len().min(avail);
if count == 0 {
if w == r {
self.underrun_count.fetch_add(1, Ordering::Relaxed);
}
return 0;
}
for i in 0..count {
out[i] = unsafe { *self.buf.as_ptr().add((r + i) & RING_MASK) };
}
self.read_pos.store(r.wrapping_add(count), Ordering::Release);
count
}
```
**`available()` — clamped for external callers:**
```rust
pub fn available(&self) -> usize {
let w = self.write_pos.load(Ordering::Acquire);
let r = self.read_pos.load(Ordering::Relaxed);
w.wrapping_sub(r).min(RING_CAPACITY)
}
```
**`free_space()` — keep for API compat:**
```rust
pub fn free_space(&self) -> usize {
RING_CAPACITY.saturating_sub(self.available())
}
```
**Diagnostic accessors:**
```rust
pub fn overflow_count(&self) -> u64 {
self.overflow_count.load(Ordering::Relaxed)
}
pub fn underrun_count(&self) -> u64 {
self.underrun_count.load(Ordering::Relaxed)
}
```
**Constructor:**
```rust
pub fn new() -> Self {
debug_assert!(RING_CAPACITY.is_power_of_two());
Self {
buf: Box::new([0i16; RING_CAPACITY]),
write_pos: AtomicUsize::new(0),
read_pos: AtomicUsize::new(0),
overflow_count: AtomicU64::new(0),
underrun_count: AtomicU64::new(0),
}
}
```
**Imports to add:** `use std::sync::atomic::AtomicU64;`
**Safety comment update:**
```rust
// SAFETY: AudioRing is SPSC — one thread writes (producer), one reads (consumer).
// The producer only writes write_pos. The consumer only writes read_pos.
// Neither thread writes the other's cursor. Buffer indices are derived from
// the owning thread's cursor, ensuring no concurrent access to the same index.
```
---
### Step 2: Add counter fields to `CallStats`
**File:** `crates/wzp-android/src/stats.rs`
Add three fields to the `CallStats` struct (after `fec_recovered`):
```rust
/// Playout ring overflow count (reader was lapped by writer).
pub playout_overflows: u64,
/// Playout ring underrun count (reader found empty buffer).
pub playout_underruns: u64,
/// Capture ring overflow count.
pub capture_overflows: u64,
```
These derive `Default` (= 0) automatically via the existing `#[derive(Default)]`.
---
### Step 3: Wire ring diagnostics into engine stats + logging
**File:** `crates/wzp-android/src/engine.rs`
**3a.** In `get_stats()` (~line 181), populate the new fields:
```rust
stats.playout_overflows = self.state.playout_ring.overflow_count();
stats.playout_underruns = self.state.playout_ring.underrun_count();
stats.capture_overflows = self.state.capture_ring.overflow_count();
```
**3b.** In the recv task periodic stats log, add ring health:
```rust
info!(
frames_decoded,
fec_recovered,
recv_errors,
max_recv_gap_ms,
playout_avail = state.playout_ring.available(),
playout_overflows = state.playout_ring.overflow_count(),
playout_underruns = state.playout_ring.underrun_count(),
"recv stats"
);
```
**3c.** In the send task periodic stats log, add capture ring health:
```rust
info!(
seq = s,
block_id,
frames_sent,
frames_dropped,
send_errors,
ring_avail = state.capture_ring.available(),
capture_overflows = state.capture_ring.overflow_count(),
"send stats"
);
```
---
### Step 4: Parse new stats in Kotlin
**File:** `android/app/src/main/java/com/wzp/engine/CallStats.kt`
Add fields to the data class:
```kotlin
val playoutOverflows: Long = 0,
val playoutUnderruns: Long = 0,
val captureOverflows: Long = 0,
```
Add parsing in `fromJson()`:
```kotlin
playoutOverflows = obj.optLong("playout_overflows", 0),
playoutUnderruns = obj.optLong("playout_underruns", 0),
captureOverflows = obj.optLong("capture_overflows", 0),
```
No UI changes needed — these fields will appear in debug report JSON automatically.
---
### Step 5: Unit tests
**File:** `crates/wzp-android/src/audio_ring.rs` — add `#[cfg(test)] mod tests`
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn capacity_is_power_of_two() {
assert!(RING_CAPACITY.is_power_of_two());
}
#[test]
fn basic_write_read() {
let ring = AudioRing::new();
let input: Vec<i16> = (0..960).map(|i| i as i16).collect();
ring.write(&input);
assert_eq!(ring.available(), 960);
let mut output = vec![0i16; 960];
let read = ring.read(&mut output);
assert_eq!(read, 960);
assert_eq!(output, input);
assert_eq!(ring.available(), 0);
}
#[test]
fn wraparound() {
let ring = AudioRing::new();
let frame = vec![42i16; 960];
// Write enough to wrap the buffer multiple times
for _ in 0..20 {
ring.write(&frame);
let mut out = vec![0i16; 960];
ring.read(&mut out);
assert!(out.iter().all(|&s| s == 42));
}
}
#[test]
fn overflow_detected_by_reader() {
let ring = AudioRing::new();
// Write more than RING_CAPACITY without reading
let big = vec![7i16; RING_CAPACITY + 960];
ring.write(&big[..RING_CAPACITY]);
ring.write(&big[RING_CAPACITY..]);
// Reader should detect lap
let mut out = vec![0i16; 960];
let read = ring.read(&mut out);
assert!(read > 0);
assert_eq!(ring.overflow_count(), 1);
// Data should be from the most recent writes
assert!(out.iter().all(|&s| s == 7));
}
#[test]
fn writer_never_modifies_read_pos() {
let ring = AudioRing::new();
// Read pos should stay at 0 until read() is called
let data = vec![1i16; RING_CAPACITY + 960];
ring.write(&data);
// read_pos is private, but we can check available() > CAPACITY
// which proves write() didn't advance read_pos
let w = ring.write_pos.load(std::sync::atomic::Ordering::Relaxed);
let r = ring.read_pos.load(std::sync::atomic::Ordering::Relaxed);
assert_eq!(r, 0, "write() must not modify read_pos");
assert!(w.wrapping_sub(r) > RING_CAPACITY);
}
#[test]
fn underrun_counted() {
let ring = AudioRing::new();
let mut out = vec![0i16; 960];
let read = ring.read(&mut out);
assert_eq!(read, 0);
assert_eq!(ring.underrun_count(), 1);
}
#[test]
fn overflow_recovery_reads_recent_data() {
let ring = AudioRing::new();
// Fill with old data
let old = vec![1i16; RING_CAPACITY];
ring.write(&old);
// Overwrite with new data (lapping the reader)
let new_data = vec![99i16; 960];
ring.write(&new_data);
// Reader should snap forward and get recent data
let mut out = vec![0i16; RING_CAPACITY];
let read = ring.read(&mut out);
assert_eq!(read, RING_CAPACITY);
// The last 960 samples should be 99
assert!(out[RING_CAPACITY - 960..].iter().all(|&s| s == 99));
assert_eq!(ring.overflow_count(), 1);
}
}
```
---
## Memory Ordering Reference
| Operation | Ordering | Rationale |
|-----------|----------|-----------|
| `write_pos.store` in `write()` | Release | Buffer writes visible before cursor advances |
| `write_pos.load` in `read()` | Acquire | Pairs with Release above — sees all buffer writes |
| `write_pos.load` in `write()` | Relaxed | Writer is sole owner of write_pos |
| `read_pos.load` in `read()` | Relaxed | Reader is sole owner of read_pos |
| `read_pos.store` in `read()` | Release | Makes available() consistent from any thread |
| `read_pos.load` in `available()` | Relaxed | Informational only, slight staleness OK |
| All counters | Relaxed | Diagnostic only |
---
## Capacity Tradeoff
| Capacity | Duration | Memory | Verdict |
|----------|----------|--------|---------|
| 8192 (2^13) | 170ms | 16KB | Less than current 200ms — risky |
| **16384 (2^14)** | **341ms** | **32KB** | **70% more headroom, bitmask indexing** |
| 32768 (2^15) | 682ms | 64KB | Excessive latency on overflow recovery |
---
## Verification
1. `cargo test -p wzp-android` — new unit tests pass
2. `cargo ndk -t arm64-v8a build --release -p wzp-android` — ARM cross-compile succeeds
3. Build APK, install on both test devices (Nothing A059 + Pixel 6)
4. 2+ minute call — verify no audio gaps
5. Check debug report JSON: `playout_overflows` should be 0 or very small
6. Check logcat `wzp_android` tag: send/recv stats show healthy ring state
7. Stress test: play music through one device speaker while on call — forces high ring throughput
---
## Files to Modify
| File | What changes |
|------|-------------|
| `crates/wzp-android/src/audio_ring.rs` | Complete rewrite — the core fix |
| `crates/wzp-android/src/stats.rs` | Add 3 counter fields |
| `crates/wzp-android/src/engine.rs` | Wire counters into get_stats() + periodic logs |
| `android/app/src/main/java/com/wzp/engine/CallStats.kt` | Parse 3 new JSON fields |
## What Does NOT Change
- `AudioPipeline.kt` — calls `readAudio()`/`writeAudio()` unchanged; ring fix is transparent
- `jni_bridge.rs` — JNI bridge passes through unchanged
- `audio_android.rs` — separate Oboe-based ring, currently unused, different design
- Relay code — relay is confirmed healthy
- Desktop client — uses `Mutex + mpsc`, not `AudioRing`

View File

@@ -0,0 +1,154 @@
---
tags: [android, wzp]
type: reference
---
# Fix: Capture/Playout Thread Use-After-Free on Hangup
## Problem
App crashes (SIGSEGV) when hanging up a call. The capture thread (`wzp-capture`) calls `engine.writeAudio()` via JNI after `teardown()` has freed the native engine handle. Same race exists for the playout thread's `readAudio()`.
**Root cause:** TOCTOU race between the `nativeHandle == 0L` check in `WzpEngine.writeAudio()`/`readAudio()` and `destroy()` freeing the native memory on the ViewModel thread. Audio threads can't be joined (libcrypto TLS destructor crash), so there's no synchronization between `stopAudio()` and `destroy()`.
**Full forensics:** `debug/INCIDENT-2026-04-06-capture-thread-use-after-free.md`
---
## Solution: Destroy Latch
Add a `CountDownLatch(2)` that both audio threads count down after exiting their loops. `teardown()` awaits the latch (with timeout) before calling `destroy()`, guaranteeing no in-flight JNI calls.
---
## Implementation Steps
### Step 1: Add a drain latch to `AudioPipeline`
**File:** `android/app/src/main/java/com/wzp/audio/AudioPipeline.kt`
Add a `CountDownLatch` field:
```kotlin
import java.util.concurrent.CountDownLatch
import java.util.concurrent.TimeUnit
class AudioPipeline(private val context: Context) {
// ... existing fields ...
/** Latch counted down by each audio thread after exiting its loop.
* stop() does NOT wait on this — teardown waits via awaitDrain(). */
private var drainLatch: CountDownLatch? = null
```
In `start()`, create the latch before spawning threads:
```kotlin
fun start(engine: WzpEngine) {
if (running) return
running = true
drainLatch = CountDownLatch(2) // one for capture, one for playout
captureThread = Thread({
runCapture(engine)
drainLatch?.countDown() // signal: capture loop exited
parkThread()
}, "wzp-capture").apply { ... }
playoutThread = Thread({
runPlayout(engine)
drainLatch?.countDown() // signal: playout loop exited
parkThread()
}, "wzp-playout").apply { ... }
// ...
}
```
Add `awaitDrain()` — called by ViewModel before `destroy()`:
```kotlin
/** Block until both audio threads have exited their loops (max 200ms).
* After this returns, no more JNI calls to the engine will be made. */
fun awaitDrain(): Boolean {
return drainLatch?.await(200, TimeUnit.MILLISECONDS) ?: true
}
```
`stop()` remains unchanged (non-blocking, sets `running = false`).
### Step 2: Update `CallViewModel.teardown()` to await drain
**File:** `android/app/src/main/java/com/wzp/ui/call/CallViewModel.kt`
Change teardown to wait for audio threads before destroying:
```kotlin
private fun teardown(stopService: Boolean = true) {
Log.i(TAG, "teardown: stopping audio, stopService=$stopService")
val hadCall = audioStarted
CallService.onStopFromNotification = null
stopAudio() // sets running=false (non-blocking)
stopStatsPolling()
// Wait for audio threads to exit their loops before destroying the engine.
// This guarantees no in-flight JNI calls to writeAudio/readAudio.
val drained = audioPipeline?.awaitDrain() ?: true
if (!drained) {
Log.w(TAG, "teardown: audio threads did not drain in time")
}
audioPipeline = null
Log.i(TAG, "teardown: stopping engine")
try { engine?.stopCall() } catch (e: Exception) { Log.w(TAG, "stopCall err: $e") }
try { engine?.destroy() } catch (e: Exception) { Log.w(TAG, "destroy err: $e") }
engine = null
engineInitialized = false
// ... rest unchanged
}
```
**Key change:** `awaitDrain()` is called AFTER `stopAudio()` (which sets `running=false`) but BEFORE `engine?.destroy()`. The latch guarantees both threads have exited their `while(running)` loops and will never call `writeAudio`/`readAudio` again.
Also move `audioPipeline = null` to after `awaitDrain()` to keep the reference alive for the latch call.
### Step 3: Move `stopAudio()` pipeline nulling
**File:** `android/app/src/main/java/com/wzp/ui/call/CallViewModel.kt`
In `stopAudio()`, do NOT null out the pipeline — let `teardown()` handle it after drain:
```kotlin
private fun stopAudio() {
if (!audioStarted) return
audioPipeline?.stop() // sets running=false
// DON'T null audioPipeline here — teardown() needs it for awaitDrain()
audioRouteManager?.unregister()
audioRouteManager?.setSpeaker(false)
_isSpeaker.value = false
audioStarted = false
}
```
---
## Files to Modify
| File | What changes |
|------|-------------|
| `android/.../audio/AudioPipeline.kt` | Add `CountDownLatch`, `countDown()` in threads, `awaitDrain()` method |
| `android/.../ui/call/CallViewModel.kt` | `teardown()` calls `awaitDrain()` before `destroy()`; `stopAudio()` doesn't null pipeline |
## What Does NOT Change
- `WzpEngine.kt` — the `nativeHandle == 0L` guard stays as defense-in-depth
- `jni_bridge.rs``panic::catch_unwind` stays as last resort
- `AudioPipeline.stop()` — remains non-blocking
- Thread parking — still needed to avoid libcrypto TLS crash
## Verification
1. Build APK, install on test device
2. Make a call, hang up — verify no crash in logcat (`adb logcat -s AndroidRuntime:E DEBUG:F`)
3. Rapid call/hangup/call/hangup cycles — stress the teardown path
4. Check logcat for `teardown: audio threads did not drain in time` — should never appear under normal conditions
5. Verify debug report still works after hangup (latch doesn't interfere with report collection)

View File

@@ -0,0 +1,195 @@
---
tags: [android, wzp]
type: reference
---
# Maintenance Guide
## Code Map — Where to Change Things
### Changing the relay address or room
Edit `CallViewModel.kt`:
```kotlin
companion object {
const val DEFAULT_RELAY = "172.16.81.125:4433"
const val DEFAULT_ROOM = "android"
}
```
For a proper settings screen, add a new Composable in `ui/` that persists to `SharedPreferences` and passes values to `viewModel.startCall(relay, room)`.
### Adding authentication
1. In `CallViewModel.startCall()`, pass a token parameter
2. In `engine.rs`, after QUIC connect but before CallOffer, send:
```rust
transport.send_signal(&SignalMessage::AuthToken { token: auth_token }).await?;
```
3. Wait for the relay to accept before proceeding to handshake
4. Start relay with `--auth-url <featherchat-endpoint>`
### Enabling media encryption
The crypto session is already derived in `engine.rs` but not applied to packets. To enable:
1. Pass `_session` (currently unused) to the send/recv tasks
2. Before `transport.send_media()`, encrypt the payload:
```rust
let mut ciphertext = Vec::new();
session.encrypt(&header_bytes, &payload, &mut ciphertext)?;
packet.payload = Bytes::from(ciphertext);
```
3. After `transport.recv_media()`, decrypt:
```rust
let mut plaintext = Vec::new();
session.decrypt(&header_bytes, &pkt.payload, &mut plaintext)?;
pkt.payload = Bytes::from(plaintext);
```
### Adding a new codec / quality profile
1. Define the profile in `wzp-proto/src/codec_id.rs`
2. Implement `AudioEncoder`/`AudioDecoder` traits in `wzp-codec`
3. Register in `AdaptiveEncoder`/`AdaptiveDecoder` switch logic
4. Add to `supported_profiles` in the CallOffer (engine.rs)
### Changing audio parameters
- **Sample rate**: Change `FRAME_SAMPLES` in `audio_android.rs` and `WzpOboeConfig.sample_rate` in `oboe_bridge.cpp`. Must match the codec's expected rate.
- **Frame duration**: Change `FRAME_SAMPLES` (960 = 20ms at 48kHz, 1920 = 40ms)
- **Ring buffer size**: Change `RING_CAPACITY` in `audio_android.rs`
- **AEC tail length**: Change the `100` in `Pipeline::new()` → `EchoCanceller::new(48000, 100)`
### Adding x86_64 support (emulator)
1. `build.gradle.kts`: add `"x86_64"` to `abiFilters`
2. `cargoNdkBuild` task: add `-t x86_64`
3. `build.rs`: handle `x86_64-linux-android` target for Oboe
4. Note: Oboe in the emulator uses a different audio HAL — audio quality will differ
## Dependency Overview
### Rust Crate Dependencies (wzp-android)
| Crate | Version | Purpose | Upgrade risk |
|-------|---------|---------|--------------|
| `jni` | 0.21 | Java FFI | Low — stable API |
| `tokio` | 1.x | Async runtime | Low |
| `quinn` | 0.11 | QUIC transport | Medium — breaking changes between 0.x |
| `rustls` | 0.23 | TLS for QUIC | Medium — tied to quinn version |
| `serde_json` | 1.x | Stats serialization | Low |
| `anyhow` | 1.x | Error handling | Low |
| `tracing` | 0.1 | Logging | Low |
| `rand` | 0.8 | Random seed generation | Low |
### Workspace Crate Dependencies
| Crate | Purpose | Key trait |
|-------|---------|-----------|
| `wzp-proto` | Shared types and traits | `MediaTransport`, `AudioEncoder`, `KeyExchange` |
| `wzp-codec` | Opus + Codec2 + signal processing | `AdaptiveEncoder`, `EchoCanceller` |
| `wzp-fec` | RaptorQ FEC | `RaptorQFecEncoder` |
| `wzp-crypto` | Key exchange + encryption | `WarzoneKeyExchange`, `ChaChaSession` |
| `wzp-transport` | QUIC connection management | `QuinnTransport`, `connect()` |
### Android/Kotlin Dependencies
| Library | Version | Purpose |
|---------|---------|---------|
| `compose-bom` | 2024.01.00 | Compose version alignment |
| `material3` | (from BOM) | UI components |
| `activity-compose` | 1.8.2 | Activity integration |
| `lifecycle-runtime-ktx` | 2.7.0 | ViewModel + coroutines |
| `core-ktx` | 1.12.0 | Kotlin extensions |
## Updating Dependencies
### Rust
```bash
cargo update -p wzp-android
cargo ndk -t arm64-v8a build --release -p wzp-android
```
Watch for `quinn`/`rustls` version coupling. They must be compatible:
- quinn 0.11 requires rustls 0.23
### Android/Kotlin
Update versions in `android/app/build.gradle.kts`. Key compatibility:
- `kotlinCompilerExtensionVersion` must match the Kotlin version
- `compose-bom` version determines all Compose library versions
- `compileSdk` and `targetSdk` should stay in sync
### NDK
If upgrading the NDK:
1. Update `ndkVersion` in `build.gradle.kts`
2. Update `ANDROID_NDK_HOME` environment variable
3. Update `CC_aarch64_linux_android` and friends
4. Verify Oboe still builds with the new toolchain
## Key Invariants to Preserve
1. **JNI function names must match package structure**: If the Kotlin package changes, all `Java_com_wzp_engine_WzpEngine_*` functions in `jni_bridge.rs` must be renamed.
2. **Manifest uses fully-qualified class names**: Never use `.ClassName` shorthand because the Gradle namespace (`com.wzp.phone`) differs from the Kotlin package (`com.wzp`).
3. **Stats JSON field names are snake_case**: Rust serializes with serde defaults (snake_case). Kotlin's `CallStats.fromJson()` expects `duration_secs`, `loss_pct`, etc.
4. **Ring buffer ordering**: Producer uses Release store on write index, consumer uses Acquire load. Breaking this causes torn reads.
5. **Codec thread owns Pipeline**: Pipeline is `!Send` (Opus encoder state). It must never be accessed from another thread.
6. **panic::catch_unwind on all JNI functions**: Rust panics unwinding across the FFI boundary is UB. Every JNI-exposed function must catch panics.
7. **Channel capacity (64)**: Both `send_tx` and `recv_tx` are bounded at 64 packets. If the network is slow, packets are dropped (`try_send` best-effort).
## Testing
### Unit Tests (Rust)
```bash
# Run all workspace tests (host, not Android)
cargo test
# Run only wzp-android tests (uses oboe_stub.cpp on host)
cargo test -p wzp-android
```
Note: Pipeline, codec, FEC, crypto tests run on the host. Audio tests use stubs.
### On-Device Testing
1. Build and install debug APK
2. Open app, tap CALL
3. Verify in logcat:
- `WzpEngine created via JNI`
- `connecting to relay...`
- `QUIC connected to relay`
- `CallOffer sent`
- `handshake complete, call active`
- `codec thread started`
4. Check stats overlay: frame counters should increment
5. Speak into mic — other connected device should hear audio
### Stress Testing
- Run a call for 30+ minutes — check for memory leaks (stats should be stable)
- Kill and restart the relay — client should eventually get a connection error
- Toggle mute rapidly — verify no crashes
- Switch speaker on/off — verify audio route changes
## Performance Monitoring
Key metrics to watch during a call:
| Metric | Healthy Range | Warning | Critical |
|--------|--------------|---------|----------|
| frames_encoded | Increasing ~50/sec | Stalled | 0 |
| frames_decoded | Increasing ~50/sec | Stalled | 0 |
| underruns | < 5/min | > 20/min | > 100/min |
| jitter_buffer_depth | 2-5 | 0 or >10 | N/A |
| loss_pct | < 5% | 5-20% | > 20% |
| rtt_ms | < 100ms | 100-300ms | > 500ms |

46
vault/Android/README.md Normal file
View File

@@ -0,0 +1,46 @@
---
tags: [android, wzp]
type: reference
---
# WarzonePhone Android Client
The WZP Android client is a native VoIP application built with Kotlin/Jetpack Compose on top of a Rust audio engine. It connects to WZP relay servers over QUIC, providing encrypted voice calls with adaptive quality, forward error correction, and acoustic echo cancellation.
## Quick Start
1. **Build**: `cd android && ./gradlew assembleRelease` (requires NDK 26.1, cargo-ndk)
2. **Install**: `adb install app/build/outputs/apk/release/app-release.apk`
3. **Run**: Open "WZ Phone", tap **CALL** to connect to the hardcoded relay
4. **Relay**: Must be running at the configured address (default `172.16.81.125:4433`)
## Current State (April 2025)
| Feature | Status |
|---------|--------|
| QUIC transport to relay | Working |
| Crypto handshake (X25519 + Ed25519) | Working |
| Opus 24k encoding/decoding | Working |
| Oboe audio I/O (48kHz mono) | Working |
| AEC / AGC signal processing | Working |
| RaptorQ FEC | Wired (repair symbols not sent yet) |
| Jitter buffer | Working |
| Adaptive quality switching | Codec-ready, not network-driven yet |
| Authentication (featherChat) | Skipped (relay has no --auth-url) |
| Media encryption (ChaCha20-Poly1305) | Session derived but not applied to packets |
| Foreground service / wake locks | Implemented, not started from UI |
## Documentation Index
- [Architecture](architecture.md) - System design, data flow diagrams, thread model
- [Build Guide](build-guide.md) - Build environment setup, dependencies, signing
- [Debugging](debugging.md) - Crash diagnosis, logcat filters, common issues
- [Maintenance](maintenance.md) - Code map, dependency management, upgrade paths
- [Roadmap](roadmap.md) - Planned work and known gaps
## Key Design Decisions
- **Rust native engine**: All audio processing, codecs, FEC, crypto, and networking run in Rust. Kotlin is UI-only.
- **Lock-free audio**: SPSC ring buffers with atomic ordering between Oboe C++ callbacks and the Rust codec thread. No mutexes in the audio path.
- **cargo-ndk**: The native library (`libwzp_android.so`) is cross-compiled for `arm64-v8a` using cargo-ndk, invoked automatically by Gradle's `cargoNdkBuild` task.
- **Single-activity Compose**: One `CallActivity` hosts all UI via Jetpack Compose with `CallViewModel` as the state holder.

117
vault/Android/Roadmap.md Normal file
View File

@@ -0,0 +1,117 @@
---
tags: [android, wzp]
type: reference
---
# Roadmap & Known Gaps
## Current State Summary
The Android client can connect to a WZP relay, complete the crypto handshake, and exchange audio in real-time. Two phones on the same network can talk to each other through the relay.
## What Works (April 2025)
- QUIC transport to relay with room-based SFU
- Full crypto handshake (X25519 ephemeral + Ed25519 signatures)
- Opus 24kbps encoding/decoding at 48kHz
- Lock-free audio I/O via Oboe (capture + playout)
- AEC (acoustic echo cancellation) with 100ms tail
- AGC (automatic gain control)
- RaptorQ FEC encoder/decoder (wired to pipeline)
- Adaptive jitter buffer (10-250 packets)
- UI with connect/disconnect, mute, speaker, live stats
- Random identity seed per app launch
## Known Gaps
### P0 — Must fix for usable calls
| Gap | Impact | Where to fix |
|-----|--------|--------------|
| **Media encryption not applied** | Audio sent in cleartext over QUIC | `engine.rs` — pass `_session` to send/recv, encrypt/decrypt payloads |
| **FEC repair symbols not sent** | No loss recovery — audio gaps on packet loss | `engine.rs` send task — call `fec_encoder.generate_repair()` and send repair packets |
| **Quality reports not sent** | Relay can't monitor quality, no adaptive switching | `engine.rs` — periodically attach `QualityReport` to MediaPacket header |
| **CallService not started** | Call dies when app is backgrounded | `CallViewModel.startCall()` — call `CallService.start(context)` |
### P1 — Important for production
| Gap | Impact | Where to fix |
|-----|--------|--------------|
| **Hardcoded relay address** | Can't change server without rebuild | Add settings screen with `SharedPreferences` |
| **No reconnection logic** | Connection drop = call over | `engine.rs` network task — detect disconnect, retry with backoff |
| **No adaptive quality switching** | Stays on GOOD profile even in bad conditions | Wire `AdaptiveQualityController` to network path quality from `QuinnTransport` |
| **Identity seed not persisted** | New identity every launch | Save seed to Android Keystore or SharedPreferences |
| **No Bluetooth audio routing** | `AudioRouteManager` exists but not wired to UI | Add Bluetooth button to InCallScreen, call `AudioRouteManager` methods |
| **No ringtone/notification for incoming** | Only outgoing calls supported | Need signaling for call setup (currently both sides initiate independently) |
### P2 — Nice to have
| Gap | Impact | Where to fix |
|-----|--------|--------------|
| **No android_logger** | Rust tracing output lost on Android | Add `android_logger` crate, init in `nativeInit()` |
| **Stats don't include network metrics** | Loss/RTT/jitter always 0 | Feed `QuinnTransport.path_quality()` back to stats |
| **No ProGuard/R8 minification** | Release APK larger than necessary | Enable `isMinifyEnabled = true` in build.gradle.kts |
| **Single ABI (arm64-v8a)** | No support for older 32-bit devices or emulators | Add `armeabi-v7a` and `x86_64` to cargo-ndk build |
| **No call history** | Can't see past calls | Add Room database for call log |
| **No contact integration** | Manual relay/room entry | Add contacts with fingerprint-based identity |
## Architecture Evolution Plan
### Phase 1: Make Calls Reliable (current → next)
```
[x] QUIC connection to relay
[x] Crypto handshake
[x] Audio encode/decode pipeline
[ ] Media encryption (ChaCha20-Poly1305)
[ ] FEC repair packet transmission
[ ] Foreground service for background calls
[ ] Reconnection on network change
```
### Phase 2: Quality & Polish
```
[ ] Adaptive quality (GOOD → DEGRADED → CATASTROPHIC switching)
[ ] Quality reports in MediaPacket headers
[ ] Network path quality display (real RTT, loss, jitter)
[ ] Settings screen (relay, room, seed persistence)
[ ] Bluetooth/wired headset audio routing
[ ] Rust android_logger for debugging
```
### Phase 3: Production Features
```
[ ] featherChat authentication
[ ] Persistent identity (Android Keystore)
[ ] Push notifications for incoming calls
[ ] Multi-party rooms (already supported by relay)
[ ] Call transfer
[ ] End-to-end encryption (bypass relay decryption)
```
## Dependency Upgrade Path
### quinn 0.11 → 0.12 (when released)
Quinn 0.12 will likely require rustls 0.24. Update both together:
1. `Cargo.toml`: bump quinn and rustls versions
2. Check `client_config()` and `server_config()` in wzp-transport for API changes
3. DATAGRAM API may change — check `send_datagram()` / `read_datagram()`
### Compose BOM 2024.01 → 2025.x
The `LinearProgressIndicator` `progress` parameter changed from `Float` to `() -> Float` in Material3 1.2+. If upgrading the BOM:
```kotlin
// Old (current):
LinearProgressIndicator(progress = level, ...)
// New (Material3 1.2+):
LinearProgressIndicator(progress = { level }, ...)
```
### Kotlin 1.9 → 2.x
Kotlin 2.0 changed the Compose compiler plugin. Update `kotlinCompilerExtensionVersion` in `composeOptions` and the Kotlin Gradle plugin version together.