feat: direct calling UI for desktop Tauri app + merge android branch
Tauri backend: - register_signal: persistent _signal connection, presence registration - place_call: send DirectCallOffer by fingerprint - answer_call: accept/reject incoming calls - get_signal_status: poll signal state Frontend: - Mode toggle: "Room" vs "Direct Call" - Register button → registers on relay signal channel - Incoming call panel with Accept/Reject - Fingerprint input + Call button - Auto-connect to media room on CallSetup event Also merges feat/android-voip-client into desktop branch: - Federation fixes, time-based dedup, FEC stale blocks - Direct calling protocol types - ACL + SAS verification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
627
docs/ADMINISTRATION.md
Normal file
627
docs/ADMINISTRATION.md
Normal file
@@ -0,0 +1,627 @@
|
||||
# WarzonePhone Relay Administration Guide
|
||||
|
||||
This document covers deploying, configuring, and operating wzp-relay instances, including federation setup, monitoring, and troubleshooting.
|
||||
|
||||
## Relay Deployment
|
||||
|
||||
### Binary
|
||||
|
||||
Build and run the relay directly:
|
||||
|
||||
```bash
|
||||
# Build release binary
|
||||
cargo build --release --bin wzp-relay
|
||||
|
||||
# Run with defaults (listen on 0.0.0.0:4433, room mode, no auth)
|
||||
./target/release/wzp-relay
|
||||
|
||||
# Run with config file
|
||||
./target/release/wzp-relay --config /etc/wzp/relay.toml
|
||||
```
|
||||
|
||||
### Remote Build (Linux)
|
||||
|
||||
The included build script provisions a temporary Hetzner Cloud VPS, builds all binaries, and downloads them:
|
||||
|
||||
```bash
|
||||
# Requires: hcloud CLI authenticated, SSH key "wz" registered
|
||||
./scripts/build-linux.sh
|
||||
# Outputs to: target/linux-x86_64/
|
||||
```
|
||||
|
||||
Produces: `wzp-relay`, `wzp-client`, `wzp-client-audio`, `wzp-web`, `wzp-bench`.
|
||||
|
||||
### Docker
|
||||
|
||||
```dockerfile
|
||||
FROM rust:1.85 AS builder
|
||||
WORKDIR /src
|
||||
COPY . .
|
||||
RUN cargo build --release --bin wzp-relay
|
||||
|
||||
FROM debian:bookworm-slim
|
||||
RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*
|
||||
COPY --from=builder /src/target/release/wzp-relay /usr/local/bin/
|
||||
EXPOSE 4433/udp
|
||||
EXPOSE 9090/tcp
|
||||
VOLUME /data
|
||||
ENV HOME=/data
|
||||
ENTRYPOINT ["wzp-relay"]
|
||||
CMD ["--config", "/data/relay.toml", "--metrics-port", "9090"]
|
||||
```
|
||||
|
||||
Build and run:
|
||||
|
||||
```bash
|
||||
docker build -t wzp-relay .
|
||||
docker run -d \
|
||||
--name wzp-relay \
|
||||
-p 4433:4433/udp \
|
||||
-p 9090:9090/tcp \
|
||||
-v /opt/wzp:/data \
|
||||
wzp-relay
|
||||
```
|
||||
|
||||
### systemd
|
||||
|
||||
Create `/etc/systemd/system/wzp-relay.service`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=WarzonePhone Relay
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=wzp
|
||||
Group=wzp
|
||||
ExecStart=/usr/local/bin/wzp-relay --config /etc/wzp/relay.toml
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
LimitNOFILE=65536
|
||||
|
||||
# Security hardening
|
||||
NoNewPrivileges=yes
|
||||
ProtectSystem=strict
|
||||
ProtectHome=yes
|
||||
ReadWritePaths=/var/lib/wzp
|
||||
PrivateTmp=yes
|
||||
|
||||
Environment=HOME=/var/lib/wzp
|
||||
Environment=RUST_LOG=info
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
Setup:
|
||||
|
||||
```bash
|
||||
# Create service user
|
||||
useradd --system --home-dir /var/lib/wzp --create-home wzp
|
||||
|
||||
# Install binary and config
|
||||
cp target/release/wzp-relay /usr/local/bin/
|
||||
mkdir -p /etc/wzp
|
||||
cp relay.toml /etc/wzp/
|
||||
|
||||
# Enable and start
|
||||
systemctl daemon-reload
|
||||
systemctl enable --now wzp-relay
|
||||
journalctl -u wzp-relay -f
|
||||
```
|
||||
|
||||
## TOML Configuration Reference
|
||||
|
||||
All fields have defaults. A minimal config file only needs the fields you want to override.
|
||||
|
||||
### Core Settings
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `listen_addr` | string (socket addr) | `"0.0.0.0:4433"` | UDP address to listen on for incoming QUIC connections |
|
||||
| `remote_relay` | string (socket addr) | none | Remote relay address for forward mode. Disables room mode when set |
|
||||
| `max_sessions` | integer | `100` | Maximum concurrent client sessions |
|
||||
| `log_level` | string | `"info"` | Logging level: trace, debug, info, warn, error |
|
||||
|
||||
### Jitter Buffer
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `jitter_target_depth` | integer | `50` | Target buffer depth in packets (50 = 1 second at 20ms frames) |
|
||||
| `jitter_max_depth` | integer | `250` | Maximum buffer depth in packets (250 = 5 seconds) |
|
||||
|
||||
### Authentication
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `auth_url` | string | none | featherChat auth validation URL. When set, clients must send a bearer token as their first signal message. The relay validates it via `POST <auth_url>` |
|
||||
|
||||
### Metrics and Monitoring
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `metrics_port` | integer | none | Port for the Prometheus HTTP metrics endpoint. Disabled if not set |
|
||||
| `probe_targets` | array of socket addrs | `[]` | Peer relay addresses to probe for health monitoring (1 Ping/s each) |
|
||||
| `probe_mesh` | boolean | `false` | Enable mesh mode for probe targets |
|
||||
|
||||
### Media Processing
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `trunking_enabled` | boolean | `false` | Enable trunk batching for outgoing media. Packs multiple session packets into one QUIC datagram, reducing overhead |
|
||||
|
||||
### WebSocket / Browser Support
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `ws_port` | integer | none | Port for WebSocket listener (browser clients). Disabled if not set |
|
||||
| `static_dir` | string | none | Directory to serve static files (HTML/JS/WASM) |
|
||||
|
||||
### Federation
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `peers` | array of PeerConfig | `[]` | Outbound federation peer relays |
|
||||
| `trusted` | array of TrustedConfig | `[]` | Inbound federation trust list |
|
||||
| `global_rooms` | array of GlobalRoomConfig | `[]` | Room names to bridge across federation |
|
||||
|
||||
### Debugging
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `debug_tap` | string | none | Log packet headers for matching rooms. Use `"*"` for all rooms, or a specific room name |
|
||||
|
||||
### PeerConfig Fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `url` | string | yes | Address of the peer relay (e.g., `"193.180.213.68:4433"`) |
|
||||
| `fingerprint` | string | yes | Expected TLS certificate fingerprint (hex with colons) |
|
||||
| `label` | string | no | Human-readable label for logging |
|
||||
|
||||
### TrustedConfig Fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `fingerprint` | string | yes | Expected TLS certificate fingerprint (hex with colons) |
|
||||
| `label` | string | no | Human-readable label for logging |
|
||||
|
||||
### GlobalRoomConfig Fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `name` | string | yes | Room name to bridge across federation (e.g., `"android"`) |
|
||||
|
||||
## CLI Flags Reference
|
||||
|
||||
```
|
||||
wzp-relay [--config <path>] [--listen <addr>] [--remote <addr>]
|
||||
[--auth-url <url>] [--metrics-port <port>]
|
||||
[--probe <addr>]... [--probe-mesh] [--mesh-status]
|
||||
[--trunking] [--global-room <name>]...
|
||||
[--debug-tap <room>]
|
||||
[--ws-port <port>] [--static-dir <dir>]
|
||||
```
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--config <path>` | Load configuration from TOML file. CLI flags override config file values |
|
||||
| `--listen <addr>` | Listen address (default: `0.0.0.0:4433`) |
|
||||
| `--remote <addr>` | Remote relay for forwarding mode. Disables room mode |
|
||||
| `--auth-url <url>` | featherChat auth endpoint (e.g., `https://chat.example.com/v1/auth/validate`) |
|
||||
| `--metrics-port <port>` | Prometheus metrics HTTP port (e.g., `9090`) |
|
||||
| `--probe <addr>` | Peer relay to probe for health monitoring. Repeatable |
|
||||
| `--probe-mesh` | Enable mesh mode for probes |
|
||||
| `--mesh-status` | Print mesh health table and exit (diagnostic) |
|
||||
| `--trunking` | Enable trunk batching for outgoing media |
|
||||
| `--global-room <name>` | Declare a room as global (bridged across federation). Repeatable |
|
||||
| `--debug-tap <room>` | Log packet headers for a room (`"*"` for all rooms) |
|
||||
| `--event-log <path>` | Write JSONL protocol event log for federation debugging |
|
||||
| `--version`, `-V` | Print build git hash and exit |
|
||||
| `--ws-port <port>` | WebSocket listener port for browser clients |
|
||||
| `--static-dir <dir>` | Directory to serve static files from |
|
||||
| `--help`, `-h` | Print help and exit |
|
||||
|
||||
CLI flags always override config file values when both are specified.
|
||||
|
||||
## Federation Setup
|
||||
|
||||
### Concepts
|
||||
|
||||
- **`[[peers]]`** -- outbound: relays we connect TO. Requires address + fingerprint
|
||||
- **`[[trusted]]`** -- inbound: relays we accept connections FROM. Requires fingerprint only (they connect to us)
|
||||
- **`[[global_rooms]]`** -- rooms bridged across all federated peers. Participants on different relays in the same global room hear each other
|
||||
|
||||
### Getting Your Relay's Fingerprint
|
||||
|
||||
When a relay starts, it logs its TLS fingerprint:
|
||||
|
||||
```
|
||||
INFO TLS certificate (deterministic from relay identity) tls_fingerprint="a5d6:e3c6:5ae7:185c:4eb1:af89:daed:4a43"
|
||||
INFO federation: to peer with this relay, add to relay.toml:
|
||||
INFO [[peers]]
|
||||
INFO url = "193.180.213.68:4433"
|
||||
INFO fingerprint = "a5d6:e3c6:5ae7:185c:4eb1:af89:daed:4a43"
|
||||
```
|
||||
|
||||
Share this information with the administrator of the peer relay.
|
||||
|
||||
### Unknown Peer Connections
|
||||
|
||||
When an unknown relay tries to federate, the log shows:
|
||||
|
||||
```
|
||||
WARN unknown relay wants to federate addr=10.0.0.5:12345 fp="7f2a:b391:0c44:..."
|
||||
INFO to accept, add to relay.toml:
|
||||
INFO [[trusted]]
|
||||
INFO fingerprint = "7f2a:b391:0c44:..."
|
||||
INFO label = "Relay at 10.0.0.5:12345"
|
||||
```
|
||||
|
||||
## Example Configurations
|
||||
|
||||
### Single Relay (Minimal)
|
||||
|
||||
```toml
|
||||
# /etc/wzp/relay.toml
|
||||
# Minimal config -- all defaults, just enable metrics
|
||||
metrics_port = 9090
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
wzp-relay --config /etc/wzp/relay.toml
|
||||
```
|
||||
|
||||
### Single Relay (Full Featured)
|
||||
|
||||
```toml
|
||||
# /etc/wzp/relay.toml
|
||||
listen_addr = "0.0.0.0:4433"
|
||||
max_sessions = 200
|
||||
log_level = "info"
|
||||
|
||||
# Metrics
|
||||
metrics_port = 9090
|
||||
|
||||
# Authentication
|
||||
auth_url = "https://chat.example.com/v1/auth/validate"
|
||||
|
||||
# Browser support
|
||||
ws_port = 8080
|
||||
static_dir = "/opt/wzp/web"
|
||||
|
||||
# Performance
|
||||
trunking_enabled = true
|
||||
|
||||
# Jitter buffer tuning
|
||||
jitter_target_depth = 50
|
||||
jitter_max_depth = 250
|
||||
```
|
||||
|
||||
### Two-Relay Federation
|
||||
|
||||
**Relay A** (`relay-a.toml` on 193.180.213.68):
|
||||
|
||||
```toml
|
||||
listen_addr = "0.0.0.0:4433"
|
||||
metrics_port = 9090
|
||||
|
||||
# Outbound: connect to Relay B
|
||||
[[peers]]
|
||||
url = "10.0.0.5:4433"
|
||||
fingerprint = "7f2a:b391:0c44:9e1d:a8b2:c5d7:e3f0:1234"
|
||||
label = "Relay B (US)"
|
||||
|
||||
# Accept inbound from Relay B
|
||||
[[trusted]]
|
||||
fingerprint = "7f2a:b391:0c44:9e1d:a8b2:c5d7:e3f0:1234"
|
||||
label = "Relay B (US)"
|
||||
|
||||
# Bridge these rooms
|
||||
[[global_rooms]]
|
||||
name = "android"
|
||||
|
||||
[[global_rooms]]
|
||||
name = "general"
|
||||
```
|
||||
|
||||
**Relay B** (`relay-b.toml` on 10.0.0.5):
|
||||
|
||||
```toml
|
||||
listen_addr = "0.0.0.0:4433"
|
||||
metrics_port = 9090
|
||||
|
||||
# Outbound: connect to Relay A
|
||||
[[peers]]
|
||||
url = "193.180.213.68:4433"
|
||||
fingerprint = "a5d6:e3c6:5ae7:185c:4eb1:af89:daed:4a43"
|
||||
label = "Relay A (EU)"
|
||||
|
||||
# Accept inbound from Relay A
|
||||
[[trusted]]
|
||||
fingerprint = "a5d6:e3c6:5ae7:185c:4eb1:af89:daed:4a43"
|
||||
label = "Relay A (EU)"
|
||||
|
||||
# Same global rooms
|
||||
[[global_rooms]]
|
||||
name = "android"
|
||||
|
||||
[[global_rooms]]
|
||||
name = "general"
|
||||
```
|
||||
|
||||
### Three-Relay Chain (Full Mesh)
|
||||
|
||||
For three relays (A, B, C) in full mesh federation, each relay needs peers and trusted entries for the other two:
|
||||
|
||||
**Relay A** (EU):
|
||||
|
||||
```toml
|
||||
listen_addr = "0.0.0.0:4433"
|
||||
metrics_port = 9090
|
||||
|
||||
# Probe all peers
|
||||
probe_targets = ["10.0.0.5:4433", "10.0.0.9:4433"]
|
||||
probe_mesh = true
|
||||
|
||||
# Peers
|
||||
[[peers]]
|
||||
url = "10.0.0.5:4433"
|
||||
fingerprint = "7f2a:b391:0c44:9e1d:a8b2:c5d7:e3f0:1234"
|
||||
label = "Relay B (US)"
|
||||
|
||||
[[peers]]
|
||||
url = "10.0.0.9:4433"
|
||||
fingerprint = "3c8e:d2a1:f7b5:6049:81c3:e9d4:a2f6:5678"
|
||||
label = "Relay C (APAC)"
|
||||
|
||||
# Trust
|
||||
[[trusted]]
|
||||
fingerprint = "7f2a:b391:0c44:9e1d:a8b2:c5d7:e3f0:1234"
|
||||
label = "Relay B (US)"
|
||||
|
||||
[[trusted]]
|
||||
fingerprint = "3c8e:d2a1:f7b5:6049:81c3:e9d4:a2f6:5678"
|
||||
label = "Relay C (APAC)"
|
||||
|
||||
# Global rooms
|
||||
[[global_rooms]]
|
||||
name = "android"
|
||||
|
||||
[[global_rooms]]
|
||||
name = "general"
|
||||
```
|
||||
|
||||
**Relay B** and **Relay C** follow the same pattern, listing the other two relays in their `[[peers]]` and `[[trusted]]` sections.
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
Enable with `--metrics-port <port>` or `metrics_port` in TOML. The relay exposes metrics at `GET /metrics` on the specified HTTP port.
|
||||
|
||||
#### Relay Metrics
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `wzp_relay_active_sessions` | Gauge | -- | Current active sessions |
|
||||
| `wzp_relay_active_rooms` | Gauge | -- | Current active rooms |
|
||||
| `wzp_relay_packets_forwarded_total` | Counter | `room` | Total packets forwarded |
|
||||
| `wzp_relay_bytes_forwarded_total` | Counter | `room` | Total bytes forwarded |
|
||||
| `wzp_relay_auth_attempts_total` | Counter | `result` (ok/fail) | Auth validation attempts |
|
||||
| `wzp_relay_handshake_duration_seconds` | Histogram | -- | Crypto handshake time |
|
||||
|
||||
#### Per-Session Metrics
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `wzp_relay_session_jitter_buffer_depth` | Gauge | `session_id` | Buffer depth per session |
|
||||
| `wzp_relay_session_loss_pct` | Gauge | `session_id` | Packet loss percentage |
|
||||
| `wzp_relay_session_rtt_ms` | Gauge | `session_id` | Round-trip time |
|
||||
| `wzp_relay_session_underruns_total` | Counter | `session_id` | Jitter buffer underruns |
|
||||
| `wzp_relay_session_overruns_total` | Counter | `session_id` | Jitter buffer overruns |
|
||||
|
||||
#### Inter-Relay Probe Metrics
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `wzp_probe_rtt_ms` | Gauge | `target` | RTT to peer relay |
|
||||
| `wzp_probe_loss_pct` | Gauge | `target` | Loss to peer relay |
|
||||
| `wzp_probe_jitter_ms` | Gauge | `target` | Jitter to peer relay |
|
||||
| `wzp_probe_up` | Gauge | `target` | 1 if reachable, 0 if not |
|
||||
|
||||
### Prometheus Scrape Config
|
||||
|
||||
```yaml
|
||||
# prometheus.yml
|
||||
scrape_configs:
|
||||
- job_name: 'wzp-relay'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'relay-a:9090'
|
||||
- 'relay-b:9090'
|
||||
scrape_interval: 10s
|
||||
```
|
||||
|
||||
### Grafana Dashboard
|
||||
|
||||
A pre-built dashboard is available at `docs/grafana-dashboard.json`. Import it into Grafana for:
|
||||
|
||||
1. **Relay Health** -- active sessions, rooms, packets/s, bytes/s
|
||||
2. **Call Quality** -- per-session jitter depth, loss%, RTT, underruns over time
|
||||
3. **Inter-Relay Mesh** -- latency heatmap, probe status, loss trends
|
||||
4. **Web Bridge** -- active connections, frames bridged, auth failures
|
||||
|
||||
### Event Log (Protocol Analyzer)
|
||||
|
||||
Use `--event-log` to write a JSONL event log that traces every federation media packet through the relay pipeline. Essential for debugging federation audio issues.
|
||||
|
||||
```bash
|
||||
wzp-relay --config relay.toml --event-log /tmp/events.jsonl
|
||||
```
|
||||
|
||||
Each media packet emits events at every decision point:
|
||||
- `federation_ingress` — packet arrived from a peer relay
|
||||
- `local_deliver` — packet delivered to local participants
|
||||
- `dedup_drop` — packet dropped as duplicate
|
||||
- `rate_limit_drop` — packet dropped by rate limiter
|
||||
- `room_not_found` — packet for unknown room
|
||||
- `local_deliver_error` — delivery to local client failed
|
||||
|
||||
Analyze with:
|
||||
```bash
|
||||
# Count events by type
|
||||
cat events.jsonl | python3 -c "
|
||||
import json, collections, sys
|
||||
c = collections.Counter()
|
||||
for l in sys.stdin: c[json.loads(l)['event']] += 1
|
||||
for k,v in sorted(c.items(), key=lambda x:-x[1]): print(f' {k}: {v}')
|
||||
"
|
||||
```
|
||||
|
||||
### Remote Version Check
|
||||
|
||||
Verify a deployed relay's version without SSH:
|
||||
|
||||
```bash
|
||||
wzp-client --version-check <relay-addr:port>
|
||||
```
|
||||
|
||||
### Debug Tap
|
||||
|
||||
Use `--debug-tap` to log packet headers for debugging:
|
||||
|
||||
```bash
|
||||
# Log headers for room "android"
|
||||
wzp-relay --debug-tap android
|
||||
|
||||
# Log headers for all rooms
|
||||
wzp-relay --debug-tap '*'
|
||||
```
|
||||
|
||||
Or in TOML:
|
||||
|
||||
```toml
|
||||
debug_tap = "android"
|
||||
```
|
||||
|
||||
### Mesh Status
|
||||
|
||||
Print the current mesh health table (diagnostic):
|
||||
|
||||
```bash
|
||||
wzp-relay --mesh-status
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
### featherChat Token Validation
|
||||
|
||||
When `--auth-url` is set, the relay requires clients to send an `AuthToken` signal message as their first message after QUIC connection. The relay validates the token by calling:
|
||||
|
||||
```
|
||||
POST <auth_url>
|
||||
Content-Type: application/json
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```json
|
||||
{
|
||||
"valid": true,
|
||||
"fingerprint": "a5d6:e3c6:...",
|
||||
"alias": "username"
|
||||
}
|
||||
```
|
||||
|
||||
If validation fails, the client is disconnected.
|
||||
|
||||
### Without Authentication
|
||||
|
||||
When `--auth-url` is not set, any client can connect. The relay logs:
|
||||
|
||||
```
|
||||
INFO auth disabled -- any client can connect (use --auth-url to enable)
|
||||
```
|
||||
|
||||
## Identity Persistence
|
||||
|
||||
### Relay Identity File
|
||||
|
||||
The relay stores its identity seed at `~/.wzp/relay-identity` (a 64-character hex string). This seed:
|
||||
|
||||
- Is generated automatically on first run
|
||||
- Persists across restarts
|
||||
- Derives the relay's Ed25519 signing key and X25519 key agreement key
|
||||
- Derives the TLS certificate deterministically (same seed = same cert = same fingerprint)
|
||||
|
||||
If the identity file is corrupted, the relay generates a new one and logs a warning. This will change the relay's TLS fingerprint, requiring federation peers to update their config.
|
||||
|
||||
### Backup
|
||||
|
||||
Back up the identity file to preserve the relay's fingerprint:
|
||||
|
||||
```bash
|
||||
cp ~/.wzp/relay-identity /secure/backup/relay-identity
|
||||
```
|
||||
|
||||
To restore, copy the file back before starting the relay.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Problem | Cause | Solution |
|
||||
|---------|-------|---------|
|
||||
| "unknown argument" on startup | Unrecognized CLI flag | Check `wzp-relay --help` for valid flags |
|
||||
| "failed to load config" | Invalid TOML syntax | Validate TOML file with `toml-cli` or similar |
|
||||
| "auth failed" for all clients | Wrong `auth_url` or featherChat server down | Verify URL is reachable: `curl -X POST <auth_url>` |
|
||||
| "session rejected" | Max sessions reached | Increase `max_sessions` in config |
|
||||
| Clients cannot connect | Firewall blocking UDP 4433 | Open UDP port 4433 in firewall |
|
||||
| Federation "unknown relay wants to federate" | Peer's fingerprint not in `[[trusted]]` | Add the logged fingerprint to `[[trusted]]` |
|
||||
| Federation "fingerprint mismatch" | Peer relay restarted with new identity | Update the fingerprint in `[[peers]]` config |
|
||||
| Federation audio silent on consecutive connects | Dedup filter or jitter buffer state | Verify relay is running latest build with time-based dedup |
|
||||
| Federation participant shows wrong relay label | Hub relay not propagating original labels | Update relay to latest build (label preservation fix) |
|
||||
| Federation disconnect takes >15 seconds | QUIC idle timeout + stale sweeper | Normal: sweeper runs every 5s with 15s TTL. Use latest client with SIGTERM handler for instant disconnect |
|
||||
| High packet loss between relays | Network congestion or misconfiguration | Check `wzp_probe_loss_pct` metric; consider relay chaining |
|
||||
| Jitter buffer overruns | Packets arriving faster than playout | Increase `jitter_max_depth` |
|
||||
| Jitter buffer underruns | Packets arriving too slowly or lost | Check network quality; increase `jitter_target_depth` |
|
||||
| "probe connection closed" | Peer relay unreachable or crashed | Check peer relay status; will auto-reconnect |
|
||||
| WebSocket clients cannot connect | `ws_port` not set | Add `--ws-port <port>` or `ws_port` in TOML |
|
||||
| Browser mic access denied | Not using HTTPS | Use TLS termination in front of the relay or serve via `wzp-web --tls` |
|
||||
|
||||
### Log Level Tuning
|
||||
|
||||
Set `RUST_LOG` environment variable for fine-grained control:
|
||||
|
||||
```bash
|
||||
# All relay logs at debug level
|
||||
RUST_LOG=debug wzp-relay
|
||||
|
||||
# Only federation at trace, everything else at info
|
||||
RUST_LOG=info,wzp_relay::federation=trace wzp-relay
|
||||
|
||||
# Quiet mode -- only warnings and errors
|
||||
RUST_LOG=warn wzp-relay
|
||||
```
|
||||
|
||||
### Health Checks
|
||||
|
||||
```bash
|
||||
# Check if relay is listening
|
||||
nc -zu relay-host 4433
|
||||
|
||||
# Check metrics endpoint
|
||||
curl -s http://relay-host:9090/metrics | head -20
|
||||
|
||||
# Check active sessions
|
||||
curl -s http://relay-host:9090/metrics | grep wzp_relay_active_sessions
|
||||
|
||||
# Check federation probe health
|
||||
curl -s http://relay-host:9090/metrics | grep wzp_probe_up
|
||||
```
|
||||
File diff suppressed because it is too large
Load Diff
665
docs/DESIGN.md
665
docs/DESIGN.md
@@ -1,168 +1,591 @@
|
||||
# WarzonePhone Detailed Design Decisions
|
||||
# WarzonePhone Design Document
|
||||
|
||||
## Why Opus + Codec2 (Not Just One)
|
||||
> Custom encrypted VoIP protocol built in Rust. Designed for hostile network conditions: 5-70% packet loss, 100-500 kbps throughput, 300-800 ms RTT. Multi-platform: Desktop (Tauri), Android, CLI, Web.
|
||||
|
||||
The dual-codec architecture is driven by the extreme range of network conditions WarzonePhone targets:
|
||||
## System Overview
|
||||
|
||||
**Opus** (24/16/6 kbps) is the clear choice for normal to degraded conditions. It offers excellent quality at moderate bitrates, has built-in inband FEC and DTX (discontinuous transmission), and the `audiopus` crate provides mature Rust bindings to libopus. Opus operates at 48 kHz natively.
|
||||
WarzonePhone is a voice-over-IP system built from scratch in Rust, targeting reliable encrypted voice communication over severely degraded networks. The protocol uses adaptive codecs (Opus + Codec2), fountain-code FEC (RaptorQ), and end-to-end ChaCha20-Poly1305 encryption over a QUIC transport layer.
|
||||
|
||||
**Codec2** (3200/1200 bps) is a narrowband vocoder designed specifically for HF radio links with extreme bandwidth constraints. At 1200 bps (1.2 kbps), it produces intelligible speech in only 6 bytes per 40ms frame -- roughly 20x lower bitrate than Opus at its minimum. The pure-Rust `codec2` crate means no C dependencies for this codec. Codec2 operates at 8 kHz, so the adaptive layer handles 48 kHz <-> 8 kHz resampling transparently.
|
||||
The system comprises three categories of components:
|
||||
|
||||
The `AdaptiveEncoder`/`AdaptiveDecoder` in `crates/wzp-codec/src/adaptive.rs` hold both codec instances and switch between them based on the active `QualityProfile`. This avoids codec re-initialization latency during tier transitions.
|
||||
1. **Protocol crates** -- a Rust workspace of 7 library crates with a star dependency graph enabling parallel development
|
||||
2. **Client applications** -- Desktop (Tauri), Android (Kotlin + JNI), CLI, and Web (browser bridge)
|
||||
3. **Relay infrastructure** -- SFU relay daemons with federation, health probing, and Prometheus metrics
|
||||
|
||||
**Bandwidth comparison with FEC overhead:**
|
||||
### Design Principles
|
||||
|
||||
| Tier | Codec Bitrate | FEC Ratio | Total Bandwidth |
|
||||
|------|--------------|-----------|----------------|
|
||||
| GOOD | 24 kbps | 20% | ~28.8 kbps |
|
||||
| DEGRADED | 6 kbps | 50% | ~9.0 kbps |
|
||||
| CATASTROPHIC | 1.2 kbps | 100% | ~2.4 kbps |
|
||||
- **User sovereignty** -- client-driven route selection, BIP39 identity backup, no central authority
|
||||
- **End-to-end encryption** -- relays never see plaintext audio; SFU forwarding preserves E2E encryption
|
||||
- **Adaptive resilience** -- automatic codec and FEC switching based on observed network quality
|
||||
- **Parallel development** -- star dependency graph allows 5 agents/developers to work simultaneously with zero merge conflicts
|
||||
|
||||
At the catastrophic tier, the entire call (audio + FEC + headers) fits within approximately 3 kbps, which is viable even over severely degraded links.
|
||||
## Architecture
|
||||
|
||||
## Why RaptorQ Over Reed-Solomon
|
||||
### Crate Overview
|
||||
|
||||
**Reed-Solomon** is a classical block erasure code. It works well but has fixed-rate overhead: you must decide in advance how many repair symbols to generate, and decoding requires receiving exactly K of any K+R symbols.
|
||||
The workspace contains 7 core crates plus integration binaries:
|
||||
|
||||
**RaptorQ** (RFC 6330) is a fountain code with several advantages for VoIP:
|
||||
| Crate | Purpose | Key Dependencies |
|
||||
|-------|---------|-----------------|
|
||||
| `wzp-proto` | Protocol types, traits, wire format | serde, bytes |
|
||||
| `wzp-codec` | Audio codecs (Opus, Codec2, RNNoise) | audiopus, codec2, nnnoiseless |
|
||||
| `wzp-fec` | Forward error correction | raptorq |
|
||||
| `wzp-crypto` | Cryptography and identity | ed25519-dalek, x25519-dalek, chacha20poly1305, bip39 |
|
||||
| `wzp-transport` | QUIC transport layer | quinn, rustls |
|
||||
| `wzp-relay` | Relay daemon (SFU, federation, metrics) | tokio, prometheus |
|
||||
| `wzp-client` | Call engine and CLI | All above |
|
||||
|
||||
1. **Rateless**: You can generate an arbitrary number of repair symbols on the fly. If conditions worsen mid-block, you can generate additional repair without re-encoding.
|
||||
Additional integration targets: `wzp-web` (browser bridge via WebSocket), Android native library (JNI), Desktop (Tauri).
|
||||
|
||||
2. **Efficient decoding**: RaptorQ can decode from any K symbols with high probability (typically K + 1 or K + 2 suffice), compared to Reed-Solomon which requires exactly K.
|
||||
### Dependency Graph
|
||||
|
||||
3. **Lower computational complexity**: O(K) encoding and decoding time, compared to O(K^2) for Reed-Solomon. This matters for real-time audio at 50 frames/second.
|
||||
```mermaid
|
||||
graph TD
|
||||
PROTO["wzp-proto<br/>(Types, Traits, Wire Format)"]
|
||||
|
||||
4. **Variable block sizes**: The encoder handles 1-56403 source symbols per block (the WZP implementation uses 5-10, but the flexibility is there).
|
||||
CODEC["wzp-codec<br/>(Opus + Codec2 + RNNoise)"]
|
||||
FEC["wzp-fec<br/>(RaptorQ FEC)"]
|
||||
CRYPTO["wzp-crypto<br/>(ChaCha20 + Identity)"]
|
||||
TRANSPORT["wzp-transport<br/>(QUIC / Quinn)"]
|
||||
|
||||
The `raptorq` crate (v2) provides a well-tested pure-Rust implementation. The WZP FEC layer adds length-prefixed padding (2-byte LE prefix + zero-pad to 256 bytes) so that variable-length audio frames can be recovered exactly.
|
||||
RELAY["wzp-relay<br/>(Relay Daemon)"]
|
||||
CLIENT["wzp-client<br/>(CLI + Call Engine)"]
|
||||
WEB["wzp-web<br/>(Browser Bridge)"]
|
||||
DESKTOP["Desktop<br/>(Tauri + CPAL)"]
|
||||
ANDROID["Android<br/>(Kotlin + JNI)"]
|
||||
|
||||
**FEC bandwidth math at different loss rates:**
|
||||
PROTO --> CODEC
|
||||
PROTO --> FEC
|
||||
PROTO --> CRYPTO
|
||||
PROTO --> TRANSPORT
|
||||
|
||||
CODEC --> CLIENT
|
||||
FEC --> CLIENT
|
||||
CRYPTO --> CLIENT
|
||||
TRANSPORT --> CLIENT
|
||||
|
||||
CODEC --> RELAY
|
||||
FEC --> RELAY
|
||||
CRYPTO --> RELAY
|
||||
TRANSPORT --> RELAY
|
||||
|
||||
CLIENT --> WEB
|
||||
CLIENT --> DESKTOP
|
||||
CLIENT --> ANDROID
|
||||
TRANSPORT --> WEB
|
||||
|
||||
FC["warzone-protocol<br/>(featherChat Identity)"] -.->|path dep| CRYPTO
|
||||
|
||||
style PROTO fill:#6c5ce7,color:#fff
|
||||
style RELAY fill:#ff9f43,color:#fff
|
||||
style CLIENT fill:#00b894,color:#fff
|
||||
style WEB fill:#0984e3,color:#fff
|
||||
style DESKTOP fill:#0984e3,color:#fff
|
||||
style ANDROID fill:#0984e3,color:#fff
|
||||
style FC fill:#fd79a8,color:#fff
|
||||
```
|
||||
|
||||
The star pattern ensures each leaf crate (`wzp-codec`, `wzp-fec`, `wzp-crypto`, `wzp-transport`) depends only on `wzp-proto` and never on each other. This enables:
|
||||
|
||||
- **Parallel development** -- 5 agents work on 5 crates with no merge conflicts
|
||||
- **Independent testing** -- each crate has self-contained tests
|
||||
- **Pluggability** -- any implementation can be swapped by implementing the same trait
|
||||
- **Fast compilation** -- changing one leaf only recompiles that leaf and integration crates
|
||||
|
||||
## Audio Pipeline
|
||||
|
||||
### Encode Pipeline (Mic to Network)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Mic as Microphone
|
||||
participant RNN as RNNoise Denoise
|
||||
participant VAD as Silence Detector
|
||||
participant ENC as Opus/Codec2 Encode
|
||||
participant FEC as RaptorQ FEC Encode
|
||||
participant INT as Interleaver
|
||||
participant HDR as Header Assembly
|
||||
participant CRYPT as ChaCha20-Poly1305
|
||||
participant QUIC as QUIC Datagram
|
||||
|
||||
Mic->>RNN: PCM i16 x 960 (20ms @ 48kHz)
|
||||
RNN->>VAD: Denoised samples (2 x 480)
|
||||
alt Silence detected (>100ms)
|
||||
VAD->>ENC: ComfortNoise packet (every 200ms)
|
||||
else Active speech or hangover
|
||||
VAD->>ENC: Active audio frame
|
||||
end
|
||||
ENC->>FEC: Compressed frame (padded to 256 bytes)
|
||||
FEC->>FEC: Accumulate block (5-10 frames)
|
||||
FEC->>INT: Source + repair symbols
|
||||
INT->>HDR: Interleaved packets (depth=3)
|
||||
HDR->>CRYPT: MediaHeader (12B) or MiniHeader (4B)
|
||||
CRYPT->>QUIC: Header=AAD, Payload=encrypted
|
||||
```
|
||||
|
||||
### Decode Pipeline (Network to Speaker)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant QUIC as QUIC Datagram
|
||||
participant CRYPT as ChaCha20-Poly1305
|
||||
participant HDR as Header Parse
|
||||
participant DEINT as De-interleaver
|
||||
participant FEC as RaptorQ FEC Decode
|
||||
participant JIT as Jitter Buffer
|
||||
participant DEC as Opus/Codec2 Decode
|
||||
participant SPK as Speaker
|
||||
|
||||
QUIC->>CRYPT: Encrypted packet
|
||||
CRYPT->>HDR: Decrypt (header=AAD)
|
||||
HDR->>DEINT: Parsed MediaHeader + payload
|
||||
DEINT->>FEC: Reordered symbols
|
||||
FEC->>FEC: Reconstruct from any K of K+R symbols
|
||||
FEC->>JIT: Recovered audio frames
|
||||
JIT->>JIT: Sequence-ordered BTreeMap
|
||||
JIT->>DEC: Pop when depth >= target
|
||||
DEC->>SPK: PCM i16 x 960
|
||||
```
|
||||
|
||||
## Codec System
|
||||
|
||||
WarzonePhone uses a dual-codec architecture to cover the full range of network conditions:
|
||||
|
||||
### Opus (Primary)
|
||||
|
||||
Opus is the primary codec for normal to degraded conditions. It operates at 48 kHz natively with built-in inband FEC and DTX (discontinuous transmission). The `audiopus` crate provides mature Rust bindings to libopus.
|
||||
|
||||
| Profile | Bitrate | Frame Duration | FEC Ratio | Total Bandwidth | Use Case |
|
||||
|---------|---------|---------------|-----------|----------------|----------|
|
||||
| Studio 64k | 64 kbps | 20ms | 10% | 70.4 kbps | LAN, excellent WiFi |
|
||||
| Studio 48k | 48 kbps | 20ms | 10% | 52.8 kbps | Good WiFi, wired |
|
||||
| Studio 32k | 32 kbps | 20ms | 10% | 35.2 kbps | WiFi, LTE |
|
||||
| Good (24k) | 24 kbps | 20ms | 20% | 28.8 kbps | WiFi, LTE, decent links |
|
||||
| Opus 16k | 16 kbps | 20ms | 20% | 19.2 kbps | 3G, moderate congestion |
|
||||
| Degraded (6k) | 6 kbps | 40ms | 50% | 9.0 kbps | 3G, congested WiFi |
|
||||
|
||||
### Codec2 (Fallback)
|
||||
|
||||
Codec2 is a narrowband vocoder designed for HF radio links with extreme bandwidth constraints. It operates at 8 kHz, and the adaptive layer handles 48 kHz <-> 8 kHz resampling transparently. The pure-Rust `codec2` crate means no C dependencies.
|
||||
|
||||
| Profile | Bitrate | Frame Duration | FEC Ratio | Total Bandwidth | Use Case |
|
||||
|---------|---------|---------------|-----------|----------------|----------|
|
||||
| Codec2 3200 | 3.2 kbps | 20ms | 50% | 4.8 kbps | Poor conditions |
|
||||
| Catastrophic (1200) | 1.2 kbps | 40ms | 100% | 2.4 kbps | Satellite, extreme loss |
|
||||
|
||||
### ComfortNoise
|
||||
|
||||
When the silence detector identifies no speech activity for over 100ms, the encoder switches to emitting a ComfortNoise packet every 200ms instead of encoding silence. This provides approximately 50% bandwidth savings in typical conversations.
|
||||
|
||||
### Adaptive Switching
|
||||
|
||||
The `AdaptiveEncoder`/`AdaptiveDecoder` in `wzp-codec` hold both codec instances and switch between them based on the active `QualityProfile`. This avoids codec re-initialization latency during tier transitions. The `AdaptiveQualityController` in `wzp-proto` manages tier transitions with hysteresis:
|
||||
|
||||
- **Downgrade**: 3 consecutive bad reports (2 on cellular networks)
|
||||
- **Upgrade**: 10 consecutive good reports (one tier at a time)
|
||||
- **Network handoff**: WiFi-to-cellular switch triggers preemptive one-tier downgrade plus a temporary 10-second FEC boost (+20%)
|
||||
|
||||
Quality tier classification thresholds:
|
||||
|
||||
| Tier | WiFi/Unknown | Cellular |
|
||||
|------|-------------|----------|
|
||||
| Good | loss < 10%, RTT < 400ms | loss < 8%, RTT < 300ms |
|
||||
| Degraded | loss 10-40%, RTT 400-600ms | loss 8-25%, RTT 300-500ms |
|
||||
| Catastrophic | loss > 40%, RTT > 600ms | loss > 25%, RTT > 500ms |
|
||||
|
||||
## Forward Error Correction (FEC)
|
||||
|
||||
### Why RaptorQ Over Reed-Solomon
|
||||
|
||||
WarzonePhone uses RaptorQ (RFC 6330) fountain codes via the `raptorq` crate:
|
||||
|
||||
1. **Rateless** -- generate arbitrary repair symbols on the fly; if conditions worsen mid-block, generate additional repair without re-encoding
|
||||
2. **Efficient decoding** -- decode from any K symbols with high probability (typically K + 1 or K + 2 suffice)
|
||||
3. **Lower complexity** -- O(K) encoding/decoding time vs O(K^2) for Reed-Solomon
|
||||
4. **Variable block sizes** -- 1-56,403 source symbols per block (WZP uses 5-10)
|
||||
|
||||
### FEC Block Structure
|
||||
|
||||
Each FEC block consists of 5-10 audio frames padded to 256-byte symbols with a 2-byte LE length prefix:
|
||||
|
||||
```
|
||||
[len:u16 LE][audio_frame][zero_padding_to_256_bytes]
|
||||
```
|
||||
|
||||
### Loss Survival by FEC Ratio
|
||||
|
||||
With 5 source frames per block:
|
||||
- 20% repair (GOOD): 1 repair symbol. Survives loss of 1 out of 6 packets (16.7% loss).
|
||||
- 50% repair (DEGRADED): 3 repair symbols. Survives loss of 3 out of 8 packets (37.5% loss).
|
||||
- 100% repair (CATASTROPHIC): 5 repair symbols. Survives loss of 5 out of 10 packets (50% loss).
|
||||
|
||||
The benchmark (`wzp-bench --fec --loss 30`) dynamically scales the FEC ratio to survive the requested loss percentage.
|
||||
| FEC Ratio | Repair Symbols | Survives Loss | Profile |
|
||||
|-----------|---------------|---------------|---------|
|
||||
| 10% | 1 | 1 of 6 (16.7%) | Studio |
|
||||
| 20% | 1 | 1 of 6 (16.7%) | Good |
|
||||
| 50% | 3 | 3 of 8 (37.5%) | Degraded |
|
||||
| 100% | 5 | 5 of 10 (50.0%) | Catastrophic |
|
||||
|
||||
## Why QUIC Over Raw UDP
|
||||
### Interleaving
|
||||
|
||||
Raw UDP would be simpler and lower-latency, but QUIC (via the `quinn` crate) provides:
|
||||
Burst loss protection via depth-3 interleaving: packets from 3 consecutive FEC blocks are interleaved before transmission. A burst of 3 consecutive lost packets affects 3 different blocks (1 loss each) rather than destroying 1 block entirely.
|
||||
|
||||
1. **DATAGRAM frames**: Unreliable delivery without head-of-line blocking (RFC 9221). Media packets use this path, so they behave like UDP datagrams but benefit from QUIC's connection management.
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "FEC Encoder"
|
||||
F1[Frame 1] --> BLK[Source Block<br/>5-10 frames]
|
||||
F2[Frame 2] --> BLK
|
||||
F3[Frame 3] --> BLK
|
||||
F4[Frame 4] --> BLK
|
||||
F5[Frame 5] --> BLK
|
||||
BLK --> SRC[Source Symbols]
|
||||
BLK --> REP[Repair Symbols<br/>ratio-dependent]
|
||||
SRC --> INT[Interleaver<br/>depth=3]
|
||||
REP --> INT
|
||||
end
|
||||
|
||||
2. **Reliable streams**: Signaling messages (CallOffer, CallAnswer, Rekey, Hangup) require reliable delivery. QUIC provides multiplexed streams without needing a separate TCP connection.
|
||||
subgraph "Network"
|
||||
INT --> LOSS{Packet Loss}
|
||||
LOSS -->|some lost| RCV[Received Symbols]
|
||||
end
|
||||
|
||||
3. **Built-in congestion control**: QUIC's congestion control prevents overwhelming degraded links, which is important when chaining relays.
|
||||
subgraph "FEC Decoder"
|
||||
RCV --> DEINT[De-interleaver]
|
||||
DEINT --> RAPTORQ[RaptorQ Decode<br/>Any K of K+R]
|
||||
RAPTORQ --> OUT[Original Frames]
|
||||
end
|
||||
|
||||
4. **Connection migration**: QUIC connections survive IP address changes (e.g., WiFi to cellular handoff), which is valuable for mobile clients.
|
||||
|
||||
5. **TLS 1.3 built-in**: The QUIC handshake provides encryption at the transport level. While WZP has its own end-to-end ChaCha20 layer, the QUIC TLS protects the header and signaling from eavesdroppers.
|
||||
|
||||
6. **NAT keepalive**: QUIC's built-in keep-alive (configured at 5-second intervals) maintains NAT bindings without application-level pings.
|
||||
|
||||
7. **Firewall traversal**: QUIC runs on UDP port 443 by default, which is commonly allowed through firewalls. The `wzp` ALPN protocol identifier distinguishes WZP traffic.
|
||||
|
||||
The tradeoff is approximately 20-40 bytes of additional per-packet overhead compared to raw UDP (QUIC short header + DATAGRAM frame overhead).
|
||||
|
||||
## Why ChaCha20-Poly1305 Over AES-GCM
|
||||
|
||||
1. **Software performance**: ChaCha20-Poly1305 is faster than AES-GCM on hardware without AES-NI instructions. This matters for ARM devices (Android phones, Raspberry Pi relays, embedded systems) where AES hardware acceleration may be absent.
|
||||
|
||||
2. **Constant-time by design**: ChaCha20 uses only add-rotate-XOR operations, making it inherently resistant to timing side-channel attacks. AES-GCM implementations without hardware support often require careful constant-time implementation.
|
||||
|
||||
3. **Warzone messenger compatibility**: The existing Warzone messenger uses ChaCha20-Poly1305 for message encryption. Reusing the same primitive simplifies the security audit and allows key material to be shared across messaging and calling.
|
||||
|
||||
4. **16-byte overhead**: Both ChaCha20-Poly1305 and AES-128-GCM produce a 16-byte authentication tag. There is no size advantage to AES-GCM.
|
||||
|
||||
5. **AEAD with AAD**: The MediaHeader is used as Associated Authenticated Data (AAD), ensuring the header is authenticated but not encrypted. This allows relays to read routing information (block ID, sequence number) without decrypting the payload.
|
||||
|
||||
## Why Star Dependency Graph (Parallel Development)
|
||||
|
||||
The workspace follows a strict star dependency pattern:
|
||||
|
||||
```
|
||||
wzp-proto (hub)
|
||||
/ | \ \
|
||||
wzp-codec wzp-fec wzp-crypto wzp-transport
|
||||
\ | / /
|
||||
wzp-relay
|
||||
wzp-client
|
||||
wzp-web
|
||||
style LOSS fill:#e17055,color:#fff
|
||||
style RAPTORQ fill:#00b894,color:#fff
|
||||
```
|
||||
|
||||
- `wzp-proto` defines all trait interfaces and wire format types
|
||||
- Each "leaf" crate (codec, fec, crypto, transport) depends only on `wzp-proto`
|
||||
- No leaf crate depends on another leaf crate
|
||||
- Integration crates (relay, client, web) depend on all leaves
|
||||
## Transport Layer
|
||||
|
||||
This enables:
|
||||
1. **Parallel development**: 5 agents/developers can work on 5 crates simultaneously with zero merge conflicts
|
||||
2. **Independent testing**: Each crate has comprehensive tests that run without requiring other implementations
|
||||
3. **Pluggability**: Any implementation can be swapped (e.g., replace RaptorQ with Reed-Solomon) by implementing the same trait
|
||||
4. **Fast compilation**: Changes to one leaf only recompile that leaf and the integration crates, not other leaves
|
||||
### Why QUIC Over Raw UDP
|
||||
|
||||
## Jitter Buffer Trade-offs
|
||||
WarzonePhone uses QUIC (via the `quinn` crate) rather than raw UDP for several reasons:
|
||||
|
||||
The jitter buffer must balance two competing goals:
|
||||
| Feature | Benefit |
|
||||
|---------|---------|
|
||||
| DATAGRAM frames (RFC 9221) | Unreliable delivery without head-of-line blocking -- behaves like UDP for media |
|
||||
| Reliable streams | Multiplexed signaling (CallOffer, Hangup, Rekey) without a separate TCP connection |
|
||||
| Congestion control | Prevents overwhelming degraded links, important when chaining relays |
|
||||
| Connection migration | Connections survive IP address changes (WiFi to cellular handoff) |
|
||||
| TLS 1.3 built-in | Transport-level encryption protects headers and signaling |
|
||||
| NAT keepalive | 5-second interval maintains NAT bindings without application-level pings |
|
||||
| Firewall traversal | Runs on UDP port 443 with `wzp` ALPN identifier |
|
||||
|
||||
**Lower latency** (smaller buffer):
|
||||
- Better conversational interactivity
|
||||
- Less memory usage
|
||||
- But more vulnerable to jitter and reordering
|
||||
The tradeoff is approximately 20-40 bytes of additional per-packet overhead compared to raw UDP.
|
||||
|
||||
**Higher quality** (larger buffer):
|
||||
- More time to receive out-of-order packets
|
||||
- More time for FEC recovery (repair packets may arrive after source packets)
|
||||
- But adds perceptible delay to the conversation
|
||||
### Wire Formats
|
||||
|
||||
The default configuration:
|
||||
- Target: 10 packets (200ms) for the client, 50 packets (1s) for the relay
|
||||
- Minimum: 3 packets (60ms) before playout begins (client), 25 packets (500ms) for relay
|
||||
- Maximum: 250 packets (5s) absolute cap
|
||||
#### MediaHeader (12 bytes)
|
||||
|
||||
The relay uses a deeper buffer because it needs to absorb jitter from the lossy inter-relay link. The client uses a shallower buffer for lower latency since it is on the last hop.
|
||||
```
|
||||
Byte 0: [V:1][T:1][CodecID:4][Q:1][FecRatioHi:1]
|
||||
Byte 1: [FecRatioLo:6][unused:2]
|
||||
Bytes 2-3: sequence (u16 BE)
|
||||
Bytes 4-7: timestamp_ms (u32 BE)
|
||||
Byte 8: fec_block_id (u8)
|
||||
Byte 9: fec_symbol_idx (u8)
|
||||
Byte 10: reserved
|
||||
Byte 11: csrc_count
|
||||
|
||||
**Known issue**: The current jitter buffer does not adapt its depth based on observed jitter. It uses sequence-number ordering only, without timestamp-based playout scheduling. This can lead to drift during long calls, as observed in echo tests.
|
||||
V = version (0), T = is_repair, CodecID = codec, Q = quality_report appended
|
||||
```
|
||||
|
||||
## Browser Audio: AudioWorklet vs ScriptProcessorNode
|
||||
#### MiniHeader (4 bytes, compressed)
|
||||
|
||||
The web bridge (`crates/wzp-web/static/`) uses AudioWorklet as the primary audio I/O mechanism, with ScriptProcessorNode as a fallback.
|
||||
```
|
||||
Bytes 0-1: timestamp_delta_ms (u16 BE)
|
||||
Bytes 2-3: payload_len (u16 BE)
|
||||
|
||||
**AudioWorklet** (preferred):
|
||||
- Runs on a dedicated audio rendering thread
|
||||
- Lower latency (no main-thread round-trip)
|
||||
- Consistent 128-sample callback timing
|
||||
- Supported in Chrome 66+, Firefox 76+, Safari 14.1+
|
||||
Preceded by FRAME_TYPE_MINI (0x01). Full header every 50 frames (~1s).
|
||||
Saves 8 bytes/packet (67% header reduction).
|
||||
```
|
||||
|
||||
**ScriptProcessorNode** (fallback):
|
||||
- Runs on the main thread via `onaudioprocess` callback
|
||||
- Higher latency, potential glitches from main-thread GC pauses
|
||||
- Deprecated by the Web Audio specification
|
||||
- Used when AudioWorklet is not available
|
||||
#### TrunkFrame (batched datagrams)
|
||||
|
||||
Both paths accumulate Float32 samples into 960-sample (20ms) Int16 frames before sending via WebSocket, matching the WZP codec frame size.
|
||||
```
|
||||
[count:u16]
|
||||
[session_id:2][len:u16][payload:len] x count
|
||||
|
||||
**Playback** uses an AudioWorklet with a ring buffer capped at 200ms (9600 samples at 48 kHz). When the buffer exceeds this limit, old samples are dropped to prevent unbounded drift. The fallback path uses scheduled `AudioBufferSourceNode` instances.
|
||||
Packs multiple session packets into one QUIC datagram.
|
||||
Max 10 entries or 1200 bytes, flushed every 5ms.
|
||||
```
|
||||
|
||||
## Room Mode: SFU vs MCU Trade-offs
|
||||
#### QualityReport (4 bytes, optional trailer)
|
||||
|
||||
WarzonePhone implements an **SFU** (Selective Forwarding Unit) architecture:
|
||||
```
|
||||
Byte 0: loss_pct (0-255 maps to 0-100%)
|
||||
Byte 1: rtt_4ms (0-255 maps to 0-1020ms)
|
||||
Byte 2: jitter_ms
|
||||
Byte 3: bitrate_cap_kbps
|
||||
```
|
||||
|
||||
**SFU** (implemented):
|
||||
- Relay forwards each participant's packets to all other participants unchanged
|
||||
- No transcoding -- the relay never decodes or re-encodes audio
|
||||
- O(N) bandwidth at the relay for N participants (each packet is sent N-1 times)
|
||||
- Each client receives separate streams from each other participant
|
||||
- Client must mix/decode multiple streams locally
|
||||
- Lower relay CPU usage (no transcoding)
|
||||
- End-to-end encryption is preserved (relay never sees plaintext)
|
||||
### Bandwidth Summary
|
||||
|
||||
**MCU** (not implemented, for comparison):
|
||||
- Relay would decode all streams, mix them, and re-encode a single combined stream
|
||||
- O(1) bandwidth to each client (receives one mixed stream)
|
||||
- Requires the relay to have codec keys (breaks E2E encryption)
|
||||
- Higher relay CPU (decoding N streams + mixing + re-encoding)
|
||||
- Audio quality loss from re-encoding
|
||||
| Profile | Audio | FEC Overhead | Total | Silence Savings |
|
||||
|---------|-------|-------------|-------|----------------|
|
||||
| Studio 64k | 64 kbps | 10% = 6.4 kbps | **70.4 kbps** | ~50% with DTX |
|
||||
| Studio 48k | 48 kbps | 10% = 4.8 kbps | **52.8 kbps** | ~50% with DTX |
|
||||
| Studio 32k | 32 kbps | 10% = 3.2 kbps | **35.2 kbps** | ~50% with DTX |
|
||||
| Good (24k) | 24 kbps | 20% = 4.8 kbps | **28.8 kbps** | ~50% with DTX |
|
||||
| Degraded (6k) | 6 kbps | 50% = 3.0 kbps | **9.0 kbps** | ~50% with DTX |
|
||||
| Catastrophic (1.2k) | 1.2 kbps | 100% = 1.2 kbps | **2.4 kbps** | ~50% with DTX |
|
||||
|
||||
The SFU choice is driven by the E2E encryption requirement: since relays never have access to the audio codec keys, they cannot decode, mix, or re-encode. The current room implementation in `crates/wzp-relay/src/room.rs` forwards received datagrams to all other participants in the room with best-effort delivery -- if one send fails, the relay continues to the next participant.
|
||||
Additional savings: MiniHeaders save 8 bytes/packet (67% header reduction). Trunking shares QUIC overhead across multiplexed sessions.
|
||||
|
||||
## Security
|
||||
|
||||
### Identity Model
|
||||
|
||||
Every user has a persistent identity derived from a 32-byte seed:
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
SEED["32-byte Seed<br/>(BIP39 Mnemonic: 24 words)"] --> HKDF1["HKDF<br/>info='warzone-ed25519'"]
|
||||
SEED --> HKDF2["HKDF<br/>info='warzone-x25519'"]
|
||||
|
||||
HKDF1 --> ED["Ed25519 SigningKey<br/>(Digital Signatures)"]
|
||||
HKDF2 --> X25519["X25519 StaticSecret<br/>(Key Agreement)"]
|
||||
|
||||
ED --> VKEY["Ed25519 VerifyingKey<br/>(Public)"]
|
||||
X25519 --> XPUB["X25519 PublicKey<br/>(Public)"]
|
||||
|
||||
VKEY --> FP["Fingerprint<br/>SHA-256(pubkey), truncated 16 bytes<br/>xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx"]
|
||||
|
||||
style SEED fill:#6c5ce7,color:#fff
|
||||
style FP fill:#fd79a8,color:#fff
|
||||
style ED fill:#ee5a24,color:#fff
|
||||
style X25519 fill:#00b894,color:#fff
|
||||
```
|
||||
|
||||
**BIP39 Mnemonic Backup**: The 32-byte seed can be encoded as a 24-word BIP39 mnemonic for human-readable backup. The same seed produces the same identity on any platform.
|
||||
|
||||
**featherChat Compatibility**: The identity derivation is compatible with the Warzone messenger (featherChat), allowing a shared identity across messaging and calling.
|
||||
|
||||
### Cryptographic Handshake
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant C as Caller
|
||||
participant R as Relay / Callee
|
||||
|
||||
Note over C: Derive identity from seed<br/>Ed25519 + X25519 via HKDF
|
||||
|
||||
C->>C: Generate ephemeral X25519 keypair
|
||||
C->>C: Sign(ephemeral_pub || "call-offer")
|
||||
C->>R: CallOffer { identity_pub, ephemeral_pub, signature, profiles }
|
||||
|
||||
R->>R: Verify Ed25519 signature
|
||||
R->>R: Generate ephemeral X25519 keypair
|
||||
R->>R: shared_secret = DH(eph_b, eph_a)
|
||||
R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
|
||||
R->>R: Sign(ephemeral_pub || "call-answer")
|
||||
R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, profile }
|
||||
|
||||
C->>C: Verify signature
|
||||
C->>C: shared_secret = DH(eph_a, eph_b)
|
||||
C->>C: session_key = HKDF(shared_secret)
|
||||
|
||||
Note over C,R: Both have identical ChaCha20-Poly1305 session key
|
||||
C->>R: Encrypted media (QUIC datagrams)
|
||||
R->>C: Encrypted media (QUIC datagrams)
|
||||
|
||||
Note over C,R: Rekey every 65,536 packets<br/>New ephemeral DH + HKDF mix
|
||||
```
|
||||
|
||||
### Encryption Details
|
||||
|
||||
| Component | Algorithm | Purpose |
|
||||
|-----------|-----------|---------|
|
||||
| Identity signing | Ed25519 | Authenticate handshake messages |
|
||||
| Key agreement | X25519 (ephemeral) | Derive shared secret |
|
||||
| Key derivation | HKDF-SHA256 | Derive session key from shared secret |
|
||||
| Media encryption | ChaCha20-Poly1305 | Encrypt audio payloads (16-byte tag) |
|
||||
| Nonce construction | Deterministic from sequence number | No nonce reuse, no state sync needed |
|
||||
| Anti-replay | Sliding window (64-packet) | Reject duplicate/old packets |
|
||||
| Forward secrecy | Rekey every 65,536 packets | New ephemeral DH + HKDF mix |
|
||||
|
||||
**Why ChaCha20-Poly1305 over AES-GCM**:
|
||||
- Faster on hardware without AES-NI (ARM phones, Raspberry Pi relays)
|
||||
- Inherently constant-time (add-rotate-XOR only)
|
||||
- Compatible with Warzone messenger (featherChat)
|
||||
- Same 16-byte authentication tag overhead as AES-GCM
|
||||
|
||||
**AEAD with AAD**: The MediaHeader is used as Associated Authenticated Data. The header is authenticated but not encrypted, allowing relays to read routing information (block ID, sequence number) without decrypting the payload.
|
||||
|
||||
### Trust on First Use (TOFU)
|
||||
|
||||
Clients remember the relay's TLS certificate fingerprint after first connection. If the fingerprint changes on a subsequent connection, the desktop client shows a "Server Key Changed" warning dialog. The relay derives its TLS certificate deterministically from its persisted identity seed, so the fingerprint is stable across restarts.
|
||||
|
||||
## Relay Architecture
|
||||
|
||||
### Room Mode (Default SFU)
|
||||
|
||||
In room mode, the relay acts as a Selective Forwarding Unit. Clients join named rooms via the QUIC SNI (Server Name Indication) field. The relay forwards each participant's encrypted packets to all other participants in the room without decoding or re-encoding.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Room Mode (SFU)"
|
||||
C1[Client 1] -->|"QUIC SNI=room-hash"| RM[Room Manager]
|
||||
C2[Client 2] -->|"QUIC SNI=room-hash"| RM
|
||||
C3[Client 3] -->|"QUIC SNI=room-hash"| RM
|
||||
RM --> R1[Room 'podcast']
|
||||
R1 -->|fan-out| C1
|
||||
R1 -->|fan-out| C2
|
||||
R1 -->|fan-out| C3
|
||||
end
|
||||
|
||||
style RM fill:#ff9f43,color:#fff
|
||||
style R1 fill:#fdcb6e
|
||||
```
|
||||
|
||||
**SFU vs MCU trade-off**: SFU was chosen because it preserves end-to-end encryption (the relay never sees plaintext audio). An MCU would need to decode, mix, and re-encode, breaking E2E encryption. The trade-off is O(N) bandwidth at the relay for N participants.
|
||||
|
||||
### Forward Mode
|
||||
|
||||
With `--remote`, the relay forwards all traffic to a remote relay. Used for chaining relays across lossy or censored links:
|
||||
|
||||
```
|
||||
Client --> Relay A (--remote B) --> Relay B --> Destination Client
|
||||
```
|
||||
|
||||
The relay pipeline in forward mode: FEC decode, jitter buffer, then FEC re-encode for the next hop.
|
||||
|
||||
## Federation
|
||||
|
||||
### Overview
|
||||
|
||||
Two or more relays form a federation mesh. Each relay is an independent SFU. When configured to trust each other, they bridge **global rooms** -- participants on relay A in a global room hear participants on relay B in the same room.
|
||||
|
||||
### Configuration
|
||||
|
||||
Federation uses three TOML configuration sections:
|
||||
|
||||
- `[[peers]]` -- outbound connections to peer relays (url + TLS fingerprint)
|
||||
- `[[trusted]]` -- inbound connections accepted from relays (TLS fingerprint only)
|
||||
- `[[global_rooms]]` -- room names to bridge across all federated peers
|
||||
|
||||
### Federation Topology
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Relay A (EU)"
|
||||
A_RM[Room Manager]
|
||||
A_FM[Federation Manager]
|
||||
A1[Alice - local]
|
||||
A2[Bob - local]
|
||||
A_RM --> A_FM
|
||||
end
|
||||
|
||||
subgraph "Relay B (US)"
|
||||
B_RM[Room Manager]
|
||||
B_FM[Federation Manager]
|
||||
B1[Charlie - local]
|
||||
B_RM --> B_FM
|
||||
end
|
||||
|
||||
A_FM <-->|"QUIC SNI='_federation'<br/>GlobalRoomActive/Inactive<br/>Media forwarding"| B_FM
|
||||
|
||||
A1 -->|media| A_RM
|
||||
A2 -->|media| A_RM
|
||||
B1 -->|media| B_RM
|
||||
|
||||
A_RM -->|"federated fan-out"| A1
|
||||
A_RM -->|"federated fan-out"| A2
|
||||
B_RM -->|"federated fan-out"| B1
|
||||
|
||||
style A_FM fill:#6c5ce7,color:#fff
|
||||
style B_FM fill:#6c5ce7,color:#fff
|
||||
style A_RM fill:#ff9f43,color:#fff
|
||||
style B_RM fill:#ff9f43,color:#fff
|
||||
```
|
||||
|
||||
### Protocol
|
||||
|
||||
1. On startup, each relay connects to all configured `[[peers]]` via QUIC with SNI `"_federation"`
|
||||
2. After QUIC handshake, sends `FederationHello { tls_fingerprint }` for identity verification
|
||||
3. Peer verifies the fingerprint against its `[[trusted]]` or `[[peers]]` list
|
||||
4. When a local participant joins a global room, sends `GlobalRoomActive { room }` to all peers
|
||||
5. When the last local participant leaves, sends `GlobalRoomInactive { room }`
|
||||
6. Media is forwarded as `[room_hash:8][original_media_packet]` -- the relay does not decrypt
|
||||
|
||||
### What Relays Do NOT Do
|
||||
|
||||
- **No transcoding** -- media passes through as-is
|
||||
- **No re-encryption** -- packets are already encrypted E2E
|
||||
- **No central coordinator** -- each relay independently connects to configured peers
|
||||
- **No automatic peer discovery** -- peers must be explicitly configured
|
||||
|
||||
### Failure Handling
|
||||
|
||||
- If a peer goes down, local rooms continue working; federated participants disappear from presence
|
||||
- Reconnection: every 30 seconds with exponential backoff up to 5 minutes
|
||||
- If a peer restarts with a different identity, the fingerprint check fails with a clear log message
|
||||
|
||||
## Jitter Buffer
|
||||
|
||||
The jitter buffer balances latency vs quality:
|
||||
|
||||
| Setting | Client | Relay |
|
||||
|---------|--------|-------|
|
||||
| Target depth | 10 packets (200ms) | 50 packets (1s) |
|
||||
| Minimum before playout | 3 packets (60ms) | 25 packets (500ms) |
|
||||
| Maximum cap | 250 packets (5s) | 250 packets (5s) |
|
||||
|
||||
The relay uses a deeper buffer to absorb jitter from lossy inter-relay links. The client uses a shallower buffer for lower latency.
|
||||
|
||||
The adaptive playout delay tracks jitter via exponential moving average and adjusts the target depth:
|
||||
|
||||
```
|
||||
target_delay = ceil(jitter_ema / 20ms) + 2
|
||||
```
|
||||
|
||||
**Known limitation**: The current jitter buffer does not use timestamp-based playout scheduling. It relies on sequence-number ordering only, which can lead to drift during long calls.
|
||||
|
||||
## Signal Messages
|
||||
|
||||
Signal messages are sent over reliable QUIC streams as length-prefixed JSON:
|
||||
|
||||
```
|
||||
[4-byte length prefix][serde_json payload]
|
||||
```
|
||||
|
||||
| Message | Purpose |
|
||||
|---------|---------|
|
||||
| `CallOffer` | Identity, ephemeral key, signature, supported profiles |
|
||||
| `CallAnswer` | Identity, ephemeral key, signature, chosen profile |
|
||||
| `AuthToken` | featherChat bearer token for relay authentication |
|
||||
| `Hangup` | Reason: Normal, Busy, Declined, Timeout, Error |
|
||||
| `Hold` / `Unhold` | Call hold state |
|
||||
| `Mute` / `Unmute` | Mic mute state |
|
||||
| `Transfer` | Call transfer to another relay/fingerprint |
|
||||
| `Rekey` | New ephemeral key for forward secrecy |
|
||||
| `QualityUpdate` | Quality report + recommended profile |
|
||||
| `Ping` / `Pong` | Latency measurement (timestamp_ms) |
|
||||
| `RoomUpdate` | Participant list changes |
|
||||
| `PresenceUpdate` | Federation presence gossip |
|
||||
| `RouteQuery` / `RouteResponse` | Presence discovery for routing |
|
||||
| `FederationHello` | Relay identity during federation setup |
|
||||
| `GlobalRoomActive` / `GlobalRoomInactive` | Federation room bridging |
|
||||
|
||||
## Test Coverage
|
||||
|
||||
272 tests across all crates, 0 failures:
|
||||
|
||||
| Crate | Tests | Key Coverage |
|
||||
|-------|-------|-------------|
|
||||
| wzp-proto | 41 | Wire format, jitter buffer, quality tiers, mini-frames, trunking |
|
||||
| wzp-codec | 31 | Opus/Codec2 roundtrip, silence detection, noise suppression |
|
||||
| wzp-fec | 22 | RaptorQ encode/decode, loss recovery, interleaving |
|
||||
| wzp-crypto | 34 + 28 compat | Encrypt/decrypt, handshake, anti-replay, featherChat identity |
|
||||
| wzp-transport | 2 | QUIC connection setup |
|
||||
| wzp-relay | 40 + 4 integration | Room ACL, session mgmt, metrics, probes, mesh, trunking |
|
||||
| wzp-client | 30 + 2 integration | Encoder/decoder, quality adapter, silence, drift, sweep |
|
||||
| wzp-web | 2 | Metrics |
|
||||
|
||||
## Build Requirements
|
||||
|
||||
- **Rust** 1.85+ (2024 edition)
|
||||
- **Linux**: cmake, pkg-config, libasound2-dev (for audio feature)
|
||||
- **macOS**: Xcode command line tools (CoreAudio included)
|
||||
- **Android**: NDK r27c, cmake 3.28+ (from pip)
|
||||
|
||||
201
docs/PRD-adaptive-quality.md
Normal file
201
docs/PRD-adaptive-quality.md
Normal file
@@ -0,0 +1,201 @@
|
||||
# PRD: Adaptive Quality Control (Auto Codec)
|
||||
|
||||
## Problem
|
||||
|
||||
When a user selects "Auto" quality, the system currently just starts at Opus 24k (GOOD) and never changes. There is no runtime adaptation — if the network degrades mid-call, audio breaks up instead of gracefully stepping down to a lower bitrate codec. Conversely, if the network is excellent, the user stays on 24k when they could have studio-quality 64k.
|
||||
|
||||
The relay already sends `QualityReport` messages with loss % and RTT, and a `QualityAdapter` exists in `call.rs` that classifies network conditions into GOOD/DEGRADED/CATASTROPHIC — but none of this is wired into the Android or desktop engines.
|
||||
|
||||
## Solution
|
||||
|
||||
Wire the existing `QualityAdapter` into both engines so that "Auto" mode continuously monitors network quality and switches codecs mid-call. The full quality range should be used:
|
||||
|
||||
```
|
||||
Excellent network → Studio 64k (best quality)
|
||||
Good network → Opus 24k (default)
|
||||
Degraded network → Opus 6k (lower bitrate, more FEC)
|
||||
Poor network → Codec2 3.2k (vocoder, heavy FEC)
|
||||
Catastrophic → Codec2 1.2k (minimum viable voice)
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────┐
|
||||
Relay ──────────► │ QualityReport │ loss %, RTT, jitter
|
||||
│ (every ~1s) │
|
||||
└────────┬────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ QualityAdapter │ classify + hysteresis
|
||||
│ (3-report window) │
|
||||
└────────┬────────────┘
|
||||
│ recommend new profile
|
||||
▼
|
||||
┌──────────────┴──────────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌────────────────┐ ┌────────────────┐
|
||||
│ Encoder │ │ Decoder │
|
||||
│ set_profile() │ │ (auto-switch │
|
||||
│ + FEC update │ │ already works)│
|
||||
└────────────────┘ └────────────────┘
|
||||
```
|
||||
|
||||
## Existing Infrastructure
|
||||
|
||||
### What already exists (in `crates/wzp-client/src/call.rs`)
|
||||
|
||||
1. **`QualityAdapter`** (lines 97-196):
|
||||
- Sliding window of `QualityReport` messages
|
||||
- `classify()`: loss > 15% or RTT > 200ms → CATASTROPHIC, loss > 5% or RTT > 100ms → DEGRADED, else → GOOD
|
||||
- `should_switch()`: hysteresis — requires 3 consecutive reports recommending the same profile before switching
|
||||
- Prevents oscillation between profiles
|
||||
|
||||
2. **`QualityReport`** (in `wzp-proto/src/packet.rs`):
|
||||
- Sent by relay piggy-backed on media packets
|
||||
- Fields: `loss_pct` (u8, 0-255 scaled), `rtt_4ms` (u8, RTT in 4ms units), `jitter_ms`, `bitrate_cap_kbps`
|
||||
|
||||
3. **`CallEncoder::set_profile()`** / **`CallDecoder` auto-switch**:
|
||||
- Encoder can switch codec mid-stream
|
||||
- Decoder already auto-detects incoming codec from packet headers
|
||||
|
||||
### What's missing
|
||||
|
||||
1. **QualityReport ingestion** — neither Android engine nor desktop engine reads quality reports from the relay
|
||||
2. **Profile switch loop** — no periodic check that feeds reports to `QualityAdapter` and applies recommended switches
|
||||
3. **Upward adaptation** — `QualityAdapter` only classifies into 3 tiers (GOOD/DEGRADED/CATASTROPHIC). Needs extension to recommend studio tiers when conditions are excellent (loss < 1%, RTT < 50ms)
|
||||
4. **Notification to UI** — when quality changes, the UI should show the current active codec
|
||||
|
||||
## Requirements
|
||||
|
||||
### Phase 1: Basic Adaptive (3-tier)
|
||||
|
||||
**Both Android and Desktop:**
|
||||
|
||||
1. **Ingest QualityReports**: In the recv loop, extract `quality_report` from incoming `MediaPacket`s when present. Feed to `QualityAdapter`.
|
||||
|
||||
2. **Periodic quality check**: Every 1 second (or on each QualityReport), call `adapter.should_switch(¤t_profile)`. If it returns `Some(new_profile)`:
|
||||
- Switch the encoder: `encoder.set_profile(new_profile)`
|
||||
- Update FEC encoder: `fec_enc = create_encoder(&new_profile)`
|
||||
- Update frame size if changed (e.g., 20ms → 40ms)
|
||||
- Log the switch
|
||||
|
||||
3. **Frame size adaptation on switch**: When switching from 20ms to 40ms frames (or vice versa):
|
||||
- Android: update `frame_samples` variable, resize `capture_buf`
|
||||
- Desktop: same — the send loop reads `frame_samples` dynamically
|
||||
|
||||
4. **UI indicator**: Show current active codec in the call screen stats line.
|
||||
- Android: add to `CallStats` and display in stats text
|
||||
- Desktop: add to `get_status` response and display in stats div
|
||||
|
||||
5. **Only in Auto mode**: Adaptive switching should only happen when the user selected "Auto". If they manually selected a profile, respect their choice.
|
||||
|
||||
### Phase 2: Extended Range (5-tier)
|
||||
|
||||
Extend `QualityAdapter::classify()` to use the full codec range:
|
||||
|
||||
| Condition | Profile | Codec |
|
||||
|-----------|---------|-------|
|
||||
| loss < 1% AND RTT < 30ms | STUDIO_64K | Opus 64k |
|
||||
| loss < 1% AND RTT < 50ms | STUDIO_48K | Opus 48k |
|
||||
| loss < 2% AND RTT < 80ms | STUDIO_32K | Opus 32k |
|
||||
| loss < 5% AND RTT < 100ms | GOOD | Opus 24k |
|
||||
| loss < 15% AND RTT < 200ms | DEGRADED | Opus 6k |
|
||||
| loss >= 15% OR RTT >= 200ms | CATASTROPHIC | Codec2 1.2k |
|
||||
|
||||
With hysteresis:
|
||||
- **Downgrade**: 3 consecutive reports (fast reaction to degradation)
|
||||
- **Upgrade**: 5 consecutive reports (slow, cautious improvement)
|
||||
- **Studio upgrade**: 10 consecutive reports (very conservative — avoid bouncing to 64k on brief good patches)
|
||||
|
||||
### Phase 3: Bandwidth Probing
|
||||
|
||||
Rather than relying solely on loss/RTT:
|
||||
1. Start at GOOD
|
||||
2. After 10 seconds of stable call, probe upward by switching to STUDIO_32K
|
||||
3. If no quality degradation after 5 seconds, probe to STUDIO_48K
|
||||
4. If degradation detected, immediately fall back
|
||||
5. This discovers the true available bandwidth rather than guessing from loss stats
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Android (`crates/wzp-android/src/engine.rs`)
|
||||
|
||||
```rust
|
||||
// In the recv loop, after decoding:
|
||||
if let Some(ref qr) = pkt.quality_report {
|
||||
quality_adapter.ingest(qr);
|
||||
}
|
||||
|
||||
// Periodic check (every 50 frames ≈ 1 second):
|
||||
if auto_profile && frames_decoded % 50 == 0 {
|
||||
if let Some(new_profile) = quality_adapter.should_switch(¤t_profile) {
|
||||
info!(from = ?current_profile.codec, to = ?new_profile.codec, "auto: switching quality");
|
||||
let _ = encoder_ref.lock().set_profile(new_profile);
|
||||
fec_enc_ref.lock() = create_encoder(&new_profile);
|
||||
current_profile = new_profile;
|
||||
frame_samples = frame_samples_for(&new_profile);
|
||||
// Resize capture buffer if needed
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Challenge**: The encoder is in the send task and the quality reports arrive in the recv task. Need shared state (AtomicU8 for profile index, or a channel).
|
||||
|
||||
**Recommended approach**: Use an `AtomicU8` that the recv task writes and the send task reads:
|
||||
```rust
|
||||
let pending_profile = Arc::new(AtomicU8::new(0xFF)); // 0xFF = no change
|
||||
|
||||
// Recv task: when adapter recommends switch
|
||||
pending_profile.store(new_profile_index, Ordering::Release);
|
||||
|
||||
// Send task: check at frame boundary
|
||||
let p = pending_profile.swap(0xFF, Ordering::Acquire);
|
||||
if p != 0xFF { /* apply switch */ }
|
||||
```
|
||||
|
||||
### Desktop (`desktop/src-tauri/src/engine.rs`)
|
||||
|
||||
Same pattern. The desktop engine already has separate send/recv tasks with shared atomics for mic_muted, etc. Add a `pending_profile: Arc<AtomicU8>` following the same pattern.
|
||||
|
||||
### Desktop CLI (`crates/wzp-client/src/call.rs`)
|
||||
|
||||
The `CallEncoder` already has `set_profile()`. The `CallDecoder` already auto-switches. Just need to:
|
||||
1. Add `QualityAdapter` to `CallDecoder`
|
||||
2. Feed quality reports in `ingest()`
|
||||
3. Check `should_switch()` in `decode_next()`
|
||||
4. Emit the recommendation via a callback or return value
|
||||
|
||||
## Testing
|
||||
|
||||
1. **Local test with tc/netem**: Use Linux traffic control to simulate loss/latency:
|
||||
```bash
|
||||
# Simulate 10% loss, 150ms RTT
|
||||
tc qdisc add dev lo root netem loss 10% delay 75ms
|
||||
# Run 2 clients in auto mode, verify they switch to DEGRADED
|
||||
```
|
||||
|
||||
2. **CLI test**: Run `wzp-client --profile auto` between two instances with simulated network conditions
|
||||
|
||||
3. **Relay quality reports**: Verify the relay actually sends QualityReport messages. If it doesn't yet, that needs to be implemented first (check relay code).
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Does the relay currently send QualityReports?** If not, Phase 1 is blocked until the relay implements per-client loss/RTT tracking and report generation. The relay sees all packets and can compute loss % per sender.
|
||||
|
||||
2. **Codec2 3.2k placement**: Should auto mode use Codec2 3.2k between DEGRADED and CATASTROPHIC? It's 20ms frames (lower latency than Opus 6k's 40ms) but speech-only quality.
|
||||
|
||||
3. **Cross-client adaptation**: If client A is on GOOD and client B auto-adapts to CATASTROPHIC, client A still sends Opus 24k. Client B can decode it fine (auto-switch on recv). But should A also be told to lower quality to save B's bandwidth? This requires signaling between clients.
|
||||
|
||||
## Milestones
|
||||
|
||||
| Phase | Scope | Effort | Dependency |
|
||||
|-------|-------|--------|------------|
|
||||
| 0 | Verify relay sends QualityReports | 0.5 day | None |
|
||||
| 1a | Wire QualityAdapter in Android engine | 1 day | Phase 0 |
|
||||
| 1b | Wire QualityAdapter in desktop engine | 1 day | Phase 0 |
|
||||
| 1c | UI indicator (current codec) | 0.5 day | Phase 1a/1b |
|
||||
| 2 | Extended 5-tier classification | 0.5 day | Phase 1 |
|
||||
| 3 | Bandwidth probing | 2 days | Phase 2 |
|
||||
198
docs/PRD-coordinated-codec.md
Normal file
198
docs/PRD-coordinated-codec.md
Normal file
@@ -0,0 +1,198 @@
|
||||
# PRD: Coordinated Codec Switching (Relay-Judged Quality)
|
||||
|
||||
## Problem
|
||||
|
||||
The current adaptive quality system (`QualityAdapter` in call.rs) exists but isn't wired into either engine. Clients encode at a fixed quality chosen at call start. When network conditions change mid-call, audio degrades instead of gracefully stepping down. When conditions improve, clients stay on low quality unnecessarily.
|
||||
|
||||
Additionally, in SFU mode with multiple participants, uncoordinated codec switching creates asymmetry: if client A upgrades to 64k while B stays on 24k, bandwidth is wasted. Participants should switch together.
|
||||
|
||||
## Solution
|
||||
|
||||
The **relay acts as the quality judge** since it sees both sides of every connection. It monitors packet loss, jitter, and RTT per participant, then signals quality recommendations. Clients react to these signals with coordinated codec switches.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||
│ Client A │◄──────►│ Relay │◄──────►│ Client B │
|
||||
│ │ │ (judge) │ │ │
|
||||
│ Encoder │ │ │ │ Encoder │
|
||||
│ Decoder │ │ Monitor │ │ Decoder │
|
||||
└─────────┘ │ per-peer│ └─────────┘
|
||||
│ quality │
|
||||
└────┬────┘
|
||||
│
|
||||
Quality Signals:
|
||||
- StableSignal (conditions good)
|
||||
- DegradeSignal (conditions bad)
|
||||
- UpgradeProposal (try higher quality?)
|
||||
- UpgradeConfirm (all agreed, switch at T)
|
||||
```
|
||||
|
||||
## Quality Classification (Relay-Side)
|
||||
|
||||
The relay monitors each participant's connection quality:
|
||||
|
||||
| Condition | Classification | Action |
|
||||
|-----------|---------------|--------|
|
||||
| loss >= 15% OR RTT >= 200ms | Critical | Immediate downgrade signal |
|
||||
| loss >= 5% OR RTT >= 100ms | Degraded | Downgrade signal after 3 reports |
|
||||
| loss < 2% AND RTT < 80ms | Good | Stable signal |
|
||||
| loss < 1% AND RTT < 50ms for 30s | Excellent | Upgrade proposal |
|
||||
| loss < 0.5% AND RTT < 30ms for 60s | Studio | Studio upgrade proposal |
|
||||
|
||||
## Coordinated Switching Protocol
|
||||
|
||||
### Downgrade (fast, safety-first)
|
||||
|
||||
1. Relay detects degradation for ANY participant
|
||||
2. Relay sends `QualityUpdate { recommended_profile: DEGRADED }` to ALL participants
|
||||
3. ALL participants immediately switch encoder to the recommended profile
|
||||
4. No negotiation — downgrade is mandatory and instant
|
||||
|
||||
### Upgrade (slow, consensual)
|
||||
|
||||
1. Relay detects sustained good conditions for ALL participants (threshold: 30s stable)
|
||||
2. Relay sends `UpgradeProposal { target_profile, switch_timestamp }` to all
|
||||
3. Each client responds: `UpgradeAccept` or `UpgradeReject`
|
||||
4. If ALL accept within 5s → Relay sends `UpgradeConfirm { profile, switch_at_ms }`
|
||||
5. All clients switch encoder at the agreed timestamp (relative to session clock)
|
||||
6. If ANY rejects or times out → upgrade cancelled, stay on current profile
|
||||
|
||||
### Asymmetric Encoding (SFU optimization)
|
||||
|
||||
In SFU mode, each client encodes independently. The relay could allow:
|
||||
- Client A (strong connection): encode at 64k
|
||||
- Client B (weak connection): encode at 6k
|
||||
- Relay forwards A's 64k to B's decoder (auto-switch handles it)
|
||||
- B benefits from A's quality without needing to send at 64k
|
||||
|
||||
This requires NO protocol changes — just each client independently following the relay's recommendation for their own encoding quality. The decoder already handles any codec.
|
||||
|
||||
### Split Network Consideration
|
||||
|
||||
If participant A has great quality but participant C has terrible quality:
|
||||
- Option 1: **Match weakest link** — everyone encodes at C's level (current approach, simple)
|
||||
- Option 2: **Per-participant recommendations** — A encodes at 64k, C encodes at 6k. B (good connection) receives and decodes both. Works because decoders auto-switch per packet.
|
||||
- Option 3: **Relay transcoding** — relay re-encodes A's 64k as 6k for C. Adds CPU on relay, but saves bandwidth for C. Future feature.
|
||||
|
||||
Recommended: start with Option 1 (match weakest), add Option 2 later.
|
||||
|
||||
## Signal Messages (New/Modified)
|
||||
|
||||
```rust
|
||||
/// Quality signal from relay to client
|
||||
QualityDirective {
|
||||
/// Recommended profile to use for encoding
|
||||
recommended_profile: QualityProfile,
|
||||
/// Reason for the recommendation
|
||||
reason: QualityReason,
|
||||
}
|
||||
|
||||
enum QualityReason {
|
||||
/// Network conditions require this quality level
|
||||
NetworkCondition,
|
||||
/// Coordinated upgrade — all participants agreed
|
||||
CoordinatedUpgrade,
|
||||
/// Coordinated downgrade — weakest link determines level
|
||||
CoordinatedDowngrade,
|
||||
}
|
||||
|
||||
/// Upgrade proposal from relay
|
||||
UpgradeProposal {
|
||||
target_profile: QualityProfile,
|
||||
/// Milliseconds from now when the switch would happen
|
||||
switch_delay_ms: u32,
|
||||
}
|
||||
|
||||
/// Client response to upgrade proposal
|
||||
UpgradeResponse {
|
||||
accepted: bool,
|
||||
}
|
||||
|
||||
/// Confirmed upgrade — all clients switch at this time
|
||||
UpgradeConfirm {
|
||||
profile: QualityProfile,
|
||||
/// Session-relative timestamp to switch (ms since call start)
|
||||
switch_at_session_ms: u64,
|
||||
}
|
||||
```
|
||||
|
||||
## Relay-Side Implementation
|
||||
|
||||
### Per-Participant Quality Tracking
|
||||
|
||||
```rust
|
||||
struct ParticipantQuality {
|
||||
/// Sliding window of recent observations
|
||||
loss_samples: VecDeque<f32>, // last 30 seconds
|
||||
rtt_samples: VecDeque<u32>, // last 30 seconds
|
||||
jitter_samples: VecDeque<u32>,
|
||||
/// Current classification
|
||||
classification: QualityClass,
|
||||
/// How long current classification has been stable
|
||||
stable_since: Instant,
|
||||
}
|
||||
```
|
||||
|
||||
### Quality Monitor Task (on relay)
|
||||
|
||||
Runs alongside the SFU forwarding loop:
|
||||
1. Every 1 second, compute per-participant quality from QUIC connection stats
|
||||
2. Classify each participant
|
||||
3. If ANY participant degrades → send downgrade to ALL
|
||||
4. If ALL participants stable for threshold → propose upgrade
|
||||
5. Track upgrade negotiation state
|
||||
|
||||
### Integration with Existing Code
|
||||
|
||||
The relay already has access to:
|
||||
- `QuinnTransport::path_quality()` → loss, RTT, jitter, bandwidth estimates
|
||||
- `QualityReport` embedded in media packet headers
|
||||
- Per-session metrics in `RelayMetrics`
|
||||
|
||||
The quality monitor just needs to read these existing metrics and produce signals.
|
||||
|
||||
## Client-Side Implementation
|
||||
|
||||
### Handling Quality Signals
|
||||
|
||||
In the recv loop (both Android engine and desktop engine):
|
||||
```rust
|
||||
SignalMessage::QualityDirective { recommended_profile, .. } => {
|
||||
// Immediate: switch encoder to recommended profile
|
||||
encoder.set_profile(recommended_profile)?;
|
||||
fec_enc = create_encoder(&recommended_profile);
|
||||
frame_samples = frame_samples_for(&recommended_profile);
|
||||
info!(codec = ?recommended_profile.codec, "quality directive: switched");
|
||||
}
|
||||
```
|
||||
|
||||
### P2P Quality (simpler case)
|
||||
|
||||
For P2P calls (no relay), both clients directly observe quality:
|
||||
1. Each client runs its own `QualityAdapter` on the direct connection
|
||||
2. When quality changes, client proposes to peer via signal
|
||||
3. Simpler negotiation: only 2 parties, no relay middleman
|
||||
4. Same coordinated switching logic, just peer-to-peer signals
|
||||
|
||||
## Backporting P2P → Relay
|
||||
|
||||
The quality monitoring and codec switching logic is identical:
|
||||
- **P2P**: client observes quality directly → proposes switch to peer
|
||||
- **Relay**: relay observes quality → proposes switch to all clients
|
||||
|
||||
The only difference is WHO makes the decision (client vs relay) and HOW many participants need to agree (2 vs N).
|
||||
|
||||
Implementation strategy: build for P2P first (simpler, 2 parties), then wrap the same logic with relay-mediated signals for SFU mode.
|
||||
|
||||
## Milestones
|
||||
|
||||
| Phase | Scope | Effort |
|
||||
|-------|-------|--------|
|
||||
| 1 | Relay-side quality monitor (per-participant tracking) | 1 day |
|
||||
| 2 | Downgrade signal (immediate, match weakest) | 1 day |
|
||||
| 3 | Client handling of QualityDirective | 1 day (both engines) |
|
||||
| 4 | Upgrade proposal + negotiation protocol | 2 days |
|
||||
| 5 | P2P quality adaptation (direct observation) | 1 day |
|
||||
| 6 | Per-participant asymmetric encoding (Option 2) | 1 day |
|
||||
170
docs/PRD-delegated-trust.md
Normal file
170
docs/PRD-delegated-trust.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# PRD: Delegated Trust for Relay Federation
|
||||
|
||||
## Problem
|
||||
|
||||
In the current federation model, when Relay 1 trusts Relay 2, and Relay 2 forwards media from Relay 3, Relay 1 has no way to know or control that Relay 3's traffic is reaching it. This is a trust gap — any relay in the chain can introduce untrusted traffic.
|
||||
|
||||
**Example:** Relay 1 (trusted zone) ←→ Relay 2 (hub) ←→ Relay 3 (unknown)
|
||||
|
||||
Relay 1 explicitly trusts Relay 2. But Relay 2 forwards Relay 3's media to Relay 1 without Relay 1's consent. Relay 1 receives media that originated from an entity it never approved.
|
||||
|
||||
## Solution
|
||||
|
||||
Add a `delegate` flag to `[[trusted]]` entries. When `delegate = true`, the relay accepts media forwarded through the trusted peer from relays that the trusted peer vouches for. When `delegate = false` (default), only media originating from explicitly trusted/peered relays is accepted.
|
||||
|
||||
## Trust Levels
|
||||
|
||||
| Config | Meaning |
|
||||
|--------|---------|
|
||||
| `[[peers]]` | "I connect to you and trust your identity" |
|
||||
| `[[trusted]]` | "I accept connections from you" |
|
||||
| `[[trusted]] delegate = true` | "I accept connections from you AND from relays you vouch for" |
|
||||
| No entry | "I reject your connections and drop your forwarded media" |
|
||||
|
||||
## Configuration
|
||||
|
||||
```toml
|
||||
# Relay 1: trusts Relay 2 and delegates trust
|
||||
[[trusted]]
|
||||
fingerprint = "relay-2-tls-fingerprint"
|
||||
label = "Relay 2 (Hub)"
|
||||
delegate = true # Accept relays that Relay 2 forwards from
|
||||
|
||||
# Without delegate (default = false):
|
||||
[[trusted]]
|
||||
fingerprint = "relay-4-tls-fingerprint"
|
||||
label = "Relay 4"
|
||||
# delegate = false (implicit default)
|
||||
# Only direct media from Relay 4 is accepted
|
||||
```
|
||||
|
||||
## Protocol Changes
|
||||
|
||||
### Relay-to-Relay Media Authorization
|
||||
|
||||
When Relay 2 forwards media from Relay 3 to Relay 1, the datagram needs to carry origin information so Relay 1 can decide whether to accept it.
|
||||
|
||||
**Option A: Origin tag in datagram** (recommended)
|
||||
|
||||
Extend the federation datagram format:
|
||||
```
|
||||
[room_hash: 8 bytes][origin_relay_fp: 8 bytes][media_packet]
|
||||
```
|
||||
|
||||
The 8-byte origin fingerprint identifies which relay originally produced the media. The forwarding relay (Relay 2) sets this to the source relay's fingerprint. Relay 1 checks:
|
||||
1. Is the origin relay directly trusted? → accept
|
||||
2. Is the forwarding relay trusted with `delegate = true`? → accept
|
||||
3. Otherwise → drop
|
||||
|
||||
**Option B: Trust announcement signal**
|
||||
|
||||
When Relay 2 connects to Relay 1, it sends a `FederationTrustChain` signal listing which relays it will forward from:
|
||||
```rust
|
||||
FederationTrustChain {
|
||||
/// Fingerprints of relays this peer may forward media from
|
||||
vouched_relays: Vec<String>,
|
||||
}
|
||||
```
|
||||
|
||||
Relay 1 checks each fingerprint against its policy:
|
||||
- If Relay 2 has `delegate = true` in Relay 1's config → accept all listed relays
|
||||
- If Relay 2 has `delegate = false` → reject, only accept direct media from Relay 2
|
||||
|
||||
Option B is simpler to implement (no datagram format change) but less granular.
|
||||
|
||||
### Recommended: Option B for v1, Option A for v2
|
||||
|
||||
Option B is simpler — the trust chain is established at connection time, not per-datagram. The forwarding relay announces what it will forward, and the receiving relay approves or rejects upfront.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Config Changes
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct TrustedConfig {
|
||||
pub fingerprint: String,
|
||||
#[serde(default)]
|
||||
pub label: Option<String>,
|
||||
/// When true, also accept media forwarded through this relay from
|
||||
/// relays it vouches for. Default: false.
|
||||
#[serde(default)]
|
||||
pub delegate: bool,
|
||||
}
|
||||
```
|
||||
|
||||
### Federation Signal
|
||||
|
||||
```rust
|
||||
/// Sent after FederationHello — lists relays this peer will forward from.
|
||||
FederationTrustChain {
|
||||
/// TLS fingerprints of relays whose media may be forwarded through us.
|
||||
vouched_relays: Vec<String>,
|
||||
}
|
||||
```
|
||||
|
||||
### Forwarding Authorization
|
||||
|
||||
In `handle_datagram`, before forwarding media to local participants:
|
||||
|
||||
```rust
|
||||
// Check if we should accept this forwarded media
|
||||
let is_authorized = if source_is_direct_peer {
|
||||
true // Direct peer, always accepted
|
||||
} else {
|
||||
// Check if the forwarding peer has delegate=true
|
||||
let forwarding_peer = fm.find_trusted_by_fingerprint(forwarding_peer_fp);
|
||||
forwarding_peer.map(|t| t.delegate).unwrap_or(false)
|
||||
};
|
||||
|
||||
if !is_authorized {
|
||||
warn!("dropping forwarded media from unauthorized relay chain");
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
### Relay 2 (Hub) Behavior
|
||||
|
||||
When Relay 2 receives `FederationTrustChain` queries from peers:
|
||||
1. Collect all directly connected peer fingerprints
|
||||
2. Send `FederationTrustChain { vouched_relays }` to each peer
|
||||
3. When a new relay connects, update all peers' trust chains
|
||||
|
||||
### Anti-Spam Properties
|
||||
|
||||
| Attack | Mitigation |
|
||||
|--------|-----------|
|
||||
| Unknown relay connects to hub | Hub rejects (not in `[[trusted]]`) |
|
||||
| Hub forwards spam relay's media | Receiving relay checks delegate flag, drops if false |
|
||||
| Relay spoofs origin fingerprint | Origin tag is set by the forwarding relay, not the source. The forwarding relay is trusted, so if it lies about origin, the trust is misplaced at the config level. |
|
||||
| Chain amplification (A→B→C→D→...) | TTL on forwarded datagrams (decrement at each hop, drop at 0). Default TTL=2 (one intermediate relay). |
|
||||
|
||||
## TTL for Chain Length
|
||||
|
||||
Add a TTL byte to the federation datagram to limit chain depth:
|
||||
|
||||
```
|
||||
[room_hash: 8 bytes][ttl: 1 byte][media_packet]
|
||||
```
|
||||
|
||||
- Default TTL = 2 (allows one intermediate relay: A→B→C)
|
||||
- Each forwarding relay decrements TTL
|
||||
- When TTL = 0, don't forward further (only deliver to local participants)
|
||||
- Configurable per-relay: `max_federation_hops = 2`
|
||||
|
||||
## Milestones
|
||||
|
||||
| Phase | Scope | Effort |
|
||||
|-------|-------|--------|
|
||||
| 1 | Add `delegate` field to `TrustedConfig` | 0.5 day |
|
||||
| 2 | `FederationTrustChain` signal + announcement | 1 day |
|
||||
| 3 | Authorization check in `handle_datagram` | 0.5 day |
|
||||
| 4 | TTL in federation datagrams | 0.5 day |
|
||||
| 5 | Testing: authorized vs unauthorized forwarding | 0.5 day |
|
||||
|
||||
## Non-Goals (v1)
|
||||
|
||||
- Per-room trust policies (trust Relay X only for room "android")
|
||||
- Dynamic trust negotiation (relays negotiate trust level at runtime)
|
||||
- Revocation (removing a relay from trust chain requires config edit + restart)
|
||||
- Cryptographic proof of origin (signed datagrams from source relay)
|
||||
59
docs/PRD-mtu-discovery.md
Normal file
59
docs/PRD-mtu-discovery.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# PRD: QUIC Path MTU Discovery
|
||||
|
||||
## Problem
|
||||
|
||||
WarzonePhone uses conservative 1200-byte QUIC datagrams. Some network paths support larger MTUs (1400+), wasting bandwidth. Some broken paths (VPNs, tunnels, double-NAT, cellular) have MTU < 1200, causing silent packet drops — this may explain why Opus 64k fails on some paths while 24k works (larger encoded frames + FEC repair packets).
|
||||
|
||||
## Solution
|
||||
|
||||
Enable Quinn's built-in Path MTU Discovery (PMTUD) and handle edge cases:
|
||||
1. PMTUD probes larger packet sizes and discovers the actual path MTU
|
||||
2. Graceful fallback when datagrams exceed discovered MTU
|
||||
3. Expose MTU in metrics for debugging
|
||||
|
||||
## Implementation
|
||||
|
||||
### Phase 1: Enable PMTUD in Quinn
|
||||
|
||||
`crates/wzp-transport/src/config.rs` — update `transport_config()`:
|
||||
|
||||
```rust
|
||||
// Enable PMTUD (Quinn default is enabled, but we should ensure it)
|
||||
config.mtu_discovery_config(Some(quinn::MtuDiscoveryConfig::default()));
|
||||
|
||||
// Set minimum MTU for safety (some paths can't handle 1200)
|
||||
// Quinn default min is 1200, which is the QUIC spec minimum
|
||||
```
|
||||
|
||||
Quinn's `MtuDiscoveryConfig` has:
|
||||
- `interval`: how often to probe (default: 600s)
|
||||
- `upper_bound`: max MTU to probe (default: 1452 for IPv4)
|
||||
- `minimum_change`: min MTU increase to be worth probing (default: 20)
|
||||
|
||||
### Phase 2: Handle MTU-related Failures
|
||||
|
||||
In federation forwarding (`send_raw_datagram`), if the datagram exceeds the connection's current MTU, Quinn returns an error. Handle gracefully:
|
||||
- Log warning with packet size vs MTU
|
||||
- Drop the packet (don't crash)
|
||||
- Track in metrics: `wzp_relay_mtu_exceeded_total`
|
||||
|
||||
### Phase 3: Codec-Aware MTU
|
||||
|
||||
When the path MTU is small, the relay or client should:
|
||||
- Prefer lower-bitrate codecs (smaller packets)
|
||||
- Reduce FEC ratio (fewer repair packets)
|
||||
- This feeds into the adaptive quality system
|
||||
|
||||
### Phase 4: Expose MTU in Stats
|
||||
|
||||
- Add `path_mtu` to relay metrics (per peer)
|
||||
- Add `path_mtu` to client stats (visible in UI)
|
||||
- Log MTU on connection establishment
|
||||
|
||||
## Non-Goals (v1)
|
||||
|
||||
- Datagram fragmentation (QUIC datagrams are atomic — either fit or don't)
|
||||
- Manual MTU override per relay config
|
||||
- MTU-based codec selection (future, needs adaptive quality)
|
||||
|
||||
## Effort: 1 day
|
||||
146
docs/PRD-p2p-direct.md
Normal file
146
docs/PRD-p2p-direct.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# PRD: Peer-to-Peer Direct Calls (No Relay)
|
||||
|
||||
## Problem
|
||||
|
||||
All calls currently route through a relay, even 1-on-1 calls between clients that could reach each other directly. This adds latency (2x hop), creates a single point of failure, and requires trusting the relay operator (even though media is encrypted, the relay sees metadata).
|
||||
|
||||
## Solution
|
||||
|
||||
For 1-on-1 calls, clients attempt a direct QUIC connection using STUN-discovered addresses. If NAT traversal succeeds, media flows directly between peers. If it fails, fall back to relay-assisted mode (current behavior).
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Preferred (P2P):
|
||||
Client A ←──QUIC direct──→ Client B
|
||||
(no relay in media path, true E2E)
|
||||
|
||||
Fallback (Relay):
|
||||
Client A ──→ Relay ──→ Client B
|
||||
(current model)
|
||||
|
||||
Hybrid discovery:
|
||||
Client A → Relay (signaling only) → Client B
|
||||
↓ ↓
|
||||
STUN server STUN server
|
||||
↓ ↓
|
||||
Discover public IP:port Discover public IP:port
|
||||
↓ ↓
|
||||
Exchange candidates via relay signaling
|
||||
↓ ↓
|
||||
Attempt direct QUIC connection ←──→
|
||||
```
|
||||
|
||||
## Why P2P = True E2E
|
||||
|
||||
- QUIC TLS handshake establishes encrypted tunnel directly between A and B
|
||||
- No third party sees the traffic
|
||||
- Certificate pinning via identity fingerprints: each client derives their TLS cert from their Ed25519 seed (same as relay identity). During QUIC handshake, both sides verify the peer's cert fingerprint against the known identity
|
||||
- MITM elimination: if A knows B's fingerprint (from prior call, QR code, or identity server), any interceptor presents a different cert → fingerprint mismatch → connection rejected
|
||||
- Stronger guarantee than relay-assisted: user doesn't need to trust relay operator
|
||||
|
||||
## Requirements
|
||||
|
||||
### Phase 1: STUN Discovery
|
||||
|
||||
1. **STUN client**: lightweight UDP-based STUN client to discover public IP:port
|
||||
- Use existing public STUN servers (stun.l.google.com:19302, etc.)
|
||||
- Or run a STUN server alongside the relay
|
||||
- Discover: local addresses, server-reflexive addresses (STUN), relay candidates (TURN/relay fallback)
|
||||
|
||||
2. **Candidate gathering**: on call initiation, gather all candidates:
|
||||
- Host candidates: local network interfaces
|
||||
- Server-reflexive: STUN-discovered public IP:port
|
||||
- Relay candidate: the relay's address (fallback)
|
||||
|
||||
3. **Candidate exchange**: via relay signaling channel (existing `IceCandidate` signal message)
|
||||
- A sends candidates to relay → relay forwards to B
|
||||
- B sends candidates to relay → relay forwards to A
|
||||
|
||||
### Phase 2: Direct Connection
|
||||
|
||||
1. **QUIC hole punching**: both clients simultaneously attempt QUIC connections to each other's candidates
|
||||
- Quinn supports connecting to multiple addresses
|
||||
- First successful connection wins
|
||||
- Timeout after 3 seconds, fall back to relay
|
||||
|
||||
2. **Identity verification**: during QUIC handshake, verify peer's TLS cert fingerprint
|
||||
- `server_config_from_seed()` already exists — derive client cert from identity seed
|
||||
- Both sides present certs (mutual TLS)
|
||||
- Verify fingerprint matches expected identity
|
||||
|
||||
3. **Media flow**: once connected, use existing `QuinnTransport` for media + signals
|
||||
- Same `send_media()` / `recv_media()` API
|
||||
- Same codec pipeline, FEC, jitter buffer
|
||||
- No code changes needed in the call engine
|
||||
|
||||
### Phase 3: Adaptive Quality (P2P)
|
||||
|
||||
P2P connections have direct quality visibility — no relay middleman:
|
||||
|
||||
1. Both clients observe RTT, loss, jitter directly from QUIC stats
|
||||
2. Adapt codec quality based on direct observations
|
||||
3. Since only 2 participants, coordinated switching is simple: propose → ack → switch
|
||||
|
||||
This is the simplest case for adaptive quality. Once proven, backport the logic to relay-assisted mode.
|
||||
|
||||
### Phase 4: Hybrid Mode
|
||||
|
||||
1. **Call initiation**: always connect to relay for signaling
|
||||
2. **Parallel attempt**: while relay call is active, attempt P2P in background
|
||||
3. **Seamless migration**: if P2P succeeds, migrate media path from relay to direct
|
||||
- Both clients switch simultaneously
|
||||
- Relay connection kept alive for signaling (presence, room updates)
|
||||
4. **Fallback**: if P2P connection drops, seamlessly fall back to relay
|
||||
|
||||
## Security Properties
|
||||
|
||||
| Property | Relay Mode | P2P Mode |
|
||||
|----------|-----------|----------|
|
||||
| Encryption | ChaCha20-Poly1305 (app layer) | QUIC TLS 1.3 + ChaCha20-Poly1305 |
|
||||
| Key exchange | Via relay signaling | Direct QUIC handshake |
|
||||
| Identity verification | TOFU (server fingerprint) | Mutual TLS cert pinning |
|
||||
| Metadata privacy | Relay sees who talks to whom | No third party sees anything |
|
||||
| MITM resistance | Depends on relay trust | Strong (cert pinning) |
|
||||
| Forward secrecy | ECDH ephemeral keys | QUIC built-in + app-layer rekey |
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### STUN in Rust
|
||||
|
||||
Use `stun-rs` or `webrtc-rs` crate for STUN client. Minimal: just need Binding Request/Response to discover server-reflexive address.
|
||||
|
||||
### Quinn Hole Punching
|
||||
|
||||
Quinn's `Endpoint` can both listen and connect. For hole punching:
|
||||
```rust
|
||||
let endpoint = create_endpoint(bind_addr, Some(server_config))?;
|
||||
// Send connect to peer's address (opens NAT pinhole)
|
||||
let conn = connect(&endpoint, peer_addr, "peer", client_config).await?;
|
||||
// Simultaneously, peer connects to our address
|
||||
// First successful handshake wins
|
||||
```
|
||||
|
||||
### Client TLS Certificate
|
||||
|
||||
Already have `server_config_from_seed()` for relays. Create `client_config_from_seed()` that presents a TLS client certificate derived from the identity seed. The peer verifies this cert's fingerprint.
|
||||
|
||||
### Signaling via Relay
|
||||
|
||||
The existing relay connection carries `IceCandidate` signals. No new infrastructure needed — just use the relay as a dumb signaling pipe for candidate exchange.
|
||||
|
||||
## Non-Goals (v1)
|
||||
|
||||
- SFU over P2P (P2P is 1-on-1 only; multi-party uses relay SFU)
|
||||
- TURN server (relay acts as the fallback, no separate TURN)
|
||||
- mDNS local discovery (future)
|
||||
- Mesh P2P for multi-party (future, complex)
|
||||
|
||||
## Milestones
|
||||
|
||||
| Phase | Scope | Effort |
|
||||
|-------|-------|--------|
|
||||
| 1 | STUN client + candidate gathering | 2 days |
|
||||
| 2 | QUIC hole punching + identity verification | 3 days |
|
||||
| 3 | Adaptive quality on P2P connection | 2 days |
|
||||
| 4 | Hybrid mode (relay + P2P, seamless migration) | 3 days |
|
||||
178
docs/PRD-protocol-analyzer.md
Normal file
178
docs/PRD-protocol-analyzer.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# PRD: Protocol Analyzer & Debug Tap
|
||||
|
||||
## 1. Relay-Side Metadata Tap (`--debug-tap`)
|
||||
|
||||
### Problem
|
||||
|
||||
When debugging federation, codec issues, or packet flow problems, there's no visibility into what's actually flowing through the relay. You have to guess from client-side logs.
|
||||
|
||||
### Solution
|
||||
|
||||
A `--debug-tap <room>` flag on the relay that logs every packet's **header metadata** for a specific room (or all rooms with `--debug-tap *`). No decryption needed — the MediaHeader is not encrypted, only the audio payload is.
|
||||
|
||||
### Output Format
|
||||
|
||||
```
|
||||
[12:00:00.123] TAP room=test dir=in src=192.168.1.5:54321 seq=1234 codec=Opus24k ts=24000 fec_block=5 fec_sym=2 repair=false len=87
|
||||
[12:00:00.123] TAP room=test dir=out dst=192.168.1.6:54322 seq=1234 codec=Opus24k ts=24000 fec_block=5 fec_sym=2 repair=false len=87 fan_out=2
|
||||
[12:00:00.143] TAP room=test dir=in src=192.168.1.5:54321 seq=1235 codec=Opus24k ts=24960 fec_block=5 fec_sym=3 repair=false len=91
|
||||
[12:00:00.500] TAP room=test dir=in src=192.168.1.6:54322 seq=0042 codec=Codec2_1200 ts=40000 fec_block=1 fec_sym=0 repair=false len=6
|
||||
[12:00:01.000] TAP room=test SIGNAL type=RoomUpdate count=3 participants=[Alice,Bob,Charlie]
|
||||
[12:00:05.000] TAP room=test STATS period=5s in_pkts=250 out_pkts=500 fan_out_avg=2.0 loss_detected=0 codecs_seen=[Opus24k,Codec2_1200]
|
||||
```
|
||||
|
||||
### What it shows
|
||||
|
||||
- **Per-packet**: direction, source/dest, sequence number, codec ID, timestamp, FEC block/symbol, repair flag, payload size
|
||||
- **Signals**: RoomUpdate, FederationRoomJoin/Leave, handshake events
|
||||
- **Periodic stats**: packets in/out, average fan-out, codecs seen, detected sequence gaps (loss)
|
||||
- **Federation**: room-hash tagged datagrams with source/dest relay
|
||||
|
||||
### Implementation
|
||||
|
||||
**File:** `crates/wzp-relay/src/room.rs` — in `run_participant_plain()` and `run_participant_trunked()`
|
||||
|
||||
After receiving a packet and before forwarding:
|
||||
```rust
|
||||
if debug_tap_enabled {
|
||||
let h = &pkt.header;
|
||||
info!(
|
||||
room = %room_name,
|
||||
dir = "in",
|
||||
src = %addr,
|
||||
seq = h.seq,
|
||||
codec = ?h.codec_id,
|
||||
ts = h.timestamp,
|
||||
fec_block = h.fec_block,
|
||||
fec_sym = h.fec_symbol,
|
||||
repair = h.is_repair,
|
||||
len = pkt.payload.len(),
|
||||
"TAP"
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
**Activation:** `--debug-tap <room_name>` CLI flag, or `debug_tap = "test"` / `debug_tap = "*"` in TOML config.
|
||||
|
||||
**Performance:** Only active when enabled. When enabled, adds one `info!()` log per packet per direction. At 50 fps × 5 participants = 500 log lines/sec — acceptable for debugging, not for production.
|
||||
|
||||
**Output options:**
|
||||
- Default: tracing log (stderr)
|
||||
- `--debug-tap-file <path>`: write to a dedicated file (JSONL format for machine parsing)
|
||||
|
||||
### Effort: 0.5 day
|
||||
|
||||
---
|
||||
|
||||
## 2. Full Protocol Analyzer (Standalone Tool)
|
||||
|
||||
### Problem
|
||||
|
||||
The metadata tap shows packet flow but can't inspect audio content, verify encryption, or measure audio quality. For deep debugging (codec issues, resampling bugs, encryption mismatches), you need to see the actual decrypted audio.
|
||||
|
||||
### Solution
|
||||
|
||||
A standalone `wzp-analyzer` binary that either:
|
||||
- **A)** Acts as a transparent proxy between client and relay (MITM mode)
|
||||
- **B)** Reads a pcap/capture file with QUIC session keys (passive mode)
|
||||
- **C)** Runs as a special "observer" client that joins a room in listen-only mode with all participants' consent
|
||||
|
||||
### Architecture
|
||||
|
||||
**Option C (recommended — simplest, no MITM):**
|
||||
|
||||
```
|
||||
┌──────────────┐
|
||||
Client A ────────►│ Relay │◄──────── Client B
|
||||
│ │
|
||||
│ (SFU) │◄──────── wzp-analyzer
|
||||
└──────────────┘ (observer mode)
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Decode + Analyze │
|
||||
│ - Packet timing │
|
||||
│ - Codec decode │
|
||||
│ - Audio quality │
|
||||
│ - Jitter stats │
|
||||
│ - Waveform plot │
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
The analyzer joins the room as a regular participant (receives all media via SFU forwarding) but doesn't send audio. It decodes everything it receives and produces analysis.
|
||||
|
||||
**Limitation:** End-to-end encrypted payloads can't be decoded without session keys. The analyzer would either:
|
||||
1. Need the session key (shared out-of-band for debugging)
|
||||
2. Or only analyze unencrypted headers + timing (same as the relay tap, but from client perspective with jitter buffer simulation)
|
||||
|
||||
For now, since encryption is not fully enforced in the current codebase (the crypto session is established but the actual ChaCha20 encryption of payloads is TODO in some paths), the analyzer can decode raw Opus/Codec2 payloads directly.
|
||||
|
||||
### Features
|
||||
|
||||
**Real-time display (TUI):**
|
||||
```
|
||||
┌─ wzp-analyzer: room "podcast" on 193.180.213.68:4433 ─────────────┐
|
||||
│ │
|
||||
│ Participants: Alice (Opus24k), Bob (Codec2_3200) │
|
||||
│ │
|
||||
│ Alice ──────────────────────────────────────── │
|
||||
│ seq: 5234 codec: Opus24k ts: 125760 loss: 0.2% jitter: 3ms │
|
||||
│ RMS: 4521 peak: 15280 silence: no │
|
||||
│ FEC blocks: 1046/1046 complete (0 recovered) │
|
||||
│ ▁▂▃▅▇█▇▅▃▂▁▁▂▃▅▇█▇▅▃▂▁ (waveform last 1s) │
|
||||
│ │
|
||||
│ Bob ────────────────────────────────────── │
|
||||
│ seq: 2617 codec: Codec2_3200 ts: 62800 loss: 1.5% jitter: 8ms│
|
||||
│ RMS: 1250 peak: 6800 silence: no │
|
||||
│ FEC blocks: 523/525 complete (4 recovered) │
|
||||
│ ▁▁▂▃▅▇▅▃▂▁▁▁▂▃▅▇▅▃▂▁▁ (waveform last 1s) │
|
||||
│ │
|
||||
│ Total: 7851 pkts recv, 0 pkts sent, 2 participants │
|
||||
│ Uptime: 2m 35s │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Recorded analysis:**
|
||||
- Save all received packets to a capture file
|
||||
- Post-session report: per-participant stats, quality timeline, codec switches, packet loss patterns
|
||||
- Export decoded audio as WAV per participant (if decryptable)
|
||||
|
||||
**Quality metrics per participant:**
|
||||
- Packet loss % (from sequence gaps)
|
||||
- Jitter (inter-arrival time variance)
|
||||
- Codec switches (timestamps + reasons)
|
||||
- RMS audio level over time
|
||||
- Silence detection
|
||||
- FEC recovery rate
|
||||
- Round-trip estimates (from Ping/Pong if available)
|
||||
|
||||
### Implementation
|
||||
|
||||
**Binary:** `wzp-analyzer` (new crate or subcommand of `wzp-client`)
|
||||
|
||||
```
|
||||
wzp-analyzer 193.180.213.68:4433 --room podcast
|
||||
wzp-analyzer 193.180.213.68:4433 --room podcast --record capture.wzp
|
||||
wzp-analyzer --replay capture.wzp --report report.html
|
||||
```
|
||||
|
||||
**Dependencies:**
|
||||
- Existing: `wzp-transport`, `wzp-proto`, `wzp-codec`, `wzp-crypto`
|
||||
- New: `ratatui` for TUI display (optional)
|
||||
|
||||
### Phases
|
||||
|
||||
| Phase | Scope | Effort |
|
||||
|-------|-------|--------|
|
||||
| 1 | Header-only analysis: join room, log packet metadata, show per-participant stats (TUI) | 2 days |
|
||||
| 2 | Audio decode: decode Opus/Codec2 payloads (unencrypted path), show waveform + RMS | 1-2 days |
|
||||
| 3 | Capture/replay: save packets to file, replay offline with full analysis | 1 day |
|
||||
| 4 | HTML report: post-session quality report with charts | 2 days |
|
||||
| 5 | Encrypted payload support: accept session keys, decrypt ChaCha20 | 1 day |
|
||||
|
||||
### Non-Goals (v1)
|
||||
|
||||
- Active probing (sending test patterns)
|
||||
- Modifying packets in transit
|
||||
- Automated quality scoring (MOS estimation)
|
||||
- Video support
|
||||
170
docs/PRD-relay-federation.md
Normal file
170
docs/PRD-relay-federation.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# PRD: Relay Federation (Multi-Relay Mesh)
|
||||
|
||||
## Problem
|
||||
|
||||
Currently all participants in a call must connect to the same relay. This creates:
|
||||
- **Single point of failure** — if the relay goes down, the entire call drops
|
||||
- **Geographic latency** — users far from the relay get high RTT
|
||||
- **Capacity limits** — one relay handles all traffic
|
||||
|
||||
Users should be able to connect to their nearest/preferred relay and still talk to users on other relays, as long as the relays are federated.
|
||||
|
||||
## Prerequisite: Fix Relay Identity Persistence
|
||||
|
||||
### Bug: TLS certificate regenerates on every restart
|
||||
|
||||
**Root cause:** `wzp-transport/src/config.rs:17` calls `rcgen::generate_simple_self_signed()` which creates a new keypair every time. The relay's Ed25519 identity seed IS persisted to `~/.wzp/relay-identity`, but the TLS certificate is not derived from it.
|
||||
|
||||
**Impact:** Clients see a different server fingerprint after every relay restart, triggering the "Server Key Changed" warning. This also breaks federation since relays identify each other by certificate fingerprint.
|
||||
|
||||
**Fix:** Derive the TLS certificate from the persisted relay seed:
|
||||
1. Add `server_config_from_seed(seed: &[u8; 32])` to `wzp-transport`
|
||||
2. Use the seed to create a deterministic keypair (e.g., derive an ECDSA key via HKDF from the Ed25519 seed)
|
||||
3. Generate a self-signed cert with that keypair — same seed = same cert = same fingerprint
|
||||
4. The relay passes its loaded seed to `server_config_from_seed()` instead of `server_config()`
|
||||
|
||||
**Effort:** 0.5 day
|
||||
|
||||
## Federation Design
|
||||
|
||||
### Core Concept
|
||||
|
||||
Two or more relays form a **federation mesh**. Each relay is an independent SFU. When relays are configured to trust each other, they bridge rooms with matching names — participants on relay A in room "podcast" hear participants on relay B in room "podcast" as if everyone were on the same relay.
|
||||
|
||||
### Configuration
|
||||
|
||||
Each relay reads a YAML config file (e.g., `~/.wzp/relay.yaml` or `--config relay.yaml`):
|
||||
|
||||
```yaml
|
||||
# Relay identity (auto-generated if missing)
|
||||
listen: 0.0.0.0:4433
|
||||
|
||||
# Federation peers — other relays we trust and bridge rooms with
|
||||
# Both sides must configure each other for federation to work
|
||||
peers:
|
||||
- url: "193.180.213.68:4433"
|
||||
fingerprint: "a5d6:e3c6:5ae7:185c:4eb1:af89:daed:4a43"
|
||||
label: "Pangolin EU"
|
||||
|
||||
- url: "10.0.0.5:4433"
|
||||
fingerprint: "7f2a:b391:0c44:..."
|
||||
label: "Office LAN"
|
||||
```
|
||||
|
||||
**Key rules:**
|
||||
- Both relays must configure each other — **mutual trust** required
|
||||
- A relay that receives a connection from an unknown peer logs: `"Relay a5d6:e3c6:... (193.180.213.68) wants to federate. To accept, add to peers config: url: 193.180.213.68:4433, fingerprint: a5d6:e3c6:..."`
|
||||
- Fingerprints are verified via the TLS certificate (requires the identity fix above)
|
||||
|
||||
### Protocol
|
||||
|
||||
#### Peer Connection
|
||||
|
||||
1. On startup, each relay attempts QUIC connections to all configured peers
|
||||
2. The connection uses SNI `"_federation"` (reserved room name prefix) to distinguish from client connections
|
||||
3. After QUIC handshake, verify the peer's certificate fingerprint matches the configured fingerprint
|
||||
4. If fingerprint mismatch → reject, log warning
|
||||
5. If peer connects but isn't in our config → log the helpful "add to config" message, reject
|
||||
|
||||
#### Room Bridging
|
||||
|
||||
Once two relays are connected:
|
||||
|
||||
1. **Room discovery**: When a local participant joins room "T", the relay sends a `FederationRoomJoin { room: "T" }` signal to all connected peers
|
||||
2. **Room leave**: When the last local participant leaves room "T", send `FederationRoomLeave { room: "T" }`
|
||||
3. **Media forwarding**: For each room that exists on both relays:
|
||||
- Relay A forwards all media packets from its local participants to relay B
|
||||
- Relay B forwards all media packets from its local participants to relay A
|
||||
- Each relay then fans out received federated media to its local participants (same as local SFU forwarding)
|
||||
4. **Participant presence**: `RoomUpdate` signals are merged — local participants + federated participants from all peers
|
||||
|
||||
```
|
||||
Relay A (2 local users) Relay B (1 local user)
|
||||
┌─────────────────────┐ ┌─────────────────────┐
|
||||
│ Room "T" │ │ Room "T" │
|
||||
│ Alice (local) ────┼──media──►│ Charlie (local) │
|
||||
│ Bob (local) ────┼──media──►│ │
|
||||
│ │◄──media──┼── Charlie │
|
||||
│ Charlie (federated)│ │ Alice (federated) │
|
||||
│ │ │ Bob (federated) │
|
||||
└─────────────────────┘ └─────────────────────┘
|
||||
```
|
||||
|
||||
#### Signal Messages (new)
|
||||
|
||||
```rust
|
||||
enum FederationSignal {
|
||||
/// A room exists on this relay with active participants
|
||||
RoomJoin { room: String, participants: Vec<ParticipantInfo> },
|
||||
/// Room is empty on this relay
|
||||
RoomLeave { room: String },
|
||||
/// Participant update for a federated room
|
||||
ParticipantUpdate { room: String, participants: Vec<ParticipantInfo> },
|
||||
}
|
||||
```
|
||||
|
||||
#### Media Forwarding
|
||||
|
||||
Federated media is forwarded as raw QUIC datagrams — the relay doesn't decode/re-encode. Each packet is prefixed with a room identifier so the receiving relay knows which room to fan it out to:
|
||||
|
||||
```
|
||||
[room_hash: 8 bytes][original_media_packet]
|
||||
```
|
||||
|
||||
The 8-byte room hash is computed once when the federation room bridge is established.
|
||||
|
||||
### What Relays DON'T Do
|
||||
|
||||
- **No transcoding** — media passes through as-is. If Alice sends Opus 64k, Charlie receives Opus 64k
|
||||
- **No re-encryption** — packets are already encrypted end-to-end between participants. Relays just forward opaque bytes
|
||||
- **No central coordinator** — each relay independently connects to its configured peers. No master/slave, no consensus protocol
|
||||
- **No automatic peer discovery** — peers must be explicitly configured in YAML
|
||||
|
||||
### Failure Handling
|
||||
|
||||
- If a peer relay goes down, the federation link drops. Local rooms continue to work. Federated participants disappear from presence.
|
||||
- Reconnection: attempt every 30 seconds with exponential backoff up to 5 minutes
|
||||
- If a peer relay restarts with a new identity (bug not fixed), the fingerprint check fails and federation is rejected with a clear error log
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 0: Fix Relay Identity (prerequisite)
|
||||
- Derive TLS cert from persisted seed
|
||||
- Same seed → same cert → same fingerprint across restarts
|
||||
|
||||
### Phase 1: YAML Config + Peer Connection
|
||||
- Add `--config relay.yaml` CLI flag
|
||||
- Parse peers config
|
||||
- On startup, connect to all configured peers via QUIC
|
||||
- Verify certificate fingerprints
|
||||
- Log helpful message for unconfigured peers
|
||||
- Reconnect on disconnect
|
||||
|
||||
### Phase 2: Room Bridging
|
||||
- Track which rooms exist on each peer
|
||||
- Forward media for shared rooms
|
||||
- Merge participant presence across peers
|
||||
- Handle room join/leave signals
|
||||
|
||||
### Phase 3: Resilience
|
||||
- Graceful handling of peer disconnect/reconnect
|
||||
- Don't duplicate packets if a participant is reachable via multiple paths
|
||||
- Rate limiting on federation links (prevent amplification)
|
||||
- Metrics: federated rooms, packets forwarded, peer latency
|
||||
|
||||
## Effort Estimates
|
||||
|
||||
| Phase | Scope | Effort |
|
||||
|-------|-------|--------|
|
||||
| 0 | Fix relay TLS identity from seed | 0.5 day |
|
||||
| 1 | YAML config + peer QUIC connections | 2 days |
|
||||
| 2 | Room bridging + media forwarding + presence merge | 3-4 days |
|
||||
| 3 | Resilience + metrics | 2 days |
|
||||
|
||||
## Non-Goals (v1)
|
||||
|
||||
- Automatic peer discovery (mDNS, DHT, etc.)
|
||||
- Cascading federation (relay A ↔ B ↔ C where A doesn't know C)
|
||||
- Load balancing across relays
|
||||
- Encryption between relays (QUIC provides transport encryption; e2e encryption between participants is orthogonal)
|
||||
- Different rooms on different relays (all federated rooms are bridged by name)
|
||||
459
docs/USER_GUIDE.md
Normal file
459
docs/USER_GUIDE.md
Normal file
@@ -0,0 +1,459 @@
|
||||
# WarzonePhone User Guide
|
||||
|
||||
This guide covers all WarzonePhone client applications: Desktop (Tauri), Android, CLI, and Web.
|
||||
|
||||
## Desktop Client (Tauri)
|
||||
|
||||
The desktop client is a Tauri application with a native Rust audio engine and a web-based UI. It runs on macOS, Windows, and Linux.
|
||||
|
||||
### Connect Screen
|
||||
|
||||
When you launch the desktop client, you see the connect screen with:
|
||||
|
||||
- **Relay selector** -- click the relay button to open the Manage Relays dialog. Shows relay name, address, connection status (verified/new/changed/offline), and RTT latency
|
||||
- **Room** -- enter a room name. Clients in the same room hear each other. Room names are hashed before being sent to the relay for privacy
|
||||
- **Alias** -- your display name shown to other participants
|
||||
- **OS Echo Cancel** -- checkbox to enable macOS VoiceProcessingIO (Apple's FaceTime-grade AEC). Strongly recommended when using speakers
|
||||
- **Connect button** -- connects to the selected relay and joins the room
|
||||
- **Identity info** -- your identicon and fingerprint are shown at the bottom. Click to copy
|
||||
|
||||
Recent rooms are displayed below the form for quick reconnection. Click any recent room to select it and its associated relay.
|
||||
|
||||
### In-Call Screen
|
||||
|
||||
Once connected, the in-call screen shows:
|
||||
|
||||
- **Room name** and **call timer** at the top
|
||||
- **Status indicator** -- green when connected, yellow when reconnecting
|
||||
- **Audio level meter** -- real-time visualization of outgoing audio
|
||||
- **Participant list** -- identicon, alias, and fingerprint for each participant. Your own entry is highlighted with a badge
|
||||
- **Controls** -- Mic toggle, Hang Up, Speaker toggle
|
||||
- **Stats bar** -- TX and RX frame rates
|
||||
|
||||
### Settings Panel
|
||||
|
||||
Open with the gear icon or **Cmd+,** (Ctrl+, on Windows/Linux). Contains:
|
||||
|
||||
#### Connection
|
||||
|
||||
- **Default Room** -- room name used on next connect
|
||||
- **Alias** -- display name
|
||||
|
||||
#### Audio
|
||||
|
||||
- **Quality slider** -- 5 levels:
|
||||
|
||||
| Position | Profile | Description |
|
||||
|----------|---------|-------------|
|
||||
| 0 | Auto | Adaptive quality based on network conditions |
|
||||
| 1 | Opus 24k | Good conditions (28.8 kbps with FEC) |
|
||||
| 2 | Opus 6k | Degraded conditions (9.0 kbps with FEC) |
|
||||
| 3 | Codec2 3.2k | Poor conditions (4.8 kbps with FEC) |
|
||||
| 4 | Codec2 1.2k | Catastrophic conditions (2.4 kbps with FEC) |
|
||||
|
||||
- **OS Echo Cancellation** -- macOS VoiceProcessingIO toggle
|
||||
- **Automatic Gain Control** -- normalize mic volume
|
||||
|
||||
#### Identity
|
||||
|
||||
- **Fingerprint** -- your public identity fingerprint
|
||||
- **Identity file** -- stored at `~/.wzp/identity`
|
||||
|
||||
#### Recent Rooms
|
||||
|
||||
- History of recently joined rooms with relay association
|
||||
- Clear History button
|
||||
|
||||
### Manage Relays Dialog
|
||||
|
||||
Open by clicking the relay selector button on the connect screen:
|
||||
|
||||
- **Relay list** -- each entry shows name, address, identicon (from server fingerprint), lock status, and RTT
|
||||
- **Select** -- click a relay to make it the default
|
||||
- **Remove** -- click the X button to delete a relay
|
||||
- **Add Relay** -- enter name and host:port to add a new relay
|
||||
- **Ping** -- relays are automatically pinged when the dialog opens. RTT and server fingerprint are updated
|
||||
|
||||
### Key Change Warning Dialog
|
||||
|
||||
If a relay's TLS fingerprint has changed since your last connection, a warning dialog appears:
|
||||
|
||||
- Shows the previously known fingerprint and the new fingerprint
|
||||
- **Accept New Key** -- trust the new fingerprint and proceed
|
||||
- **Cancel** -- abort the connection
|
||||
|
||||
This is the TOFU (Trust on First Use) model. Fingerprint changes typically mean the relay was restarted with a new identity. However, they could also indicate a man-in-the-middle attack.
|
||||
|
||||
### Keyboard Shortcuts
|
||||
|
||||
| Shortcut | Action | Context |
|
||||
|----------|--------|---------|
|
||||
| **m** | Toggle microphone | In-call |
|
||||
| **s** | Toggle speaker | In-call |
|
||||
| **q** | Hang up | In-call |
|
||||
| **Cmd+,** (Ctrl+,) | Open/close settings | Any |
|
||||
| **Escape** | Close dialog/settings | Any |
|
||||
| **Enter** | Connect | Connect screen (when room/alias field is focused) |
|
||||
|
||||
### Audio Engine
|
||||
|
||||
The desktop audio engine uses:
|
||||
|
||||
- **CPAL** for audio I/O (CoreAudio on macOS, WASAPI on Windows, ALSA on Linux)
|
||||
- **VoiceProcessingIO** on macOS for OS-level echo cancellation (opt-in via checkbox)
|
||||
- **Lock-free SPSC ring buffers** between audio threads and network threads
|
||||
- **Direct playout** -- no jitter buffer on the client (the relay buffers instead)
|
||||
- Audio callbacks deliver 512 f32 samples at 48 kHz on macOS (accumulated to 960-sample frames for codec)
|
||||
|
||||
#### Audio Quality Notes
|
||||
|
||||
- Always use **Release builds** for real-time audio. Debug builds are too slow for wzp-codec, nnnoiseless, audiopus, and raptorq
|
||||
- VoiceProcessingIO is strongly recommended on macOS. Software AEC does not work well with the round-trip latency (~35-45ms)
|
||||
- The quality slider only affects the **encode** side. Decoding always accepts all codecs
|
||||
|
||||
### Auto-Reconnect
|
||||
|
||||
If the connection drops, the client automatically attempts to reconnect with exponential backoff (1s, 2s, 4s, 8s, capped at 10s). After 5 failed attempts, the client returns to the connect screen. The status dot shows yellow during reconnection.
|
||||
|
||||
## Android Client
|
||||
|
||||
The Android client is built with Kotlin and Jetpack Compose, using JNI to call the Rust audio engine.
|
||||
|
||||
### Call Screen
|
||||
|
||||
The main call screen shows:
|
||||
|
||||
- **Server selector** -- tap to choose from configured servers
|
||||
- **Room name** -- enter the room to join
|
||||
- **Connect/Disconnect** button
|
||||
- **Participant list** with identicons and aliases
|
||||
- **Audio level visualization**
|
||||
- **Mute/Unmute** button
|
||||
|
||||
### Settings Screen
|
||||
|
||||
The settings screen is organized into sections:
|
||||
|
||||
#### Identity
|
||||
|
||||
- **Display Name** -- your alias shown to other participants
|
||||
- **Fingerprint** -- displayed with an identicon. Tap to copy
|
||||
- **Copy Key** -- copy the 64-character hex seed to clipboard for backup
|
||||
- **Restore Key** -- paste a previously backed-up hex seed to restore your identity
|
||||
|
||||
#### Audio Defaults
|
||||
|
||||
- **Voice Volume** -- playout gain slider (-20 dB to +20 dB)
|
||||
- **Mic Gain** -- capture gain slider (-20 dB to +20 dB)
|
||||
- **Echo Cancellation (AEC)** -- toggle Android's built-in AEC. Disable if audio sounds distorted
|
||||
- **Quality slider** -- 8 levels from best to lowest:
|
||||
|
||||
| Position | Profile | Bitrate | Color |
|
||||
|----------|---------|---------|-------|
|
||||
| 0 | Studio 64k | 70.4 kbps | Green |
|
||||
| 1 | Studio 48k | 52.8 kbps | Green |
|
||||
| 2 | Studio 32k | 35.2 kbps | Green |
|
||||
| 3 | Auto | Adaptive | Yellow-green |
|
||||
| 4 | Opus 24k | 28.8 kbps | Yellow-green |
|
||||
| 5 | Opus 6k | 9.0 kbps | Yellow |
|
||||
| 6 | Codec2 3.2k | 4.8 kbps | Orange |
|
||||
| 7 | Codec2 1.2k | 2.4 kbps | Red |
|
||||
|
||||
Note: "Decode always accepts all codecs" -- the quality setting only affects encoding.
|
||||
|
||||
#### Servers
|
||||
|
||||
- **Server chips** -- tap to select, X to remove (built-in servers cannot be removed)
|
||||
- **Add Server** -- enter host, port (default 4433), and optional label
|
||||
- **Force Ping** -- servers are pinged on dialog open to measure RTT
|
||||
|
||||
#### Network
|
||||
|
||||
- **Prefer IPv6** -- toggle to prefer IPv6 connections when available
|
||||
|
||||
#### Room
|
||||
|
||||
- **Default Room** -- the room name pre-filled on the call screen
|
||||
|
||||
### Identity Backup and Restore
|
||||
|
||||
Your identity is a 32-byte seed stored as a 64-character hex string. To back up:
|
||||
|
||||
1. Go to Settings > Identity
|
||||
2. Tap **Copy Key**
|
||||
3. Store the hex string securely
|
||||
|
||||
To restore on a new device:
|
||||
|
||||
1. Go to Settings > Identity
|
||||
2. Tap **Restore Key**
|
||||
3. Paste the 64-character hex string
|
||||
4. Tap **Restore** (key is staged)
|
||||
5. Tap **Save** to apply
|
||||
|
||||
The same seed produces the same fingerprint on any device or platform.
|
||||
|
||||
## CLI Client (wzp-client)
|
||||
|
||||
The CLI client is a command-line tool for testing, recording, and live audio.
|
||||
|
||||
### Usage
|
||||
|
||||
```
|
||||
wzp-client [options] [relay-addr]
|
||||
```
|
||||
|
||||
Default relay address: `127.0.0.1:4433`
|
||||
|
||||
### Flags Reference
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--live` | Live mic/speaker mode. Requires `--features audio` at build time |
|
||||
| `--send-tone <secs>` | Send a 440 Hz test tone for N seconds |
|
||||
| `--send-file <file>` | Send a raw PCM file (48 kHz mono s16le) |
|
||||
| `--record <file.raw>` | Record received audio to raw PCM file |
|
||||
| `--echo-test <secs>` | Run automated echo quality test for N seconds. Produces a windowed analysis with loss%, SNR, correlation |
|
||||
| `--drift-test <secs>` | Run automated clock-drift measurement for N seconds |
|
||||
| `--sweep` | Run jitter buffer parameter sweep (local, no network). Tests different buffer configurations |
|
||||
| `--seed <hex>` | Identity seed as 64 hex characters. Compatible with featherChat |
|
||||
| `--mnemonic <words...>` | Identity seed as BIP39 mnemonic (24 words). All remaining non-flag words are consumed |
|
||||
| `--room <name>` | Room name. Hashed before sending for privacy |
|
||||
| `--token <token>` | featherChat bearer token for relay authentication |
|
||||
| `--metrics-file <path>` | Write JSONL telemetry to file (1 line/sec) |
|
||||
| `--help`, `-h` | Print help and exit |
|
||||
|
||||
### Common Usage Patterns
|
||||
|
||||
#### Connectivity Test (Silence)
|
||||
|
||||
```bash
|
||||
# Send 250 silence frames (5 seconds) and exit
|
||||
wzp-client 127.0.0.1:4433
|
||||
```
|
||||
|
||||
#### Live Audio Call
|
||||
|
||||
```bash
|
||||
# Terminal 1
|
||||
wzp-relay
|
||||
|
||||
# Terminal 2: Alice
|
||||
wzp-client --live --room myroom 127.0.0.1:4433
|
||||
|
||||
# Terminal 3: Bob
|
||||
wzp-client --live --room myroom 127.0.0.1:4433
|
||||
```
|
||||
|
||||
Both capture from mic and play received audio. Press Ctrl+C to stop.
|
||||
|
||||
#### Send Test Tone and Record
|
||||
|
||||
```bash
|
||||
# Terminal 1
|
||||
wzp-relay
|
||||
|
||||
# Terminal 2: Send 10 seconds of 440 Hz tone
|
||||
wzp-client --send-tone 10 127.0.0.1:4433
|
||||
|
||||
# Terminal 3: Record what is received
|
||||
wzp-client --record call.raw 127.0.0.1:4433
|
||||
```
|
||||
|
||||
Play the recording:
|
||||
|
||||
```bash
|
||||
ffplay -f s16le -ar 48000 -ac 1 call.raw
|
||||
```
|
||||
|
||||
#### Send Audio File
|
||||
|
||||
```bash
|
||||
# Convert to raw PCM first
|
||||
ffmpeg -i song.mp3 -f s16le -ar 48000 -ac 1 song.raw
|
||||
|
||||
# Send through relay
|
||||
wzp-client --send-file song.raw 127.0.0.1:4433
|
||||
```
|
||||
|
||||
#### Echo Quality Test
|
||||
|
||||
```bash
|
||||
wzp-relay &
|
||||
wzp-client --echo-test 30 127.0.0.1:4433
|
||||
```
|
||||
|
||||
Produces a windowed analysis showing loss percentage, SNR, correlation, and quality degradation trends.
|
||||
|
||||
#### Clock Drift Test
|
||||
|
||||
```bash
|
||||
wzp-relay &
|
||||
wzp-client --drift-test 60 127.0.0.1:4433
|
||||
```
|
||||
|
||||
Measures clock drift between the send and receive paths over the specified duration.
|
||||
|
||||
#### Jitter Buffer Sweep
|
||||
|
||||
```bash
|
||||
# Runs locally, no network needed
|
||||
wzp-client --sweep
|
||||
```
|
||||
|
||||
Tests different jitter buffer configurations and prints results.
|
||||
|
||||
#### With Identity and Auth
|
||||
|
||||
```bash
|
||||
# Using hex seed
|
||||
wzp-client --seed 0123456789abcdef...64chars --room secure-room --token my-bearer-token relay.example.com:4433
|
||||
|
||||
# Using BIP39 mnemonic
|
||||
wzp-client --mnemonic abandon abandon abandon ... zoo --room secure-room relay.example.com:4433
|
||||
```
|
||||
|
||||
#### With JSONL Telemetry
|
||||
|
||||
```bash
|
||||
wzp-client --live --metrics-file /tmp/call.jsonl relay.example.com:4433
|
||||
```
|
||||
|
||||
Writes one JSON object per second:
|
||||
|
||||
```json
|
||||
{
|
||||
"ts": "2026-04-07T12:00:00Z",
|
||||
"buffer_depth": 45,
|
||||
"underruns": 0,
|
||||
"overruns": 0,
|
||||
"loss_pct": 1.2,
|
||||
"rtt_ms": 34,
|
||||
"jitter_ms": 8,
|
||||
"frames_sent": 50,
|
||||
"frames_received": 49,
|
||||
"quality_profile": "GOOD"
|
||||
}
|
||||
```
|
||||
|
||||
### Audio File Format
|
||||
|
||||
All raw PCM files use:
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Sample rate | 48 kHz |
|
||||
| Channels | 1 (mono) |
|
||||
| Sample format | signed 16-bit little-endian (s16le) |
|
||||
|
||||
Conversion commands:
|
||||
|
||||
```bash
|
||||
# WAV to raw PCM
|
||||
ffmpeg -i input.wav -f s16le -ar 48000 -ac 1 output.raw
|
||||
|
||||
# MP3 to raw PCM
|
||||
ffmpeg -i input.mp3 -f s16le -ar 48000 -ac 1 output.raw
|
||||
|
||||
# Raw PCM to WAV
|
||||
ffmpeg -f s16le -ar 48000 -ac 1 -i input.raw output.wav
|
||||
|
||||
# Play raw PCM
|
||||
ffplay -f s16le -ar 48000 -ac 1 file.raw
|
||||
```
|
||||
|
||||
## Web Client (Browser)
|
||||
|
||||
The web client runs in a browser via the wzp-web bridge server.
|
||||
|
||||
### Setup
|
||||
|
||||
```bash
|
||||
# Start relay
|
||||
wzp-relay
|
||||
|
||||
# Start web bridge
|
||||
wzp-web --port 8080 --relay 127.0.0.1:4433
|
||||
|
||||
# For remote access (requires TLS for mic)
|
||||
wzp-web --port 8443 --relay 127.0.0.1:4433 --tls
|
||||
```
|
||||
|
||||
Open `http://localhost:8080/room-name` (or `https://...` with TLS).
|
||||
|
||||
### Features
|
||||
|
||||
- **Open mic** (default) and **push-to-talk** modes
|
||||
- PTT via on-screen button, mouse hold, or spacebar
|
||||
- Audio level meter
|
||||
- Auto-reconnection on disconnect
|
||||
|
||||
### Audio Processing
|
||||
|
||||
The web client uses AudioWorklet (preferred) with a ScriptProcessorNode fallback:
|
||||
|
||||
- **Capture**: Accumulates Float32 samples into 960-sample (20ms) Int16 frames
|
||||
- **Playback**: Ring buffer capped at 200ms (9600 samples at 48 kHz)
|
||||
|
||||
## Identity System
|
||||
|
||||
### Overview
|
||||
|
||||
Your identity is a 32-byte cryptographic seed that derives:
|
||||
|
||||
- **Ed25519 signing key** -- authenticates handshake messages
|
||||
- **X25519 key agreement key** -- derives shared session encryption keys
|
||||
- **Fingerprint** -- SHA-256 of the public key, truncated to 16 bytes, displayed as `xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx`
|
||||
- **Identicon** -- deterministic visual avatar generated from the fingerprint
|
||||
|
||||
### Seed Sources
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| Auto-generated | Created on first run, stored in `~/.wzp/identity` (desktop/CLI) or app storage (Android) |
|
||||
| `--seed <hex>` | 64-character hex string (CLI) |
|
||||
| `--mnemonic <words>` | 24-word BIP39 mnemonic (CLI) |
|
||||
| Copy Key / Restore Key | Hex backup/restore (Android settings) |
|
||||
|
||||
### BIP39 Mnemonic Backup
|
||||
|
||||
The 32-byte seed can be represented as a 24-word BIP39 mnemonic for human-readable backup. The same mnemonic produces the same identity on any platform or device.
|
||||
|
||||
### featherChat Compatibility
|
||||
|
||||
The identity derivation uses the same HKDF scheme as featherChat (Warzone messenger). The same seed produces the same fingerprint in both systems, allowing a unified identity across messaging and calling.
|
||||
|
||||
### Trust on First Use (TOFU)
|
||||
|
||||
Clients remember the fingerprints of relays and peers they connect to. On subsequent connections, if a fingerprint changes, the client warns the user. This protects against man-in-the-middle attacks but requires manual verification on first contact.
|
||||
|
||||
## Quality Profiles Explained
|
||||
|
||||
### When to Use Each Profile
|
||||
|
||||
| Profile | Total Bandwidth | Best For | Trade-offs |
|
||||
|---------|----------------|----------|------------|
|
||||
| **Studio 64k** | 70.4 kbps | LAN calls, music, podcasting | Highest quality, needs good network |
|
||||
| **Studio 48k** | 52.8 kbps | Good WiFi, wired connections | Near-studio quality |
|
||||
| **Studio 32k** | 35.2 kbps | Reliable WiFi, LTE | Very good quality with lower bandwidth |
|
||||
| **Auto** | Adaptive | Most users | Automatically switches based on network conditions |
|
||||
| **Opus 24k** | 28.8 kbps | General use, moderate networks | Good speech quality, reasonable bandwidth |
|
||||
| **Opus 6k** | 9.0 kbps | 3G networks, congested WiFi | Intelligible speech, some artifacts |
|
||||
| **Codec2 3.2k** | 4.8 kbps | Poor connections | Robotic but intelligible, narrowband |
|
||||
| **Codec2 1.2k** | 2.4 kbps | Satellite links, extreme loss | Minimal intelligibility, last resort |
|
||||
|
||||
### Auto Mode
|
||||
|
||||
Auto mode starts at the **Good (Opus 24k)** profile and adapts based on observed network quality:
|
||||
|
||||
- **Downgrade** -- 3 consecutive bad quality reports (2 on cellular) trigger a step down
|
||||
- **Upgrade** -- 10 consecutive good quality reports trigger a step up (one tier at a time)
|
||||
- **Network handoff** -- switching from WiFi to cellular triggers a preemptive one-tier downgrade plus a 10-second FEC boost
|
||||
|
||||
Auto mode uses three tiers (Good, Degraded, Catastrophic). It does not use the Studio profiles, which must be selected manually.
|
||||
|
||||
### Manual Override
|
||||
|
||||
When you select a specific profile (not Auto), adaptive switching is disabled. The encoder stays at the selected profile regardless of network conditions. This is useful when you know your network quality and want consistent encoding, or when you want to force a specific bitrate.
|
||||
|
||||
Note: The decoder always accepts all codecs. A manual quality selection only affects what you send, not what you receive.
|
||||
394
docs/android/fix-audio-ring-desync.md
Normal file
394
docs/android/fix-audio-ring-desync.md
Normal file
@@ -0,0 +1,394 @@
|
||||
# Fix: AudioRing SPSC Buffer Cursor Desync
|
||||
|
||||
## Problem
|
||||
|
||||
A critical bug causes 10-16 seconds of bidirectional audio silence mid-call (~25-30s in). Both participants go silent at the exact same moment. The QUIC transport, relay, Opus codec, and FEC are all healthy — the bug is in the lock-free ring buffer that transfers decoded PCM from the Rust recv task to the Kotlin AudioTrack playout thread.
|
||||
|
||||
**Root cause:** `AudioRing::write()` modifies `read_pos` from the producer thread during overflow handling (lines 68-72 of `audio_ring.rs`). This violates the SPSC invariant — only the consumer should own `read_pos`. When both threads write to `read_pos`, a race corrupts the cursor state, causing the reader to see an empty or stale buffer for 12-16 seconds.
|
||||
|
||||
**Full forensics:** `debug/INCIDENT-2026-04-06-playout-ring-desync.md`
|
||||
|
||||
---
|
||||
|
||||
## Solution: Reader-Detects-Lap Architecture
|
||||
|
||||
The writer NEVER touches `read_pos`. On overflow, the writer simply overwrites old buffer data and advances `write_pos`. The reader detects it was lapped and self-corrects by snapping its own `read_pos` forward.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Rewrite `AudioRing`
|
||||
|
||||
**File:** `crates/wzp-android/src/audio_ring.rs`
|
||||
|
||||
Replace the entire implementation with:
|
||||
|
||||
**Constants:**
|
||||
```rust
|
||||
/// Ring buffer capacity — must be a power of 2 for bitmask indexing.
|
||||
/// 16384 samples = 341.3ms at 48kHz mono. Provides 70% more headroom
|
||||
/// than the previous 9600 (200ms) for surviving Android GC pauses.
|
||||
const RING_CAPACITY: usize = 16384; // 2^14
|
||||
const RING_MASK: usize = RING_CAPACITY - 1;
|
||||
```
|
||||
|
||||
**Struct:**
|
||||
```rust
|
||||
pub struct AudioRing {
|
||||
buf: Box<[i16; RING_CAPACITY]>,
|
||||
write_pos: AtomicUsize, // monotonically increasing, ONLY written by producer
|
||||
read_pos: AtomicUsize, // monotonically increasing, ONLY written by consumer
|
||||
overflow_count: AtomicU64, // incremented by reader when it detects a lap
|
||||
underrun_count: AtomicU64, // incremented by reader when ring is empty
|
||||
}
|
||||
```
|
||||
|
||||
**`write()` — producer. Does NOT touch `read_pos`:**
|
||||
```rust
|
||||
pub fn write(&self, samples: &[i16]) -> usize {
|
||||
let count = samples.len().min(RING_CAPACITY);
|
||||
let w = self.write_pos.load(Ordering::Relaxed);
|
||||
|
||||
for i in 0..count {
|
||||
unsafe {
|
||||
let ptr = self.buf.as_ptr() as *mut i16;
|
||||
*ptr.add((w + i) & RING_MASK) = samples[i];
|
||||
}
|
||||
}
|
||||
|
||||
self.write_pos.store(w.wrapping_add(count), Ordering::Release);
|
||||
count
|
||||
}
|
||||
```
|
||||
|
||||
**`read()` — consumer. Detects lap, self-corrects:**
|
||||
```rust
|
||||
pub fn read(&self, out: &mut [i16]) -> usize {
|
||||
let w = self.write_pos.load(Ordering::Acquire);
|
||||
let mut r = self.read_pos.load(Ordering::Relaxed);
|
||||
|
||||
let mut avail = w.wrapping_sub(r);
|
||||
|
||||
// Lap detection: writer has overwritten our unread data.
|
||||
// Snap read_pos forward to oldest valid data in the buffer.
|
||||
// Safe because we (the reader) are the sole owner of read_pos.
|
||||
if avail > RING_CAPACITY {
|
||||
r = w.wrapping_sub(RING_CAPACITY);
|
||||
avail = RING_CAPACITY;
|
||||
self.overflow_count.fetch_add(1, Ordering::Relaxed);
|
||||
}
|
||||
|
||||
let count = out.len().min(avail);
|
||||
if count == 0 {
|
||||
if w == r {
|
||||
self.underrun_count.fetch_add(1, Ordering::Relaxed);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
for i in 0..count {
|
||||
out[i] = unsafe { *self.buf.as_ptr().add((r + i) & RING_MASK) };
|
||||
}
|
||||
|
||||
self.read_pos.store(r.wrapping_add(count), Ordering::Release);
|
||||
count
|
||||
}
|
||||
```
|
||||
|
||||
**`available()` — clamped for external callers:**
|
||||
```rust
|
||||
pub fn available(&self) -> usize {
|
||||
let w = self.write_pos.load(Ordering::Acquire);
|
||||
let r = self.read_pos.load(Ordering::Relaxed);
|
||||
w.wrapping_sub(r).min(RING_CAPACITY)
|
||||
}
|
||||
```
|
||||
|
||||
**`free_space()` — keep for API compat:**
|
||||
```rust
|
||||
pub fn free_space(&self) -> usize {
|
||||
RING_CAPACITY.saturating_sub(self.available())
|
||||
}
|
||||
```
|
||||
|
||||
**Diagnostic accessors:**
|
||||
```rust
|
||||
pub fn overflow_count(&self) -> u64 {
|
||||
self.overflow_count.load(Ordering::Relaxed)
|
||||
}
|
||||
|
||||
pub fn underrun_count(&self) -> u64 {
|
||||
self.underrun_count.load(Ordering::Relaxed)
|
||||
}
|
||||
```
|
||||
|
||||
**Constructor:**
|
||||
```rust
|
||||
pub fn new() -> Self {
|
||||
debug_assert!(RING_CAPACITY.is_power_of_two());
|
||||
Self {
|
||||
buf: Box::new([0i16; RING_CAPACITY]),
|
||||
write_pos: AtomicUsize::new(0),
|
||||
read_pos: AtomicUsize::new(0),
|
||||
overflow_count: AtomicU64::new(0),
|
||||
underrun_count: AtomicU64::new(0),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Imports to add:** `use std::sync::atomic::AtomicU64;`
|
||||
|
||||
**Safety comment update:**
|
||||
```rust
|
||||
// SAFETY: AudioRing is SPSC — one thread writes (producer), one reads (consumer).
|
||||
// The producer only writes write_pos. The consumer only writes read_pos.
|
||||
// Neither thread writes the other's cursor. Buffer indices are derived from
|
||||
// the owning thread's cursor, ensuring no concurrent access to the same index.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Add counter fields to `CallStats`
|
||||
|
||||
**File:** `crates/wzp-android/src/stats.rs`
|
||||
|
||||
Add three fields to the `CallStats` struct (after `fec_recovered`):
|
||||
|
||||
```rust
|
||||
/// Playout ring overflow count (reader was lapped by writer).
|
||||
pub playout_overflows: u64,
|
||||
/// Playout ring underrun count (reader found empty buffer).
|
||||
pub playout_underruns: u64,
|
||||
/// Capture ring overflow count.
|
||||
pub capture_overflows: u64,
|
||||
```
|
||||
|
||||
These derive `Default` (= 0) automatically via the existing `#[derive(Default)]`.
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Wire ring diagnostics into engine stats + logging
|
||||
|
||||
**File:** `crates/wzp-android/src/engine.rs`
|
||||
|
||||
**3a.** In `get_stats()` (~line 181), populate the new fields:
|
||||
|
||||
```rust
|
||||
stats.playout_overflows = self.state.playout_ring.overflow_count();
|
||||
stats.playout_underruns = self.state.playout_ring.underrun_count();
|
||||
stats.capture_overflows = self.state.capture_ring.overflow_count();
|
||||
```
|
||||
|
||||
**3b.** In the recv task periodic stats log, add ring health:
|
||||
|
||||
```rust
|
||||
info!(
|
||||
frames_decoded,
|
||||
fec_recovered,
|
||||
recv_errors,
|
||||
max_recv_gap_ms,
|
||||
playout_avail = state.playout_ring.available(),
|
||||
playout_overflows = state.playout_ring.overflow_count(),
|
||||
playout_underruns = state.playout_ring.underrun_count(),
|
||||
"recv stats"
|
||||
);
|
||||
```
|
||||
|
||||
**3c.** In the send task periodic stats log, add capture ring health:
|
||||
|
||||
```rust
|
||||
info!(
|
||||
seq = s,
|
||||
block_id,
|
||||
frames_sent,
|
||||
frames_dropped,
|
||||
send_errors,
|
||||
ring_avail = state.capture_ring.available(),
|
||||
capture_overflows = state.capture_ring.overflow_count(),
|
||||
"send stats"
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Parse new stats in Kotlin
|
||||
|
||||
**File:** `android/app/src/main/java/com/wzp/engine/CallStats.kt`
|
||||
|
||||
Add fields to the data class:
|
||||
|
||||
```kotlin
|
||||
val playoutOverflows: Long = 0,
|
||||
val playoutUnderruns: Long = 0,
|
||||
val captureOverflows: Long = 0,
|
||||
```
|
||||
|
||||
Add parsing in `fromJson()`:
|
||||
|
||||
```kotlin
|
||||
playoutOverflows = obj.optLong("playout_overflows", 0),
|
||||
playoutUnderruns = obj.optLong("playout_underruns", 0),
|
||||
captureOverflows = obj.optLong("capture_overflows", 0),
|
||||
```
|
||||
|
||||
No UI changes needed — these fields will appear in debug report JSON automatically.
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Unit tests
|
||||
|
||||
**File:** `crates/wzp-android/src/audio_ring.rs` — add `#[cfg(test)] mod tests`
|
||||
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn capacity_is_power_of_two() {
|
||||
assert!(RING_CAPACITY.is_power_of_two());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn basic_write_read() {
|
||||
let ring = AudioRing::new();
|
||||
let input: Vec<i16> = (0..960).map(|i| i as i16).collect();
|
||||
ring.write(&input);
|
||||
assert_eq!(ring.available(), 960);
|
||||
|
||||
let mut output = vec![0i16; 960];
|
||||
let read = ring.read(&mut output);
|
||||
assert_eq!(read, 960);
|
||||
assert_eq!(output, input);
|
||||
assert_eq!(ring.available(), 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn wraparound() {
|
||||
let ring = AudioRing::new();
|
||||
let frame = vec![42i16; 960];
|
||||
// Write enough to wrap the buffer multiple times
|
||||
for _ in 0..20 {
|
||||
ring.write(&frame);
|
||||
let mut out = vec![0i16; 960];
|
||||
ring.read(&mut out);
|
||||
assert!(out.iter().all(|&s| s == 42));
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn overflow_detected_by_reader() {
|
||||
let ring = AudioRing::new();
|
||||
// Write more than RING_CAPACITY without reading
|
||||
let big = vec![7i16; RING_CAPACITY + 960];
|
||||
ring.write(&big[..RING_CAPACITY]);
|
||||
ring.write(&big[RING_CAPACITY..]);
|
||||
|
||||
// Reader should detect lap
|
||||
let mut out = vec![0i16; 960];
|
||||
let read = ring.read(&mut out);
|
||||
assert!(read > 0);
|
||||
assert_eq!(ring.overflow_count(), 1);
|
||||
// Data should be from the most recent writes
|
||||
assert!(out.iter().all(|&s| s == 7));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn writer_never_modifies_read_pos() {
|
||||
let ring = AudioRing::new();
|
||||
// Read pos should stay at 0 until read() is called
|
||||
let data = vec![1i16; RING_CAPACITY + 960];
|
||||
ring.write(&data);
|
||||
// read_pos is private, but we can check available() > CAPACITY
|
||||
// which proves write() didn't advance read_pos
|
||||
let w = ring.write_pos.load(std::sync::atomic::Ordering::Relaxed);
|
||||
let r = ring.read_pos.load(std::sync::atomic::Ordering::Relaxed);
|
||||
assert_eq!(r, 0, "write() must not modify read_pos");
|
||||
assert!(w.wrapping_sub(r) > RING_CAPACITY);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn underrun_counted() {
|
||||
let ring = AudioRing::new();
|
||||
let mut out = vec![0i16; 960];
|
||||
let read = ring.read(&mut out);
|
||||
assert_eq!(read, 0);
|
||||
assert_eq!(ring.underrun_count(), 1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn overflow_recovery_reads_recent_data() {
|
||||
let ring = AudioRing::new();
|
||||
// Fill with old data
|
||||
let old = vec![1i16; RING_CAPACITY];
|
||||
ring.write(&old);
|
||||
// Overwrite with new data (lapping the reader)
|
||||
let new_data = vec![99i16; 960];
|
||||
ring.write(&new_data);
|
||||
|
||||
// Reader should snap forward and get recent data
|
||||
let mut out = vec![0i16; RING_CAPACITY];
|
||||
let read = ring.read(&mut out);
|
||||
assert_eq!(read, RING_CAPACITY);
|
||||
// The last 960 samples should be 99
|
||||
assert!(out[RING_CAPACITY - 960..].iter().all(|&s| s == 99));
|
||||
assert_eq!(ring.overflow_count(), 1);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Memory Ordering Reference
|
||||
|
||||
| Operation | Ordering | Rationale |
|
||||
|-----------|----------|-----------|
|
||||
| `write_pos.store` in `write()` | Release | Buffer writes visible before cursor advances |
|
||||
| `write_pos.load` in `read()` | Acquire | Pairs with Release above — sees all buffer writes |
|
||||
| `write_pos.load` in `write()` | Relaxed | Writer is sole owner of write_pos |
|
||||
| `read_pos.load` in `read()` | Relaxed | Reader is sole owner of read_pos |
|
||||
| `read_pos.store` in `read()` | Release | Makes available() consistent from any thread |
|
||||
| `read_pos.load` in `available()` | Relaxed | Informational only, slight staleness OK |
|
||||
| All counters | Relaxed | Diagnostic only |
|
||||
|
||||
---
|
||||
|
||||
## Capacity Tradeoff
|
||||
|
||||
| Capacity | Duration | Memory | Verdict |
|
||||
|----------|----------|--------|---------|
|
||||
| 8192 (2^13) | 170ms | 16KB | Less than current 200ms — risky |
|
||||
| **16384 (2^14)** | **341ms** | **32KB** | **70% more headroom, bitmask indexing** |
|
||||
| 32768 (2^15) | 682ms | 64KB | Excessive latency on overflow recovery |
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
1. `cargo test -p wzp-android` — new unit tests pass
|
||||
2. `cargo ndk -t arm64-v8a build --release -p wzp-android` — ARM cross-compile succeeds
|
||||
3. Build APK, install on both test devices (Nothing A059 + Pixel 6)
|
||||
4. 2+ minute call — verify no audio gaps
|
||||
5. Check debug report JSON: `playout_overflows` should be 0 or very small
|
||||
6. Check logcat `wzp_android` tag: send/recv stats show healthy ring state
|
||||
7. Stress test: play music through one device speaker while on call — forces high ring throughput
|
||||
|
||||
---
|
||||
|
||||
## Files to Modify
|
||||
|
||||
| File | What changes |
|
||||
|------|-------------|
|
||||
| `crates/wzp-android/src/audio_ring.rs` | Complete rewrite — the core fix |
|
||||
| `crates/wzp-android/src/stats.rs` | Add 3 counter fields |
|
||||
| `crates/wzp-android/src/engine.rs` | Wire counters into get_stats() + periodic logs |
|
||||
| `android/app/src/main/java/com/wzp/engine/CallStats.kt` | Parse 3 new JSON fields |
|
||||
|
||||
## What Does NOT Change
|
||||
|
||||
- `AudioPipeline.kt` — calls `readAudio()`/`writeAudio()` unchanged; ring fix is transparent
|
||||
- `jni_bridge.rs` — JNI bridge passes through unchanged
|
||||
- `audio_android.rs` — separate Oboe-based ring, currently unused, different design
|
||||
- Relay code — relay is confirmed healthy
|
||||
- Desktop client — uses `Mutex + mpsc`, not `AudioRing`
|
||||
149
docs/android/fix-capture-thread-crash.md
Normal file
149
docs/android/fix-capture-thread-crash.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# Fix: Capture/Playout Thread Use-After-Free on Hangup
|
||||
|
||||
## Problem
|
||||
|
||||
App crashes (SIGSEGV) when hanging up a call. The capture thread (`wzp-capture`) calls `engine.writeAudio()` via JNI after `teardown()` has freed the native engine handle. Same race exists for the playout thread's `readAudio()`.
|
||||
|
||||
**Root cause:** TOCTOU race between the `nativeHandle == 0L` check in `WzpEngine.writeAudio()`/`readAudio()` and `destroy()` freeing the native memory on the ViewModel thread. Audio threads can't be joined (libcrypto TLS destructor crash), so there's no synchronization between `stopAudio()` and `destroy()`.
|
||||
|
||||
**Full forensics:** `debug/INCIDENT-2026-04-06-capture-thread-use-after-free.md`
|
||||
|
||||
---
|
||||
|
||||
## Solution: Destroy Latch
|
||||
|
||||
Add a `CountDownLatch(2)` that both audio threads count down after exiting their loops. `teardown()` awaits the latch (with timeout) before calling `destroy()`, guaranteeing no in-flight JNI calls.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Add a drain latch to `AudioPipeline`
|
||||
|
||||
**File:** `android/app/src/main/java/com/wzp/audio/AudioPipeline.kt`
|
||||
|
||||
Add a `CountDownLatch` field:
|
||||
|
||||
```kotlin
|
||||
import java.util.concurrent.CountDownLatch
|
||||
import java.util.concurrent.TimeUnit
|
||||
|
||||
class AudioPipeline(private val context: Context) {
|
||||
// ... existing fields ...
|
||||
|
||||
/** Latch counted down by each audio thread after exiting its loop.
|
||||
* stop() does NOT wait on this — teardown waits via awaitDrain(). */
|
||||
private var drainLatch: CountDownLatch? = null
|
||||
```
|
||||
|
||||
In `start()`, create the latch before spawning threads:
|
||||
|
||||
```kotlin
|
||||
fun start(engine: WzpEngine) {
|
||||
if (running) return
|
||||
running = true
|
||||
drainLatch = CountDownLatch(2) // one for capture, one for playout
|
||||
|
||||
captureThread = Thread({
|
||||
runCapture(engine)
|
||||
drainLatch?.countDown() // signal: capture loop exited
|
||||
parkThread()
|
||||
}, "wzp-capture").apply { ... }
|
||||
|
||||
playoutThread = Thread({
|
||||
runPlayout(engine)
|
||||
drainLatch?.countDown() // signal: playout loop exited
|
||||
parkThread()
|
||||
}, "wzp-playout").apply { ... }
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
Add `awaitDrain()` — called by ViewModel before `destroy()`:
|
||||
|
||||
```kotlin
|
||||
/** Block until both audio threads have exited their loops (max 200ms).
|
||||
* After this returns, no more JNI calls to the engine will be made. */
|
||||
fun awaitDrain(): Boolean {
|
||||
return drainLatch?.await(200, TimeUnit.MILLISECONDS) ?: true
|
||||
}
|
||||
```
|
||||
|
||||
`stop()` remains unchanged (non-blocking, sets `running = false`).
|
||||
|
||||
### Step 2: Update `CallViewModel.teardown()` to await drain
|
||||
|
||||
**File:** `android/app/src/main/java/com/wzp/ui/call/CallViewModel.kt`
|
||||
|
||||
Change teardown to wait for audio threads before destroying:
|
||||
|
||||
```kotlin
|
||||
private fun teardown(stopService: Boolean = true) {
|
||||
Log.i(TAG, "teardown: stopping audio, stopService=$stopService")
|
||||
val hadCall = audioStarted
|
||||
CallService.onStopFromNotification = null
|
||||
stopAudio() // sets running=false (non-blocking)
|
||||
stopStatsPolling()
|
||||
|
||||
// Wait for audio threads to exit their loops before destroying the engine.
|
||||
// This guarantees no in-flight JNI calls to writeAudio/readAudio.
|
||||
val drained = audioPipeline?.awaitDrain() ?: true
|
||||
if (!drained) {
|
||||
Log.w(TAG, "teardown: audio threads did not drain in time")
|
||||
}
|
||||
audioPipeline = null
|
||||
|
||||
Log.i(TAG, "teardown: stopping engine")
|
||||
try { engine?.stopCall() } catch (e: Exception) { Log.w(TAG, "stopCall err: $e") }
|
||||
try { engine?.destroy() } catch (e: Exception) { Log.w(TAG, "destroy err: $e") }
|
||||
engine = null
|
||||
engineInitialized = false
|
||||
// ... rest unchanged
|
||||
}
|
||||
```
|
||||
|
||||
**Key change:** `awaitDrain()` is called AFTER `stopAudio()` (which sets `running=false`) but BEFORE `engine?.destroy()`. The latch guarantees both threads have exited their `while(running)` loops and will never call `writeAudio`/`readAudio` again.
|
||||
|
||||
Also move `audioPipeline = null` to after `awaitDrain()` to keep the reference alive for the latch call.
|
||||
|
||||
### Step 3: Move `stopAudio()` pipeline nulling
|
||||
|
||||
**File:** `android/app/src/main/java/com/wzp/ui/call/CallViewModel.kt`
|
||||
|
||||
In `stopAudio()`, do NOT null out the pipeline — let `teardown()` handle it after drain:
|
||||
|
||||
```kotlin
|
||||
private fun stopAudio() {
|
||||
if (!audioStarted) return
|
||||
audioPipeline?.stop() // sets running=false
|
||||
// DON'T null audioPipeline here — teardown() needs it for awaitDrain()
|
||||
audioRouteManager?.unregister()
|
||||
audioRouteManager?.setSpeaker(false)
|
||||
_isSpeaker.value = false
|
||||
audioStarted = false
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files to Modify
|
||||
|
||||
| File | What changes |
|
||||
|------|-------------|
|
||||
| `android/.../audio/AudioPipeline.kt` | Add `CountDownLatch`, `countDown()` in threads, `awaitDrain()` method |
|
||||
| `android/.../ui/call/CallViewModel.kt` | `teardown()` calls `awaitDrain()` before `destroy()`; `stopAudio()` doesn't null pipeline |
|
||||
|
||||
## What Does NOT Change
|
||||
|
||||
- `WzpEngine.kt` — the `nativeHandle == 0L` guard stays as defense-in-depth
|
||||
- `jni_bridge.rs` — `panic::catch_unwind` stays as last resort
|
||||
- `AudioPipeline.stop()` — remains non-blocking
|
||||
- Thread parking — still needed to avoid libcrypto TLS crash
|
||||
|
||||
## Verification
|
||||
|
||||
1. Build APK, install on test device
|
||||
2. Make a call, hang up — verify no crash in logcat (`adb logcat -s AndroidRuntime:E DEBUG:F`)
|
||||
3. Rapid call/hangup/call/hangup cycles — stress the teardown path
|
||||
4. Check logcat for `teardown: audio threads did not drain in time` — should never appear under normal conditions
|
||||
5. Verify debug report still works after hangup (latch doesn't interfere with report collection)
|
||||
Reference in New Issue
Block a user