Files
wz-phone/docs/PRD-network-awareness.md
Siavash Sameni 8fcf1be341
Some checks failed
Mirror to GitHub / mirror (push) Failing after 23s
Build Release Binaries / build-amd64 (push) Failing after 6m8s
feat(nat): Tailscale-inspired STUN/ICE + port mapping + mid-call re-gathering (#28)
Phase 8: 5 new modules bringing NAT traversal close to Tailscale's approach.

- stun.rs: RFC 5389 STUN client — public server reflexive discovery,
  XOR-MAPPED-ADDRESS parsing, parallel probe with retry, STUN fallback
  in desktop try_reflect_own_addr()
- portmap.rs: NAT-PMP (RFC 6886) + PCP (RFC 6887) + UPnP IGD port
  mapping — gateway discovery, acquire/release/refresh lifecycle,
  new PeerCandidates.mapped candidate type in dial order
- ice_agent.rs: candidate lifecycle — gather(), re_gather(),
  apply_peer_update() with monotonic generation counter,
  CandidateUpdate signal message forwarded by relay
- netcheck.rs: comprehensive diagnostic — NAT type, IPv4/v6,
  port mapping availability, relay latencies, CLI --netcheck
- relay_map.rs: RTT-sorted relay map, preferred() selection,
  populate_from_ack() for RegisterPresenceAck.available_relays

Relay: CallRegistry stores + cross-wires caller/callee_mapped_addr
into CallSetup.peer_mapped_addr. Region config + available_relays
populated from federation peers in RegisterPresenceAck.

Desktop: place_call/answer_call call acquire_port_mapping() and
fill caller/callee_mapped_addr. STUN+relay combined NAT detection.

571 tests pass (66 new), 0 regressions, 0 warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:17:17 +04:00

140 lines
7.4 KiB
Markdown

# PRD: Network Awareness
> Phase: Implemented (core path)
> Status: Ready for testing
> Platform: Android native Kotlin app (com.wzp)
## Problem
WarzonePhone's quality controller (`AdaptiveQualityController`) had a `signal_network_change()` API for proactive adaptation to WiFi↔cellular transitions, but nothing called it. Network handoffs during calls were only detected reactively via jitter spikes — by which time the user had already experienced degraded audio.
## Solution
Integrate Android's `ConnectivityManager.NetworkCallback` to detect network transport changes in real-time and feed them to the quality controller. This enables:
1. **Preemptive quality downgrade** when switching from WiFi to cellular
2. **FEC boost** (10-second window with +0.2 ratio) after any network change
3. **Faster downgrade thresholds** on cellular (2 consecutive reports vs 3 on WiFi)
## Architecture
```
┌──────────────────────────────────────────────────────────────┐
│ Android │
│ │
│ ConnectivityManager │
│ │ NetworkCallback │
│ ▼ │
│ NetworkMonitor.kt │
│ │ onNetworkChanged(type, bandwidthKbps) │
│ ▼ │
│ CallViewModel.kt ──► WzpEngine.onNetworkChanged() │
│ │ JNI │
│ ▼ │
│ jni_bridge.rs: nativeOnNetworkChanged(handle, type, bw) │
│ │ │
│ ▼ │
│ engine.rs: state.pending_network_type.store(type) │
│ │ AtomicU8 (lock-free) │
│ ▼ │
│ recv task: quality_ctrl.signal_network_change(ctx) │
│ │ │
│ ├─ Preemptive downgrade (WiFi → cellular) │
│ ├─ FEC boost 10s │
│ └─ Faster cellular thresholds │
└──────────────────────────────────────────────────────────────┘
```
## Network Classification
`NetworkMonitor` classifies the active transport without requiring `READ_PHONE_STATE` permission by using bandwidth heuristics:
| Downstream Bandwidth | Classification | Rust `NetworkContext` |
|----------------------|---------------|----------------------|
| N/A (WiFi transport) | WiFi | `WiFi` |
| >= 100 Mbps | 5G NR | `Cellular5g` |
| >= 10 Mbps | LTE | `CellularLte` |
| < 10 Mbps | 3G or worse | `Cellular3g` |
| Ethernet | WiFi (equivalent) | `WiFi` |
| Network lost | None | `Unknown` |
## Cross-Task Signaling
The network type is communicated from the JNI thread to the recv task via `AtomicU8` — the same pattern used for `pending_profile` (adaptive quality profile switches):
```
JNI thread recv task (tokio)
│ │
│ store(type, Release) │
│──────────────────────────────►│
│ │ swap(0xFF, Acquire)
│ │ if != 0xFF:
│ │ quality_ctrl.signal_network_change(ctx)
│ │
```
Sentinel value `0xFF` means "no change pending". The recv task polls on every received packet (~20-40ms), so latency is bounded by the inter-packet interval.
## Components
### New File
| File | Purpose |
|------|---------|
| `android/.../net/NetworkMonitor.kt` | ConnectivityManager callback, transport classification, deduplication |
### Modified Files
| File | Change |
|------|--------|
| `android/.../engine/WzpEngine.kt` | Added `onNetworkChanged()` method + `nativeOnNetworkChanged` external |
| `android/.../ui/call/CallViewModel.kt` | Instantiates NetworkMonitor, wires callback, register/unregister lifecycle |
| `crates/wzp-android/src/jni_bridge.rs` | Added `Java_com_wzp_engine_WzpEngine_nativeOnNetworkChanged` JNI entry |
| `crates/wzp-android/src/engine.rs` | Added `pending_network_type: AtomicU8` to EngineState, recv task polls it |
### Unchanged (already implemented)
| File | API |
|------|-----|
| `crates/wzp-proto/src/quality.rs` | `AdaptiveQualityController::signal_network_change(NetworkContext)` |
| `crates/wzp-transport/src/path_monitor.rs` | `PathMonitor::detect_handoff()` (available for future use) |
## Deferred Work
### Tauri Desktop App (com.wzp.desktop)
~~The Tauri engine doesn't use `AdaptiveQualityController` — quality is resolved once at call start.~~ **Update (2026-04-13):** Desktop now has `AdaptiveQualityController` wired into the recv task with `pending_profile` AtomicU8 bridge. Network monitoring on desktop is now feasible — the blocker was adaptive quality, which is done. Remaining work: platform-specific network change detection (macOS: `SCNetworkReachability` or `NWPathMonitor`; Linux: `netlink` socket).
### Mid-Call ICE Re-gathering — PARTIALLY IMPLEMENTED (2026-04-14)
When the device's IP address changes, the system now:
1. Re-gather local host candidates (`local_host_candidates()`) ✅
2. Re-probe STUN (`stun::discover_reflexive()` + `portmap::acquire_port_mapping()`) ✅
3. Send updated candidates to the peer (`CandidateUpdate` signal message) ✅
4. Relay forwards `CandidateUpdate` to peer (same pattern as `MediaPathReport`) ✅
5. Peer receives and can parse via `IceAgent::apply_peer_update()`
6. Attempt new dual-path race for path upgrade — **NOT YET WIRED** (transport hot-swap)
`NetworkMonitor.onIpChanged` fires on `onLinkPropertiesChanged` — the hook is ready.
The signaling plane is fully implemented via `IceAgent` + `CandidateUpdate`.
Remaining: wire `onIpChanged` → JNI → `pending_ice_regather` AtomicBool → recv task → `ice_agent.re_gather()` → transport swap.
New modules added in Phase 8 (Tailscale-inspired):
- `crates/wzp-client/src/ice_agent.rs` — candidate lifecycle management
- `crates/wzp-client/src/stun.rs` — public STUN server probing (independent of relay)
- `crates/wzp-client/src/portmap.rs` — NAT-PMP/PCP/UPnP port mapping
- `crates/wzp-client/src/netcheck.rs` — comprehensive network diagnostic
## Testing
1. Build native APK
2. Start a call on WiFi
3. Verify logcat: `quality controller: network context updated` with `ctx=WiFi`
4. Disable WiFi → device falls to cellular
5. Verify logcat: `ctx=CellularLte` (or `Cellular5g`/`Cellular3g`)
6. Verify FEC boost activates (check quality_ctrl logs)
7. Verify preemptive quality downgrade (tier drops one level on WiFi→cellular)
8. Re-enable WiFi → verify transition back
9. Rapid WiFi toggle (5x in 10s) → verify no crashes, deduplication works
10. Airplane mode → verify `onLost` fires with `TYPE_NONE`