docs: PRDs for P2P direct calls and coordinated codec switching

PRD-p2p-direct.md: STUN-based NAT traversal for direct QUIC connections between clients. True E2E with mutual TLS cert pinning via identity fingerprints. Hybrid mode: try P2P, fall back to relay. 4 phases: STUN discovery, hole punching, P2P adaptive quality, seamless relay-to-P2P migration. PRD-coordinated-codec.md: Relay acts as quality judge — monitors per-participant loss/RTT/jitter, sends quality directives. Downgrade is immediate (match weakest link), upgrade is consensual (all participants must agree, synchronized switch at agreed timestamp). Covers asymmetric encoding in SFU and P2P→relay backporting strategy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:12:12 +04:00
parent b00db5dfdc
commit 898c1ea32b
2 changed files with 344 additions and 0 deletions
--- a/docs/PRD-coordinated-codec.md
+++ b/docs/PRD-coordinated-codec.md
@@ -0,0 +1,198 @@
+# PRD: Coordinated Codec Switching (Relay-Judged Quality)
+
+## Problem
+
+The current adaptive quality system (`QualityAdapter` in call.rs) exists but isn't wired into either engine. Clients encode at a fixed quality chosen at call start. When network conditions change mid-call, audio degrades instead of gracefully stepping down. When conditions improve, clients stay on low quality unnecessarily.
+
+Additionally, in SFU mode with multiple participants, uncoordinated codec switching creates asymmetry: if client A upgrades to 64k while B stays on 24k, bandwidth is wasted. Participants should switch together.
+
+## Solution
+
+The **relay acts as the quality judge** since it sees both sides of every connection. It monitors packet loss, jitter, and RTT per participant, then signals quality recommendations. Clients react to these signals with coordinated codec switches.
+
+## Architecture
+
+```
+┌─────────┐        ┌─────────┐        ┌─────────┐
+│ Client A │◄──────►│  Relay  │◄──────►│ Client B │
+│          │        │ (judge) │        │          │
+│ Encoder  │        │         │        │ Encoder  │
+│ Decoder  │        │ Monitor │        │ Decoder  │
+└─────────┘        │ per-peer│        └─────────┘
+                    │ quality │
+                    └────┬────┘
+                         │
+                    Quality Signals:
+                    - StableSignal (conditions good)
+                    - DegradeSignal (conditions bad)
+                    - UpgradeProposal (try higher quality?)
+                    - UpgradeConfirm (all agreed, switch at T)
+```
+
+## Quality Classification (Relay-Side)
+
+The relay monitors each participant's connection quality:
+
+| Condition | Classification | Action |
+|-----------|---------------|--------|
+| loss >= 15% OR RTT >= 200ms | Critical | Immediate downgrade signal |
+| loss >= 5% OR RTT >= 100ms | Degraded | Downgrade signal after 3 reports |
+| loss < 2% AND RTT < 80ms | Good | Stable signal |
+| loss < 1% AND RTT < 50ms for 30s | Excellent | Upgrade proposal |
+| loss < 0.5% AND RTT < 30ms for 60s | Studio | Studio upgrade proposal |
+
+## Coordinated Switching Protocol
+
+### Downgrade (fast, safety-first)
+
+1. Relay detects degradation for ANY participant
+2. Relay sends `QualityUpdate { recommended_profile: DEGRADED }` to ALL participants
+3. ALL participants immediately switch encoder to the recommended profile
+4. No negotiation — downgrade is mandatory and instant
+
+### Upgrade (slow, consensual)
+
+1. Relay detects sustained good conditions for ALL participants (threshold: 30s stable)
+2. Relay sends `UpgradeProposal { target_profile, switch_timestamp }` to all
+3. Each client responds: `UpgradeAccept` or `UpgradeReject`
+4. If ALL accept within 5s → Relay sends `UpgradeConfirm { profile, switch_at_ms }`
+5. All clients switch encoder at the agreed timestamp (relative to session clock)
+6. If ANY rejects or times out → upgrade cancelled, stay on current profile
+
+### Asymmetric Encoding (SFU optimization)
+
+In SFU mode, each client encodes independently. The relay could allow:
+- Client A (strong connection): encode at 64k
+- Client B (weak connection): encode at 6k
+- Relay forwards A's 64k to B's decoder (auto-switch handles it)
+- B benefits from A's quality without needing to send at 64k
+
+This requires NO protocol changes — just each client independently following the relay's recommendation for their own encoding quality. The decoder already handles any codec.
+
+### Split Network Consideration
+
+If participant A has great quality but participant C has terrible quality:
+- Option 1: **Match weakest link** — everyone encodes at C's level (current approach, simple)
+- Option 2: **Per-participant recommendations** — A encodes at 64k, C encodes at 6k. B (good connection) receives and decodes both. Works because decoders auto-switch per packet.
+- Option 3: **Relay transcoding** — relay re-encodes A's 64k as 6k for C. Adds CPU on relay, but saves bandwidth for C. Future feature.
+
+Recommended: start with Option 1 (match weakest), add Option 2 later.
+
+## Signal Messages (New/Modified)
+
+```rust
+/// Quality signal from relay to client
+QualityDirective {
+    /// Recommended profile to use for encoding
+    recommended_profile: QualityProfile,
+    /// Reason for the recommendation
+    reason: QualityReason,
+}
+
+enum QualityReason {
+    /// Network conditions require this quality level
+    NetworkCondition,
+    /// Coordinated upgrade — all participants agreed
+    CoordinatedUpgrade,
+    /// Coordinated downgrade — weakest link determines level
+    CoordinatedDowngrade,
+}
+
+/// Upgrade proposal from relay
+UpgradeProposal {
+    target_profile: QualityProfile,
+    /// Milliseconds from now when the switch would happen
+    switch_delay_ms: u32,
+}
+
+/// Client response to upgrade proposal
+UpgradeResponse {
+    accepted: bool,
+}
+
+/// Confirmed upgrade — all clients switch at this time
+UpgradeConfirm {
+    profile: QualityProfile,
+    /// Session-relative timestamp to switch (ms since call start)
+    switch_at_session_ms: u64,
+}
+```
+
+## Relay-Side Implementation
+
+### Per-Participant Quality Tracking
+
+```rust
+struct ParticipantQuality {
+    /// Sliding window of recent observations
+    loss_samples: VecDeque<f32>,    // last 30 seconds
+    rtt_samples: VecDeque<u32>,     // last 30 seconds
+    jitter_samples: VecDeque<u32>,
+    /// Current classification
+    classification: QualityClass,
+    /// How long current classification has been stable
+    stable_since: Instant,
+}
+```
+
+### Quality Monitor Task (on relay)
+
+Runs alongside the SFU forwarding loop:
+1. Every 1 second, compute per-participant quality from QUIC connection stats
+2. Classify each participant
+3. If ANY participant degrades → send downgrade to ALL
+4. If ALL participants stable for threshold → propose upgrade
+5. Track upgrade negotiation state
+
+### Integration with Existing Code
+
+The relay already has access to:
+- `QuinnTransport::path_quality()` → loss, RTT, jitter, bandwidth estimates
+- `QualityReport` embedded in media packet headers
+- Per-session metrics in `RelayMetrics`
+
+The quality monitor just needs to read these existing metrics and produce signals.
+
+## Client-Side Implementation
+
+### Handling Quality Signals
+
+In the recv loop (both Android engine and desktop engine):
+```rust
+SignalMessage::QualityDirective { recommended_profile, .. } => {
+    // Immediate: switch encoder to recommended profile
+    encoder.set_profile(recommended_profile)?;
+    fec_enc = create_encoder(&recommended_profile);
+    frame_samples = frame_samples_for(&recommended_profile);
+    info!(codec = ?recommended_profile.codec, "quality directive: switched");
+}
+```
+
+### P2P Quality (simpler case)
+
+For P2P calls (no relay), both clients directly observe quality:
+1. Each client runs its own `QualityAdapter` on the direct connection
+2. When quality changes, client proposes to peer via signal
+3. Simpler negotiation: only 2 parties, no relay middleman
+4. Same coordinated switching logic, just peer-to-peer signals
+
+## Backporting P2P → Relay
+
+The quality monitoring and codec switching logic is identical:
+- **P2P**: client observes quality directly → proposes switch to peer
+- **Relay**: relay observes quality → proposes switch to all clients
+
+The only difference is WHO makes the decision (client vs relay) and HOW many participants need to agree (2 vs N).
+
+Implementation strategy: build for P2P first (simpler, 2 parties), then wrap the same logic with relay-mediated signals for SFU mode.
+
+## Milestones
+
+| Phase | Scope | Effort |
+|-------|-------|--------|
+| 1 | Relay-side quality monitor (per-participant tracking) | 1 day |
+| 2 | Downgrade signal (immediate, match weakest) | 1 day |
+| 3 | Client handling of QualityDirective | 1 day (both engines) |
+| 4 | Upgrade proposal + negotiation protocol | 2 days |
+| 5 | P2P quality adaptation (direct observation) | 1 day |
+| 6 | Per-participant asymmetric encoding (Option 2) | 1 day |
--- a/docs/PRD-p2p-direct.md
+++ b/docs/PRD-p2p-direct.md
@@ -0,0 +1,146 @@
+# PRD: Peer-to-Peer Direct Calls (No Relay)
+
+## Problem
+
+All calls currently route through a relay, even 1-on-1 calls between clients that could reach each other directly. This adds latency (2x hop), creates a single point of failure, and requires trusting the relay operator (even though media is encrypted, the relay sees metadata).
+
+## Solution
+
+For 1-on-1 calls, clients attempt a direct QUIC connection using STUN-discovered addresses. If NAT traversal succeeds, media flows directly between peers. If it fails, fall back to relay-assisted mode (current behavior).
+
+## Architecture
+
+```
+Preferred (P2P):
+  Client A ←──QUIC direct──→ Client B
+  (no relay in media path, true E2E)
+
+Fallback (Relay):
+  Client A ──→ Relay ──→ Client B
+  (current model)
+
+Hybrid discovery:
+  Client A → Relay (signaling only) → Client B
+       ↓                                    ↓
+    STUN server                        STUN server
+       ↓                                    ↓
+    Discover public IP:port          Discover public IP:port
+       ↓                                    ↓
+    Exchange candidates via relay signaling
+       ↓                                    ↓
+    Attempt direct QUIC connection ←──→
+```
+
+## Why P2P = True E2E
+
+- QUIC TLS handshake establishes encrypted tunnel directly between A and B
+- No third party sees the traffic
+- Certificate pinning via identity fingerprints: each client derives their TLS cert from their Ed25519 seed (same as relay identity). During QUIC handshake, both sides verify the peer's cert fingerprint against the known identity
+- MITM elimination: if A knows B's fingerprint (from prior call, QR code, or identity server), any interceptor presents a different cert → fingerprint mismatch → connection rejected
+- Stronger guarantee than relay-assisted: user doesn't need to trust relay operator
+
+## Requirements
+
+### Phase 1: STUN Discovery
+
+1. **STUN client**: lightweight UDP-based STUN client to discover public IP:port
+   - Use existing public STUN servers (stun.l.google.com:19302, etc.)
+   - Or run a STUN server alongside the relay
+   - Discover: local addresses, server-reflexive addresses (STUN), relay candidates (TURN/relay fallback)
+
+2. **Candidate gathering**: on call initiation, gather all candidates:
+   - Host candidates: local network interfaces
+   - Server-reflexive: STUN-discovered public IP:port
+   - Relay candidate: the relay's address (fallback)
+
+3. **Candidate exchange**: via relay signaling channel (existing `IceCandidate` signal message)
+   - A sends candidates to relay → relay forwards to B
+   - B sends candidates to relay → relay forwards to A
+
+### Phase 2: Direct Connection
+
+1. **QUIC hole punching**: both clients simultaneously attempt QUIC connections to each other's candidates
+   - Quinn supports connecting to multiple addresses
+   - First successful connection wins
+   - Timeout after 3 seconds, fall back to relay
+
+2. **Identity verification**: during QUIC handshake, verify peer's TLS cert fingerprint
+   - `server_config_from_seed()` already exists — derive client cert from identity seed
+   - Both sides present certs (mutual TLS)
+   - Verify fingerprint matches expected identity
+
+3. **Media flow**: once connected, use existing `QuinnTransport` for media + signals
+   - Same `send_media()` / `recv_media()` API
+   - Same codec pipeline, FEC, jitter buffer
+   - No code changes needed in the call engine
+
+### Phase 3: Adaptive Quality (P2P)
+
+P2P connections have direct quality visibility — no relay middleman:
+
+1. Both clients observe RTT, loss, jitter directly from QUIC stats
+2. Adapt codec quality based on direct observations
+3. Since only 2 participants, coordinated switching is simple: propose → ack → switch
+
+This is the simplest case for adaptive quality. Once proven, backport the logic to relay-assisted mode.
+
+### Phase 4: Hybrid Mode
+
+1. **Call initiation**: always connect to relay for signaling
+2. **Parallel attempt**: while relay call is active, attempt P2P in background
+3. **Seamless migration**: if P2P succeeds, migrate media path from relay to direct
+   - Both clients switch simultaneously
+   - Relay connection kept alive for signaling (presence, room updates)
+4. **Fallback**: if P2P connection drops, seamlessly fall back to relay
+
+## Security Properties
+
+| Property | Relay Mode | P2P Mode |
+|----------|-----------|----------|
+| Encryption | ChaCha20-Poly1305 (app layer) | QUIC TLS 1.3 + ChaCha20-Poly1305 |
+| Key exchange | Via relay signaling | Direct QUIC handshake |
+| Identity verification | TOFU (server fingerprint) | Mutual TLS cert pinning |
+| Metadata privacy | Relay sees who talks to whom | No third party sees anything |
+| MITM resistance | Depends on relay trust | Strong (cert pinning) |
+| Forward secrecy | ECDH ephemeral keys | QUIC built-in + app-layer rekey |
+
+## Implementation Notes
+
+### STUN in Rust
+
+Use `stun-rs` or `webrtc-rs` crate for STUN client. Minimal: just need Binding Request/Response to discover server-reflexive address.
+
+### Quinn Hole Punching
+
+Quinn's `Endpoint` can both listen and connect. For hole punching:
+```rust
+let endpoint = create_endpoint(bind_addr, Some(server_config))?;
+// Send connect to peer's address (opens NAT pinhole)
+let conn = connect(&endpoint, peer_addr, "peer", client_config).await?;
+// Simultaneously, peer connects to our address
+// First successful handshake wins
+```
+
+### Client TLS Certificate
+
+Already have `server_config_from_seed()` for relays. Create `client_config_from_seed()` that presents a TLS client certificate derived from the identity seed. The peer verifies this cert's fingerprint.
+
+### Signaling via Relay
+
+The existing relay connection carries `IceCandidate` signals. No new infrastructure needed — just use the relay as a dumb signaling pipe for candidate exchange.
+
+## Non-Goals (v1)
+
+- SFU over P2P (P2P is 1-on-1 only; multi-party uses relay SFU)
+- TURN server (relay acts as the fallback, no separate TURN)
+- mDNS local discovery (future)
+- Mesh P2P for multi-party (future, complex)
+
+## Milestones
+
+| Phase | Scope | Effort |
+|-------|-------|--------|
+| 1 | STUN client + candidate gathering | 2 days |
+| 2 | QUIC hole punching + identity verification | 3 days |
+| 3 | Adaptive quality on P2P connection | 2 days |
+| 4 | Hybrid mode (relay + P2P, seamless migration) | 3 days |