Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
122 lines
4.8 KiB
Markdown
122 lines
4.8 KiB
Markdown
---
|
||
tags: [prd, wzp]
|
||
type: prd
|
||
---
|
||
|
||
# PRD: Transport Feedback & Bandwidth Estimator
|
||
|
||
> **Status:** proposed
|
||
> **Resolves:** Audit W6 (no BWE), W14 (no receiver→sender feedback channel).
|
||
> **Depends on:** PRD #1 (wire format v2 — for u32 seq).
|
||
|
||
## Problem
|
||
|
||
`AdaptiveQualityController` decides tier transitions from loss% and RTT only. Quinn exposes congestion-window and bytes-in-flight, but we don't consume them. There is no receiver→sender feedback channel beyond the inline 4-byte `QualityReport`.
|
||
|
||
Consequences:
|
||
- On stable links with spare capacity, we never upgrade past the declared profile (audio stuck at Opus 24 k when 64 k is available).
|
||
- Oscillation between adjacent tiers on the boundary.
|
||
- **No bandwidth-aware adaptation = no usable video.** Video without BWE either oscillates wildly or never uses available capacity.
|
||
|
||
## Goals
|
||
|
||
- Continuous bandwidth estimate per session, surfaced to adaptation controllers.
|
||
- Receiver→sender feedback at ~50 ms cadence carrying ack/nack/remb.
|
||
- Audio benefits immediately (smarter upgrades, fewer oscillations).
|
||
- Video uses BWE as its primary input (PRD #7).
|
||
|
||
## Non-goals
|
||
|
||
- Replacing Quinn's congestion controller — we ride on top.
|
||
- Cross-stream BWE (each session estimates independently for v1).
|
||
|
||
## Design
|
||
|
||
### `SignalMessage::TransportFeedback`
|
||
|
||
New signal variant, sent on the existing signal stream every 50 ms or every N media packets, whichever first:
|
||
|
||
```rust
|
||
pub struct TransportFeedback {
|
||
pub version: u8, // PRD #4 W12: always present
|
||
pub stream_id: u8, // 0 for session-wide; >0 for per-stream
|
||
pub acked_seqs: Vec<u32>, // recent seqs received OK (RLE-compressed)
|
||
pub nacked_seqs: Vec<u32>, // recent seqs missing (RLE-compressed)
|
||
pub remb_bps: u32, // receiver's estimated max bandwidth
|
||
pub recv_time_us: u64, // arrival-time for sender-side jitter calc
|
||
}
|
||
```
|
||
|
||
RLE compression keeps the wire size bounded (typical payload ~50 B).
|
||
|
||
### `BandwidthEstimator` (in `wzp-proto`)
|
||
|
||
```rust
|
||
pub struct BandwidthEstimator {
|
||
cwnd_bps: AtomicU64, // from Quinn path stats
|
||
bytes_in_flight: AtomicU64, // from Quinn path stats
|
||
peer_remb_bps: AtomicU64, // from TransportFeedback
|
||
smoothed_bps: AtomicU64, // EWMA output
|
||
}
|
||
|
||
impl BandwidthEstimator {
|
||
pub fn update_from_quinn(&self, stats: &QuinnPathStats);
|
||
pub fn update_from_peer(&self, fb: &TransportFeedback);
|
||
pub fn target_send_bps(&self) -> u64 {
|
||
// 0.9 × min(cwnd_bps, peer_remb_bps), EWMA-smoothed
|
||
}
|
||
}
|
||
```
|
||
|
||
Three signals fused:
|
||
1. **Quinn cwnd.** Conservative ceiling — sending faster than cwnd just drops or queues.
|
||
2. **Peer REMB.** Receiver's perspective on what they can actually consume (after their own jitter buffer, decode budget, etc.).
|
||
3. **EWMA smoothing.** Half-life ~2 s; avoids oscillation.
|
||
|
||
Target = 90 % of `min(cwnd, remb)`, leaving headroom for probing upward.
|
||
|
||
### Adaptation controller integration
|
||
|
||
`AdaptiveQualityController::tick()` already consumes loss/RTT/jitter. Add BWE input:
|
||
|
||
```rust
|
||
if self.bwe.target_send_bps() > self.current_tier_ceiling_bps() * 1.3
|
||
&& consecutive_upgrade_reports >= UPGRADE_THRESHOLD {
|
||
self.upgrade_one_tier();
|
||
}
|
||
```
|
||
|
||
Upgrade gated on BWE *headroom*, not just clean reports. Eliminates the "always at Opus 24 k on a fiber link" pathology.
|
||
|
||
### Probing
|
||
|
||
To detect unused capacity, sender occasionally adds 5–10 % padding/FEC during otherwise-clean windows. If `cwnd` doesn't drop and `remb` doesn't fall, the headroom is real — upgrade. If signals degrade, back off. Cheap and standard.
|
||
|
||
## Implementation outline
|
||
|
||
1. New `wzp-proto::bwe::BandwidthEstimator`.
|
||
2. `wzp-transport` exposes `QuinnPathStats { cwnd_bps, bytes_in_flight, rtt_ms }`; already partially there via `QuinnPathSnapshot`.
|
||
3. `SignalMessage::TransportFeedback` variant + serde.
|
||
4. Receiver-side: track recent seqs in a ring buffer; emit feedback every 50 ms.
|
||
5. Sender-side: BWE consumes own Quinn stats + incoming feedback.
|
||
6. `AdaptiveQualityController::set_bwe(&BandwidthEstimator)`.
|
||
7. Prometheus: `wzp_session_bwe_bps`, `wzp_session_remb_bps`, `wzp_session_cwnd_bps`.
|
||
8. Probing logic behind a flag for first deployment.
|
||
|
||
## Acceptance criteria
|
||
|
||
- On a shaped 5 Mbps link with Opus 24 k, controller upgrades to Opus 64 k within 30 s.
|
||
- On a shaped 50 kbps link, controller stays at Opus 6 k and does not oscillate.
|
||
- Feedback wire size < 100 B per 50 ms (= < 2 kbps overhead).
|
||
- Probing finds headroom on a 10 Mbps link in < 60 s.
|
||
|
||
## Risks
|
||
|
||
- **Probing-induced loss on already-saturated links.** Mitigation: probe only when smoothed loss < 1 % over 10 s.
|
||
- **Feedback storm under heavy loss.** Mitigation: feedback rate capped at 20 Hz independent of media rate.
|
||
- **Quinn cwnd lies on QUIC-over-some-VPNs.** Mitigation: REMB serves as cross-check; take min of the two.
|
||
|
||
## Effort
|
||
|
||
~4 engineer-days (Wave 2 tasks T2.1–T2.3).
|