Files
wz-phone/vault/PRDs/PRD-transport-feedback-bwe.md
Siavash Sameni ed8a7ae5aa docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:00:17 +04:00

122 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
tags: [prd, wzp]
type: prd
---
# PRD: Transport Feedback & Bandwidth Estimator
> **Status:** proposed
> **Resolves:** Audit W6 (no BWE), W14 (no receiver→sender feedback channel).
> **Depends on:** PRD #1 (wire format v2 — for u32 seq).
## Problem
`AdaptiveQualityController` decides tier transitions from loss% and RTT only. Quinn exposes congestion-window and bytes-in-flight, but we don't consume them. There is no receiver→sender feedback channel beyond the inline 4-byte `QualityReport`.
Consequences:
- On stable links with spare capacity, we never upgrade past the declared profile (audio stuck at Opus 24 k when 64 k is available).
- Oscillation between adjacent tiers on the boundary.
- **No bandwidth-aware adaptation = no usable video.** Video without BWE either oscillates wildly or never uses available capacity.
## Goals
- Continuous bandwidth estimate per session, surfaced to adaptation controllers.
- Receiver→sender feedback at ~50 ms cadence carrying ack/nack/remb.
- Audio benefits immediately (smarter upgrades, fewer oscillations).
- Video uses BWE as its primary input (PRD #7).
## Non-goals
- Replacing Quinn's congestion controller — we ride on top.
- Cross-stream BWE (each session estimates independently for v1).
## Design
### `SignalMessage::TransportFeedback`
New signal variant, sent on the existing signal stream every 50 ms or every N media packets, whichever first:
```rust
pub struct TransportFeedback {
pub version: u8, // PRD #4 W12: always present
pub stream_id: u8, // 0 for session-wide; >0 for per-stream
pub acked_seqs: Vec<u32>, // recent seqs received OK (RLE-compressed)
pub nacked_seqs: Vec<u32>, // recent seqs missing (RLE-compressed)
pub remb_bps: u32, // receiver's estimated max bandwidth
pub recv_time_us: u64, // arrival-time for sender-side jitter calc
}
```
RLE compression keeps the wire size bounded (typical payload ~50 B).
### `BandwidthEstimator` (in `wzp-proto`)
```rust
pub struct BandwidthEstimator {
cwnd_bps: AtomicU64, // from Quinn path stats
bytes_in_flight: AtomicU64, // from Quinn path stats
peer_remb_bps: AtomicU64, // from TransportFeedback
smoothed_bps: AtomicU64, // EWMA output
}
impl BandwidthEstimator {
pub fn update_from_quinn(&self, stats: &QuinnPathStats);
pub fn update_from_peer(&self, fb: &TransportFeedback);
pub fn target_send_bps(&self) -> u64 {
// 0.9 × min(cwnd_bps, peer_remb_bps), EWMA-smoothed
}
}
```
Three signals fused:
1. **Quinn cwnd.** Conservative ceiling — sending faster than cwnd just drops or queues.
2. **Peer REMB.** Receiver's perspective on what they can actually consume (after their own jitter buffer, decode budget, etc.).
3. **EWMA smoothing.** Half-life ~2 s; avoids oscillation.
Target = 90 % of `min(cwnd, remb)`, leaving headroom for probing upward.
### Adaptation controller integration
`AdaptiveQualityController::tick()` already consumes loss/RTT/jitter. Add BWE input:
```rust
if self.bwe.target_send_bps() > self.current_tier_ceiling_bps() * 1.3
&& consecutive_upgrade_reports >= UPGRADE_THRESHOLD {
self.upgrade_one_tier();
}
```
Upgrade gated on BWE *headroom*, not just clean reports. Eliminates the "always at Opus 24 k on a fiber link" pathology.
### Probing
To detect unused capacity, sender occasionally adds 510 % padding/FEC during otherwise-clean windows. If `cwnd` doesn't drop and `remb` doesn't fall, the headroom is real — upgrade. If signals degrade, back off. Cheap and standard.
## Implementation outline
1. New `wzp-proto::bwe::BandwidthEstimator`.
2. `wzp-transport` exposes `QuinnPathStats { cwnd_bps, bytes_in_flight, rtt_ms }`; already partially there via `QuinnPathSnapshot`.
3. `SignalMessage::TransportFeedback` variant + serde.
4. Receiver-side: track recent seqs in a ring buffer; emit feedback every 50 ms.
5. Sender-side: BWE consumes own Quinn stats + incoming feedback.
6. `AdaptiveQualityController::set_bwe(&BandwidthEstimator)`.
7. Prometheus: `wzp_session_bwe_bps`, `wzp_session_remb_bps`, `wzp_session_cwnd_bps`.
8. Probing logic behind a flag for first deployment.
## Acceptance criteria
- On a shaped 5 Mbps link with Opus 24 k, controller upgrades to Opus 64 k within 30 s.
- On a shaped 50 kbps link, controller stays at Opus 6 k and does not oscillate.
- Feedback wire size < 100 B per 50 ms (= < 2 kbps overhead).
- Probing finds headroom on a 10 Mbps link in < 60 s.
## Risks
- **Probing-induced loss on already-saturated links.** Mitigation: probe only when smoothed loss < 1 % over 10 s.
- **Feedback storm under heavy loss.** Mitigation: feedback rate capped at 20 Hz independent of media rate.
- **Quinn cwnd lies on QUIC-over-some-VPNs.** Mitigation: REMB serves as cross-check; take min of the two.
## Effort
~4 engineer-days (Wave 2 tasks T2.1T2.3).