Files
wz-phone/vault/PRDs/PRD-transport-feedback-bwe.md
Siavash Sameni ed8a7ae5aa docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:00:17 +04:00

4.8 KiB
Raw Blame History

tags, type
tags type
prd
wzp
prd

PRD: Transport Feedback & Bandwidth Estimator

Status: proposed Resolves: Audit W6 (no BWE), W14 (no receiver→sender feedback channel). Depends on: PRD #1 (wire format v2 — for u32 seq).

Problem

AdaptiveQualityController decides tier transitions from loss% and RTT only. Quinn exposes congestion-window and bytes-in-flight, but we don't consume them. There is no receiver→sender feedback channel beyond the inline 4-byte QualityReport.

Consequences:

  • On stable links with spare capacity, we never upgrade past the declared profile (audio stuck at Opus 24 k when 64 k is available).
  • Oscillation between adjacent tiers on the boundary.
  • No bandwidth-aware adaptation = no usable video. Video without BWE either oscillates wildly or never uses available capacity.

Goals

  • Continuous bandwidth estimate per session, surfaced to adaptation controllers.
  • Receiver→sender feedback at ~50 ms cadence carrying ack/nack/remb.
  • Audio benefits immediately (smarter upgrades, fewer oscillations).
  • Video uses BWE as its primary input (PRD #7).

Non-goals

  • Replacing Quinn's congestion controller — we ride on top.
  • Cross-stream BWE (each session estimates independently for v1).

Design

SignalMessage::TransportFeedback

New signal variant, sent on the existing signal stream every 50 ms or every N media packets, whichever first:

pub struct TransportFeedback {
    pub version: u8,            // PRD #4 W12: always present
    pub stream_id: u8,           // 0 for session-wide; >0 for per-stream
    pub acked_seqs: Vec<u32>,    // recent seqs received OK (RLE-compressed)
    pub nacked_seqs: Vec<u32>,   // recent seqs missing (RLE-compressed)
    pub remb_bps: u32,           // receiver's estimated max bandwidth
    pub recv_time_us: u64,       // arrival-time for sender-side jitter calc
}

RLE compression keeps the wire size bounded (typical payload ~50 B).

BandwidthEstimator (in wzp-proto)

pub struct BandwidthEstimator {
    cwnd_bps: AtomicU64,         // from Quinn path stats
    bytes_in_flight: AtomicU64,  // from Quinn path stats
    peer_remb_bps: AtomicU64,    // from TransportFeedback
    smoothed_bps: AtomicU64,     // EWMA output
}

impl BandwidthEstimator {
    pub fn update_from_quinn(&self, stats: &QuinnPathStats);
    pub fn update_from_peer(&self, fb: &TransportFeedback);
    pub fn target_send_bps(&self) -> u64 {
        // 0.9 × min(cwnd_bps, peer_remb_bps), EWMA-smoothed
    }
}

Three signals fused:

  1. Quinn cwnd. Conservative ceiling — sending faster than cwnd just drops or queues.
  2. Peer REMB. Receiver's perspective on what they can actually consume (after their own jitter buffer, decode budget, etc.).
  3. EWMA smoothing. Half-life ~2 s; avoids oscillation.

Target = 90 % of min(cwnd, remb), leaving headroom for probing upward.

Adaptation controller integration

AdaptiveQualityController::tick() already consumes loss/RTT/jitter. Add BWE input:

if self.bwe.target_send_bps() > self.current_tier_ceiling_bps() * 1.3
    && consecutive_upgrade_reports >= UPGRADE_THRESHOLD {
    self.upgrade_one_tier();
}

Upgrade gated on BWE headroom, not just clean reports. Eliminates the "always at Opus 24 k on a fiber link" pathology.

Probing

To detect unused capacity, sender occasionally adds 510 % padding/FEC during otherwise-clean windows. If cwnd doesn't drop and remb doesn't fall, the headroom is real — upgrade. If signals degrade, back off. Cheap and standard.

Implementation outline

  1. New wzp-proto::bwe::BandwidthEstimator.
  2. wzp-transport exposes QuinnPathStats { cwnd_bps, bytes_in_flight, rtt_ms }; already partially there via QuinnPathSnapshot.
  3. SignalMessage::TransportFeedback variant + serde.
  4. Receiver-side: track recent seqs in a ring buffer; emit feedback every 50 ms.
  5. Sender-side: BWE consumes own Quinn stats + incoming feedback.
  6. AdaptiveQualityController::set_bwe(&BandwidthEstimator).
  7. Prometheus: wzp_session_bwe_bps, wzp_session_remb_bps, wzp_session_cwnd_bps.
  8. Probing logic behind a flag for first deployment.

Acceptance criteria

  • On a shaped 5 Mbps link with Opus 24 k, controller upgrades to Opus 64 k within 30 s.
  • On a shaped 50 kbps link, controller stays at Opus 6 k and does not oscillate.
  • Feedback wire size < 100 B per 50 ms (= < 2 kbps overhead).
  • Probing finds headroom on a 10 Mbps link in < 60 s.

Risks

  • Probing-induced loss on already-saturated links. Mitigation: probe only when smoothed loss < 1 % over 10 s.
  • Feedback storm under heavy loss. Mitigation: feedback rate capped at 20 Hz independent of media rate.
  • Quinn cwnd lies on QUIC-over-some-VPNs. Mitigation: REMB serves as cross-check; take min of the two.

Effort

~4 engineer-days (Wave 2 tasks T2.1T2.3).