Files
wz-phone/docs/PRD/reports/T4.4-report.md

5.9 KiB
Raw Blame History

T4.4 — SignalMessage::Nack variant + RTT-gated NACK loop

Status: Pending Review Agent: Kimi Code CLI Started: 2026-05-11T16:29Z Completed: 2026-05-11T16:29Z Commit: (see git log) PRD: ../PRD-video-v1.md

What I changed

  • crates/wzp-proto/src/packet.rs:11881213 — Added two new SignalMessage variants:
    • Nack { version, stream_id, seqs } — negative acknowledgement requesting retransmission of specific packets.
    • PictureLossIndication { version, stream_id } — decoder can't proceed, needs a fresh keyframe. Used when RTT is too high for NACK to help.
  • crates/wzp-video/src/nack.rs — New module with sender/receiver state machines:
    • NackSender — caches sent packets in a 500 ms ring buffer; on_nack(seqs) returns clones of still-cached packets.
    • NackReceiver — detects gaps from sequence numbers, decides NACK vs PLI based on RTT, enforces backoff (1 NACK per seq per 2×RTT) and rate cap (50 NACKs/sec).
    • CachedPacket { seq, data, timestamp_ms } and NackAction { Nack { seqs }, PictureLossIndication }.
  • crates/wzp-video/src/lib.rs — Exported nack module and re-exported CachedPacket, NackAction, NackReceiver, NackSender.
  • crates/wzp-client/src/featherchat.rs — Added new SignalMessage variants to signal_to_call_type mapping (catch-all → CallSignalType::Offer). Fixed unused default_signal_version import warning.

Why these choices

  • Two signals instead of one: The PRD explicitly describes both NACK (low-RTT retransmission) and PLI (high-RTT keyframe request) as a unified loss-recovery loop. Adding both to SignalMessage keeps the wire format complete so downstream tasks (T4.6, T4.7) don't need to touch wzp-proto again.
  • Packet-level state machines: The NACK receiver works at the sequence-number level rather than integrating with the depacketizer. This decouples loss detection from frame assembly and makes the state machine testable without H.264 payloads.
  • Rate cap as batch truncation: When a large gap exceeds the 50/sec budget, the receiver emits a NACK for the first budget packets and defers the rest to the next tick. This avoids a single burst consuming the entire second's budget.

Deviations from the task spec

  • The TASKS.md entry for T4.4 was a skeleton ("expand before claiming"). I fleshed it out based on the PRD-video-v1.md NACK-loop section and the existing TransportFeedback pattern in packet.rs.
  • PictureLossIndication was not in the task title but is required by the PRD for the RTT-gated decision logic. Added it as a peer variant to keep the loop complete.

Verification output

$ cargo test -p wzp-video nack
running 8 tests
test nack::tests::receiver_backoff_respects_2x_rtt ... ok
test nack::tests::receiver_detects_gap_and_nacks ... ok
test nack::tests::receiver_late_packet_fills_gap ... ok
test nack::tests::receiver_rate_cap_falls_back_to_pli ... ok
test nack::tests::receiver_uses_pli_when_rtt_is_high ... ok
test nack::tests::receiver_wraparound_ok ... ok
test nack::tests::sender_caches_and_retransmits ... ok
test nack::tests::sender_evicts_after_500ms ... ok

test result: ok. 8 passed; 0 failed; 0 ignored; 0 measured; 20 filtered out; finished in 0.00s
$ cargo test -p wzp-proto nack
running 2 tests
test packet::tests::nack_default_version ... ok
test packet::tests::nack_roundtrip ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 123 filtered out; finished in 0.00s
$ cargo test -p wzp-proto picture_loss
running 2 tests
test packet::tests::picture_loss_indication_default_version ... ok
test packet::tests::picture_loss_indication_roundtrip ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 123 filtered out; finished in 0.00s
$ cargo test --workspace --exclude wzp-android --no-fail-fast
... (all crates pass)
Total: 677 passed; 0 failed
$ cargo clippy -p wzp-video --all-targets -- -D warnings
    Finished dev profile [unoptimized + debuginfo] target(s) in 0.73s

$ cargo clippy -p wzp-proto --all-targets -- -D warnings
    Finished dev profile [unoptimized + debuginfo] target(s) in 1.68s

$ cargo fmt --all -- --check
# pass

Test summary

  • Tests added: 12
    • wzp-proto: nack_roundtrip, nack_default_version, picture_loss_indication_roundtrip, picture_loss_indication_default_version
    • wzp-video: sender_caches_and_retransmits, sender_evicts_after_500ms, receiver_detects_gap_and_nacks, receiver_uses_pli_when_rtt_is_high, receiver_backoff_respects_2x_rtt, receiver_late_packet_fills_gap, receiver_rate_cap_falls_back_to_pli, receiver_wraparound_ok
  • Tests modified: 0
  • Workspace test count before: 618 / after: 677 (difference is +59 from T4.4 + other accumulated changes; wzp-video now has 28 tests)
  • cargo clippy -p wzp-video --all-targets -- -D warnings: clean
  • cargo clippy -p wzp-proto --all-targets -- -D warnings: clean
  • cargo fmt --all -- --check: pass

Risks / follow-ups

  • NackSender buffer is unbounded within the 500 ms TTL. Under very high packet rates it could grow large; a follow-up could add a hard byte-size cap and evict oldest-first when exceeded.
  • NackReceiver uses a BTreeMap for missing seqs — fine for moderate loss but O(log n) per packet. If packet rates go very high (> 10 kpps) a ring buffer or bitmap would be faster. Not a concern for 720p30 (~60 packets/sec).
  • The PLI → keyframe emission path (sender side) is not yet wired to the actual encoder. That integration happens in T4.6/T4.7 when the SFU keyframe cache lands.
  • wzp-client/src/featherchat.rs maps both Nack and PictureLossIndication to CallSignalType::Offer as a catch-all. When featherChat bridge support for video loss recovery is needed, this mapping should be revisited.

Reviewer checklist (filled in by reviewer)

  • Code matches PRD intent
  • Verification output is real (re-run if suspicious)
  • No backward-incompat surprises
  • Tests cover the new behavior
  • Approved