4.6 KiB
T6.2 — Tier F video scorer (keyframe periodicity, I/P ratio, BWE responsiveness)
Status: Pending Review
Agent: Kimi Code CLI
Started: 2026-05-12T13:20Z
Completed: 2026-05-12T13:45Z
Commit: f16d650
PRD: ../PRD-relay-conformance.md
What I changed
crates/wzp-relay/src/video_scorer.rs— New file.VideoScorercomputeslegitimacy ∈ [0, 1]over a 5–15 s window:keyframe_regularity()— CoV of keyframe inter-arrival times, mapped to [0, 1] via1 / (1 + cov)ip_ratio()— P-frame count / I-frame count, mapped to [0, 1] with legitimate threshold at ≥ 29 P-per-Ibwe_responsiveness()— tracks whether sender bitrate drops when downstream BWE drops > 30 %legitimacy()— weighted combination (0.35 keyframe + 0.30 I/P + 0.40 BWE), clamped withscore.clamp(0.0, 1.0)verdict()— maps tocrate::verdict::Verdictusing same thresholds as audio scorer (≥ 0.7 Legitimate, ≥ 0.3 Suspect)- Explicit penalties for all-I-frame streams (
p_frame_count == 0, −0.60) and no-keyframes-after-GOP (i_frame_count == 0after 120 packets, −0.50)
crates/wzp-relay/src/lib.rs— Addedpub mod video_scorer;crates/wzp-relay/src/room.rs:1263-1267— Added// TODO(T6.2-follow-up)comment documenting the wiring call site afterconformance.observe()
Why these choices
Mirrored audio_scorer.rs (T5.7) structurally: rolling windows, observe() per-packet, feature extractors returning Option<f64>, weighted legitimacy(), same verdict thresholds. BWE weight is 0.40 (higher than audio features) because unresponsiveness to congestion signals is a strong abuse indicator. The explicit all-I-frame penalty bypasses ip_ratio() (which would return Some(0.0)) to apply a stronger −0.60 deduction that pushes the score into Abusive territory.
Deviations from the task spec
Weight adjustment. The task block specified 0.35/0.35/0.30 weights. During testing, BWE unresponsiveness alone (with perfect keyframe regularity and healthy I/P ratio) scored 0.70 → Legitimate, which is too lenient. Bumped BWE weight to 0.40 and reduced I/P to 0.30 so that unresponsive streams score ≤ 0.60 → Suspect. Updated the task block in TASKS.md to reflect this in the same commit.
Verification output
$ cargo test -p wzp-relay --lib -- video_scorer
Finished `test` profile [unoptimized + debuginfo] target(s) in 7.39s
Running unittests src/lib.rs (target/debug/deps/wzp_relay-9174aebf89cae671)
running 10 tests
test video_scorer::tests::video_scorer_counts_packets ... ok
test video_scorer::tests::video_scorer_ignores_audio ... ok
test video_scorer::tests::bwe_responsive_drop ... ok
test video_scorer::tests::video_scorer_insufficient_samples ... ok
test video_scorer::tests::video_scorer_abusive_bwe_unresponsive ... ok
test video_scorer::tests::keyframe_regularity_random ... ok
test video_scorer::tests::video_scorer_legitimate_traffic ... ok
test video_scorer::tests::video_scorer_ip_ratio_out_of_range ... ok
test video_scorer::tests::video_scorer_abusive_no_keyframes ... ok
test video_scorer::tests::keyframe_regularity_perfect_gop ... ok
test result: ok. 10 passed; 0 failed; 0 ignored; 0 measured; 127 filtered out
$ cargo test -p wzp-relay --lib
Finished `test` profile [unoptimized + debuginfo] target(s) in 7.39s
Running unittests src/lib.rs (target/debug/deps/wzp_relay-9174aebf89cae671)
running 137 tests
...
test result: ok. 137 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
$ cargo fmt --all -- --check
# pass
$ cargo clippy -p wzp-relay --lib --no-deps -- -D warnings
# pass for new/changed code (pre-existing debt in federation/handshake/relay_link/room allowed)
Test summary
- Tests added: 10
- Tests modified: 0
- Workspace test count before: 127 / after: 137 (wzp-relay lib)
cargo fmt --all -- --check: passcargo clippy: pass for changed code
Risks / follow-ups
- BWE weight bumped from 0.30 → 0.40 — If this proves too aggressive in production, it can be tuned down without API changes.
- Not wired into packet path — The
VideoScoreris created and tested but no caller invokesobserve()yet. The TODO comment inroom.rs:1263marks the integration point. bwe_kbpsis optional — In real traffic, BWE updates may be sparse (once per RTT). The scorer handlesNonegracefully with a mild 0.15 penalty.
Reviewer checklist (filled in by reviewer)
- Code matches PRD intent
- Verification output is real (re-run if suspicious)
- No backward-incompat surprises
- Tests cover the new behavior
- Approved