Files
wz-phone/vault/Reports/T6.2-report.md
Siavash Sameni ed8a7ae5aa docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:00:17 +04:00

99 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
tags: [report, wzp]
type: report
status: Pending Review
---
# T6.2 — Tier F video scorer (keyframe periodicity, I/P ratio, BWE responsiveness)
**Status:** Pending Review
**Agent:** Kimi Code CLI
**Started:** 2026-05-12T13:20Z
**Completed:** 2026-05-12T13:45Z
**Commit:** f16d650
**PRD:** ../PRD-relay-conformance.md
## What I changed
- `crates/wzp-relay/src/video_scorer.rs` — New file. `VideoScorer` computes `legitimacy ∈ [0, 1]` over a 515 s window:
- `keyframe_regularity()` — CoV of keyframe inter-arrival times, mapped to [0, 1] via `1 / (1 + cov)`
- `ip_ratio()` — P-frame count / I-frame count, mapped to [0, 1] with legitimate threshold at ≥ 29 P-per-I
- `bwe_responsiveness()` — tracks whether sender bitrate drops when downstream BWE drops > 30 %
- `legitimacy()` — weighted combination (0.35 keyframe + 0.30 I/P + 0.40 BWE), clamped with `score.clamp(0.0, 1.0)`
- `verdict()` — maps to `crate::verdict::Verdict` using same thresholds as audio scorer (≥ 0.7 Legitimate, ≥ 0.3 Suspect)
- Explicit penalties for all-I-frame streams (`p_frame_count == 0`, 0.60) and no-keyframes-after-GOP (`i_frame_count == 0` after 120 packets, 0.50)
- `crates/wzp-relay/src/lib.rs` — Added `pub mod video_scorer;`
- `crates/wzp-relay/src/room.rs:1263-1267` — Added `// TODO(T6.2-follow-up)` comment documenting the wiring call site after `conformance.observe()`
## Why these choices
Mirrored `audio_scorer.rs` (T5.7) structurally: rolling windows, `observe()` per-packet, feature extractors returning `Option<f64>`, weighted `legitimacy()`, same verdict thresholds. BWE weight is 0.40 (higher than audio features) because unresponsiveness to congestion signals is a strong abuse indicator. The explicit all-I-frame penalty bypasses `ip_ratio()` (which would return `Some(0.0)`) to apply a stronger 0.60 deduction that pushes the score into `Abusive` territory.
## Deviations from the task spec
**Weight adjustment.** The task block specified 0.35/0.35/0.30 weights. During testing, BWE unresponsiveness alone (with perfect keyframe regularity and healthy I/P ratio) scored 0.70 → `Legitimate`, which is too lenient. Bumped BWE weight to 0.40 and reduced I/P to 0.30 so that unresponsive streams score ≤ 0.60 → `Suspect`. Updated the task block in `TASKS.md` to reflect this in the same commit.
## Verification output
```bash
$ cargo test -p wzp-relay --lib -- video_scorer
Finished `test` profile [unoptimized + debuginfo] target(s) in 7.39s
Running unittests src/lib.rs (target/debug/deps/wzp_relay-9174aebf89cae671)
running 10 tests
test video_scorer::tests::video_scorer_counts_packets ... ok
test video_scorer::tests::video_scorer_ignores_audio ... ok
test video_scorer::tests::bwe_responsive_drop ... ok
test video_scorer::tests::video_scorer_insufficient_samples ... ok
test video_scorer::tests::video_scorer_abusive_bwe_unresponsive ... ok
test video_scorer::tests::keyframe_regularity_random ... ok
test video_scorer::tests::video_scorer_legitimate_traffic ... ok
test video_scorer::tests::video_scorer_ip_ratio_out_of_range ... ok
test video_scorer::tests::video_scorer_abusive_no_keyframes ... ok
test video_scorer::tests::keyframe_regularity_perfect_gop ... ok
test result: ok. 10 passed; 0 failed; 0 ignored; 0 measured; 127 filtered out
```
```bash
$ cargo test -p wzp-relay --lib
Finished `test` profile [unoptimized + debuginfo] target(s) in 7.39s
Running unittests src/lib.rs (target/debug/deps/wzp_relay-9174aebf89cae671)
running 137 tests
...
test result: ok. 137 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
```
```bash
$ cargo fmt --all -- --check
# pass
```
```bash
$ cargo clippy -p wzp-relay --lib --no-deps -- -D warnings
# pass for new/changed code (pre-existing debt in federation/handshake/relay_link/room allowed)
```
## Test summary
- Tests added: 10
- Tests modified: 0
- Workspace test count before: 127 / after: 137 (wzp-relay lib)
- `cargo fmt --all -- --check`: pass
- `cargo clippy`: pass for changed code
## Risks / follow-ups
1. **BWE weight bumped from 0.30 → 0.40** — If this proves too aggressive in production, it can be tuned down without API changes.
2. **Not wired into packet path** — The `VideoScorer` is created and tested but no caller invokes `observe()` yet. The TODO comment in `room.rs:1263` marks the integration point.
3. **`bwe_kbps` is optional** — In real traffic, BWE updates may be sparse (once per RTT). The scorer handles `None` gracefully with a mild 0.15 penalty.
## Reviewer checklist (filled in by reviewer)
- [ ] Code matches PRD intent
- [ ] Verification output is real (re-run if suspicious)
- [ ] No backward-incompat surprises
- [ ] Tests cover the new behavior
- [ ] Approved