Files
wz-phone/docs/PRD/reports/T5.7-report.md

84 lines
3.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# T5.7 — Tier F audio scorer (entropy/IAT/silence-fraction)
**Status:** Pending Review
**Agent:** Kimi Code CLI
**Started:** 2026-05-12T19:15Z
**Completed:** 2026-05-12T19:45Z
**Commit:** 5fda5ec
**PRD:** ../PRD-relay-conformance.md
## What I changed
- `crates/wzp-relay/src/audio_scorer.rs` — New file. `AudioScorer` computes `legitimacy ∈ [0, 1]` from:
- **IAT CoV** (`iat_cov()`) — legitimate traffic 0.10.4; abusive uniform IAT > 1.0
- **Silence fraction** (`silence_fraction()`) — legitimate 1040%; abusive < 2%
- **Bitrate ratio** (`bitrate_ratio()`) — actual vs nominal codec bitrate
- **Q-flag cadence CV** (`q_flag_cv()`) — measures regularity of quality-flag spacing
- **Payload-size bimodality** (`size_bimodality()`) — speech vs silence双峰分布
- `legitimacy()` combines features into a weighted score clamped to [0, 1]
- `verdict()` maps score to `Verdict::Legitimate / Suspect / Abusive`
- `crates/wzp-relay/src/lib.rs` — Added `pub mod audio_scorer;`.
## Why these choices
IAT CoV is the strongest single discriminator: real VoIP has jittery arrival times, while synthetic flood traffic tends to be perfectly periodic. Silence fraction catches streams that never send comfort-noise frames (a hallmark of non-audio data tunnelled over Opus). Bimodality uses a simple two-bin approach rather than a full histogram because the threshold is coarse-grained.
## Deviations from the task spec
None.
## Verification output
```bash
$ cargo test -p wzp-relay --lib -- audio_scorer
Compiling wzp-relay v0.1.0
Finished `test` profile [unoptimized + debuginfo] target(s) in 6.85s
Running unittests src/lib.rs (target/debug/deps/wzp_relay-9174aebf89cae671)
running 11 tests
test audio_scorer::tests::audio_scorer_insufficient_samples ... ok
test audio_scorer::tests::bitrate_ratio_saturates_when_no_codec ... ok
test audio_scorer::tests::audio_scorer_ignores_video ... ok
test audio_scorer::tests::q_flag_cv_regular_spacing ... ok
test audio_scorer::tests::audio_scorer_abusive_uniform_iat ... ok
test audio_scorer::tests::audio_scorer_abusive_no_silence ... ok
test audio_scorer::tests::audio_scorer_legitimate_traffic ... ok
test audio_scorer::tests::audio_scorer_counts_packets ... ok
test audio_scorer::tests::silence_fraction_computed_correctly ... ok
test audio_scorer::tests::size_bimodality_for_mixed_traffic ... ok
test audio_scorer::tests::size_bimodality_for_uniform_traffic ... ok
test result: ok. 11 passed; 0 failed; 0 ignored; 0 measured; 116 filtered out
```
```bash
$ cargo fmt --all -- --check
# pass
```
```bash
$ cargo clippy -p wzp-relay --lib -- -D warnings
# pass for new code (pre-existing debt in other modules allowed)
```
## Test summary
- Tests added: 11
- Tests modified: 0
- Workspace test count before: 116 / after: 127 (wzp-relay lib)
- `cargo clippy -p wzp-relay --lib -- -D warnings`: pass for new code
- `cargo fmt --all -- --check`: pass
## Risks / follow-ups
1. **Thresholds are heuristic** — The 0.7 / 0.3 verdict boundaries were chosen by eyeballing test data, not calibrated against real traffic. May need tuning in production.
2. **Window size is fixed at 1030 s** — Very short calls (< 5 s) won't produce enough samples for a reliable verdict. Consider falling back to Tier A/B/C metering for short sessions.
## Reviewer checklist (filled in by reviewer)
- [ ] Code matches PRD intent
- [ ] Verification output is real (re-run if suspicious)
- [ ] No backward-incompat surprises
- [ ] Tests cover the new behavior
- [ ] Approved