Files
wz-phone/docs/PRD/reports/T5.7-report.md

3.5 KiB
Raw Blame History

T5.7 — Tier F audio scorer (entropy/IAT/silence-fraction)

Status: Pending Review Agent: Kimi Code CLI Started: 2026-05-12T19:15Z Completed: 2026-05-12T19:45Z Commit: 5fda5ec PRD: ../PRD-relay-conformance.md

What I changed

  • crates/wzp-relay/src/audio_scorer.rs — New file. AudioScorer computes legitimacy ∈ [0, 1] from:
    • IAT CoV (iat_cov()) — legitimate traffic 0.10.4; abusive uniform IAT > 1.0
    • Silence fraction (silence_fraction()) — legitimate 1040%; abusive < 2%
    • Bitrate ratio (bitrate_ratio()) — actual vs nominal codec bitrate
    • Q-flag cadence CV (q_flag_cv()) — measures regularity of quality-flag spacing
    • Payload-size bimodality (size_bimodality()) — speech vs silence双峰分布
    • legitimacy() combines features into a weighted score clamped to [0, 1]
    • verdict() maps score to Verdict::Legitimate / Suspect / Abusive
  • crates/wzp-relay/src/lib.rs — Added pub mod audio_scorer;.

Why these choices

IAT CoV is the strongest single discriminator: real VoIP has jittery arrival times, while synthetic flood traffic tends to be perfectly periodic. Silence fraction catches streams that never send comfort-noise frames (a hallmark of non-audio data tunnelled over Opus). Bimodality uses a simple two-bin approach rather than a full histogram because the threshold is coarse-grained.

Deviations from the task spec

None.

Verification output

$ cargo test -p wzp-relay --lib -- audio_scorer
   Compiling wzp-relay v0.1.0
    Finished `test` profile [unoptimized + debuginfo] target(s) in 6.85s
     Running unittests src/lib.rs (target/debug/deps/wzp_relay-9174aebf89cae671)

running 11 tests
test audio_scorer::tests::audio_scorer_insufficient_samples ... ok
test audio_scorer::tests::bitrate_ratio_saturates_when_no_codec ... ok
test audio_scorer::tests::audio_scorer_ignores_video ... ok
test audio_scorer::tests::q_flag_cv_regular_spacing ... ok
test audio_scorer::tests::audio_scorer_abusive_uniform_iat ... ok
test audio_scorer::tests::audio_scorer_abusive_no_silence ... ok
test audio_scorer::tests::audio_scorer_legitimate_traffic ... ok
test audio_scorer::tests::audio_scorer_counts_packets ... ok
test audio_scorer::tests::silence_fraction_computed_correctly ... ok
test audio_scorer::tests::size_bimodality_for_mixed_traffic ... ok
test audio_scorer::tests::size_bimodality_for_uniform_traffic ... ok

test result: ok. 11 passed; 0 failed; 0 ignored; 0 measured; 116 filtered out
$ cargo fmt --all -- --check
# pass
$ cargo clippy -p wzp-relay --lib -- -D warnings
# pass for new code (pre-existing debt in other modules allowed)

Test summary

  • Tests added: 11
  • Tests modified: 0
  • Workspace test count before: 116 / after: 127 (wzp-relay lib)
  • cargo clippy -p wzp-relay --lib -- -D warnings: pass for new code
  • cargo fmt --all -- --check: pass

Risks / follow-ups

  1. Thresholds are heuristic — The 0.7 / 0.3 verdict boundaries were chosen by eyeballing test data, not calibrated against real traffic. May need tuning in production.
  2. Window size is fixed at 1030 s — Very short calls (< 5 s) won't produce enough samples for a reliable verdict. Consider falling back to Tier A/B/C metering for short sessions.

Reviewer checklist (filled in by reviewer)

  • Code matches PRD intent
  • Verification output is real (re-run if suspicious)
  • No backward-incompat surprises
  • Tests cover the new behavior
  • Approved