Adds four Criterion.rs benchmark suites to measure hot-path performance
and demonstrate the impact of Sprints 1–3 optimizations:
- benches/protocol.rs — Command & StatusMessage serialize/deserialize
- benches/bandwidth.rs — BandwidthState atomics, budget, interval math
- benches/tcp_rx_scan.rs — memchr SIMD scan vs naive O(n) loop (55× faster
on 256KB buffers with status at end)
- benches/ecsrp5.rs — WCurve::new() heavy math vs cached LazyLock
(~123,000× faster access)
Also adds BENCHMARKS.md with usage instructions and example results.
Visibility changes (bench-only):
- scan_status_message is now pub (was #[cfg(test)] only)
- WCurve and WCURVE are now pub in ecsrp5.rs
dev-dependencies: criterion + pprof (optional flamegraph support)
1.9 KiB
Benchmarks
This project uses Criterion.rs for performance benchmarking and regression detection.
Running Benchmarks
Run all benchmarks:
cargo bench
Run a specific benchmark suite:
cargo bench --bench protocol
cargo bench --bench bandwidth
cargo bench --bench tcp_rx_scan
cargo bench --bench ecsrp5
Run in "quick" mode (fewer iterations, useful for development):
cargo bench --bench tcp_rx_scan -- --quick
Benchmark Suites
protocol — Protocol Serialization
Measures the zero-allocation serialization/deserialization of Command (16 bytes) and StatusMessage (12 bytes) structs.
bandwidth — Bandwidth State Atomics
Measures BandwidthState hot-path operations: fetch_add, spend_budget, calc_send_interval, advance_next_send, and summary.
tcp_rx_scan — TCP RX Status Message Scan
Compares the optimized memchr-based scan against the old naive O(n) byte-by-byte loop on 256KB buffers. Key scenarios:
- All zeros (common case — data packets contain no status)
- Status at start
- Status at end (worst case for naive scan)
- Split messages (status spans two TCP reads)
ecsrp5 — EC-SRP5 Curve Construction
Compares WCurve::new() (heavy BigUint modular arithmetic) against the cached &*WCURVE access to demonstrate the Sprint 1 optimization.
Interpreting Results
Criterion generates HTML reports in target/criterion/. Open target/criterion/report/index.html after running benchmarks to view interactive charts.
Example results (Apple M3 Pro, release profile):
| Benchmark | Naive/Uncached | Optimized/Cached | Speedup |
|---|---|---|---|
| TCP RX scan 256KB (status at end) | 251 µs | 4.5 µs | ~55× |
| WCurve construction | 126 µs | 1.0 ns | ~123,000× |
| Command serialize | — | 7.7 ns | — |
Bandwidth fetch_add |
— | ~1 ns | — |