bench: add criterion benchmarks for protocol, bandwidth, TCP RX scan, and EC-SRP5

Adds four Criterion.rs benchmark suites to measure hot-path performance and demonstrate the impact of Sprints 1–3 optimizations: - benches/protocol.rs — Command & StatusMessage serialize/deserialize - benches/bandwidth.rs — BandwidthState atomics, budget, interval math - benches/tcp_rx_scan.rs — memchr SIMD scan vs naive O(n) loop (55× faster on 256KB buffers with status at end) - benches/ecsrp5.rs — WCurve::new() heavy math vs cached LazyLock (~123,000× faster access) Also adds BENCHMARKS.md with usage instructions and example results. Visibility changes (bench-only): - scan_status_message is now pub (was #[cfg(test)] only) - WCurve and WCURVE are now pub in ecsrp5.rs dev-dependencies: criterion + pprof (optional flamegraph support)
2026-04-30 21:01:38 +04:00
parent bba9b0512c
commit 3afbfb42cf
9 changed files with 969 additions and 18 deletions
--- a/BENCHMARKS.md
+++ b/BENCHMARKS.md
@@ -0,0 +1,54 @@
+# Benchmarks
+
+This project uses [Criterion.rs](https://bheisler.github.io/criterion.rs/book/) for performance benchmarking and regression detection.
+
+## Running Benchmarks
+
+Run all benchmarks:
+```bash
+cargo bench
+```
+
+Run a specific benchmark suite:
+```bash
+cargo bench --bench protocol
+cargo bench --bench bandwidth
+cargo bench --bench tcp_rx_scan
+cargo bench --bench ecsrp5
+```
+
+Run in "quick" mode (fewer iterations, useful for development):
+```bash
+cargo bench --bench tcp_rx_scan -- --quick
+```
+
+## Benchmark Suites
+
+### `protocol` — Protocol Serialization
+Measures the zero-allocation serialization/deserialization of `Command` (16 bytes) and `StatusMessage` (12 bytes) structs.
+
+### `bandwidth` — Bandwidth State Atomics
+Measures `BandwidthState` hot-path operations: `fetch_add`, `spend_budget`, `calc_send_interval`, `advance_next_send`, and `summary`.
+
+### `tcp_rx_scan` — TCP RX Status Message Scan
+Compares the optimized `memchr`-based scan against the old naive O(n) byte-by-byte loop on 256KB buffers. Key scenarios:
+- **All zeros** (common case — data packets contain no status)
+- **Status at start**
+- **Status at end** (worst case for naive scan)
+- **Split messages** (status spans two TCP reads)
+
+### `ecsrp5` — EC-SRP5 Curve Construction
+Compares `WCurve::new()` (heavy `BigUint` modular arithmetic) against the cached `&*WCURVE` access to demonstrate the Sprint 1 optimization.
+
+## Interpreting Results
+
+Criterion generates HTML reports in `target/criterion/`. Open `target/criterion/report/index.html` after running benchmarks to view interactive charts.
+
+Example results (Apple M3 Pro, release profile):
+
+| Benchmark | Naive/Uncached | Optimized/Cached | Speedup |
+|-----------|---------------|------------------|---------|
+| TCP RX scan 256KB (status at end) | 251 µs | 4.5 µs | **~55×** |
+| WCurve construction | 126 µs | 1.0 ns | **~123,000×** |
+| Command serialize | — | 7.7 ns | — |
+| Bandwidth `fetch_add` | — | ~1 ns | — |