bench: add criterion benchmarks for protocol, bandwidth, TCP RX scan, and EC-SRP5
Some checks failed
CI / test (push) Failing after 1m42s

Adds four Criterion.rs benchmark suites to measure hot-path performance
and demonstrate the impact of Sprints 1–3 optimizations:

- benches/protocol.rs    — Command & StatusMessage serialize/deserialize
- benches/bandwidth.rs   — BandwidthState atomics, budget, interval math
- benches/tcp_rx_scan.rs — memchr SIMD scan vs naive O(n) loop (55× faster
                           on 256KB buffers with status at end)
- benches/ecsrp5.rs      — WCurve::new() heavy math vs cached LazyLock
                           (~123,000× faster access)

Also adds BENCHMARKS.md with usage instructions and example results.

Visibility changes (bench-only):
- scan_status_message is now pub (was #[cfg(test)] only)
- WCurve and WCURVE are now pub in ecsrp5.rs

dev-dependencies: criterion + pprof (optional flamegraph support)
This commit is contained in:
Siavash Sameni
2026-04-30 21:01:38 +04:00
parent bba9b0512c
commit 3afbfb42cf
9 changed files with 969 additions and 18 deletions

54
BENCHMARKS.md Normal file
View File

@@ -0,0 +1,54 @@
# Benchmarks
This project uses [Criterion.rs](https://bheisler.github.io/criterion.rs/book/) for performance benchmarking and regression detection.
## Running Benchmarks
Run all benchmarks:
```bash
cargo bench
```
Run a specific benchmark suite:
```bash
cargo bench --bench protocol
cargo bench --bench bandwidth
cargo bench --bench tcp_rx_scan
cargo bench --bench ecsrp5
```
Run in "quick" mode (fewer iterations, useful for development):
```bash
cargo bench --bench tcp_rx_scan -- --quick
```
## Benchmark Suites
### `protocol` — Protocol Serialization
Measures the zero-allocation serialization/deserialization of `Command` (16 bytes) and `StatusMessage` (12 bytes) structs.
### `bandwidth` — Bandwidth State Atomics
Measures `BandwidthState` hot-path operations: `fetch_add`, `spend_budget`, `calc_send_interval`, `advance_next_send`, and `summary`.
### `tcp_rx_scan` — TCP RX Status Message Scan
Compares the optimized `memchr`-based scan against the old naive O(n) byte-by-byte loop on 256KB buffers. Key scenarios:
- **All zeros** (common case — data packets contain no status)
- **Status at start**
- **Status at end** (worst case for naive scan)
- **Split messages** (status spans two TCP reads)
### `ecsrp5` — EC-SRP5 Curve Construction
Compares `WCurve::new()` (heavy `BigUint` modular arithmetic) against the cached `&*WCURVE` access to demonstrate the Sprint 1 optimization.
## Interpreting Results
Criterion generates HTML reports in `target/criterion/`. Open `target/criterion/report/index.html` after running benchmarks to view interactive charts.
Example results (Apple M3 Pro, release profile):
| Benchmark | Naive/Uncached | Optimized/Cached | Speedup |
|-----------|---------------|------------------|---------|
| TCP RX scan 256KB (status at end) | 251 µs | 4.5 µs | **~55×** |
| WCurve construction | 126 µs | 1.0 ns | **~123,000×** |
| Command serialize | — | 7.7 ns | — |
| Bandwidth `fetch_add` | — | ~1 ns | — |