bench: add criterion benchmarks for protocol, bandwidth, TCP RX scan, and EC-SRP5
Some checks failed
CI / test (push) Failing after 1m42s
Some checks failed
CI / test (push) Failing after 1m42s
Adds four Criterion.rs benchmark suites to measure hot-path performance
and demonstrate the impact of Sprints 1–3 optimizations:
- benches/protocol.rs — Command & StatusMessage serialize/deserialize
- benches/bandwidth.rs — BandwidthState atomics, budget, interval math
- benches/tcp_rx_scan.rs — memchr SIMD scan vs naive O(n) loop (55× faster
on 256KB buffers with status at end)
- benches/ecsrp5.rs — WCurve::new() heavy math vs cached LazyLock
(~123,000× faster access)
Also adds BENCHMARKS.md with usage instructions and example results.
Visibility changes (bench-only):
- scan_status_message is now pub (was #[cfg(test)] only)
- WCurve and WCURVE are now pub in ecsrp5.rs
dev-dependencies: criterion + pprof (optional flamegraph support)
This commit is contained in:
54
BENCHMARKS.md
Normal file
54
BENCHMARKS.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Benchmarks
|
||||
|
||||
This project uses [Criterion.rs](https://bheisler.github.io/criterion.rs/book/) for performance benchmarking and regression detection.
|
||||
|
||||
## Running Benchmarks
|
||||
|
||||
Run all benchmarks:
|
||||
```bash
|
||||
cargo bench
|
||||
```
|
||||
|
||||
Run a specific benchmark suite:
|
||||
```bash
|
||||
cargo bench --bench protocol
|
||||
cargo bench --bench bandwidth
|
||||
cargo bench --bench tcp_rx_scan
|
||||
cargo bench --bench ecsrp5
|
||||
```
|
||||
|
||||
Run in "quick" mode (fewer iterations, useful for development):
|
||||
```bash
|
||||
cargo bench --bench tcp_rx_scan -- --quick
|
||||
```
|
||||
|
||||
## Benchmark Suites
|
||||
|
||||
### `protocol` — Protocol Serialization
|
||||
Measures the zero-allocation serialization/deserialization of `Command` (16 bytes) and `StatusMessage` (12 bytes) structs.
|
||||
|
||||
### `bandwidth` — Bandwidth State Atomics
|
||||
Measures `BandwidthState` hot-path operations: `fetch_add`, `spend_budget`, `calc_send_interval`, `advance_next_send`, and `summary`.
|
||||
|
||||
### `tcp_rx_scan` — TCP RX Status Message Scan
|
||||
Compares the optimized `memchr`-based scan against the old naive O(n) byte-by-byte loop on 256KB buffers. Key scenarios:
|
||||
- **All zeros** (common case — data packets contain no status)
|
||||
- **Status at start**
|
||||
- **Status at end** (worst case for naive scan)
|
||||
- **Split messages** (status spans two TCP reads)
|
||||
|
||||
### `ecsrp5` — EC-SRP5 Curve Construction
|
||||
Compares `WCurve::new()` (heavy `BigUint` modular arithmetic) against the cached `&*WCURVE` access to demonstrate the Sprint 1 optimization.
|
||||
|
||||
## Interpreting Results
|
||||
|
||||
Criterion generates HTML reports in `target/criterion/`. Open `target/criterion/report/index.html` after running benchmarks to view interactive charts.
|
||||
|
||||
Example results (Apple M3 Pro, release profile):
|
||||
|
||||
| Benchmark | Naive/Uncached | Optimized/Cached | Speedup |
|
||||
|-----------|---------------|------------------|---------|
|
||||
| TCP RX scan 256KB (status at end) | 251 µs | 4.5 µs | **~55×** |
|
||||
| WCurve construction | 126 µs | 1.0 ns | **~123,000×** |
|
||||
| Command serialize | — | 7.7 ns | — |
|
||||
| Bandwidth `fetch_add` | — | ~1 ns | — |
|
||||
Reference in New Issue
Block a user