Files
btest-rs/BENCHMARKS.md
Siavash Sameni 3afbfb42cf
Some checks failed
CI / test (push) Failing after 1m42s
bench: add criterion benchmarks for protocol, bandwidth, TCP RX scan, and EC-SRP5
Adds four Criterion.rs benchmark suites to measure hot-path performance
and demonstrate the impact of Sprints 1–3 optimizations:

- benches/protocol.rs    — Command & StatusMessage serialize/deserialize
- benches/bandwidth.rs   — BandwidthState atomics, budget, interval math
- benches/tcp_rx_scan.rs — memchr SIMD scan vs naive O(n) loop (55× faster
                           on 256KB buffers with status at end)
- benches/ecsrp5.rs      — WCurve::new() heavy math vs cached LazyLock
                           (~123,000× faster access)

Also adds BENCHMARKS.md with usage instructions and example results.

Visibility changes (bench-only):
- scan_status_message is now pub (was #[cfg(test)] only)
- WCurve and WCURVE are now pub in ecsrp5.rs

dev-dependencies: criterion + pprof (optional flamegraph support)
2026-04-30 21:01:38 +04:00

55 lines
1.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Benchmarks
This project uses [Criterion.rs](https://bheisler.github.io/criterion.rs/book/) for performance benchmarking and regression detection.
## Running Benchmarks
Run all benchmarks:
```bash
cargo bench
```
Run a specific benchmark suite:
```bash
cargo bench --bench protocol
cargo bench --bench bandwidth
cargo bench --bench tcp_rx_scan
cargo bench --bench ecsrp5
```
Run in "quick" mode (fewer iterations, useful for development):
```bash
cargo bench --bench tcp_rx_scan -- --quick
```
## Benchmark Suites
### `protocol` — Protocol Serialization
Measures the zero-allocation serialization/deserialization of `Command` (16 bytes) and `StatusMessage` (12 bytes) structs.
### `bandwidth` — Bandwidth State Atomics
Measures `BandwidthState` hot-path operations: `fetch_add`, `spend_budget`, `calc_send_interval`, `advance_next_send`, and `summary`.
### `tcp_rx_scan` — TCP RX Status Message Scan
Compares the optimized `memchr`-based scan against the old naive O(n) byte-by-byte loop on 256KB buffers. Key scenarios:
- **All zeros** (common case — data packets contain no status)
- **Status at start**
- **Status at end** (worst case for naive scan)
- **Split messages** (status spans two TCP reads)
### `ecsrp5` — EC-SRP5 Curve Construction
Compares `WCurve::new()` (heavy `BigUint` modular arithmetic) against the cached `&*WCURVE` access to demonstrate the Sprint 1 optimization.
## Interpreting Results
Criterion generates HTML reports in `target/criterion/`. Open `target/criterion/report/index.html` after running benchmarks to view interactive charts.
Example results (Apple M3 Pro, release profile):
| Benchmark | Naive/Uncached | Optimized/Cached | Speedup |
|-----------|---------------|------------------|---------|
| TCP RX scan 256KB (status at end) | 251 µs | 4.5 µs | **~55×** |
| WCurve construction | 126 µs | 1.0 ns | **~123,000×** |
| Command serialize | — | 7.7 ns | — |
| Bandwidth `fetch_add` | — | ~1 ns | — |