Files
btest-rs/BENCHMARKS.md
Siavash Sameni 3afbfb42cf
Some checks failed
CI / test (push) Failing after 1m42s
bench: add criterion benchmarks for protocol, bandwidth, TCP RX scan, and EC-SRP5
Adds four Criterion.rs benchmark suites to measure hot-path performance
and demonstrate the impact of Sprints 1–3 optimizations:

- benches/protocol.rs    — Command & StatusMessage serialize/deserialize
- benches/bandwidth.rs   — BandwidthState atomics, budget, interval math
- benches/tcp_rx_scan.rs — memchr SIMD scan vs naive O(n) loop (55× faster
                           on 256KB buffers with status at end)
- benches/ecsrp5.rs      — WCurve::new() heavy math vs cached LazyLock
                           (~123,000× faster access)

Also adds BENCHMARKS.md with usage instructions and example results.

Visibility changes (bench-only):
- scan_status_message is now pub (was #[cfg(test)] only)
- WCurve and WCURVE are now pub in ecsrp5.rs

dev-dependencies: criterion + pprof (optional flamegraph support)
2026-04-30 21:01:38 +04:00

1.9 KiB
Raw Permalink Blame History

Benchmarks

This project uses Criterion.rs for performance benchmarking and regression detection.

Running Benchmarks

Run all benchmarks:

cargo bench

Run a specific benchmark suite:

cargo bench --bench protocol
cargo bench --bench bandwidth
cargo bench --bench tcp_rx_scan
cargo bench --bench ecsrp5

Run in "quick" mode (fewer iterations, useful for development):

cargo bench --bench tcp_rx_scan -- --quick

Benchmark Suites

protocol — Protocol Serialization

Measures the zero-allocation serialization/deserialization of Command (16 bytes) and StatusMessage (12 bytes) structs.

bandwidth — Bandwidth State Atomics

Measures BandwidthState hot-path operations: fetch_add, spend_budget, calc_send_interval, advance_next_send, and summary.

tcp_rx_scan — TCP RX Status Message Scan

Compares the optimized memchr-based scan against the old naive O(n) byte-by-byte loop on 256KB buffers. Key scenarios:

  • All zeros (common case — data packets contain no status)
  • Status at start
  • Status at end (worst case for naive scan)
  • Split messages (status spans two TCP reads)

ecsrp5 — EC-SRP5 Curve Construction

Compares WCurve::new() (heavy BigUint modular arithmetic) against the cached &*WCURVE access to demonstrate the Sprint 1 optimization.

Interpreting Results

Criterion generates HTML reports in target/criterion/. Open target/criterion/report/index.html after running benchmarks to view interactive charts.

Example results (Apple M3 Pro, release profile):

Benchmark Naive/Uncached Optimized/Cached Speedup
TCP RX scan 256KB (status at end) 251 µs 4.5 µs ~55×
WCurve construction 126 µs 1.0 ns ~123,000×
Command serialize 7.7 ns
Bandwidth fetch_add ~1 ns