Files

Siavash Sameni 3afbfb42cf

CI / test (push) Failing after 1m42s

Details

bench: add criterion benchmarks for protocol, bandwidth, TCP RX scan, and EC-SRP5

Adds four Criterion.rs benchmark suites to measure hot-path performance
and demonstrate the impact of Sprints 1–3 optimizations:

- benches/protocol.rs    — Command & StatusMessage serialize/deserialize
- benches/bandwidth.rs   — BandwidthState atomics, budget, interval math
- benches/tcp_rx_scan.rs — memchr SIMD scan vs naive O(n) loop (55× faster
                           on 256KB buffers with status at end)
- benches/ecsrp5.rs      — WCurve::new() heavy math vs cached LazyLock
                           (~123,000× faster access)

Also adds BENCHMARKS.md with usage instructions and example results.

Visibility changes (bench-only):
- scan_status_message is now pub (was #[cfg(test)] only)
- WCurve and WCURVE are now pub in ecsrp5.rs

dev-dependencies: criterion + pprof (optional flamegraph support)

2026-04-30 21:01:38 +04:00

1.9 KiB

Raw Permalink Blame History

Benchmarks

This project uses Criterion.rs for performance benchmarking and regression detection.

Running Benchmarks

Run all benchmarks:

cargo bench

Run a specific benchmark suite:

cargo bench --bench protocol
cargo bench --bench bandwidth
cargo bench --bench tcp_rx_scan
cargo bench --bench ecsrp5

Run in "quick" mode (fewer iterations, useful for development):

cargo bench --bench tcp_rx_scan -- --quick

Benchmark Suites

`protocol` — Protocol Serialization

Measures the zero-allocation serialization/deserialization of Command (16 bytes) and StatusMessage (12 bytes) structs.

`bandwidth` — Bandwidth State Atomics

Measures BandwidthState hot-path operations: fetch_add, spend_budget, calc_send_interval, advance_next_send, and summary.

`tcp_rx_scan` — TCP RX Status Message Scan

Compares the optimized memchr-based scan against the old naive O(n) byte-by-byte loop on 256KB buffers. Key scenarios:

All zeros (common case — data packets contain no status)
Status at start
Status at end (worst case for naive scan)
Split messages (status spans two TCP reads)

`ecsrp5` — EC-SRP5 Curve Construction

Compares WCurve::new() (heavy BigUint modular arithmetic) against the cached &*WCURVE access to demonstrate the Sprint 1 optimization.

Interpreting Results

Criterion generates HTML reports in target/criterion/. Open target/criterion/report/index.html after running benchmarks to view interactive charts.

Example results (Apple M3 Pro, release profile):

Benchmark	Naive/Uncached	Optimized/Cached	Speedup
TCP RX scan 256KB (status at end)	251 µs	4.5 µs	~55×
WCurve construction	126 µs	1.0 ns	~123,000×
Command serialize	—	7.7 ns	—
Bandwidth `fetch_add`	—	~1 ns	—

1.9 KiB Raw Permalink Blame History Unescape Escape

Benchmarks

Running Benchmarks

Benchmark Suites

protocol — Protocol Serialization

bandwidth — Bandwidth State Atomics

tcp_rx_scan — TCP RX Status Message Scan