Files
btest-rs/PERFORMANCE_PRDS.md
Siavash Sameni bba9b0512c
All checks were successful
CI / test (push) Successful in 2m14s
perf: replace O(n) TCP RX buffer scan with SIMD memchr + carry buffer (Sprint 3)
This commit fixes the most significant hot-path bottleneck in the
client: the tcp_client_rx_loop was scanning up to 256KB byte-by-byte
on every read() call looking for interleaved 12-byte status messages.

Changes:
- client.rs (tcp_client_rx_loop): Replace the O(n) for-loop scan
  with a three-stage approach:

  1. Split-message check: An 11-byte carry buffer stores trailing
     bytes from the previous read. We check every possible alignment
     where a status message (0x07 + cpu_byte) could span the carry
     and the start of the current buffer. This fixes a latent bug
     where the old code would miss status messages split across TCP
     read boundaries.

  2. Fast scan: memchr::memchr (AVX2/NEON SIMD) finds 0x07 bytes
     in the 256KB buffer. On all-zero data packets this exits in
     ~4096 SIMD-width operations instead of 262,144 byte compares.
     ~64x faster scan path.

  3. Carry save: Save up to 11 trailing bytes for the next read.

- client.rs (unit tests): Add scan_status_message() helper and
  five unit tests covering:
  - Status message fully within buffer
  - Status message split across reads (5+7 bytes)
  - Status message split at boundary (1+11 bytes)
  - All-zero buffer (no false positive)
  - Short buffer (no panic)

- Cargo.toml / Cargo.lock: Add memchr as an explicit dependency.

Verified against live MikroTik RouterOS (TCP both + receive modes
with EC-SRP5 auth). Status messages detected correctly. No wire
protocol changes — 100% MikroTik compatible.
2026-04-30 20:46:34 +04:00

635 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Performance Improvement PRDs
**Project:** btest-rs
**Constraint:** 100% MikroTik BTest protocol compatibility — no wire-format or behavioral changes visible to MikroTik devices
**Date:** 2026-04-30
---
## How to Read This Document
Each PRD is sorted by **recommended execution order**, which balances:
- **Effort** (development + review + test time)
- **Risk** (probability of regression or compatibility break)
- **Performance Effect** (measured or estimated throughput/latency improvement)
- **MikroTik Compatibility Risk** (whether the change could affect interoperability)
**Sorting rationale:** Execute *quick wins* first to build velocity and reduce risk surface, then tackle *high-impact* items with full attention.
---
## Summary Matrix
| # | PRD | Effort | Risk | Perf Impact | MikroTik Risk | Tier |
|---|-----|--------|------|-------------|---------------|------|
| 1 | WCurve Global Cache | 30 min | None | Medium | None | Quick Win |
| 2 | Redundant `Instant::now()` | 15 min | None | Low | None | Quick Win |
| 3 | `hash_password` Hex Fix | 30 min | None | Low | None | Quick Win |
| 4 | CSV File Handle Cache | 30 min | None | Low | None | Quick Win |
| 5 | Error String Matching | 30 min | None | Low | None | Quick Win |
| 6 | `chrono_date_today` Replace | 1 hr | Low | Low | None | Quick Win |
| 7 | Syslog Mutex + Timestamp | 1 hr | Low | Low | None | Quick Win |
| 8 | `ip.to_string()` Cache | 1 hr | Low | Low | None | Quick Win |
| 9 | FreeBSD CPU FFI | 3 hrs | Medium | Medium | None | Platform Fix |
| 10 | Multi-Conn Notify Wake | 2 hrs | Medium | Medium | None | Latency Fix |
| 11 | UDP Timer Reuse | 2 hrs | Medium | Medium | None | Throughput Fix |
| 12 | TCP RX Scan Optimization | 4 hrs | Medium | **High** | Low | Hot Path Fix |
| 13 | SQLite Connection Pool | 12 days | High | **High** | None | Scalability Fix |
---
## Tier 1: Quick Wins (Do These First)
---
### PRD-001: Cache `WCurve` in Global `LazyLock`
**Background:**
`WCurve::new()` is called on every EC-SRP5 authentication (client and server). It recomputes the Weierstrass curve generator point via `lift_x(9)``prime_mod_sqrt()`, which performs heavy `BigUint` modular arithmetic. The result is deterministic and immutable.
**MikroTik Compatibility:**
- **100% safe.** This is pure internal mathematics. The wire bytes, auth handshake order, and hash outputs are identical. No protocol-visible change.
**Objective:**
Eliminate redundant `BigUint` modular square root computation per authentication.
**Design:**
```rust
// src/ecsrp5.rs
static WCURVE: std::sync::LazyLock<WCurve> = std::sync::LazyLock::new(WCurve::new);
```
Replace all call sites:
- `src/ecsrp5.rs:363` (`client_authenticate`)
- `src/ecsrp5.rs:499` (`server_authenticate`)
Change `let w = WCurve::new();` to `let w = &*WCURVE;`. Update any `WCurve` methods that take `self` to take `&self` if they don't already.
**Acceptance Criteria:**
- [ ] `ecsrp5_test.rs` passes unchanged.
- [ ] `full_integration_test.rs` EC-SRP5 tests pass unchanged.
- [ ] `WCurve::new()` is called exactly once per process lifetime.
- [ ] No change to serialized auth bytes on the wire.
**Effort:** 30 min
**Risk:** None — stateless deterministic cache
**Performance Impact:** Medium — reduces per-auth CPU time by ~30-50% (estimated), especially noticeable under concurrent logins.
---
### PRD-002: Deduplicate `Instant::now()` in `tcp_tx_loop_inner`
**Background:**
The TCP TX loop calls `Instant::now()` twice per iteration (status check and interval scheduling). Monotonic clock reads are cheap but not free, and occur in the hottest loop in the system.
**MikroTik Compatibility:**
- **100% safe.** Timing granularity remains identical.
**Objective:**
Reduce syscalls in the per-packet hot path.
**Design:**
```rust
let now = Instant::now();
if send_status && now >= next_status { ... next_status = now + Duration::from_secs(1); }
// ... reuse `now` for interval math
```
**Acceptance Criteria:**
- [ ] TCP send/receive/both integration tests pass.
- [ ] No behavioral change in status injection timing.
**Effort:** 15 min
**Risk:** None
**Performance Impact:** Low — micro-optimization, but trivial.
---
### PRD-003: Fix `hash_password()` Hex Encoding Allocations
**Background:**
`user_db.rs:614` allocates one `String` per byte when hex-encoding a 32-byte SHA256 hash:
```rust
result.iter().map(|b| format!("{:02x}", b)).collect()
```
**MikroTik Compatibility:**
- **100% safe.** Output string is identical.
**Objective:**
Replace N-allocation hex encoding with a single-allocation approach.
**Design:**
Use `hex` crate (already in dependency tree via `ecsrp5.rs` debug logging) or a small `[u8; 64]` buffer with `write!` to a `String::with_capacity(64)`.
**Acceptance Criteria:**
- [ ] Same hex string output for all inputs.
- [ ] `pro` feature tests pass.
**Effort:** 30 min
**Risk:** None
**Performance Impact:** Low — removes 32 allocations per password hash.
---
### PRD-004: Cache CSV File Handle
**Background:**
`csv_output::write_result()` re-opens the file via `OpenOptions::new().append(true).open(path)` on every call (once per test). Safe but wasteful.
**MikroTik Compatibility:**
- **100% safe.** No protocol involvement.
**Objective:**
Hold the file handle open for the process lifetime.
**Design:**
Change `static CSV_FILE: Mutex<Option<String>>` to `Mutex<Option<(String, std::fs::File)>>`, or open once during `init()` and store `Mutex<Option<File>>`.
**Acceptance Criteria:**
- [ ] CSV tests in `full_integration_test.rs` pass.
- [ ] File is created with headers on `init()`.
- [ ] Multiple `write_result` calls append correctly.
**Effort:** 30 min
**Risk:** None
**Performance Impact:** Low — removes one `open()` syscall per test.
---
### PRD-005: Remove Allocating Error String Matching
**Background:**
`src/server_pro/enforcer.rs:157-161` does:
```rust
match format!("{}", e).as_str() {
s if s.contains("daily") => ...
}
```
This allocates a `String` from the error just for substring matching.
**MikroTik Compatibility:**
- **100% safe.** Server-pro internal logic only.
**Objective:**
Match without allocation.
**Design:**
Use `e.to_string().contains("daily")` (still allocates but clearer) or, better, downcast the `rusqlite::Error` or match on structured error variants. If the error is `anyhow::Error`, use `.downcast_ref::<rusqlite::Error>()`.
**Acceptance Criteria:**
- [ ] Quota enforcement behavior unchanged.
- [ ] Enforcer tests pass.
**Effort:** 30 min
**Risk:** None
**Performance Impact:** Low — removes one allocation per enforcer tick.
---
### PRD-006: Replace `chrono_date_today()` with `chrono` Crate
**Background:**
`user_db.rs:617-638` contains a hand-rolled Gregorian calendar converter that loops from 1970 to compute today's date. Called before almost every DB write. The `chrono` crate is already pulled in transitively by `rusqlite`.
**MikroTik Compatibility:**
- **100% safe.** No protocol involvement.
**Objective:**
Replace 30 lines of error-prone manual date math with one `chrono` call.
**Design:**
Add `chrono = { version = "0.4", optional = true }` gated behind `pro` feature (or use the transitive dep directly). Replace `chrono_date_today()` with:
```rust
chrono::Local::now().format("%Y-%m-%d").to_string()
```
**Acceptance Criteria:**
- [ ] `pro` feature compiles.
- [ ] Date strings match format `YYYY-MM-DD`.
- [ ] DB write tests pass.
**Effort:** 1 hr
**Risk:** Low — adds explicit dep that already exists transitively
**Performance Impact:** Low — eliminates loop overhead, but called infrequently.
---
### PRD-007: Optimize Syslog Mutex and Timestamp Formatting
**Background:**
`syslog_logger.rs` holds a global `std::sync::Mutex` while formatting a timestamp (manual calendar math) and sending UDP. `std::sync::Mutex` is relatively slow, and the timestamp logic duplicates `chrono_date_today()` issues.
**MikroTik Compatibility:**
- **100% safe.** No protocol involvement.
**Objective:**
Reduce lock contention and allocation in logging path.
**Design:**
1. Use `parking_lot::Mutex` (faster, no poisoning) OR switch to `std::sync::Mutex` but clone the `SyslogSender` config outside the lock.
2. Replace `bsd_timestamp()` with `chrono::Local::now().format("%b %e %H:%M:%S")`.
3. Pre-allocate the `String` with `with_capacity(256)`.
**Acceptance Criteria:**
- [ ] Syslog output format remains RFC 3164 compliant.
- [ ] `test_syslog_events` in `full_integration_test.rs` passes.
**Effort:** 1 hr
**Risk:** Low
**Performance Impact:** Low — logging is not a hot path, but reduces global lock hold time.
---
### PRD-008: Cache `ip.to_string()` in Quota Checks
**Background:**
`quota.rs:389` calls `ip.to_string()` and then passes `&ip_str` to multiple DB methods, allocating a new `String` on every `remaining_budget()` call.
**MikroTik Compatibility:**
- **100% safe.** Server-pro internal logic.
**Objective:**
Eliminate redundant IP stringification.
**Design:**
Change DB methods to accept `&std::net::IpAddr` directly and stringify inside only when needed for SQL parameter binding (which `rusqlite` may already handle via `ToSql`). Alternatively, pass `ip_str: &str` from a single `to_string()` call and avoid re-stringifying in sub-calls.
**Acceptance Criteria:**
- [ ] Quota checks return identical results.
- [ ] `pro` feature tests pass.
**Effort:** 1 hr
**Risk:** Low
**Performance Impact:** Low — one allocation removed per quota check.
---
## Tier 2: Moderate Fixes (Platform & Latency)
---
### PRD-009: FreeBSD CPU Sampling via `libc::sysctl` FFI
**Background:**
On FreeBSD, `cpu.rs` spawns `sysctl -n kern.cp_time` as a child process every second. `fork()` + `exec()` is orders of magnitude slower than a direct syscall.
**MikroTik Compatibility:**
- **100% safe.** No protocol involvement. Platform-specific internal code.
**Objective:**
Replace subprocess with direct `sysctl(3)` syscall.
**Design:**
```rust
#[cfg(target_os = "freebsd")]
fn get_cpu_times() -> (u64, u64) {
let mut mib = [libc::CTL_KERN, libc::KERN_CP_TIME];
let mut cp_time: [libc::c_ulong; 5] = [0; 5];
let mut len = std::mem::size_of_val(&cp_time);
unsafe {
if libc::sysctl(
mib.as_mut_ptr(),
mib.len() as u32,
&mut cp_time as *mut _ as *mut libc::c_void,
&mut len,
std::ptr::null_mut(),
0,
) == 0 {
let total = cp_time[0] + cp_time[1] + cp_time[2] + cp_time[3] + cp_time[4];
return (total as u64, cp_time[4] as u64);
}
}
(0, 0)
}
```
**Acceptance Criteria:**
- [ ] Compiles on FreeBSD.
- [ ] Returns same values as previous `sysctl` command approach.
- [ ] No child process spawned (verify with `ktrace` or `ps`).
**Effort:** 3 hrs
**Risk:** Medium — requires FreeBSD test environment; FFI is unsafe
**Performance Impact:** Medium — eliminates 1 fork/exec per second on FreeBSD.
---
### PRD-010: Replace 100ms Poll with `tokio::sync::Notify`
**Background:**
In `server.rs:313-332`, the primary connection of a multi-connection TCP test busy-polls the session map every 100ms waiting for secondary connections to join.
**MikroTik Compatibility:**
- **100% safe.** This is internal server-side coordination. The wire behavior (waiting for connections, then starting the test) is unchanged. MikroTik clients will not observe a difference except potentially faster test startup.
**Objective:**
Eliminate polling latency and unnecessary mutex acquisitions.
**Design:**
1. Add a `tokio::sync::Notify` to `TcpSession`:
```rust
struct TcpSession {
peer_ip: IpAddr,
streams: Vec<OwnedTcpStream>,
expected: u8,
notify: tokio::sync::Notify,
}
```
2. In the secondary connection handler, after pushing to `streams`, call `session.notify.notify_one()`.
3. In the primary wait loop, replace the sleep loop with:
```rust
let count = { /* lock, get count, drop lock */ };
if count + 1 >= conn_count { break; }
// Wait for notification or 10s deadline
let timeout = tokio::time::sleep(Duration::from_secs(10));
tokio::pin!(timeout);
loop {
tokio::select! {
_ = session.notify.notified() => {
let count = { /* lock, get count */ };
if count + 1 >= conn_count { break; }
}
_ = &mut timeout => { break; }
}
}
```
**Acceptance Criteria:**
- [ ] Multi-connection TCP tests pass.
- [ ] Test startup latency is ≤ 1ms after last connection joins (was up to 100ms).
- [ ] No deadlock under concurrent multi-connection tests.
**Effort:** 2 hrs
**Risk:** Medium — concurrency change; must carefully manage lock/notify ordering to avoid races
**Performance Impact:** Medium — improves multi-conn test startup latency by up to 100ms per test.
---
### PRD-011: Reuse UDP RX Timer Instead of Per-Call Timeout
**Background:**
Both client and server UDP RX loops create a new `tokio::time::timeout` on every `recv`/`recv_from` call:
```rust
tokio::time::timeout(Duration::from_secs(5), socket.recv(&mut buf)).await
```
At high packet rates, this registers and cancels timers on Tokio's timer wheel constantly.
**MikroTik Compatibility:**
- **100% safe.** Internal async timing only. UDP packet processing is unchanged.
**Objective:**
Reduce timer wheel churn in high-rate UDP RX loops.
**Design:**
Option A — `tokio::select!` with a pinned sleep future:
```rust
let mut timeout = tokio::time::sleep(Duration::from_secs(5));
tokio::pin!(timeout);
loop {
tokio::select! {
biased; // prioritize recv
res = socket.recv(&mut buf) => { /* handle */ timeout.as_mut().reset(Instant::now() + Duration::from_secs(5)); }
_ = &mut timeout => { tracing::debug!("UDP RX timeout"); }
}
}
```
Option B — Use `socket2` to set `SO_RCVTIMEO` on the underlying socket, then use blocking/async recv without Tokio timeouts. This moves timeout handling into the kernel, which is even cheaper.
**Recommendation:** Start with Option A (pure Tokio, no platform risk). Option B can be a follow-up.
**Acceptance Criteria:**
- [ ] UDP send/receive/both tests pass.
- [ ] UDP RX still times out correctly when no packets arrive.
- [ ] No change to packet parsing or sequence tracking.
**Effort:** 2 hrs
**Risk:** Medium — changes timeout behavior; must ensure test abortion still works correctly
**Performance Impact:** Medium — reduces timer wheel registration overhead, noticeable at >50K pps.
---
## Tier 3: High Impact (Do These With Full Focus)
---
### PRD-012: Optimize TCP Client RX Status Message Scan
**Background:**
`tcp_client_rx_loop` (`client.rs:210-216`) scans up to 256KB byte-by-byte on every `read()` call looking for a 12-byte status marker (`0x07` + `0x80|cpu`). Since data is all zeros, this is almost always a full scan.
**MikroTik Compatibility Consideration:**
- **High confidence of safety.** The protocol is: MikroTik injects 12-byte status messages into the TCP stream. Our client must detect them. Changing *how* we detect them (faster scan) does not change:
- What bytes are sent on the wire
- What bytes we expect
- How we respond to status messages
- **One edge case to handle:** TCP is a stream. A status message may be split across two `read()` calls. The current code does **not** handle this correctly (it scans each buffer independently). The optimized version *should* handle split messages to be strictly more correct than the current implementation.
**Objective:**
Replace O(n) byte-by-byte scan with SIMD-accelerated or state-machine-based detection, while correctly handling split messages.
**Design — Recommended: Ring Buffer Approach**
Since status messages are 12 bytes and all other bytes are zeros, maintain a 12-byte ring buffer across reads:
```rust
const STATUS_MSG_SIZE: usize = 12;
async fn tcp_client_rx_loop(mut reader: OwnedReadHalf, state: Arc<BandwidthState>) {
let mut buf = vec![0u8; 256 * 1024];
let mut carry = [0u8; STATUS_MSG_SIZE - 1]; // up to 11 bytes from previous read
let mut carry_len = 0usize;
while state.running.load(Ordering::Relaxed) {
match reader.read(&mut buf).await {
Ok(0) | Err(_) => break,
Ok(n) => {
state.rx_bytes.fetch_add(n as u64, Ordering::Relaxed);
// Check if a status message spans the carry + start of buf
if carry_len > 0 {
let needed = STATUS_MSG_SIZE - carry_len;
if n >= needed {
let mut candidate = [0u8; STATUS_MSG_SIZE];
candidate[..carry_len].copy_from_slice(&carry[..carry_len]);
candidate[carry_len..].copy_from_slice(&buf[..needed]);
if candidate[0] == STATUS_MSG_TYPE && candidate[1] >= 0x80 {
state.remote_cpu.store(candidate[1] & 0x7F, Ordering::Relaxed);
}
}
}
// Scan within buf for status messages
// Since data is zeros, use memchr to find 0x07 candidates
if n >= STATUS_MSG_SIZE {
let search_end = n - STATUS_MSG_SIZE + 1;
let mut offset = 0;
while let Some(pos) = memchr::memchr(STATUS_MSG_TYPE, &buf[offset..search_end]) {
let i = offset + pos;
if buf[i + 1] >= 0x80 {
state.remote_cpu.store(buf[i + 1] & 0x7F, Ordering::Relaxed);
break;
}
offset = i + 1;
if offset >= search_end { break; }
}
}
// Save trailing bytes for next read
carry_len = (n).min(STATUS_MSG_SIZE - 1);
if n >= carry_len {
carry[..carry_len].copy_from_slice(&buf[n - carry_len..n]);
}
}
}
}
}
```
**Alternative: `memchr` crate only**
If we determine split messages are extremely rare and the current behavior is "good enough," simply replace the `for` loop with:
```rust
if let Some(pos) = memchr::memchr(STATUS_MSG_TYPE, &buf[..n - STATUS_MSG_SIZE + 1]) {
if buf[pos + 1] >= 0x80 { /* ... */ }
}
```
This is a 5-line change with massive speedup (SIMD scan). However, the ring buffer approach is strictly more correct and not much more complex.
**Acceptance Criteria:**
- [ ] TCP bidirectional tests pass.
- [ ] Remote CPU reporting still works.
- [ ] Status messages split across reads are correctly detected (unit test for this).
- [ ] `memchr` crate added to deps (very lightweight).
- [ ] No change to wire bytes or server behavior.
**Effort:** 4 hrs
**Risk:** Medium — hot path change; must be carefully reviewed and tested
**Performance Impact:** **High** — eliminates 256KB byte scan per read. At 10K reads/sec, saves ~2.5GB of memory scanning per second.
---
### PRD-013: SQLite Connection Pool / Channel-Based Writer
**Background:**
`server_pro` uses a single `Arc<Mutex<Connection>>`. All quota checks, usage recordings, and auth lookups serialize through one lock. `remaining_budget()` issues 15 queries, locking 15+ times. This is the primary scalability bottleneck for the pro server.
**MikroTik Compatibility:**
- **100% safe.** Server-side infrastructure only. No protocol change.
**Objective:**
Enable concurrent quota checks and usage recording without mutex contention.
**Design — Option A: Connection Pool (Recommended for reads)**
Use `r2d2_sqlite` or `deadpool-sqlite`:
1. Open a pool of ~4-8 connections to the same SQLite file (WAL mode supports this).
2. Read-only operations (`remaining_budget`, `get_user`, `check_user`) borrow a connection from the pool.
3. Write operations (`record_usage`, `record_session`) also borrow from the pool (WAL allows concurrent readers + one writer).
**Design — Option B: Channel-Based Writer (Recommended for writes)**
1. Keep one dedicated `Connection` owned by a single Tokio task.
2. Expose an `mpsc::channel` where other tasks send write requests (`RecordUsage { user, tx, rx }`).
3. The writer task batches or sequentially executes writes without any mutex.
4. Reads use a separate read-only connection or pool.
**Hybrid Recommendation:**
- **Reads:** Small connection pool (4 connections) for quota checks and auth lookups.
- **Writes:** Single dedicated async task with an `mpsc::unbounded_channel` for usage recording.
- **Cache:** Add a 5-second TTL cache for `remaining_budget()` results per user+IP to avoid redundant DB hits during test setup.
**Acceptance Criteria:**
- [ ] `pro` feature compiles and all tests pass.
- [ ] Concurrent test launches scale linearly up to at least 50 concurrent sessions.
- [ ] Quota enforcement remains correct (no over-quota usage).
- [ ] Session logging and interval recording remain accurate.
- [ ] No SQLite "database is locked" errors under load.
**Effort:** 12 days
**Risk:** High — touches every DB interaction in `server_pro`; potential for data races, quota leaks, or connection exhaustion
**Performance Impact:** **High** — enables horizontal scaling of concurrent tests; removes the primary pro server bottleneck.
---
## Execution Roadmap
### Sprint 1: Quick Wins + Foundation (1 day)
- [ ] PRD-001: WCurve cache
- [ ] PRD-002: `Instant::now()` dedup
- [ ] PRD-003: `hash_password` hex fix
- [ ] PRD-004: CSV file handle cache
- [ ] PRD-005: Error string matching
- [ ] PRD-006: `chrono` date replacement
- [ ] PRD-007: Syslog optimization
- [ ] PRD-008: `ip.to_string()` cache
**Deliverable:** Low-risk PR with 8 clean commits. Run full integration tests.
### Sprint 2: Platform & Async Fixes (1 day)
- [ ] PRD-009: FreeBSD CPU FFI
- [ ] PRD-010: Multi-conn Notify wake
- [ ] PRD-011: UDP timer reuse
**Deliverable:** PR with platform + latency improvements.
### Sprint 3: Hot Path Optimization (12 days)
- [ ] PRD-012: TCP RX scan optimization
- [ ] Add unit test for split status messages
- [ ] Benchmark before/after with `criterion` (or manual throughput test)
**Deliverable:** PR with benchmark numbers proving improvement.
### Sprint 4: Scalability (23 days)
- [ ] PRD-013: SQLite connection pool / channel writer
- [ ] Load test: 50 concurrent tests, verify no DB lock contention
- [ ] Add `remaining_budget` cache
**Deliverable:** PR with load test results.
---
## Testing Requirements for All PRDs
Since **no wire protocol changes** are made, the existing integration test suite is the primary validation tool. However, for PRD-012 and PRD-013, additional tests are required:
### New Tests to Add
1. **Split Status Message Unit Test (for PRD-012)**
```rust
#[test]
fn test_status_message_split_across_reads() {
// Feed first 5 bytes, then remaining 7 bytes
// Assert CPU value is extracted correctly
}
```
2. **Concurrent Quota Load Test (for PRD-013)**
```rust
#[tokio::test]
async fn test_concurrent_quota_checks() {
// Spawn 50 tasks doing remaining_budget() + record_usage()
// Assert no panics, no SQLite locked errors
}
```
3. **FreeBSD CPU Parity Test (for PRD-009)**
Manual verification on FreeBSD that FFI `sysctl` returns same values as command.
---
## Appendix: MikroTik Compatibility Checklist
For every PRD, verify:
- [ ] No change to `Command` or `StatusMessage` struct layouts or serialization
- [ ] No change to MD5 challenge-response handshake order
- [ ] No change to EC-SRP5 handshake order or byte values
- [ ] No change to TCP packet sizes or UDP payload format
- [ ] No change to status injection timing (1-second interval)
- [ ] No change to NAT probe behavior
- [ ] Client can still authenticate against stock RouterOS `btest` server
- [ ] Server can still accept connections from stock RouterOS `btest` client
All PRDs in this document satisfy the above checklist by construction.