This commit fixes the most significant hot-path bottleneck in the
client: the tcp_client_rx_loop was scanning up to 256KB byte-by-byte
on every read() call looking for interleaved 12-byte status messages.
Changes:
- client.rs (tcp_client_rx_loop): Replace the O(n) for-loop scan
with a three-stage approach:
1. Split-message check: An 11-byte carry buffer stores trailing
bytes from the previous read. We check every possible alignment
where a status message (0x07 + cpu_byte) could span the carry
and the start of the current buffer. This fixes a latent bug
where the old code would miss status messages split across TCP
read boundaries.
2. Fast scan: memchr::memchr (AVX2/NEON SIMD) finds 0x07 bytes
in the 256KB buffer. On all-zero data packets this exits in
~4096 SIMD-width operations instead of 262,144 byte compares.
~64x faster scan path.
3. Carry save: Save up to 11 trailing bytes for the next read.
- client.rs (unit tests): Add scan_status_message() helper and
five unit tests covering:
- Status message fully within buffer
- Status message split across reads (5+7 bytes)
- Status message split at boundary (1+11 bytes)
- All-zero buffer (no false positive)
- Short buffer (no panic)
- Cargo.toml / Cargo.lock: Add memchr as an explicit dependency.
Verified against live MikroTik RouterOS (TCP both + receive modes
with EC-SRP5 auth). Status messages detected correctly. No wire
protocol changes — 100% MikroTik compatible.