Complete spec for fixing the playout ring buffer cursor race that causes 12-16s bidirectional silence mid-call. Includes exact code, memory ordering rationale, unit tests, and verification steps. Any agent can implement from this document alone. See also: debug/INCIDENT-2026-04-06-playout-ring-desync.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12 KiB
Fix: AudioRing SPSC Buffer Cursor Desync
Problem
A critical bug causes 10-16 seconds of bidirectional audio silence mid-call (~25-30s in). Both participants go silent at the exact same moment. The QUIC transport, relay, Opus codec, and FEC are all healthy — the bug is in the lock-free ring buffer that transfers decoded PCM from the Rust recv task to the Kotlin AudioTrack playout thread.
Root cause: AudioRing::write() modifies read_pos from the producer thread during overflow handling (lines 68-72 of audio_ring.rs). This violates the SPSC invariant — only the consumer should own read_pos. When both threads write to read_pos, a race corrupts the cursor state, causing the reader to see an empty or stale buffer for 12-16 seconds.
Full forensics: debug/INCIDENT-2026-04-06-playout-ring-desync.md
Solution: Reader-Detects-Lap Architecture
The writer NEVER touches read_pos. On overflow, the writer simply overwrites old buffer data and advances write_pos. The reader detects it was lapped and self-corrects by snapping its own read_pos forward.
Implementation Steps
Step 1: Rewrite AudioRing
File: crates/wzp-android/src/audio_ring.rs
Replace the entire implementation with:
Constants:
/// Ring buffer capacity — must be a power of 2 for bitmask indexing.
/// 16384 samples = 341.3ms at 48kHz mono. Provides 70% more headroom
/// than the previous 9600 (200ms) for surviving Android GC pauses.
const RING_CAPACITY: usize = 16384; // 2^14
const RING_MASK: usize = RING_CAPACITY - 1;
Struct:
pub struct AudioRing {
buf: Box<[i16; RING_CAPACITY]>,
write_pos: AtomicUsize, // monotonically increasing, ONLY written by producer
read_pos: AtomicUsize, // monotonically increasing, ONLY written by consumer
overflow_count: AtomicU64, // incremented by reader when it detects a lap
underrun_count: AtomicU64, // incremented by reader when ring is empty
}
write() — producer. Does NOT touch read_pos:
pub fn write(&self, samples: &[i16]) -> usize {
let count = samples.len().min(RING_CAPACITY);
let w = self.write_pos.load(Ordering::Relaxed);
for i in 0..count {
unsafe {
let ptr = self.buf.as_ptr() as *mut i16;
*ptr.add((w + i) & RING_MASK) = samples[i];
}
}
self.write_pos.store(w.wrapping_add(count), Ordering::Release);
count
}
read() — consumer. Detects lap, self-corrects:
pub fn read(&self, out: &mut [i16]) -> usize {
let w = self.write_pos.load(Ordering::Acquire);
let mut r = self.read_pos.load(Ordering::Relaxed);
let mut avail = w.wrapping_sub(r);
// Lap detection: writer has overwritten our unread data.
// Snap read_pos forward to oldest valid data in the buffer.
// Safe because we (the reader) are the sole owner of read_pos.
if avail > RING_CAPACITY {
r = w.wrapping_sub(RING_CAPACITY);
avail = RING_CAPACITY;
self.overflow_count.fetch_add(1, Ordering::Relaxed);
}
let count = out.len().min(avail);
if count == 0 {
if w == r {
self.underrun_count.fetch_add(1, Ordering::Relaxed);
}
return 0;
}
for i in 0..count {
out[i] = unsafe { *self.buf.as_ptr().add((r + i) & RING_MASK) };
}
self.read_pos.store(r.wrapping_add(count), Ordering::Release);
count
}
available() — clamped for external callers:
pub fn available(&self) -> usize {
let w = self.write_pos.load(Ordering::Acquire);
let r = self.read_pos.load(Ordering::Relaxed);
w.wrapping_sub(r).min(RING_CAPACITY)
}
free_space() — keep for API compat:
pub fn free_space(&self) -> usize {
RING_CAPACITY.saturating_sub(self.available())
}
Diagnostic accessors:
pub fn overflow_count(&self) -> u64 {
self.overflow_count.load(Ordering::Relaxed)
}
pub fn underrun_count(&self) -> u64 {
self.underrun_count.load(Ordering::Relaxed)
}
Constructor:
pub fn new() -> Self {
debug_assert!(RING_CAPACITY.is_power_of_two());
Self {
buf: Box::new([0i16; RING_CAPACITY]),
write_pos: AtomicUsize::new(0),
read_pos: AtomicUsize::new(0),
overflow_count: AtomicU64::new(0),
underrun_count: AtomicU64::new(0),
}
}
Imports to add: use std::sync::atomic::AtomicU64;
Safety comment update:
// SAFETY: AudioRing is SPSC — one thread writes (producer), one reads (consumer).
// The producer only writes write_pos. The consumer only writes read_pos.
// Neither thread writes the other's cursor. Buffer indices are derived from
// the owning thread's cursor, ensuring no concurrent access to the same index.
Step 2: Add counter fields to CallStats
File: crates/wzp-android/src/stats.rs
Add three fields to the CallStats struct (after fec_recovered):
/// Playout ring overflow count (reader was lapped by writer).
pub playout_overflows: u64,
/// Playout ring underrun count (reader found empty buffer).
pub playout_underruns: u64,
/// Capture ring overflow count.
pub capture_overflows: u64,
These derive Default (= 0) automatically via the existing #[derive(Default)].
Step 3: Wire ring diagnostics into engine stats + logging
File: crates/wzp-android/src/engine.rs
3a. In get_stats() (~line 181), populate the new fields:
stats.playout_overflows = self.state.playout_ring.overflow_count();
stats.playout_underruns = self.state.playout_ring.underrun_count();
stats.capture_overflows = self.state.capture_ring.overflow_count();
3b. In the recv task periodic stats log, add ring health:
info!(
frames_decoded,
fec_recovered,
recv_errors,
max_recv_gap_ms,
playout_avail = state.playout_ring.available(),
playout_overflows = state.playout_ring.overflow_count(),
playout_underruns = state.playout_ring.underrun_count(),
"recv stats"
);
3c. In the send task periodic stats log, add capture ring health:
info!(
seq = s,
block_id,
frames_sent,
frames_dropped,
send_errors,
ring_avail = state.capture_ring.available(),
capture_overflows = state.capture_ring.overflow_count(),
"send stats"
);
Step 4: Parse new stats in Kotlin
File: android/app/src/main/java/com/wzp/engine/CallStats.kt
Add fields to the data class:
val playoutOverflows: Long = 0,
val playoutUnderruns: Long = 0,
val captureOverflows: Long = 0,
Add parsing in fromJson():
playoutOverflows = obj.optLong("playout_overflows", 0),
playoutUnderruns = obj.optLong("playout_underruns", 0),
captureOverflows = obj.optLong("capture_overflows", 0),
No UI changes needed — these fields will appear in debug report JSON automatically.
Step 5: Unit tests
File: crates/wzp-android/src/audio_ring.rs — add #[cfg(test)] mod tests
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn capacity_is_power_of_two() {
assert!(RING_CAPACITY.is_power_of_two());
}
#[test]
fn basic_write_read() {
let ring = AudioRing::new();
let input: Vec<i16> = (0..960).map(|i| i as i16).collect();
ring.write(&input);
assert_eq!(ring.available(), 960);
let mut output = vec![0i16; 960];
let read = ring.read(&mut output);
assert_eq!(read, 960);
assert_eq!(output, input);
assert_eq!(ring.available(), 0);
}
#[test]
fn wraparound() {
let ring = AudioRing::new();
let frame = vec![42i16; 960];
// Write enough to wrap the buffer multiple times
for _ in 0..20 {
ring.write(&frame);
let mut out = vec![0i16; 960];
ring.read(&mut out);
assert!(out.iter().all(|&s| s == 42));
}
}
#[test]
fn overflow_detected_by_reader() {
let ring = AudioRing::new();
// Write more than RING_CAPACITY without reading
let big = vec![7i16; RING_CAPACITY + 960];
ring.write(&big[..RING_CAPACITY]);
ring.write(&big[RING_CAPACITY..]);
// Reader should detect lap
let mut out = vec![0i16; 960];
let read = ring.read(&mut out);
assert!(read > 0);
assert_eq!(ring.overflow_count(), 1);
// Data should be from the most recent writes
assert!(out.iter().all(|&s| s == 7));
}
#[test]
fn writer_never_modifies_read_pos() {
let ring = AudioRing::new();
// Read pos should stay at 0 until read() is called
let data = vec![1i16; RING_CAPACITY + 960];
ring.write(&data);
// read_pos is private, but we can check available() > CAPACITY
// which proves write() didn't advance read_pos
let w = ring.write_pos.load(std::sync::atomic::Ordering::Relaxed);
let r = ring.read_pos.load(std::sync::atomic::Ordering::Relaxed);
assert_eq!(r, 0, "write() must not modify read_pos");
assert!(w.wrapping_sub(r) > RING_CAPACITY);
}
#[test]
fn underrun_counted() {
let ring = AudioRing::new();
let mut out = vec![0i16; 960];
let read = ring.read(&mut out);
assert_eq!(read, 0);
assert_eq!(ring.underrun_count(), 1);
}
#[test]
fn overflow_recovery_reads_recent_data() {
let ring = AudioRing::new();
// Fill with old data
let old = vec![1i16; RING_CAPACITY];
ring.write(&old);
// Overwrite with new data (lapping the reader)
let new_data = vec![99i16; 960];
ring.write(&new_data);
// Reader should snap forward and get recent data
let mut out = vec![0i16; RING_CAPACITY];
let read = ring.read(&mut out);
assert_eq!(read, RING_CAPACITY);
// The last 960 samples should be 99
assert!(out[RING_CAPACITY - 960..].iter().all(|&s| s == 99));
assert_eq!(ring.overflow_count(), 1);
}
}
Memory Ordering Reference
| Operation | Ordering | Rationale |
|---|---|---|
write_pos.store in write() |
Release | Buffer writes visible before cursor advances |
write_pos.load in read() |
Acquire | Pairs with Release above — sees all buffer writes |
write_pos.load in write() |
Relaxed | Writer is sole owner of write_pos |
read_pos.load in read() |
Relaxed | Reader is sole owner of read_pos |
read_pos.store in read() |
Release | Makes available() consistent from any thread |
read_pos.load in available() |
Relaxed | Informational only, slight staleness OK |
| All counters | Relaxed | Diagnostic only |
Capacity Tradeoff
| Capacity | Duration | Memory | Verdict |
|---|---|---|---|
| 8192 (2^13) | 170ms | 16KB | Less than current 200ms — risky |
| 16384 (2^14) | 341ms | 32KB | 70% more headroom, bitmask indexing |
| 32768 (2^15) | 682ms | 64KB | Excessive latency on overflow recovery |
Verification
cargo test -p wzp-android— new unit tests passcargo ndk -t arm64-v8a build --release -p wzp-android— ARM cross-compile succeeds- Build APK, install on both test devices (Nothing A059 + Pixel 6)
- 2+ minute call — verify no audio gaps
- Check debug report JSON:
playout_overflowsshould be 0 or very small - Check logcat
wzp_androidtag: send/recv stats show healthy ring state - Stress test: play music through one device speaker while on call — forces high ring throughput
Files to Modify
| File | What changes |
|---|---|
crates/wzp-android/src/audio_ring.rs |
Complete rewrite — the core fix |
crates/wzp-android/src/stats.rs |
Add 3 counter fields |
crates/wzp-android/src/engine.rs |
Wire counters into get_stats() + periodic logs |
android/app/src/main/java/com/wzp/engine/CallStats.kt |
Parse 3 new JSON fields |
What Does NOT Change
AudioPipeline.kt— callsreadAudio()/writeAudio()unchanged; ring fix is transparentjni_bridge.rs— JNI bridge passes through unchangedaudio_android.rs— separate Oboe-based ring, currently unused, different design- Relay code — relay is confirmed healthy
- Desktop client — uses
Mutex + mpsc, notAudioRing