From 3b85604b4184377dc3ce453e50f3e3f894ded7ec Mon Sep 17 00:00:00 2001 From: Siavash Sameni Date: Tue, 7 Apr 2026 18:32:24 +0400 Subject: [PATCH] docs: PRDs for local recording + mixer and studio quality tiers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PRD-local-recording.md: Dual-path architecture for podcast-quality interviews — local lossless WAV recording alongside live call, with sync markers for post-session alignment, resumable upload to a self-hosted mixer service that produces normalized multi-track output. PRD-studio-quality.md: Documents the Opus 32k/48k/64k studio tiers, when to use them, cross-codec interop, and backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/PRD-local-recording.md | 141 ++++++++++++++++++++++++++++++++++++ docs/PRD-studio-quality.md | 56 ++++++++++++++ 2 files changed, 197 insertions(+) create mode 100644 docs/PRD-local-recording.md create mode 100644 docs/PRD-studio-quality.md diff --git a/docs/PRD-local-recording.md b/docs/PRD-local-recording.md new file mode 100644 index 0000000..02aa7c4 --- /dev/null +++ b/docs/PRD-local-recording.md @@ -0,0 +1,141 @@ +# PRD: Local Recording + Cloud Mixer for Podcast-Quality Interviews + +## Problem + +WarzonePhone delivers real-time encrypted voice, but the audio quality is limited by network conditions (codec compression, packet loss, jitter). Podcasters and interviewers need pristine, studio-grade recordings of each participant — independent of what the network delivers. + +## Solution + +**Dual-path architecture**: each client simultaneously (1) participates in the live call at whatever codec quality the network supports, and (2) records their own microphone locally as lossless PCM. After the session, all local recordings are uploaded to a self-hosted mixer service that aligns, normalizes, and outputs a final multi-track or mixed file. + +## Architecture + +``` + ┌──────────────────┐ + Mic ──┬── Opus/Codec2 ──► Network (live) │ ← real-time call + │ └──────────────────┘ + │ + └── WAV 48kHz ────► Local File │ ← pristine recording + (timestamped) + │ + ▼ (after hangup) + ┌──────────────────┐ + │ Mixer Service │ ← self-hosted + │ (align + mix) │ + └──────────────────┘ + │ + ▼ + Final MP3/WAV/FLAC +``` + +## Requirements + +### Phase 1: Local Recording (MVP) + +**All clients (Desktop, Android, Web):** + +1. **Record toggle**: User can enable "Record this call" before or during a call +2. **Recording pipeline**: Tap raw PCM from the microphone capture path *before* it enters the codec encoder +3. **File format**: WAV (48kHz, 16-bit, mono) — simple, universally supported, lossless +4. **Sync markers**: Embed a monotonic timestamp (ms since call start) at the beginning of the recording, and periodically (every 10s) write a sync marker packet into a sidecar JSON file: + ```json + {"ts_ms": 30000, "seq": 1500, "wall_clock_utc": "2026-04-07T12:00:30Z"} + ``` + This allows the mixer to align recordings from different participants even if they join at different times. +5. **Storage**: + - Desktop: `~/.wzp/recordings/{room}_{timestamp}.wav` + - Android: `Documents/WarzonePhone/{room}_{timestamp}.wav` + - Web: IndexedDB blob or File System Access API +6. **File size estimate**: 48kHz * 16-bit * mono = 96 KB/s = ~5.6 MB/min = ~345 MB/hour +7. **UI indicator**: Red dot + timer showing recording is active and file size growing +8. **On hangup**: Close the WAV file, show "Recording saved" with file path/size + +### Phase 2: Upload to Mixer + +1. **Upload endpoint**: Self-hosted HTTP service (Rust or Go) that accepts WAV uploads with metadata +2. **Chunked/resumable upload**: Large files need resumable uploads (tus protocol or simple chunked POST) +3. **Upload metadata**: + ```json + { + "session_id": "uuid", + "participant_fingerprint": "xxxx:xxxx:...", + "alias": "Alice", + "room": "podcast-ep-42", + "duration_secs": 3600, + "sync_markers": [...], + "sample_rate": 48000, + "channels": 1, + "bit_depth": 16 + } + ``` +4. **Upload UI**: Progress bar after hangup, option to upload now or later +5. **Retry on failure**: Queue uploads for retry if network is unavailable + +### Phase 3: Mixer Service + +1. **Alignment**: Use sync markers (wall clock + sequence numbers) to align recordings from all participants to a common timeline +2. **Silence trimming**: Detect and optionally trim leading/trailing silence +3. **Normalization**: Per-track loudness normalization (LUFS-based) +4. **Noise reduction**: Optional per-track noise gate or RNNoise pass +5. **Output formats**: + - Multi-track: ZIP of individual WAVs (aligned, normalized) + - Mixed: Single stereo or mono WAV/MP3/FLAC with all participants + - Podcast-ready: Loudness-normalized to -16 LUFS (podcast standard) +6. **Web UI**: Simple dashboard to see sessions, download outputs, preview waveforms +7. **Self-hosted**: Docker image, single binary, SQLite for metadata + +## Implementation Notes + +### Recording tap point + +The recording must tap *after* AGC (so levels are normalized) but *before* the codec encoder (to avoid compression artifacts). In the current architecture: + +``` +Mic → Ring Buffer → AGC → [TAP HERE for recording] → Opus/Codec2 → Network +``` + +**Desktop** (`engine.rs`): After `capture_agc.process_frame()`, before `encoder.encode()` +**Android** (`engine.rs`): Same location — after AGC, before encode +**CLI** (`call.rs`): After `self.agc.process_frame()` in `CallEncoder::encode_frame()` + +### WAV writer + +Use a simple streaming WAV writer that: +- Writes the WAV header with placeholder data length +- Appends PCM samples as they come +- On close, seeks back to update the data length in the header + +### Sync mechanism + +Wall-clock UTC alone is insufficient (clocks drift). The sync strategy: +1. Each participant records their local monotonic time + wall clock at call start +2. Periodically (every 10s), each participant writes: `{local_mono_ms, seq_number, utc_iso}` +3. The mixer uses sequence numbers (which are shared via the wire protocol) as ground truth for alignment, with wall clock as a fallback + +### Privacy + +- Local recordings never leave the device without explicit user action +- Upload is manual, not automatic +- The mixer service processes files and can delete originals after mixing +- No recording data flows through the relay — only the user's own mic + +## Non-Goals (v1) + +- Live transcription (future) +- Video recording (audio only) +- Automatic upload without user consent +- Recording other participants' audio (only your own mic) +- Real-time mixing (post-session only) + +## Milestones + +| Phase | Scope | Effort | +|-------|-------|--------| +| 1a | Local WAV recording on Desktop | 1-2 days | +| 1b | Local WAV recording on Android | 1-2 days | +| 1c | Sync markers + metadata sidecar | 1 day | +| 2a | Upload service (HTTP + storage) | 2-3 days | +| 2b | Upload UI in clients | 1-2 days | +| 3a | Mixer: alignment + normalization | 2-3 days | +| 3b | Mixer: web dashboard | 2-3 days | +| 3c | Docker packaging | 1 day | diff --git a/docs/PRD-studio-quality.md b/docs/PRD-studio-quality.md new file mode 100644 index 0000000..1ebdf4d --- /dev/null +++ b/docs/PRD-studio-quality.md @@ -0,0 +1,56 @@ +# PRD: Studio Quality Tiers (Opus 32k/48k/64k) + +## Status: Implemented + +Studio quality tiers have been added to the wire protocol and all clients. + +## What Was Added + +### Wire Protocol (codec_id.rs) + +Three new `CodecId` variants using the 4-bit header space (values 6-8): + +| CodecId | Wire Value | Bitrate | Frame | Use Case | +|---------|-----------|---------|-------|----------| +| Opus32k | 6 | 32 kbps | 20ms | Studio low — noticeable improvement over 24k for voice | +| Opus48k | 7 | 48 kbps | 20ms | Studio — excellent voice, captures nuance | +| Opus64k | 8 | 64 kbps | 20ms | Studio high — near-transparent quality | + +### Quality Profiles + +| Profile | Codec | FEC | Bandwidth (with FEC) | +|---------|-------|-----|---------------------| +| STUDIO_32K | Opus 32k | 10% | ~35 kbps | +| STUDIO_48K | Opus 48k | 10% | ~53 kbps | +| STUDIO_64K | Opus 64k | 10% | ~70 kbps | + +FEC is set to 10% (vs 20% for GOOD) — studio assumes a good network. + +### Client Support + +| Client | Selection | Status | +|--------|-----------|--------| +| Desktop (Tauri) | Quality slider in Settings (8 levels) | Done | +| CLI | `--profile studio-64k` / `studio-48k` / `studio-32k` | Done | +| Android | Needs codec picker update in SettingsScreen.kt | TODO | +| Web | Needs UI | TODO | + +### Cross-Codec Interop + +All decoder auto-switch paths (call.rs, desktop engine.rs) handle the new codec IDs. A studio-64k client can talk to a codec2-1200 client — the receiver auto-switches. + +## When to Use Studio Tiers + +- **Podcast recording sessions**: Use studio-64k for best quality (combined with local WAV recording for pristine output) +- **Music collaboration**: Opus at 48-64k captures instrument harmonics much better than 24k +- **Good network conditions**: Only useful when bandwidth isn't constrained; the extra bits are wasted on lossy networks + +## When NOT to Use + +- **Mobile data**: Stick with Auto/GOOD — studio tiers use 2-3x the bandwidth +- **High packet loss**: Studio profiles use minimal FEC (10%); degraded networks need DEGRADED or CATASTROPHIC profiles with 50-100% FEC +- **Large group calls**: Each participant's stream multiplies bandwidth; 64k * 10 participants = 640 kbps incoming + +## Backward Compatibility + +Old clients (before this change) will receive packets with CodecId 6/7/8 which they don't recognize. The `from_wire()` returns `None` for unknown values, causing the packet to be dropped. Old clients can still *send* to new clients fine (they use CodecId 0-5). This is acceptable for a pre-release protocol.