Documents the feat/desktop-audio-rewrite branch story end-to-end: - Purpose: shared codebase with android-rewrite via Tauri, platform- specific audio backends via target-dep sections + feature flags - Audio backend matrix: CPAL baseline + macOS VPIO + Windows WASAPI AudioCategory_Communications - Recent work: desktop direct calling feature with history dedup, macOS VPIO integration, Windows cross-compile via cargo-xwin, the libopus/clang-cl vendored audiopus_sys fix, icon.ico generation, and the WASAPI communications capture backend (task #24) - Build pipelines: native cargo on macOS/Linux, Docker on SepehrHomeserverdk for Windows, Hetzner Cloud alternative - Testing procedures for direct calling parity and Windows AEC A/B - Known quirks: vendor path relative, cargo-xwin override.cmake clobber, WebView2 runtime prerequisite, 2024 edition unsafe lint warnings Also appends shared-doc sections (identical on both branches): - ARCHITECTURE.md: "Audio Backend Architecture (Platform Matrix)" - ADMINISTRATION.md: "Build Pipelines" - USER_GUIDE.md: "Direct 1:1 Calling" and "Windows AEC Variants" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
13 KiB
Branch: feat/desktop-audio-rewrite
Home of the Tauri desktop client for macOS, Windows, and Linux. Named "audio-rewrite" because the original driver was replacing a CPAL-only audio pipeline with platform-native backends that support OS-level echo cancellation (VoiceProcessingIO on macOS, WASAPI Communications on Windows), but the branch has grown into the full desktop story — Windows cross-compilation, vendored dependencies, history UI, direct calling, the whole thing.
Purpose
The desktop client shares 100% of its frontend (desktop/src/) and Tauri command layer (desktop/src-tauri/src/lib.rs, engine.rs, history.rs) with the Android build on android-rewrite. Differences are limited to:
- Audio backends, which are platform-gated via Cargo target-dep sections in
desktop/src-tauri/Cargo.tomland feature flags incrates/wzp-client/Cargo.toml. - Identity storage paths, which resolve via Tauri's
app_data_dir()(~/Library/Application Support/…on macOS,%APPDATA%\…on Windows,~/.local/share/…on Linux). - Build toolchains: native
cargo buildon macOS/Linux,cargo xwincross-compile from Linux for Windows via Docker on SepehrHomeserverdk.
Audio backend matrix
| Target | Capture | Playback | AEC |
|---|---|---|---|
| macOS | CPAL (WASAPI/CoreAudio via cpal crate) OR VoiceProcessingIO (native Core Audio) | CPAL | VoiceProcessingIO native AEC (when vpio feature enabled) |
| Windows (default) | CPAL → WASAPI shared mode | CPAL → WASAPI shared mode | None |
| Windows (AEC build) | Direct WASAPI with IAudioClient2::SetClientProperties(AudioCategory_Communications) |
CPAL → WASAPI shared mode | OS-level: Windows routes the capture stream through the driver's communications APO chain (AEC + NS + AGC) |
| Linux | CPAL → ALSA/PulseAudio | CPAL → ALSA/PulseAudio | None |
The macOS VPIO path is gated behind the vpio feature in wzp-client and the coreaudio-rs dep is itself cfg(target_os = "macos"), so enabling the feature on Windows or Linux is a no-op.
The Windows AEC path is gated behind the windows-aec feature, also target-gated (the windows crate dep is only pulled in on Windows), and re-exports WasapiAudioCapture as AudioCapture when enabled so downstream code doesn't need to know which backend is active. The current Windows build at target/windows-exe/wzp-desktop.exe has windows-aec on; a baseline noAEC build is preserved at target/windows-exe/wzp-desktop-noAEC.exe for A/B comparison on real hardware.
See BRANCH-android-rewrite.md for Oboe audio on Android, which is its own story.
Recent major work
1. Desktop direct calling feature (commit 2fd9465 and neighbors)
Brought direct 1:1 calls to macOS with full parity to the Android client:
- Identity path fix: the desktop
CallEngine::startwas loading seed from$HOME/.wzp/identitywhileregister_signalused Tauri'sapp_data_dir(), producing two different fingerprints per run. Both now route throughload_or_create_seed()which usesapp_data_dir()everywhere. - Call history with dedup:
history.rsstores aVec<CallHistoryEntry>with aCallDirectionenum (Placed | Received | Missed). Thelogfunction dedupes bycall_idso an outgoing call isn't logged twice as "missed" (when the signal loop'sDirectCallOfferhandler fires) and then again as "placed" (whenplace_callreturns). Instead the entry is updated in place. - Recent contacts row: a horizontal chip UI in the direct-call panel showing the last N peers with friendly aliases, clickable to re-dial.
- Deregister button: lets a user drop their signal registration without quitting the app, useful when switching identities.
- Random alias derivation: a new client sees a human-friendly alias like "silent-forest-41" derived deterministically from its seed, so it's identifiable in the UI before manual naming.
- Default room "general" instead of "android", since the desktop client is not Android.
2. macOS VoiceProcessingIO integration
crates/wzp-client/src/audio_vpio.rs — a native Core Audio implementation using AUGraph + AudioComponentInstance with the VPIO audio unit. Gives you hardware-accelerated AEC (same AEC Apple ships in FaceTime / iMessage audio / voice memos) at the cost of tight coupling to Apple frameworks. Lock-free ring pattern matches the CPAL path so the upper layers don't notice the difference.
Enabled by features = ["audio", "vpio"] in the macOS target section of desktop/src-tauri/Cargo.toml.
3. Windows cross-compilation via cargo-xwin
Cross-compiling Rust + Tauri to x86_64-pc-windows-msvc from Linux using cargo-xwin, which downloads the Microsoft CRT + Windows SDK on demand and drives clang-cl as the compiler. No Windows machine is needed for the build itself — only for runtime testing.
Build infrastructure:
scripts/Dockerfile.windows-builder— Debian bookworm + Rust + cargo-xwin + Node 20 + cmake + ninja + llvm + clang + lld + nasm. Pre-warms the xwin MSVC CRT cache at image build time (saves ~4 minutes per cold build).scripts/build-windows-docker.sh— fire-and-forget remote build via Docker on SepehrHomeserverdk. Same pattern asbuild-tauri-android.sh. Uploads the.exeto rustypaste and fires anntfy.sh/wzpnotification on start and on completion.scripts/build-windows-cloud.sh— alternative pipeline using a temporary Hetzner Cloud VPS. Slower (full VM spin-up), more expensive, but useful when Docker image rebuilds would be disruptive.
Two critical blockers resolved on the way to a working .exe:
-
libopus SSE4.1 / SSSE3 intrinsic compile failure.
audiopus_sysvendors libopus 1.3.1, whoseCMakeLists.txtgates the per-file-msse4.1COMPILE_FLAGSbehindif(NOT MSVC). Underclang-cl, CMake setsMSVC=1(becauseCMAKE_C_COMPILER_FRONTEND_VARIANT=MSVCtriggersPlatform/Windows-MSVC.cmakewhich unconditionally sets the variable), so the per-file flag is never set and the SSE4.1 source files compile without the target feature — then fail with 20+ "always_inline function '_mm_cvtepi16_epi32' requires target feature 'sse4.1'" errors.Fixed by vendoring audiopus_sys into
vendor/audiopus_sys/and patching its bundled libopus to introduce anMSVC_CLvariable that is true only for realcl.exe(distinguished viaCMAKE_C_COMPILER_ID STREQUAL "MSVC"). The eightif(NOT MSVC)SIMD guards are flipped toif(NOT MSVC_CL)and the global/archblock at line 445 becomesif(MSVC_CL), so clang-cl gets the GCC-style per-file flags while real cl.exe keeps the/arch:AVX//arch:SSE2globals.Wired in via
[patch.crates-io] audiopus_sys = { path = "vendor/audiopus_sys" }at the workspace root.Upstream tracking: xiph/opus#256, xiph/opus PR #257 (both stale).
-
tauri-build needs
icons/icon.icofor the Windows PE resource. The desktop only hadicon.png. Generated a multi-size ICO (16/24/32/48/64/128/256) from the existing placeholder via Pillow and committed it. Placeholder quality — real branded icons can replace it later.
4. Windows AudioCategory_Communications capture path (task #24)
crates/wzp-client/src/audio_wasapi.rs — direct WASAPI capture via IMMDeviceEnumerator → IAudioClient2 → SetClientProperties with AudioCategory_Communications. This tells Windows "this is a VoIP call" and Windows routes the capture stream through the driver's registered communications APO chain, which on most Win10/11 consumer hardware includes AEC, NS, and AGC.
Caveat: quality is driver-dependent. On a machine with a good communications APO (Intel Smart Sound, Dolby, modern Realtek on Win11 24H2+, anything with Voice Clarity enabled) it's excellent. On generic class-compliant drivers with no communications APO registered, it's a no-op. For a guaranteed AEC regardless of driver, see task #26 which tracks implementing the classic Voice Capture DSP (CLSID_CWMAudioAEC) as a fallback.
Gated behind the windows-aec feature in wzp-client. Enabled by default in the Windows target section of desktop/src-tauri/Cargo.toml.
Build pipelines
Native macOS / Linux
cd desktop
npm install
npm run build
cd src-tauri
cargo build --release --bin wzp-desktop
Windows x86_64 via Docker on SepehrHomeserverdk
./scripts/build-windows-docker.sh # Full: pull + build + download
./scripts/build-windows-docker.sh --no-pull # Skip git fetch
./scripts/build-windows-docker.sh --rust # Force-clean Rust target
./scripts/build-windows-docker.sh --image-build # (Re)build the Docker image (fire-and-forget)
Output lands at target/windows-exe/wzp-desktop.exe. Both wzp-desktop.exe and wzp-desktop-noAEC.exe can coexist in that directory; the script writes wzp-desktop.exe so renaming the prior build to -noAEC.exe (or any other name) before rebuilding preserves it.
Windows x86_64 via Hetzner Cloud (alternative)
./scripts/build-windows-cloud.sh # Full: create VM → build → download → destroy
./scripts/build-windows-cloud.sh --prepare # Create VM and install deps only
./scripts/build-windows-cloud.sh --build # Build on existing VM
./scripts/build-windows-cloud.sh --destroy # Delete the VM
WZP_KEEP_VM=1 ./scripts/build-windows-cloud.sh # Keep VM alive after build for debug
Remember to destroy the VM at end of day with --destroy.
Linux x86_64 (relay + CLI + bench)
./scripts/build-linux-docker.sh # Fire-and-forget remote Docker build
./scripts/build-linux-docker.sh --install # Wait for completion and download
Uses the same wzp-android-builder Docker image as Android (not a separate image), since the deps (Rust + cmake + ring prereqs) are the same.
Testing
Direct calling parity
- Build on two machines (macOS + Windows, or two macOS, or any combination).
- Both machines register on the same relay.
- Copy one machine's fingerprint into the other's direct-call panel.
- Place the call. Confirm ringing UI on the callee and "calling…" UI on the caller.
- Answer. Confirm audio flows both ways.
- Hang up from either side. Confirm call-history entries are labeled correctly (
Outgoingon caller,Incomingon callee, neverMissedon a successful call).
Windows AEC A/B
- Install
wzp-desktop-noAEC.exeandwzp-desktop.exeon the same Windows box. - Join a call from each (separately) while a second machine plays known audio through the first machine's speakers.
- On the remote (listening) side: the
noAECcall should have clear audible echo; the AEC call should have minimal or no echo after a 1–2 s convergence period. - If both builds sound identical (with echo) → the
AudioCategory_Communicationsswitch isn't triggering the driver's APO chain. Investigate via task #26 (Voice Capture DSP fallback).
Known quirks
-
libopus vendor path is workspace-relative.
[patch.crates-io] audiopus_sys = { path = "vendor/audiopus_sys" }works from any crate in the workspace because Cargo resolves it against the rootCargo.toml's directory. If the workspace is moved or vendored into another workspace, update the path. -
cargo xwinoverwritesoverride.cmakeon every invocation. Any attempt to patch~/.cache/cargo-xwin/cmake/clang-cl/override.cmakeat Docker image build time is inert becausesrc/compiler/clang_cl.rsline ~444 writes the bundled file fresh on every run. All real fixes must land in the source tree (via the vendored audiopus_sys, as done here), not in the cargo-xwin cache. -
WebView2 runtime is a prerequisite on Windows 10. Windows 11 ships with it. If the
.exelaunches and immediately exits with no error on a Win10 machine, that's the missing runtime — install it from Microsoft's Evergreen bootstrapper. -
Rust 2024 edition
unsafe_op_in_unsafe_fnlint. The WASAPI backend inaudio_wasapi.rsemits ~18 of these warnings because Rust 2024 requires explicitunsafe { ... }blocks insideunsafe fnbodies. The warnings don't block the build and don't affect runtime behavior; cleaning them up is tracked informally as tech debt.
Files of interest
| Path | Purpose |
|---|---|
desktop/src/ |
Shared frontend (TypeScript + HTML + CSS) |
desktop/src-tauri/src/lib.rs |
Tauri commands shared with Android |
desktop/src-tauri/src/engine.rs |
CallEngine wrapper |
desktop/src-tauri/src/history.rs |
Persistent call history store with dedup |
crates/wzp-client/src/audio_io.rs |
CPAL capture + playback (baseline) |
crates/wzp-client/src/audio_vpio.rs |
macOS VoiceProcessingIO capture (AEC) |
crates/wzp-client/src/audio_wasapi.rs |
Windows WASAPI communications capture (AEC) |
vendor/audiopus_sys/opus/CMakeLists.txt |
Patched libopus for clang-cl SIMD |
scripts/Dockerfile.windows-builder |
Windows cross-compile Docker image |
scripts/build-windows-docker.sh |
Remote Docker build pipeline |
scripts/build-windows-cloud.sh |
Hetzner VPS alternative pipeline |
scripts/build-linux-docker.sh |
Linux x86_64 relay/CLI build pipeline |