feat(windows): WASAPI capture backend with OS-level AEC
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Has been cancelled

Adds a direct WASAPI microphone capture path for the Windows desktop
build that opens the default communications endpoint via
IMMDeviceEnumerator -> IAudioClient2 -> SetClientProperties with
AudioCategory_Communications, turning on Windows's communications
audio processing chain (AEC, noise suppression, automatic gain
control). The communications AEC operates at the OS level and uses
the system render mix as the reference signal, so echo from our
existing CPAL playback stream is cancelled automatically with no
per-process reference plumbing.

Architecture:
- New crates/wzp-client/src/audio_wasapi.rs module (~280 lines).
  Event-driven capture loop on a dedicated thread; pushes PCM into
  the same lock-free AudioRing used by the CPAL path. Same public
  API as audio_io::AudioCapture so downstream code is unchanged.
- New `windows-aec` feature in wzp-client that pulls in the
  `windows` crate (Microsoft's official Rust COM bindings) gated to
  target_os = "windows" only. Enabling the feature on non-Windows
  targets is a no-op since both the module and the dep are
  cfg(target_os = "windows").
- lib.rs re-exports WasapiAudioCapture as AudioCapture when the
  feature is on, otherwise falls back to the CPAL AudioCapture.
  AudioPlayback is always the CPAL one — no reason to swap it.
- desktop/src-tauri/Cargo.toml Windows target enables the new
  feature: `features = ["audio", "windows-aec"]`.

Implementation notes:
- Uses eCommunications role (not eConsole) for GetDefaultAudioEndpoint
  — the user-configured "communications" device that Teams/Zoom
  pick up, and the one Windows's AEC is tuned for.
- Requests 48 kHz mono i16 with AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM +
  SRC_DEFAULT_QUALITY so Windows handles any format conversion in
  the audio engine instead of rejecting our format.
- Event-driven with SetEventHandle / WaitForSingleObject — no
  polling, minimal CPU cost between packets.
- 200 ms wait timeout so the capture thread polls `running` often
  enough for Drop to stop cleanly even if the audio engine stalls
  (e.g. device unplug).

Task #24.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-04-10 14:35:36 +04:00
parent 7fecf285ea
commit 03a80a3196
5 changed files with 453 additions and 10 deletions

View File

@@ -15,6 +15,12 @@ pub mod audio_ring;
// feature on Windows/Linux was previously silently broken.
#[cfg(all(feature = "vpio", target_os = "macos"))]
pub mod audio_vpio;
// WASAPI-direct capture with Windows's OS-level AEC (AudioCategory_Communications).
// Only compiled when `windows-aec` feature is on AND target is Windows. The
// `windows` dependency is itself gated to Windows in Cargo.toml, so enabling
// this feature on non-Windows targets is a no-op.
#[cfg(all(feature = "windows-aec", target_os = "windows"))]
pub mod audio_wasapi;
pub mod bench;
pub mod call;
pub mod drift_test;
@@ -24,7 +30,24 @@ pub mod handshake;
pub mod metrics;
pub mod sweep;
// AudioPlayback always comes from the CPAL path (`audio_io`). We do not
// need OS-level processing on the playback side because Windows's
// communications AEC, once engaged on the capture stream, uses the system
// render mix as the reference signal — it cancels echo from CPAL playback
// (and any other app's audio) without special handling.
#[cfg(feature = "audio")]
pub use audio_io::{AudioCapture, AudioPlayback};
pub use audio_io::AudioPlayback;
// AudioCapture: two possible backends. Windows-AEC path when compiled in,
// otherwise the plain CPAL path. The two types share the same public API
// (`start`, `ring`, `stop`, `Drop`) so downstream code is identical.
#[cfg(all(
feature = "audio",
any(not(feature = "windows-aec"), not(target_os = "windows"))
))]
pub use audio_io::AudioCapture;
#[cfg(all(feature = "windows-aec", target_os = "windows"))]
pub use audio_wasapi::WasapiAudioCapture as AudioCapture;
pub use call::{CallConfig, CallDecoder, CallEncoder};
pub use handshake::perform_handshake;