Phase 1 of the big refactor. Escape the Tauri Android
__init_tcb+4 symbol leak (rust-lang/rust#104707) by making
wzp-desktop's Android .so pure Rust — ZERO cc::Build, no cpp/ files,
no C++ in the rustc link step. All future C++ (Oboe audio bridge)
lives in a new standalone cdylib crate `wzp-native` which is built
with cargo-ndk (the same path the legacy wzp-android crate uses
successfully on the same phone + same NDK), copied into Tauri's
gen/android/app/src/main/jniLibs at build time, and dlopened by
wzp-desktop at runtime via libloading.
Changes in this commit:
- NEW crate crates/wzp-native/ with crate-type = ["cdylib"] only
(no staticlib, no rlib — rust#104707 shows mixing staticlib with
cdylib leaks non-exported symbols, which is the original bug
source). Phase 1 scaffold has TWO extern "C" functions:
wzp_native_version() -> i32 (returns 42)
wzp_native_hello(buf, cap) -> usize (writes a string)
So we can verify dlopen + dlsym + cross-.so FFI end-to-end
before adding any real C++.
- desktop/src-tauri/cpp/ directory DELETED (7 files gone).
- desktop/src-tauri/build.rs reduced to just the git hash capture
+ tauri_build::build(). No more cc::Build of any kind.
- desktop/src-tauri/Cargo.toml: drop cc from build-dependencies,
add libloading = "0.8" as an Android-only runtime dep.
- desktop/src-tauri/src/lib.rs Builder::setup() now (on Android only)
dlopens libwzp_native.so, calls wzp_native_version() and
wzp_native_hello(), and logs the result:
"wzp-native dlopen OK: version=42 msg=\"hello from wzp-native\""
If this log appears in logcat when the app launches and the home
screen still renders, the split-cdylib pipeline is validated and
Phase 2 (port the Oboe bridge into wzp-native) can proceed.
- scripts/build-tauri-android.sh: insert a `cargo ndk -t arm64-v8a
build --release -p wzp-native` step before `cargo tauri android
build`, with `-o desktop/src-tauri/gen/android/app/src/main/jniLibs`
so the resulting libwzp_native.so lands in the place gradle will
package into the final APK.
- Workspace Cargo.toml: add crates/wzp-native to [workspace] members.
Phase 2 (separate commit, only if Phase 1 works):
- Copy cpp/oboe_bridge.{h,cpp} + getauxval_fix.c from the legacy
wzp-android crate into crates/wzp-native/cpp/.
- Add cc = "1" as a build-dependency on wzp-native (safe: it's a
single-cdylib crate with no staticlib, so no symbol leak).
- Add build.rs that compiles the Oboe C++ and the wzp-native Rust
FFI exposes the audio start/stop/read/write functions.
- wzp-desktop::engine.rs dlopens wzp-native at CallEngine::start,
uses its audio functions instead of CPAL on Android.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Incremental Step E (commit 4250f1b) proved that merely compiling the
Oboe C++ bridge into libwzp_desktop_lib.so — with NO Rust-side FFI
bindings, no function calls — resurrects the __init_tcb+4 / pthread_
create SIGSEGV at WryActivity.onCreate. Bisection:
build #17 (baseline) ✓
build #18 (Step A, hello.c) ✓
build #19 (Step B, wzp-client dep) ✓
build #21 (Step C, engine mod compiled) ✓
build #22 (Step D, getauxval_fix.c) ✓
build #23 (Step E, Oboe C++ compiled) ✗ — __init_tcb+4 crash
Root cause: tauri-cli hard-codes `aarch64-linux-android24-clang` as the
Rust linker. Without any C++ code in the .so, libstd's pthread_create
reference gets resolved against the dynamic libc.so. The moment we add
a C++ static library that links against libc++_shared, the link-time
resolution pulls in the API-24 libc.a static pthread_create stub — and
Rust's libstd then also calls that stub instead of libc.so's real one.
The stub calls __init_tcb which SIGSEGVs because bionic's TCB state
only exists for static-libc main executables, not .so's loaded via
dlopen. API-26 NDK has proper dynamic bindings that resolve correctly.
Option 3 fix: at image build time, replace every NDK
aarch64-linux-android24-clang (and armv7/x86_64/i686, clang/clang++)
binary with a one-line shell script that exec()s the corresponding
android26-clang. Since tauri-cli invokes the linker via absolute path,
PATH and env var overrides fail — but replacing the binary on disk
inside the image is guaranteed to take effect. The legacy wzp-android
crate doesn't need this because cargo-ndk respects .cargo/config.toml
where a crate-level linker override is set.
Only changing the Dockerfile here. Next: rebuild the image no-cache,
retry Step E, and if the baseline holds, proceed to Steps F/G.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spent 10+ builds chasing a __init_tcb+4 / pthread_create SIGSEGV after
adding the oboe audio backend. Every "fix" made things worse. Reverting
all Android-specific files to the state at 35642d1 (build #6), which
was the last commit where the Tauri Android app actually launched,
rendered the home screen, and successfully registered on a relay.
Reverted files (all back to their 35642d1 content):
- desktop/src-tauri/Cargo.toml (no build-dep cc, no tracing-android)
- desktop/src-tauri/build.rs (git hash only, no Oboe / cc build)
- desktop/src-tauri/src/lib.rs (engine cfg-gated on non-android)
- desktop/src-tauri/src/main.rs (two-line desktop entry)
- desktop/src-tauri/src/engine.rs (desktop-only audio setup)
- scripts/Dockerfile.android-builder (no android24→26 clang shim)
- scripts/build-tauri-android.sh (no linker env vars / manifest patch)
Deleted (were added between b314138 and e2e023d):
- desktop/src-tauri/cpp/getauxval_fix.c
- desktop/src-tauri/cpp/oboe_bridge.{h,cpp}
- desktop/src-tauri/cpp/oboe_stub.cpp
- desktop/src-tauri/src/oboe_audio.rs
Next: rebuild image on remote (to drop the baked-in clang shim), build
an APK, install on Pixel 6, verify the UI renders the same way build #6
did. From there we add features back ONE at a time so we can actually
bisect which one triggers the tao::ndk_glue crash. User's rule:
"if you want to change stack, change incrementally, so we can debug".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build #13's PATH wrapper trick failed because tauri-cli invokes the linker
with an absolute path (/opt/android-sdk/ndk/.../bin/aarch64-linux-android24-
clang), which bypasses \$PATH entirely. The pthread_shim logs confirmed the
broken API-24 stubs were still being linked:
WZP_pthread_shim: dlsym(RTLD_DEFAULT, pthread_create) returned NULL:
libdl.a is a stub --- use libdl.so instead
Move the fix up a level — into the Dockerfile itself. On image build, for
each of the four android ABIs × {clang, clang++}, rename
`${abi}24-${suffix}` to `${abi}24-${suffix}.orig` and replace it with a
shell wrapper that exec()s `${abi}26-${suffix}`. Any call to the API-24
wrapper — via PATH, absolute path, or otherwise — now transparently runs
the API-26 wrapper, which uses the real libc.so/libdl.so bindings.
The old bash-c /tmp/wrappers workaround in build-tauri-android.sh is
removed now that the image handles it at the right layer.
Also add `--shell` to build-tauri-android.sh: opens an interactive docker
container on the remote with the same mounts/env as the build, so I can
iterate on cargo tauri android build / manually patch files / etc.
without the full git push → ssh → rebuild → install loop.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build #12's instrumented pthread_shim gave us the definitive diagnosis:
WZP_pthread_shim: dlsym(RTLD_DEFAULT, pthread_create) returned NULL:
libdl.a is a stub --- use libdl.so instead
Tauri-cli invokes `aarch64-linux-android24-clang` as the linker and the
API-24 NDK sysroot ships *stub* libdl.a / libc.a: they compile fine but
every symbol crashes if called, because they're meant to coexist with a
separate dynamic .so that the dynamic linker provides at runtime. Rust's
pre-built libstd.rlib has static calls into those stubs baked in, so no
matter what we do at link time the broken code lands in the .so.
Env-var overrides of CARGO_TARGET_AARCH64_LINUX_ANDROID_LINKER don't
stick — tauri-cli resets them before invoking cargo. So instead of
fighting the env, we put a wrapper on $PATH, literally named
`aarch64-linux-android24-clang`, that exec()s the android26 version.
When tauri-cli looks up android24-clang via PATH, it gets our wrapper,
our wrapper runs android26-clang, and suddenly the whole build is using
the API-26 NDK sysroot with real dynamic bindings to libc.so / libdl.so.
Wrappers are installed for all four ABIs (aarch64, armv7, x86_64, i686)
× both suffixes (clang, clang++) directly inside the docker bash -c
preamble before any cargo invocation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit bumped minSdk from 24 to 26 in build.gradle.kts
hoping tauri-cli would pick it up and use the android26-clang linker,
but the crash recurred at exactly the same frame (__init_tcb via
pthread_create via std::thread::spawn). That means tauri-cli is
ignoring the gradle minSdk value and sticking with its hardcoded
aarch64-linux-android24-clang.
The android24 linker resolves __init_tcb against the broken static
stub in libc.a (API 24 does NOT export __init_tcb as a dynamic symbol
from libc.so — it only exists in the static archive, and the stub
expects the TCB to be initialised by a running static init path,
which never happens in a dlopen-loaded .so).
Override the linker env vars directly in the docker run invocation
for all four ABIs. These take precedence over anything tauri-cli or
.cargo/config.toml might set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build #7 crashed at launch on the Pixel 6 with SIGSEGV in
__init_tcb / pthread_create called from tao::ndk_glue::create in
WryActivity.onCreate:
#00 __init_tcb(bionic_tcb*, pthread_internal_t*)+4
#01 pthread_create+360
#02 std::sys::thread::unix::Thread::new
#04 tao::platform_impl::platform::ndk_glue::create
#05 Java_com_wzp_desktop_WryActivity_create
Tauri scaffolds build.gradle.kts with `minSdk = 24`, which makes the
tauri-cli invoke `aarch64-linux-android24-clang` as the Rust linker. That
linker transitively pulls broken static stubs from libc.a for getauxval,
__init_tcb and pthread_create — these stubs only work in statically-
linked executables because they read bionic state (__libc_auxv, TCB) that
only the libc init path sets up. In a .so loaded via dlopen they SIGSEGV
the moment anything spawns a thread.
API 26+ has the real runtime symbols and the NDK-26 linker resolves them
against libc.so instead of the static fallback. This is also the minimum
Oboe supports. Patch the generated build.gradle.kts post-init to swap
`minSdk = 24` for `minSdk = 26` — the legacy wzp-android crate solved
the same issue with a .cargo/config.toml linker override plus a
getauxval_fix.c shim.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This is the big one — the Tauri Android app now has a real audio stack
capable of full-duplex VoIP, reusing the proven C++ Oboe bridge from the
legacy wzp-android crate.
Architecture:
- desktop/src-tauri/cpp/ — copies of oboe_bridge.{h,cpp}, oboe_stub.cpp,
and getauxval_fix.c from crates/wzp-android/cpp/. build.rs clones
google/oboe@1.8.1 into OUT_DIR and compiles the bridge + all Oboe
sources as "oboe_bridge" static lib, linking against shared libc++
(static would pull broken libc stubs that SIGSEGV in .so libraries).
- src/oboe_audio.rs — Rust side: an SPSC ring buffer matching the C++
bridge's AtomicI32 layout, plus OboeHandle::start() which returns
(capture_ring, playout_ring, owning_handle). The ring exposes the same
(available / read / write) methods as wzp_client::audio_ring::AudioRing
so CallEngine treats both backends interchangeably.
- src/engine.rs — compiled on every platform now. A cfg-switched type
alias picks wzp_client::audio_ring::AudioRing on desktop and
crate::oboe_audio::AudioRing on Android. The audio setup block has
three branches: VPIO/CPAL on macOS, CPAL on Linux/Windows, Oboe on
Android. Send/recv tasks are identical across platforms.
- src/lib.rs — removes all the "step 3 not done" Android stubs. The
engine module is no longer cfg-gated; connect / disconnect / toggle_mic
/ toggle_speaker / get_status are single implementations used by both
desktop and Android. Identity path resolves via app.path().app_data_dir()
from the Tauri setup() callback (already wired in step 1).
Runtime mic permission:
- scripts/build-tauri-android.sh now injects RECORD_AUDIO + MODIFY_AUDIO_
SETTINGS into gen/android/app/src/main/AndroidManifest.xml after init,
and overwrites MainActivity.kt with a version that calls
ActivityCompat.requestPermissions in onCreate. This is idempotent:
every build re-applies the patches so tauri re-init can't regress them.
Cargo.toml:
- cc is now an unconditional build-dep (build.rs runs on the host, so
target-gating build-deps doesn't work).
- wzp-client is now a dep on every platform. On Android it gets default
features only (no "audio"/"vpio") so CPAL isn't dragged in — oboe_audio
provides the capture/playout rings instead.
- tracing-android is added on Android so tracing events flow into logcat.
build.rs also gained embedded git hash (WZP_GIT_HASH) capture, which is
shown under the fingerprint on the home screen — already committed in
7639aaf, reinstated here alongside the Oboe build logic.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two related Android-only papercuts found while testing build #4 on a Pixel 6:
1. Frontend was crashing in the WebView with:
Tauri/Console: Uncaught (in promise) event.listen not allowed.
Permissions associated with this command: core:event:allow-listen,
core:event:default
The desktop build worked fine because Tauri's default capability set
covers the desktop side. On Android (and iOS) Tauri 2.x is much stricter
about ACL — without an explicit capabilities/default.json that lists
"android" in its platforms, the WebView gets zero permissions. Add a
default capability granting core:default + the event listener perms
across all five platforms (linux/macOS/windows/android/iOS).
2. Every fresh docker run produced a new ~/.android/debug.keystore, so
`adb install -r` of a freshly built APK over an already-installed one
failed with INSTALL_FAILED_UPDATE_INCOMPATIBLE. Mount a persistent host
volume at /home/builder/.android in build-tauri-android.sh so the same
debug keystore is reused across builds and `install -r` keeps working.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dockerfile.android-builder: install Android API 36 platform + build-tools
35.0.0 alongside the existing API 34 set. Tauri 2.x mobile defaults to
compileSdk 36 / build-tools 35; without these the gradle build fails with
"SDK directory is not writable" because the read-only /opt/android-sdk
volume can't grow at build time. Adding Node.js 20, all four Rust android
targets, and tauri-cli 2.x was already in place.
scripts/build-tauri-android.sh: new build wrapper for the desktop/ Tauri
project (parallel to scripts/build-and-notify.sh which targets the legacy
android/ Kotlin app). Pulls the branch on remote, runs cargo tauri android
build inside the docker image, and sends three ntfy.sh/wzp notifications
that all carry the short git hash:
- STARTED [hash] — <commit subject>
- OK [hash] (size) — <rustypaste apk url>
- FAILED [hash] (line N) — <rustypaste log url>
On failure the full /tmp/wzp-tauri-build.log is uploaded to rustypaste so
the URL in the failure ntfy is directly downloadable, same place as the
APK.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
git pull fails when refs are stale from concurrent builds. Switch to
git gc + git fetch + git reset --hard origin/branch for robustness.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cargo.lock changes from Docker builds caused pull conflicts. Now uses
reset --hard + clean -fd to guarantee clean state before pulling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Dedup key now includes source peer fingerprint hash, preventing
packets from different senders with same room+seq from being dropped
as duplicates (was silently killing all multi-hop audio)
- Build scripts default to --pull (use --no-pull to skip)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hash was read inside Docker (/build/source) where .git doesn't
exist. Now reads from $BASE_DIR/data/source before Docker runs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ntfy messages now show: "WZP Linux [abc1234] ready!" and
"WZP Android [abc1234] done! APK: url" so you can verify which
commit was built without checking relay version remotely.
Also added PRD-mtu-discovery.md for QUIC Path MTU Discovery.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wzp-relay --version prints "wzp-relay <short-git-hash>".
Build hash also logged on startup: version=abc1234.
Enables verifying deployed relay matches expected build.
Also fixed federation-test.sh: use kill -INT (not SIGTERM) so
clients save recordings before exit. Added save delay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cargo-ndk doesn't always copy libc++_shared.so into jniLibs. The
build script now finds it in the NDK and copies it manually if
missing, preventing the build check from failing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both Android and Linux build scripts now send ntfy notification
when build fails, not just on success.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same Docker image as Android build. Separate cache dirs (cache-linux/)
to avoid conflicts when running both builds simultaneously.
Builds: wzp-relay, wzp-client, wzp-client-audio, wzp-web, wzp-bench
Uploads tar.gz to rustypaste, notifies ntfy.sh/wzp.
Usage:
./scripts/build-linux-docker.sh --pull # fire and forget
./scripts/build-linux-docker.sh --pull --install # wait + download
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Relay identity:
- Stored in ~/.wzp/relay-identity (hex-encoded 32-byte seed)
- Generated on first run, reused on restart
- Fingerprint stays consistent across relay restarts
Linux build script (scripts/build-linux-notify.sh):
- Fire and forget: Hetzner VM → build all binaries → upload to rustypaste → ntfy notify → destroy VM
- Builds: wzp-relay, wzp-client, wzp-client-audio, wzp-web, wzp-bench
- Packages as tar.gz, uploads to rustypaste
- --keep flag to preserve VM
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- AudioPipeline.debugRecording defaults to false (was true)
- SettingsRepository: persist debug_recording preference
- CallViewModel: debugRecording StateFlow + setter, wired to AudioPipeline
- Only records PCM + RMS when explicitly enabled in settings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The jni crate emits VERBOSE logs for every JNI method lookup (~10 lines
per call, 100+ calls/sec on audio threads). This floods logcat, consumes
CPU, and triggers system kills. Filter to only show INFO+ for our crates
and WARN+ for everything else.
Also fix build script: clean full Rust target to ensure libc++_shared.so
is always copied by cargo-ndk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Filter hcloud by SERVER_NAME to avoid touching other servers
- Use rsync instead of tar (handles submodules, no macOS xattr spam)
- Default server type cx33
- Release APK failure is non-fatal (debug APK still produced)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cmake 3.28 works when ANDROID_NDK is set (not just ANDROID_NDK_HOME).
Relaxed version check from <=3.26 to <=3.30.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents WHY each version is pinned:
- cmake 3.25: 3.27+ rewrote Android-Determine.cmake with bugs
- NDK 26.1: NDK 27 scudo crashes on MTE devices (Nothing A059)
- JDK 17: Gradle 8.5 + AGP 8.2.0 official support
- ANDROID_NDK: cmake checks this, not ANDROID_NDK_HOME
Idempotent, works from clone or existing tree.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the single-shot ephemeral VM approach:
- --prepare: create VM, install deps (Rust, cmake, etc), upload source
- --build: build on VM with full output (iterate on errors)
- --transfer: download binaries to target/linux-x86_64/
- --destroy: delete VM when done
- --upload: re-upload source to existing VM
- --all: prepare + build + transfer (VM persists)
VM reuse: --prepare detects existing wzp-builder VM and just re-uploads.
All steps get VM IP from hcloud server list (last created).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Web playback rewritten from push-scheduling to pull-based ring buffer:
- ScriptProcessorNode pulls frames from buffer every ~21ms
- Buffer capped at 10 frames (~200ms) — drops oldest on overflow
- Latency permanently bounded, no drift over time
Also: install ring crypto provider for rustls TLS on Linux,
build on debian-12 to match mequ glibc.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cpal is now behind an 'audio' feature flag (off by default)
- --live mode requires --features audio at build time
- --send-tone and --record work on headless servers without audio libs
- Linux build script no longer installs libasound2-dev
Build for headless: cargo build --release
Build with mic/speakers: cargo build --release --features audio
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CLI modes:
- --send-tone <secs>: send 440Hz test tone (no mic needed)
- --record <file.raw>: save received audio to raw PCM file
- --help: usage info
- Combine: --send-tone 10 --record out.raw
Raw PCM format: 48kHz mono s16le
Play with: ffplay -f s16le -ar 48000 -ac 1 out.raw
Build scripts:
- scripts/build-linux.sh: Hetzner VPS build with auto-cleanup
- scripts/cleanup-builder.sh: kill stale builders
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>