Commit Graph

44 Commits

Author SHA1 Message Date
Siavash Sameni
daf7bcd9ba chore(warnings): sweep the workspace — zero warnings on lib + bin targets
Addressed every rustc warning surfaced by \`cargo check --workspace
--release --lib --bins\` on opus-DRED-v2. Split across three
categories:

## Real bugs surfaced by the audit (fix, don't silence)

- **crates/wzp-relay/src/federation.rs** — the per-peer RTT monitor
  task computed \`rtt_ms\` every 5 s and threw it on the floor. The
  \`wzp_federation_peer_rtt_ms\` gauge has been registered in
  metrics.rs the whole time but was never receiving samples, leaving
  the Grafana panel blank. Wired it up: the task now calls
  \`fm_rtt.metrics.federation_peer_rtt_ms.with_label_values(&[&label_rtt]).set(rtt_ms)\`
  on every sample. Fixes three warnings (\`rtt_ms\`, \`fm_rtt\`,
  \`label_rtt\` were all captured for this task and all dead).

## Dead code removal

- **crates/wzp-relay/src/federation.rs** — removed \`local_delivery_seq:
  AtomicU16\` field and its initializer. It was described in comments
  as "per-room seq counter for federation media delivered to local
  clients" but was declared, initialized to 0, and never read or
  written anywhere else. Genuine half-wired feature; deletable with
  zero behavior change.
- **crates/wzp-relay/src/room.rs** — removed \`let recv_start =
  Instant::now()\` at the top of a recv loop that was never read.
  Separate variable \`last_recv_instant\` already measures the actual
  gap that's used for the \`max_recv_gap_ms\` stat.
- **crates/wzp-client/src/cli.rs** — removed \`let my_fp = fp.clone()\`
  from the signal loop setup. Cloned but never used in any match arm.

## Stub-intent warnings (underscore + explanatory comment)

- **crates/wzp-relay/src/handshake.rs** — \`choose_profile\` hardcodes
  \`QualityProfile::GOOD\` and ignores its \`supported\` parameter.
  Comment already documented "Cap at GOOD (24k) for now — studio
  tiers not yet tested for federation reliability". Renamed to
  \`_supported\`, expanded the comment to explicitly note the future
  plan (pick highest supported ≤ relay ceiling).
- **crates/wzp-relay/src/federation.rs** — \`forward_to_peers\` takes
  \`room_name: &str\` but only uses \`room_hash\`. The caller
  (handle_datagram) passes the name for caller-site symmetry with
  other helpers; kept the param shape and underscored the binding
  with a comment noting it's reserved for future per-name logging.

## Cosmetic fixes

- **crates/wzp-relay/src/event_log.rs** — dropped \`use std::sync::Arc\`
  (unused).
- **crates/wzp-relay/src/signal_hub.rs** — trimmed \`use tracing::{info,
  warn}\` to \`use tracing::info\`. Also removed unnecessary \`mut\` on
  \`hub\` binding in the \`register_unregister\` test.
- **crates/wzp-relay/src/room.rs** — trimmed \`use tracing::{debug,
  error, info, trace, warn}\` to \`{error, info, warn}\`. Also removed
  unnecessary \`mut\` on \`mgr\` binding in the \`room_join_leave\` test.
- **crates/wzp-relay/src/main.rs** — removed unnecessary \`mut\` on the
  \`config\` destructured binding from \`parse_args()\`; and dropped
  \`ref caller_alias\` from the \`DirectCallOffer\` match pattern since
  the relay just forwards the full \`msg\` (caller_alias is preserved
  end-to-end, we don't need to read it on the relay).
- **crates/wzp-crypto/tests/featherchat_compat.rs** — dropped
  \`CallSignalType\` from a \`use wzp_client::featherchat::{...}\`
  (unused in the test body). Note: this test file has pre-existing
  compile errors from SignalMessage schema drift unrelated to this
  sweep; that's tracked separately.

## Crate-level annotation

- **crates/wzp-android/src/lib.rs** — added
  \`#![allow(dead_code, unused_imports, unused_variables, unused_mut)]\`
  with a doc block explaining the crate is dead code since the Tauri
  mobile rewrite. The legacy Kotlin+JNI Android app that consumed
  this crate was replaced by desktop/src-tauri (live Android recv
  path) + crates/wzp-native (Oboe bridge). Rather than piecemeal
  cleanup of a crate that shouldn't be maintained, the whole-crate
  allow keeps CI clean until someone removes the crate entirely. Kills
  all 6 wzp-android warnings (4 unused imports/vars, 1 unused \`mut\`
  on a JNI env param, 1 dead \`command_rx\` field) in one line.

## Not touched

- **deps/featherchat/warzone/crates/warzone-protocol/src/x3dh.rs** —
  3 unused-variable warnings in \`alice_spk_secret\`, \`alice_bundle\`,
  \`bob_bundle_bytes\`. This is a vendored third-party submodule;
  upstream's problem, not ours. Would need to be reported to
  featherchat upstream if we care.

## Verification

- \`cargo check --workspace --release --lib --bins\` → 0 warnings, 0 errors
- \`cargo check --workspace --release --all-targets\` → only the 3
  featherchat submodule warnings remain, plus the pre-existing 3
  broken integration tests (SignalMessage schema drift from Phase 2,
  tracked separately and explicitly out of scope).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 08:28:26 +04:00
Siavash Sameni
505a834c5b feat(codec): Phase 3c — Android engine.rs DRED reconstruction on packet loss
Phase 3c mirrors Phase 3b on the Android receive path. With Phase 0-3b
landed on desktop + Android encoder, this commit completes codec-layer
loss recovery on the Android decoder side.

Architectural difference vs desktop: engine.rs has NO jitter buffer.
The recv task reads packets directly from the transport via
recv_media().await and writes decoded audio straight into the playout
ring. There is no PlayoutResult::Missing equivalent. Gap detection
therefore has to be done via sequence-number tracking — when a packet
arrives with seq > expected_seq, the frames in between are missing and
we attempt to reconstruct them via DRED before decoding the newly-
arrived packet.

Implementation:

  Imports & types:
    - Added wzp_codec::AdaptiveDecoder, wzp_codec::dred_ffi::{
      DredDecoderHandle, DredState} imports.
    - Changed the `decoder` local from Box<dyn AudioDecoder> (via
      wzp_codec::create_decoder) to concrete AdaptiveDecoder::new(profile).
      Same reasoning as Phase 3b: reconstruct_from_dred is an inherent
      method, not a trait method, so we need the concrete type.

  Recv task state (all task-local, no new struct fields):
    - dred_decoder: DredDecoderHandle
    - dred_parse_scratch: DredState (reused, overwritten per parse)
    - last_good_dred: DredState (cached most-recent valid state)
    - last_good_dred_seq: Option<u16>
    - expected_seq: Option<u16> (for gap detection)
    - dred_reconstructions: u64 (telemetry)
    - classical_plc_invocations: u64 (telemetry)

  Recv loop body (Opus source packets only):
    1. Parse DRED from the new packet first so last_good_dred reflects
       the freshest state available for gap recovery.
    2. Detect a gap: gap = pkt.seq.wrapping_sub(expected_seq). Cap at
       MAX_GAP_FRAMES = 16 (320 ms) to avoid huge wraparound scenarios.
    3. For each missing seq in the gap:
         offset = (last_good_dred_seq - missing_seq) * frame_samples
         if 0 < offset <= last_good_dred.samples_available():
             reconstruct_from_dred + write to playout ring
             bump dred_reconstructions
         else:
             decoder.decode_lost (classical PLC) + write + bump plc counter
    4. Decode the current packet normally and write to playout ring
       (unchanged from Phase 2).
    5. Update expected_seq = pkt.seq.wrapping_add(1).

  Profile-switch handling: when the incoming codec changes (triggering
  decoder.set_profile), reset last_good_dred_seq and expected_seq to
  None. The cached DRED state is tied to the old profile's frame rate
  and would produce wrong offsets after the switch; starting fresh is
  correct.

  Decode-error fallback: the existing `Err(e) => decode_lost` branch
  now also increments classical_plc_invocations so the counter
  accurately reflects all PLC invocations (gap-detected AND decode-
  error-triggered).

Telemetry (CallStats additions):
  - stats.dred_reconstructions: u64
  - stats.classical_plc_invocations: u64
  Both updated on every packet arrival in the existing stats.lock()
  block alongside frames_decoded/fec_recovered, so the Android UI and
  JNI bridge already have these values without any further plumbing.
  The periodic recv stats log now includes both counters.

Ordering note: DRED gap reconstruction happens BEFORE decoding the new
packet's audio because the playout ring is FIFO. Gap samples must be
written before the new packet's samples so temporal order is preserved.
Out-of-order late arrivals (seq < expected_seq) are naturally dropped
as stale by the gap detection (gap would be a large wraparound value
exceeding MAX_GAP_FRAMES).

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 68 passing (unchanged from Phase 3b)
- cargo test -p wzp-client --lib: 35 passing (unchanged from Phase 3b)
- cargo check -p wzp-android --lib: zero errors
- cargo test -p wzp-android cannot run on macOS host (pre-existing
  -llog linker dep, unrelated). Real end-to-end verification happens
  via the Android APK build on the remote Docker builder
  (scripts/build-and-notify.sh).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:03:31 +04:00
Siavash Sameni
6db5c25b54 feat(codec): Phase 2 — remove RaptorQ from Opus tiers, Codec2 unchanged
Phase 2 of the DRED integration (docs/PRD-dred-integration.md). With
Phase 1 having enabled DRED on every Opus profile, the app-level RaptorQ
layer is now redundant overhead on those tiers: +20% bitrate, +40–100 ms
receive-side latency (block wait), +CPU for stats we never used. This
phase removes RaptorQ from the Opus encode and decode paths on both the
desktop (wzp-client/call.rs) and Android (wzp-android/engine.rs) sides.
Codec2 tiers keep RaptorQ with their current ratios unchanged — DRED is
libopus-only and Codec2 has no neural equivalent.

Encoder changes (the real bandwidth / CPU win):
- CallEncoder::encode_frame and engine.rs encode loop now gate the
  RaptorQ path on !codec.is_opus():
    - Opus source packets emit fec_block=0, fec_symbol=0,
      fec_ratio_encoded=0 in the MediaHeader
    - fec_enc.add_source_symbol is skipped on Opus
    - generate_repair + repair packet emission is skipped on Opus
    - block_id and frame_in_block counters stay frozen at 0 for Opus
- Codec2 path is byte-for-byte identical to pre-Phase-2 behavior.

Decoder changes (mostly cleanup, since both live decoder paths were
already reading audio directly from source packets and only using the
RaptorQ decoder output for stats):
- CallDecoder::ingest skips fec_dec.add_symbol on Opus packets. Source
  packets still flow to the jitter buffer; Opus repair packets from old
  senders are dropped cleanly (repair packets never hit the jitter
  buffer either).
- engine.rs recv loop skips fec_dec.add_symbol, fec_dec.try_decode, and
  fec_dec.expire_before on Opus packets. The `fec_recovered` stat
  counter becomes Codec2-only (a separate DRED reconstruction counter
  lands in Phase 4).

Wire-format backward compat verified at pre-flight:
- Old receiver + new sender: engine.rs pipeline.rs path gates on
  non-zero fec_block/fec_symbol which now never fire for Opus, so the
  RaptorQ decoder simply isn't fed. Audio flows normally. Desktop
  CallDecoder's old path accumulated packets into the stale-eviction
  HashMap, which cleans up after 2s — harmless.
- New receiver + old sender: new receiver skips RaptorQ on Opus so
  old-sender repair packets are ignored entirely (no crash, no double-
  decode). Loses the (previously vestigial) RaptorQ recovery benefit,
  which was never actually active in the audio path. Source packets
  still decode normally.
- No wire format version bump required. MediaHeader is unchanged; we
  just zero the FEC fields on Opus packets.

Test changes:
- Removed `encoder_generates_repair_on_full_block` — asserted the old
  (pre-Phase-2) RaptorQ-on-Opus behavior and is now incorrect. Replaced
  with two symmetric tests:
    - `opus_source_packets_have_zero_fec_header_fields` — verifies
      Phase 2 invariants on Opus packets
    - `opus_encoder_never_emits_repair_packets` — runs 20 frames of
      non-silent sine wave through a GOOD-profile encoder, asserts
      exactly 20 output packets, zero repair
    - `codec2_encoder_generates_repair_on_full_block` — same shape as
      the old test but on CATASTROPHIC profile (Codec2 1200, 8
      frames/block, ratio 1.0) to verify Codec2 path still emits
      repairs as before

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 61 passing (Phase 1 baseline held)
- cargo test -p wzp-client --lib: 32 passing (+3 new Phase 2 tests,
  -1 old test removed)
- cargo check -p wzp-android --lib: zero errors (host link of
  wzp-android tests fails on -llog per pre-existing Android-only
  build.rs, unrelated to this work; integration build via
  build-and-notify.sh will validate Android end-to-end)
- Pre-existing broken integration test in
  crates/wzp-client/tests/handshake_integration.rs (SignalMessage
  schema drift) is NOT caused by this commit — baseline had the same
  3 compile errors before Phase 2. Flagged as a separate cleanup task.

Expected observable effects on a real call:
- Opus 24k outgoing bitrate drops from ~28.8 kbps (ratio 0.2 RaptorQ)
  to ~25 kbps (base 24 kbps + DRED ~1–10 kbps signal-dependent)
- Opus receive-side latency drops ~40 ms on clean network (no more
  block wait — jitter buffer emits as soon as a source packet arrives)
- Codec2 calls show no latency or bitrate change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:02:42 +04:00
Siavash Sameni
5d8e743cbf feat: Android engine + Kotlin API for direct 1:1 calling
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m47s
Rust engine:
- start_signaling(): persistent _signal connection, presence registration
- Signal recv loop: handles DirectCallOffer, CallRinging, CallSetup, Hangup
- New CallState variants: Registered, Ringing, IncomingCall
- Stats expose incoming_call_id, incoming_caller_fp, incoming_caller_alias, sas_code
- New EngineCommands: PlaceCall, AnswerCall, RejectCall

JNI bridge:
- nativeStartSignaling(relay, seed, token, alias)
- nativePlaceCall(targetFp)
- nativeAnswerCall(callId, mode)

Kotlin API (WzpEngine.kt):
- startSignaling(relay, seed, token, alias)
- placeCall(targetFingerprint)
- answerCall(callId, mode) — 0=Reject, 1=AcceptTrusted, 2=AcceptGeneric

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:02:48 +04:00
Siavash Sameni
3d76acf528 fix: multi-hop federation — hub relay forwards without local participants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m18s
Three fixes for 3-relay chain (R1→R2→R3):

1. Room lookup in handle_datagram: hub relay (R2) has no local
   participants, so active_rooms() was empty and datagrams were
   silently dropped. Now also checks global_rooms config directly,
   allowing hub relays to forward without local clients.

2. Multi-hop forwarding: removed active_rooms filter — forward to
   ALL connected peers except source. The receiving peer decides
   whether to deliver or forward further.

3. Android relay_label: native RoomMember now includes relay_label
   from RoomUpdate signal. Kotlin UI reads it for relay grouping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:33:44 +04:00
Siavash Sameni
0abecf7fd8 feat: adaptive quality engine + codec indicator UI
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 2m17s
Wire AdaptiveQualityController into Android engine for auto codec
switching based on network quality reports. Add color-coded TX/RX
codec badges to the in-call screen showing active codecs and Auto mode.

- Recv task: ingest QualityReports, feed to controller, signal profile
  changes via AtomicU8 to send task
- Send task: check for pending profile switch at frame boundaries,
  update encoder/FEC/frame size
- Track peer codec from incoming packet headers
- Kotlin UI: codec badges (blue=studio, green=good, amber=degraded,
  red=catastrophic) with Auto tag
- Add .taskmaster to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:19:11 +04:00
Siavash Sameni
d06cf66538 fix: auto codec, force-ping button, relay delete button
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m57s
1. Auto codec: new "Auto" position on quality slider (JNI index 7).
   When selected, the engine uses the relay's chosen_profile from
   CallAnswer instead of the local preference. Slider now has 8
   positions: Studio 64k → Auto → Codec2 1.2k.

2. Force ping: added refresh button (↻) in Manage Relays dialog
   header. Calls pingAllServers() to re-check all relays on demand.

3. Delete relay fix: the X button was inside a Surface(onClick=...)
   which swallowed the touch event. Replaced with a separate Surface
   that properly intercepts the click.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:22:24 +04:00
Siavash Sameni
c8bcc5c974 fix: advertise studio profiles in handshake supported_profiles
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 2m7s
Mirror to GitHub / mirror (push) Failing after 35s
The CallOffer only advertised GOOD/DEGRADED/CATASTROPHIC. When a
client uses a studio profile, the relay's choose_profile couldn't
pick it. Now advertises all 6 profiles (studio 64k/48k/32k + good +
degraded + catastrophic) in both Android engine and shared handshake.

Also: the relay MUST be rebuilt with the new CodecId variants,
otherwise it will fail to deserialize CallOffer messages containing
studio QualityProfiles in supported_profiles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:39:31 +04:00
Siavash Sameni
53f8bf8fff feat: full quality tiers + slider UI + key-change warning on Android
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m52s
1. Wire protocol: add Opus 32k/48k/64k (CodecId 6/7/8) + STUDIO
   profiles with is_opus() helper. Opus enc/dec accept all Opus variants.

2. JNI bridge: expand profile_from_int to 7 levels (0-6) mapping to
   GOOD, DEGRADED, CATASTROPHIC, Codec2_3200, STUDIO_32K/48K/64K.

3. Settings UI: replace radio buttons with Material3 Slider — 7 stops
   from Studio 64k (green) to Codec2 1.2k (dark red), matching desktop.

4. Key-change warning: AlertDialog on connect when server fingerprint
   has changed. Shows old vs new fingerprint, Accept New Key or Cancel.
   Accepting saves the new fingerprint and proceeds with the call.

5. Engine recv: handle studio codec IDs in auto-switch path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:11:29 +04:00
Siavash Sameni
fa3c7f1cef fix: dynamic frame sizing for non-default quality profiles on Android
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
The send loop was hardcoded to 960 samples (20ms/Opus24k), causing
DEGRADED (Opus 6k, 40ms) and CATASTROPHIC (Codec2 1200, 40ms) to
fail — the encoder needed 1920 samples but only got 960.

Changes:
- capture_buf, ring read threshold, and timestamp increment are now
  computed from profile.frame_duration_ms (960 for 20ms, 1920 for 40ms)
- decode_buf sized to MAX_FRAME_SAMPLES (1920) to handle any incoming codec
- recv codec switch now uses correct QualityProfile per codec (was
  inheriting original profile's frame_duration_ms, breaking cross-codec)
- added ComfortNoise guard on recv path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:00:27 +04:00
Siavash Sameni
68b56d9172 fix: ping every 5min (was 5s), clean endpoint on failure, never block connect
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m45s
- Ping interval: 5 minutes (was 5 seconds — too aggressive)
- Rust ping_relay: explicitly close endpoint + shutdown runtime on failure
- Connect button works regardless of ping status (never blocked)
- Ping failure doesn't corrupt engine state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:40:14 +04:00
Siavash Sameni
a8dc350a65 feat: codec selection in settings (Opus / Opus Low / Codec2)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 41s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
- Settings UI: radio buttons for encode codec selection
- Persisted via SettingsRepository
- Passed through WzpEngine.startCall(profile=) → JNI → Rust CallStartConfig
- Decode always accepts all codecs (per-packet codec_id switch)
- 0 = Opus 24k (GOOD), 1 = Opus 6k (DEGRADED), 2 = Codec2 1.2k (CATASTROPHIC)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:50:01 +04:00
Siavash Sameni
00fa109f07 feat: codec2 support — adaptive encoder/decoder, per-packet codec switch
Some checks failed
Mirror to GitHub / mirror (push) Failing after 33s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
Android engine:
- Use wzp_codec::create_encoder/create_decoder (factory) instead of
  hardcoded OpusEncoder/OpusDecoder
- Recv path: auto-switch decoder based on incoming packet's codec_id
- Supports mixed-codec rooms (one client Opus, another Codec2)

Desktop client already uses factory functions — no changes needed.

Codec selection via QualityProfile:
- GOOD: Opus 24kbps
- DEGRADED: Opus 6kbps
- CATASTROPHIC: Codec2 1200bps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:34:14 +04:00
Siavash Sameni
18f7faa279 fix: ping as engine instance method — same lifecycle as call
Some checks failed
Mirror to GitHub / mirror (push) Failing after 7s
Build Release Binaries / build-amd64 (push) Failing after 19s
Ping was a static JNI method that loaded the .so before nativeInit,
crashing jemalloc. Now ping is an instance method on WzpEngine:

- Engine is created once (nativeInit), reused for both ping and call
- pingRelay() uses same tokio runtime pattern as startCall()
- Auto-pings all servers on app launch (after engine init)
- No process restart needed
- TOFU fingerprints saved on first successful ping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 09:49:33 +04:00
Siavash Sameni
264ef9c4d4 feat: relay ping with RTT, server TOFU, lock icons (Phase 2 backport)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Rust JNI:
- nativePingRelay: QUIC connect with 3s timeout, returns RTT + server
  certificate fingerprint as JSON. Static method, no engine needed.

Kotlin:
- WzpEngine.pingRelay() static wrapper
- SettingsRepository: TOFU fingerprint persistence (tofu_{address} keys)
- CallViewModel: pingAllServers() coroutine, lockStatus() helper,
  PingResult/LockStatus data types
- InCallScreen: server chips show lock icon + RTT color (green/yellow),
  "Ping All" button

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:43:53 +04:00
Siavash Sameni
5e93cb74f2 fix: filter tracing to INFO for wzp crates, WARN for jni crate
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 4m7s
The jni crate emits VERBOSE logs for every JNI method lookup (~10 lines
per call, 100+ calls/sec on audio threads). This floods logcat, consumes
CPU, and triggers system kills. Filter to only show INFO+ for our crates
and WARN+ for everything else.

Also fix build script: clean full Rust target to ensure libc++_shared.so
is always copied by cargo-ndk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 21:37:29 +04:00
Siavash Sameni
9eed94850d fix: DirectByteBuffer audio path — eliminate JNI array copies
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m43s
Adds nativeWriteAudioDirect / nativeReadAudioDirect JNI functions
that accept a DirectByteBuffer instead of ShortArray. The buffer's
native memory is accessed directly by Rust via pointer — no
GetShortArrayRegion / SetShortArrayRegion, no GC-managed array
copies on the audio hot path.

This fixes SIGBUS crashes on Android 16 where ART's concurrent
mark-compact GC crashes when flipping thread roots during JNI
array operations on MAX_PRIORITY audio threads.

Old ShortArray methods kept for backward compatibility.
AudioPipeline switched to use Direct variants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 19:29:08 +04:00
Siavash Sameni
33fab9a049 fix: vec allocation for AudioRing, catch_unwind on tracing init, profiling
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m49s
- AudioRing: use vec![].into_boxed_slice() instead of Box::new([]) to
  avoid 32KB stack allocation that crashes scudo on Android
- JNI bridge: wrap tracing_subscriber init in catch_unwind to survive
  sharded_slab allocation failures on some devices
- Engine: per-step encode profiling (avg_agc_us, avg_opus_us, avg_fec_us,
  avg_send_us) logged every 5 seconds in send stats

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 15:41:46 +04:00
Siavash Sameni
31d2306915 feat: per-step encode profiling in send task stats
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Adds average microsecond timings for each encode step:
- avg_agc_us: AGC processing
- avg_opus_us: Opus encoding
- avg_fec_us: FEC encode + repair generation
- avg_send_us: QUIC send_media
- avg_total_us: sum of above

Logged every 5 seconds in send stats. Resets each interval.
Use to identify which step is bottlenecking the encode loop
on devices where fps drops below 50.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 14:18:33 +04:00
Siavash Sameni
4af7c5f94c fix: AudioRing cursor desync + capture thread use-after-free
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m56s
AudioRing (reader-detects-lap architecture):
- Writer NEVER touches read_pos — fixes SPSC invariant violation
- Reader self-corrects when lapped (snaps read_pos forward)
- Power-of-2 capacity (16384 = 341ms) with bitmask indexing
- Added overflow_count and underrun_count diagnostics
- Wired ring health into engine stats and periodic logging

Capture thread use-after-free (drain latch):
- Added CountDownLatch(2) to AudioPipeline
- Audio threads count down after exiting their loops
- teardown() awaits latch (200ms timeout) before destroy()
- Guarantees no in-flight JNI calls when native handle is freed
- stopAudio() no longer nulls pipeline (teardown handles it)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:28:34 +04:00
Claude
2b3bdae440 fix: enable Rust tracing → Android logcat via tracing-android
Rust tracing subscriber was never initialized — all info!/warn!/error!
calls in the engine went to /dev/null. This meant our send/recv health
logging was invisible and we couldn't confirm the congestion fix was
active.

Now initializes tracing-android layer on first nativeInit(), routing
all Rust logs to logcat under tag "wzp_android". Also expanded logcat
filter in DebugReporter to capture engine-level log lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 08:03:28 +00:00
Claude
20922455bd fix: send task crash on QUIC congestion + AEC toggle + debug reporter
Root cause: send_media() returns Err(Blocked) when QUIC congestion
window is full. The send task treated ANY send error as fatal (break),
killing the entire call. Now send errors drop the packet and continue.

Also hardened recv task to survive transient errors and added health
logging (recv gap tracking, periodic stats) to both send and recv.

Relay: added comprehensive debug logging — recv gaps, lock contention,
forward latency, send errors — all per-participant with 5s stats.

Other changes:
- AEC toggle in Settings (persisted, applied on next call)
- Debug report: records call audio (WAV), RMS histogram (CSV), logcat,
  stats. Emailed as zip via Android share intent after call ends.
- Replaced LinearProgressIndicator with Box (compose version compat)
- FileProvider for sharing debug zip attachments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 07:38:56 +00:00
Claude
e6564bab57 fix: mic mute crackling + add AEC/NoiseSuppressor + dedup room participants
Mic mute: the send loop now zeros the capture buffer when muted instead
of relying on write_audio() to skip writes. Previously stale ring data
and AGC amplification of near-silence caused crackling artifacts.

AEC: attach Android's hardware AcousticEchoCanceler to the AudioRecord
session. Also attach NoiseSuppressor when available. Both are released
on capture stop.

Room UI: deduplicate participants by fingerprint so ghost entries from
stale relay state don't show duplicate names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 06:06:35 +00:00
Claude
aebf9156c0 fix: dedup participants in UI, wait for QUIC close ack before exiting
UI: deduplicate room participants by fingerprint so ghost entries from
stale relay state don't show duplicates.

Engine: after select! ends, call close_now() + connection.closed() with
500ms timeout to wait for the relay to acknowledge the CONNECTION_CLOSE.
Previously the close frame was queued but the runtime died before quinn
could retransmit if the first packet was lost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 05:40:06 +00:00
Claude
9bbaec6b35 fix: use shutdown_timeout so QUIC CONNECTION_CLOSE actually gets sent
shutdown_background() killed the tokio runtime before quinn could send the
CONNECTION_CLOSE frame on the wire, so the relay never knew the client left.
Now use shutdown_timeout(500ms) to give quinn time to flush the close frame,
matching the desktop client pattern (which uses 2s timeout).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 05:20:20 +00:00
Claude
a9c4260b4e fix: close QUIC connection on hangup so relay removes participant immediately
stop_call() now calls close_now() on the stored transport handle before
killing the tokio runtime. This sends a QUIC CONNECTION_CLOSE frame so
the relay's recv loop breaks immediately, triggering leave() + RoomUpdate
broadcast. Previously the runtime was killed first, so transport.close()
never ran and the relay kept stale participants until idle timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 04:58:24 +00:00
Claude
0835c36d0f feat: settings page with persistence, client alias in handshake, fix null fingerprints
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m34s
- Add SettingsScreen with identity (alias, key backup/restore), audio defaults,
  server management, network prefs, and default room
- SettingsRepository persists all settings via SharedPreferences
- Auto-generate random display names on first launch (e.g. "Swift Wolf")
- Thread alias through CallOffer → relay handshake → RoomUpdate broadcast
- Derive caller fingerprint from identity key in relay handshake (fixes null
  fingerprints when --auth-url is not set)
- Persist identity seed for stable fingerprints across reconnects
- Add alias field to SignalMessage::CallOffer (serde default for backward compat)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 03:56:33 +00:00
Claude
2d4b8eebd5 feat: RoomUpdate protocol — broadcast participant list on join/leave
- Add RoomUpdate signal message to wzp-proto with participant count + list
- Add RoomParticipant struct (fingerprint + optional alias)
- Store fingerprint/alias in relay Participant struct
- Broadcast RoomUpdate to all room members on join and leave
- Add signal recv task in Android engine to handle RoomUpdate
- Surface room_participant_count + room_participants in CallStats JSON
- Show "X in room" with participant names in Android in-call UI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:12:24 +00:00
Claude
a23d9f5e41 feat: foreground service, dB gain sliders, speaker routing, live network stats
- Wire CallService foreground service for background calls (microphone type)
- Add Voice Volume + Mic Gain sliders (-20 to +20 dB) applied in Kotlin
- Connect AudioRouteManager for real speaker toggle via AudioManager
- Feed quinn QUIC RTT into PathMonitor, display Loss/RTT/Jitter from live data
- Nuclear teardown between calls — recreate engine + audio pipeline each call
- Fix re-entrant teardown loop from CallService notification callback
- Park audio threads as daemons to avoid libcrypto TLS destructor crash on exit
- Remove duplicate wakelocks from Activity (service owns them now)
- Strip AEC + denoise from capture path, keep AGC only (incremental approach)
- Fix .so copy target: libwzp_android.so not libwzp.so

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 17:45:00 +00:00
Claude
b3e56ecbd8 feat: add AGC to capture + playout paths, add server UI, DNS resolve
- Wire AutoGainControl on both capture (mic → encode) and playout
  (decode → speaker) paths to normalize volume levels
- Add server list with add/remove custom server dialog
- Add IPv4/IPv6 preference toggle for DNS resolution
- Resolve DNS hostnames to IP in Kotlin before passing to Rust engine
- Revert to IP addresses for default servers (DNS still broken on QUIC)

AGC confirmed working — voice levels noticeably improved in testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:02:33 +00:00
Claude
bf91cf25bd feat: add real audio pipeline with Opus + RaptorQ FEC
- AudioPipeline: Kotlin AudioRecord/AudioTrack on JVM threads, PCM
  shuttled to Rust via lock-free ring buffers + JNI
- FEC: RaptorQ fountain codes on encode (5 frames/block, 20% repair
  ratio for GOOD profile), decoder feeds repair symbols for recovery
- Real audio level meter from mic RMS (replaces fake animation)
- Room name editable in UI (default: "android")
- Relay changed to pangolin.manko.yoga:4433
- Stats overlay shows FEC recovered count
- CallState now synced from polled stats (fixes "Connecting" stuck bug)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:33:59 +00:00
Claude
af85a49e86 fix: eliminate all native thread creation — run everything single-threaded
pthread_create crashes on Android due to static bionic __init_tcb stubs
in the Rust std prebuilt rlibs. This is unfixable without rebuilding std.

Solution: run the entire call (QUIC connect, handshake, media send/recv)
on a single tokio current_thread runtime. The JNI startCall() now blocks,
so Kotlin dispatches it to Dispatchers.IO (JVM thread, not pthread).

Audio pipeline temporarily simplified to silence frames — will restore
once threading is solved (either via Java Thread or rebuilding std).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:52:28 +00:00
Claude
bae03365da fix: restore getauxval_fix.c + current_thread tokio — both needed
The getauxval override (dlsym wrapper) fixes SIGSEGV in
init_have_lse_atomics at library load time. The current_thread
tokio runtime avoids SEGV_ACCERR in pthread_create/__init_tcb.
Both fixes are required together.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:37:57 +00:00
Claude
9d9ce4706d fix: use current_thread tokio runtime — avoid pthread_create SEGV on Android
Multi-thread tokio runtime crashes with SEGV_ACCERR in __init_tcb
during pthread_create on Android (static bionic stubs from CRT).
Switch to current_thread runtime which runs network I/O on the
calling thread without spawning additional OS threads.

Also: clean up build.rs — use only libc++_shared.so (dynamic),
remove getauxval_fix.c hack, remove static c++/c++abi linking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:27:46 +00:00
Claude
9098e28a1f fix: SIGSEGV in getauxval — override broken CRT stub with dlsym wrapper
compiler-rt's init_have_lse_atomics calls getauxval(AT_HWCAP) at
library load time. The static getauxval from the CRT reads from
__libc_auxv which is NULL in shared libraries → SIGSEGV at 0x0.

Fix: compile getauxval_fix.c that provides a getauxval() which uses
dlsym(RTLD_DEFAULT) to find the real bionic getauxval at runtime.
Also switch to libc++_shared.so (bundled in APK) to avoid pulling
in static libc stubs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 08:39:57 +00:00
Claude
a8dd0c2f57 fix: also link libc++abi for RTTI — resolve missing __class_type_info vtable
- Compile all 62 Oboe source files (was headers-only, missing symbols)
- Link libc++_static + libc++abi with NDK sysroot search path
- Bump linker target from android21 to android26 (fixes pthread_atfork)
- Link liblog + libOpenSLES for Oboe runtime deps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 05:48:49 +00:00
Claude
778f4dd428 fix: link libc++ statically — crash on launch due to missing libc++_shared.so
- Set cpp_link_stdlib(None) to suppress cc crate's automatic linking
- Explicitly link both c++_static and c++abi with NDK sysroot search path
- Fixes RTTI vtable symbol (_ZTVN10__cxxabiv117__class_type_infoE) error
- Verified: only liblog.so remains as dynamic dependency

Closes #001

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 05:07:25 +00:00
Siavash Sameni
622fdee51f fix: also link libc++abi for RTTI — resolve missing __class_type_info vtable
Previous fix linked c++_static but not c++abi. Android NDK splits the
static C++ runtime into two archives: libc++_static.a (STL) and
libc++abi.a (RTTI/exceptions). Without c++abi, dlopen fails on
_ZTVN10__cxxabiv117__class_type_infoE.

Now using cpp_link_stdlib(None) to suppress cc crate auto-linking, then
explicitly linking both c++_static and c++abi via cargo:rustc-link-lib.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:00:14 +04:00
Siavash Sameni
e751af7e38 fix: link libc++ statically — crash on launch due to missing libc++_shared.so
The app crashed immediately when loading libwzp_android.so because the
cc crate's default dynamic linking produced a runtime dependency on
libc++_shared.so, which was never packaged into the APK.

Adding .cpp_link_stdlib(Some("c++_static")) to build.rs bakes the C++
runtime into libwzp_android.so directly, eliminating the missing .so.

See issues/001-libc++-shared-crash.md for full diagnosis and logcat trace.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 08:52:55 +04:00
Claude
8d5f6fe044 feat: wire QUIC transport, JNI bridge, connect UI + add docs
- Replace raw FFI with proper `jni` crate for string marshalling
- Wire QUIC transport in engine: connect to relay, crypto handshake
  (CallOffer/CallAnswer, X25519+Ed25519), send/recv MediaPackets
- Feed received packets into jitter buffer (was previously ignored)
- Add connect screen UI with CALL button (idle state) and in-call
  controls (mute, speaker, hang up, live stats)
- Hardcode relay 172.16.81.125:4433, room "android"
- Add comprehensive docs in docs/android/:
  architecture.md (8 mermaid diagrams), build-guide.md,
  debugging.md, maintenance.md, roadmap.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 04:43:49 +00:00
Claude
780309fede fix: crash on launch — don't auto-start call, handle null JNI strings, remove stdout tracing
- CallActivity no longer auto-starts a call on launch
- CallViewModel lazily inits engine only when startCall() is called
- nativeGetStats nullable return handled safely in Kotlin
- Removed tracing_subscriber::fmt() which panics on Android (no stdout)
- All JNI calls wrapped in try/catch on Kotlin side

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 02:04:23 +00:00
Claude
73ebcdd869 build: Android APK builds working — debug (8.9MB) and release (2.0MB)
- Fix C++ std::std:: double namespace in oboe_bridge.cpp
- Auto-fetch Oboe headers from GitHub in build.rs
- Configure cargo cross-compilation (.cargo/config.toml) with NDK linkers
- Fix Gradle settings (dependencyResolutionManagement), signing configs,
  Compose LinearProgressIndicator API, and Android manifest theme
- Add Gradle wrapper, .gitignore for build artifacts
- arm64-v8a only (raptorq crate incompatible with armv7 32-bit)
- Release APK: 2.0MB signed with wzp-release key
- Debug APK: 8.9MB signed with wzp-debug key

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 19:37:08 +00:00
Claude
e7b1c3372a feat: Android VoIP client — Phase 2 (JNI bridge, Compose UI, AEC pipeline wiring)
- JNI bridge with 8 extern functions (init, startCall, stopCall, setMute,
  setSpeaker, getStats, forceProfile, destroy) with panic catching
- Kotlin engine layer: WzpEngine JNI wrapper, WzpCallback interface,
  CallStats data class with JSON deserialization
- Jetpack Compose UI: InCallScreen with quality indicator (green/yellow/red),
  mute/speaker/hangup buttons, stats overlay, duration timer
- CallActivity with RECORD_AUDIO permission handling, Material3 theme
- CallService foreground service with WakeLock, WiFi lock, notification
- AudioRouteManager for speaker/earpiece/Bluetooth SCO switching
- AEC wired into CallEncoder pipeline: AEC → AGC → denoise → silence → encode
- AEC farend reference fed from decode path to encode path in pipeline
- Engine exposes set_aec_enabled/set_agc_enabled via AtomicBool flags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:16:38 +00:00
Claude
26e9c55f1f feat: Android VoIP client — Phase 1 (audio quality, network adaptation, crate skeleton)
- New wzp-android crate with Oboe C++ backend, lock-free SPSC ring buffers,
  engine orchestrator, codec pipeline, and Android Gradle project structure
- AEC (NLMS adaptive filter), AGC (two-stage with fast attack/slow release),
  windowed-sinc FIR resampler replacing linear interpolation (wzp-codec)
- Opus encoder tuning: complexity 7 default, set_expected_loss support
- Mobile jitter buffer: asymmetric EMA (fast up/slow down), handoff spike
  detection with 2s cooldown, configurable safety margin
- Network-aware quality control: cellular-specific thresholds, faster
  downgrade on cellular, proactive tier drop on WiFi→cellular handoff,
  FEC ratio boost during network transitions
- Handoff detection in PathMonitor via RTT jitter spike analysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:07:55 +00:00