105 Commits

Author SHA1 Message Date
Siavash Sameni
c95255d31b fix(build): build-and-notify.sh — parameterize branch, fail loud on pull errors
Two bugs caused the post-Phase-3c APK build to ship from the wrong source:

1. The remote script hardcoded `feat/android-voip-client` as the branch
   to pull — not the current working branch. It was never updated when
   we moved to android-rewrite and then opus-DRED.

2. `git reset --hard origin/feat/android-voip-client 2>/dev/null || true`
   silently swallowed the reset failure when that branch didn't exist on
   the remote's origin, leaving the tree on whatever branch was there
   from a previous session. In our case that was feat/desktop-audio-rewrite
   at d0c1731 — the android-rewrite baseline, missing every Phase 0-4
   commit. The ntfy notification even included the stale commit hash
   d0c1731 but nobody noticed because the hash wasn't being cross-checked
   against the branch the script claimed to be building.

Fix:

Local side (scripts/build-and-notify.sh):
  - Auto-detect the current local branch via `git branch --show-current`
  - Accept `--branch NAME` override for explicit control
  - Pass the branch as a third positional arg to the remote build script
  - Abort early if we can't determine a branch (detached HEAD)
  - Updated usage docs to reflect the "build whatever I'm working on"
    default

Remote side (embedded heredoc):
  - Read BRANCH from $3 and abort if empty
  - `git fetch origin "$BRANCH"` — no piping to /dev/null, errors surface
  - `git reset --hard "origin/$BRANCH"` — no `|| true`, failures abort
  - Print the resolved commit hash + subject line immediately after reset
    so logs cross-reference the source clearly
  - Started/done ntfy notifications now include the branch name alongside
    the commit hash: "WZP Android [opus-DRED @ 953ab71] done! APK: ..."

Result: next build will (a) actually fetch the requested branch from the
remote's gitea origin, (b) fail loudly if the branch doesn't exist or
the reset fails, and (c) surface the branch in the ntfy notifications so
future "wait, which build is this?" confusion is impossible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:27:18 +04:00
Siavash Sameni
99c0173590 feat(telemetry): Phase 4 — LossRecoveryUpdate protocol + relay metrics + DebugReporter
Phase 4 lays the telemetry foundation for distinguishing DRED recoveries
from classical PLC in production: a new SignalMessage variant, two new
per-session Prometheus counters on the relay side, and a highlighted
loss-recovery section in the Android DebugReporter.

The periodic emitter (client → relay) and Grafana panel are deferred to
Phase 4b — this commit ships the protocol surface, the relay sink, and
the immediate user-visible debug output. Once 4b lands the full path
(emitter → relay → Prometheus → Grafana), the metrics here will
automatically start receiving data.

Scope decision — why not extend QualityReport instead:
The existing wire-format QualityReport is a fixed 4-byte media packet
trailer. Adding counter fields to it would shift the binary layout and
break backward compatibility (old receivers would parse the last 4
bytes of the extended trailer as QR, corrupting audio). Using a
new SignalMessage variant on the reliable QUIC signal stream sidesteps
the wire-format problem entirely — serde JSON enums tolerate unknown
variants gracefully on old receivers, and the signal channel is the
right layer for periodic telemetry aggregates.

Changes:

  wzp-proto/src/packet.rs:
    - New SignalMessage::LossRecoveryUpdate variant carrying:
        * dred_reconstructions: u64 (monotonic since call start)
        * classical_plc_invocations: u64 (monotonic)
        * frames_decoded: u64 (for rate calculation)
    - All three fields tagged #[serde(default)] for forward compat.

  wzp-client/src/featherchat.rs:
    - Added a match arm so signal_to_call_type() handles the new
      variant (treat as Offer for featherChat bridging purposes).

  wzp-relay/src/metrics.rs:
    - Two new IntCounterVec metrics on the relay, labeled by session_id:
        * wzp_relay_session_dred_reconstructions_total
        * wzp_relay_session_classical_plc_total
    - New method update_session_loss_recovery(session_id, dred, plc)
      applies monotonic deltas: if the incoming totals exceed the
      current counter, the difference is inc_by'd. If the incoming
      totals are LOWER (client restart or counter reset), the
      Prometheus counter holds steady until the client catches up.
      This matches the existing update_session_buffer delta pattern.
    - remove_session_metrics() now cleans up the two new labels.
    - New test session_loss_recovery_monotonic_delta exercises:
        * initial population (10 DRED, 2 PLC)
        * forward advance (25, 5 → delta +15, +3)
        * lower values ignored (client reset → counters unchanged)
        * client catches up (30, 8 → advances to new max)
    - Existing session_metrics_cleanup test extended to cover the
      new counters.

  android/app/src/main/java/com/wzp/debug/DebugReporter.kt:
    - Phase 4 users — and incident responders — need to quickly see
      whether DRED is actually firing during a call. The stats JSON
      already carries the counters (after Phase 3c), but they were
      buried in the trailing JSON dump. Added a dedicated
      "=== Loss Recovery ===" section to the meta preamble that
      extracts dred_reconstructions, classical_plc_invocations,
      frames_decoded, and fec_recovered from the JSON and displays
      them plainly, plus computed percentages when frames_decoded > 0.
    - New extractLongField helper: tiny hand-rolled JSON integer
      extractor. We don't want to pull in a full JSON parser for this
      single use case and CallStats has a flat, well-known schema.

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-proto --lib: 63 passing
- cargo test -p wzp-codec --lib: 68 passing
- cargo test -p wzp-client --lib: 35 passing (+1 ignored probe)
- cargo test -p wzp-relay --lib: 68 passing (+1 new Phase 4 test)
- cargo check -p wzp-android --lib: zero errors
- Android APK build verified earlier today (unridden-alfonso.apk
  via the remote Docker builder) — Phase 0–3c confirmed to compile
  end-to-end on the NDK target.

Phase 4b remaining (not blocking this commit):
- Periodic LossRecoveryUpdate emitter in wzp-client/src/call.rs and
  wzp-android/src/engine.rs (every ~5 s)
- Relay-side handler in main.rs that matches the new variant and
  calls metrics.update_session_loss_recovery
- Grafana "Loss recovery breakdown" panel in docs/grafana-dashboard.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:21:04 +04:00
Siavash Sameni
953ab71392 feat(codec): Phase 3c — Android engine.rs DRED reconstruction on packet loss
Phase 3c mirrors Phase 3b on the Android receive path. With Phase 0-3b
landed on desktop + Android encoder, this commit completes codec-layer
loss recovery on the Android decoder side.

Architectural difference vs desktop: engine.rs has NO jitter buffer.
The recv task reads packets directly from the transport via
recv_media().await and writes decoded audio straight into the playout
ring. There is no PlayoutResult::Missing equivalent. Gap detection
therefore has to be done via sequence-number tracking — when a packet
arrives with seq > expected_seq, the frames in between are missing and
we attempt to reconstruct them via DRED before decoding the newly-
arrived packet.

Implementation:

  Imports & types:
    - Added wzp_codec::AdaptiveDecoder, wzp_codec::dred_ffi::{
      DredDecoderHandle, DredState} imports.
    - Changed the `decoder` local from Box<dyn AudioDecoder> (via
      wzp_codec::create_decoder) to concrete AdaptiveDecoder::new(profile).
      Same reasoning as Phase 3b: reconstruct_from_dred is an inherent
      method, not a trait method, so we need the concrete type.

  Recv task state (all task-local, no new struct fields):
    - dred_decoder: DredDecoderHandle
    - dred_parse_scratch: DredState (reused, overwritten per parse)
    - last_good_dred: DredState (cached most-recent valid state)
    - last_good_dred_seq: Option<u16>
    - expected_seq: Option<u16> (for gap detection)
    - dred_reconstructions: u64 (telemetry)
    - classical_plc_invocations: u64 (telemetry)

  Recv loop body (Opus source packets only):
    1. Parse DRED from the new packet first so last_good_dred reflects
       the freshest state available for gap recovery.
    2. Detect a gap: gap = pkt.seq.wrapping_sub(expected_seq). Cap at
       MAX_GAP_FRAMES = 16 (320 ms) to avoid huge wraparound scenarios.
    3. For each missing seq in the gap:
         offset = (last_good_dred_seq - missing_seq) * frame_samples
         if 0 < offset <= last_good_dred.samples_available():
             reconstruct_from_dred + write to playout ring
             bump dred_reconstructions
         else:
             decoder.decode_lost (classical PLC) + write + bump plc counter
    4. Decode the current packet normally and write to playout ring
       (unchanged from Phase 2).
    5. Update expected_seq = pkt.seq.wrapping_add(1).

  Profile-switch handling: when the incoming codec changes (triggering
  decoder.set_profile), reset last_good_dred_seq and expected_seq to
  None. The cached DRED state is tied to the old profile's frame rate
  and would produce wrong offsets after the switch; starting fresh is
  correct.

  Decode-error fallback: the existing `Err(e) => decode_lost` branch
  now also increments classical_plc_invocations so the counter
  accurately reflects all PLC invocations (gap-detected AND decode-
  error-triggered).

Telemetry (CallStats additions):
  - stats.dred_reconstructions: u64
  - stats.classical_plc_invocations: u64
  Both updated on every packet arrival in the existing stats.lock()
  block alongside frames_decoded/fec_recovered, so the Android UI and
  JNI bridge already have these values without any further plumbing.
  The periodic recv stats log now includes both counters.

Ordering note: DRED gap reconstruction happens BEFORE decoding the new
packet's audio because the playout ring is FIFO. Gap samples must be
written before the new packet's samples so temporal order is preserved.
Out-of-order late arrivals (seq < expected_seq) are naturally dropped
as stale by the gap detection (gap would be a large wraparound value
exceeding MAX_GAP_FRAMES).

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 68 passing (unchanged from Phase 3b)
- cargo test -p wzp-client --lib: 35 passing (unchanged from Phase 3b)
- cargo check -p wzp-android --lib: zero errors
- cargo test -p wzp-android cannot run on macOS host (pre-existing
  -llog linker dep, unrelated). Real end-to-end verification happens
  via the Android APK build on the remote Docker builder
  (scripts/build-and-notify.sh).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:06:45 +04:00
Siavash Sameni
662b14a2af feat(codec): Phase 3b — CallDecoder DRED reconstruction on packet loss
Phase 3b of the DRED integration — wires the Phase 3a FFI primitives
into the desktop receive path. When the jitter buffer reports a missing
Opus frame, CallDecoder now attempts to reconstruct the audio from the
most recently parsed DRED side-channel state before falling through to
classical PLC.

Architectural refinement vs the PRD's literal wording: the PRD said
"jitter buffer takes a Box<dyn DredReconstructor>". After checking deps,
wzp-transport depends only on wzp-proto (not wzp-codec). Putting DRED
state in the jitter buffer would require a new cross-crate dep and
couple the codec-agnostic buffer to libopus. Instead, this commit keeps
the DRED state ring and reconstruction dispatch inside CallDecoder (one
layer up from the jitter buffer), intercepting the existing
PlayoutResult::Missing signal. Same lookahead/backfill semantics,
cleaner layering, zero change to wzp-transport.

Changes:

  CallDecoder field type: Box<dyn AudioDecoder> → AdaptiveDecoder.
  Required because Phase 3b calls the inherent reconstruct_from_dred
  method, which cannot live on the AudioDecoder trait without dragging
  libopus DredState through wzp-proto. In practice AdaptiveDecoder was
  the only AudioDecoder implementor anyway — the trait abstraction was
  buying nothing. Method call sites unchanged because AdaptiveDecoder
  also implements AudioDecoder.

  New CallDecoder fields:
    - dred_decoder: DredDecoderHandle
    - dred_parse_scratch: DredState  (scratch for parse_into)
    - last_good_dred: DredState      (cached most-recent valid state)
    - last_good_dred_seq: Option<u16>
    - dred_reconstructions: u64      (Phase 4 telemetry)
    - classical_plc_invocations: u64 (Phase 4 telemetry)

  CallDecoder::ingest — on Opus non-repair packets, parse DRED into the
  scratch state. On success (samples_available > 0), std::mem::swap the
  scratch into last_good_dred and record the seq. This is O(1) per
  packet, zero allocation after construction (the two DredState buffers
  are allocated once in new() and reused forever).

  CallDecoder::decode_next — on PlayoutResult::Missing(seq) for Opus
  profiles: if last_good_dred_seq > seq and the seq delta × frame_samples
  fits within samples_available, call audio_dec.reconstruct_from_dred
  and bump dred_reconstructions. Otherwise fall through to classical
  PLC and bump classical_plc_invocations. The Codec2 path always falls
  through to classical PLC since DRED is libopus-only and
  AdaptiveDecoder::reconstruct_from_dred rejects Codec2 tiers
  explicitly.

  OpusDecoder and AdaptiveDecoder: new inherent reconstruct_from_dred
  method that delegates to the underlying DecoderHandle. Needed to
  bridge CallDecoder's wzp-client code to the Phase 3a FFI wrappers
  without touching the AudioDecoder trait.

CRITICAL FINDING — raised DRED loss floor from 5% to 15%:

Phase 3b testing discovered that libopus 1.5's DRED emission window
scales aggressively with OPUS_SET_PACKET_LOSS_PERC. Empirical data
(see probe_dred_samples_available_by_loss_floor, an #[ignore]'d
diagnostic test in call.rs):

  loss_pct   samples_available   effective_ms
    5%        720                  15 ms  (useless!)
   10%        2640                 55 ms
   15%        4560                 95 ms
   20%        6480                135 ms
   25%+       8400 (capped)       175 ms  (~87% of 200 ms configured)

The Phase 1 default of 5% produced only a 15 ms reconstruction window
— too small to even cover a single 20 ms Opus frame. DRED was
effectively disabled even though it was emitting bytes. Raised the
floor to 15% (95 ms window) as the minimum that actually provides
single-frame loss recovery. This updates Phase 1's DRED_LOSS_FLOOR_PCT
constant in opus_enc.rs and the accompanying module docstring.

Trade-off: 15% assumed loss slightly increases encoder bitrate overhead
on clean networks. Measured via the existing phase1 bitrate probe:

  Before (5% floor):  3649 bytes/sec at Opus 24k + 300 Hz sine
  After  (15% floor): 3568 bytes/sec at Opus 24k + 300 Hz sine

The delta is within noise — 15% isn't meaningfully more expensive than
5% on this signal, which suggests the DRED emission size is signal-
dependent rather than loss-dependent for small values. Net result: we
get a 6x larger reconstruction window for essentially free.

Tests (+3 DRED recovery, +1 #[ignore]'d probe):
- opus_single_packet_loss_is_recovered_via_dred — full encode → ingest
  → decode_next loop with one packet dropped mid-stream. Asserts
  dred_reconstructions ≥ 1 and observes the exact counter deltas.
- opus_lossless_ingest_never_triggers_dred_or_plc — baseline behavior,
  lossless stream never takes the Missing branch.
- codec2_loss_falls_through_to_classical_plc — Codec2 never
  reconstructs via DRED even if state were populated (which it won't
  be — Codec2 packets don't carry DRED bytes).
- probe_dred_samples_available_by_loss_floor — #[ignore]'d diagnostic
  that sweeps loss_pct values and prints the resulting DRED window
  sizes. Kept for future tuning work.

New CallDecoder introspection accessors (public but undocumented in
the PRD): last_good_dred_seq() and last_good_dred_samples_available()
for test diagnostics and future telemetry surfaces in Phase 4.

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 68 passing (Phase 3a baseline held)
- cargo test -p wzp-client --lib: 35 passing (+3 Phase 3b tests,
  +1 ignored diagnostic, no regressions)

Next up: Phase 3c mirrors this on the Android engine.rs receive path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:55:25 +04:00
Siavash Sameni
b830f29e66 feat(codec): Phase 3a — DRED FFI primitives (DredDecoderHandle + DredState)
Phase 3a of the DRED integration — the foundation for codec-layer loss
recovery. Adds three new safe wrappers to crates/wzp-codec/src/dred_ffi.rs
over the raw opusic-sys FFI, plus the reconstruction method on the existing
DecoderHandle. No call-site integration yet — that lands in Phase 3b (desktop)
and Phase 3c (Android).

New types:
- `DredDecoderHandle`: owns *mut OpusDREDDecoder from opus_dred_decoder_create.
  Used for parsing DRED side-channel data out of arriving Opus packets.
  This is a SEPARATE libopus object from OpusDecoder — it has its own
  internal state. Freed via opus_dred_decoder_destroy on Drop.
- `DredState`: owns *mut OpusDRED from opus_dred_alloc (a fixed ~10.6 KB
  buffer per libopus 1.5). Holds parsed DRED data between the parse and
  reconstruct steps. Reusable — parse_into overwrites contents. Tracks
  samples_available as a cached u32 so callers don't thread the value
  separately. Freed via opus_dred_free on Drop.

New methods:
- `DredDecoderHandle::parse_into(&mut self, state: &mut DredState, packet)`
  wraps opus_dred_parse with max_dred_samples=48000 (1s max), sampling_rate
  =48000, defer_processing=0. Returns the positive sample offset of the
  first decodable DRED sample, 0 if no DRED is present, or an error.
  Populates state.samples_available so subsequent reconstruct calls know
  the valid offset range.
- `DecoderHandle::reconstruct_from_dred(&mut self, state, offset_samples,
  output)` wraps opus_decoder_dred_decode. Reconstructs audio at a specific
  sample position (positive, measured backward from the DRED anchor packet)
  into a caller-provided output buffer. Validates that 0 < offset_samples
  <= state.samples_available() before calling the FFI to catch range bugs.

Tests (+7, wzp-codec total: 68 passing):
- dred_decoder_handle_creates_and_drops
- dred_state_creates_and_drops
- dred_state_reset_zeroes_counter
- dred_parse_and_reconstruct_roundtrip — end-to-end validation. Encodes
  60 frames of a 300 Hz sine wave through a DRED-enabled Opus 24k encoder,
  parses DRED state out of each arriving packet, asserts that at least one
  packet carries non-zero samples_available (DRED warm-up completes within
  the first second), then reconstructs 20 ms of audio from inside the
  window and asserts non-zero total energy. This is the hard signal that
  the full libopus 1.5 DRED FFI chain is correctly wired on our side.
- reconstruct_with_out_of_range_offset_errors — offset > samples_available
  is rejected at the Rust layer before the FFI call.
- reconstruct_with_zero_offset_errors — offset <= 0 rejected.
- dred_parse_empty_packet_returns_zero — graceful handling of empty input.

Architectural note (divergence from PRD's literal wording):
The PRD said "jitter buffer takes a Box<dyn DredReconstructor>". After
checking Cargo.toml for wzp-transport, it does NOT depend on wzp-codec —
only wzp-proto. Adding a DRED state ring inside the jitter buffer would
require a new cross-crate dependency and couple the codec-agnostic jitter
buffer to libopus internals. Instead, Phase 3b will put the DRED state
ring and reconstruction dispatch in CallDecoder (one layer up from the
jitter buffer), intercepting the existing PlayoutResult::Missing signal
and attempting reconstruction before falling through to classical PLC.
The jitter buffer itself stays unchanged. Same lookahead/backfill
semantics, cleaner layering. PRD's intent preserved, implementation
refined.

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 68 passing (61 Phase 2 baseline + 7 new)
- The roundtrip test is the acceptance gate — it proves that
  opus_dred_decoder_create, opus_dred_alloc, opus_dred_parse, and
  opus_decoder_dred_decode all work correctly through our wrappers on
  real libopus 1.5.2 output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:51:15 +04:00
Siavash Sameni
d5c298d0b5 feat(codec): Phase 2 — remove RaptorQ from Opus tiers, Codec2 unchanged
Phase 2 of the DRED integration (docs/PRD-dred-integration.md). With
Phase 1 having enabled DRED on every Opus profile, the app-level RaptorQ
layer is now redundant overhead on those tiers: +20% bitrate, +40–100 ms
receive-side latency (block wait), +CPU for stats we never used. This
phase removes RaptorQ from the Opus encode and decode paths on both the
desktop (wzp-client/call.rs) and Android (wzp-android/engine.rs) sides.
Codec2 tiers keep RaptorQ with their current ratios unchanged — DRED is
libopus-only and Codec2 has no neural equivalent.

Encoder changes (the real bandwidth / CPU win):
- CallEncoder::encode_frame and engine.rs encode loop now gate the
  RaptorQ path on !codec.is_opus():
    - Opus source packets emit fec_block=0, fec_symbol=0,
      fec_ratio_encoded=0 in the MediaHeader
    - fec_enc.add_source_symbol is skipped on Opus
    - generate_repair + repair packet emission is skipped on Opus
    - block_id and frame_in_block counters stay frozen at 0 for Opus
- Codec2 path is byte-for-byte identical to pre-Phase-2 behavior.

Decoder changes (mostly cleanup, since both live decoder paths were
already reading audio directly from source packets and only using the
RaptorQ decoder output for stats):
- CallDecoder::ingest skips fec_dec.add_symbol on Opus packets. Source
  packets still flow to the jitter buffer; Opus repair packets from old
  senders are dropped cleanly (repair packets never hit the jitter
  buffer either).
- engine.rs recv loop skips fec_dec.add_symbol, fec_dec.try_decode, and
  fec_dec.expire_before on Opus packets. The `fec_recovered` stat
  counter becomes Codec2-only (a separate DRED reconstruction counter
  lands in Phase 4).

Wire-format backward compat verified at pre-flight:
- Old receiver + new sender: engine.rs pipeline.rs path gates on
  non-zero fec_block/fec_symbol which now never fire for Opus, so the
  RaptorQ decoder simply isn't fed. Audio flows normally. Desktop
  CallDecoder's old path accumulated packets into the stale-eviction
  HashMap, which cleans up after 2s — harmless.
- New receiver + old sender: new receiver skips RaptorQ on Opus so
  old-sender repair packets are ignored entirely (no crash, no double-
  decode). Loses the (previously vestigial) RaptorQ recovery benefit,
  which was never actually active in the audio path. Source packets
  still decode normally.
- No wire format version bump required. MediaHeader is unchanged; we
  just zero the FEC fields on Opus packets.

Test changes:
- Removed `encoder_generates_repair_on_full_block` — asserted the old
  (pre-Phase-2) RaptorQ-on-Opus behavior and is now incorrect. Replaced
  with two symmetric tests:
    - `opus_source_packets_have_zero_fec_header_fields` — verifies
      Phase 2 invariants on Opus packets
    - `opus_encoder_never_emits_repair_packets` — runs 20 frames of
      non-silent sine wave through a GOOD-profile encoder, asserts
      exactly 20 output packets, zero repair
    - `codec2_encoder_generates_repair_on_full_block` — same shape as
      the old test but on CATASTROPHIC profile (Codec2 1200, 8
      frames/block, ratio 1.0) to verify Codec2 path still emits
      repairs as before

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 61 passing (Phase 1 baseline held)
- cargo test -p wzp-client --lib: 32 passing (+3 new Phase 2 tests,
  -1 old test removed)
- cargo check -p wzp-android --lib: zero errors (host link of
  wzp-android tests fails on -llog per pre-existing Android-only
  build.rs, unrelated to this work; integration build via
  build-and-notify.sh will validate Android end-to-end)
- Pre-existing broken integration test in
  crates/wzp-client/tests/handshake_integration.rs (SignalMessage
  schema drift) is NOT caused by this commit — baseline had the same
  3 compile errors before Phase 2. Flagged as a separate cleanup task.

Expected observable effects on a real call:
- Opus 24k outgoing bitrate drops from ~28.8 kbps (ratio 0.2 RaptorQ)
  to ~25 kbps (base 24 kbps + DRED ~1–10 kbps signal-dependent)
- Opus receive-side latency drops ~40 ms on clean network (no more
  block wait — jitter buffer emits as soon as a source packet arrives)
- Codec2 calls show no latency or bitrate change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:42:33 +04:00
Siavash Sameni
4090206909 feat(codec): Phase 1 — enable DRED on all Opus profiles, disable inband FEC
Phase 1 of the DRED integration (docs/PRD-dred-integration.md). The Opus
encoder now emits DRED (Deep REDundancy) bytes in every packet, carrying
a neural-coded history of recent audio that the decoder can use to
reconstruct loss bursts up to the configured window. Opus inband FEC
(LBRR) is disabled because DRED does the same job better and running both
wastes bitrate on overlapping protection.

Tiered DRED duration policy per PRD:
  Studio  (Opus 32k/48k/64k): 10 frames = 100 ms
  Normal  (Opus 16k/24k):     20 frames = 200 ms
  Degraded (Opus 6k):         50 frames = 500 ms

Each profile switch (via adaptive quality) updates the DRED duration to
match the new tier. A 5% packet_loss floor is applied whenever DRED is
active, because libopus 1.5 gates DRED emission on non-zero packet_loss.
Real loss measurements from the quality adapter override upward.

Escape hatch: AUDIO_USE_LEGACY_FEC=1 reverts the encoder to Phase 0
behavior (inband FEC Mode1, DRED off, no loss floor). Read once at
OpusEncoder::new; call-scoped, not re-read mid-call. Trait-level
set_inband_fec becomes a no-op in DRED mode to preserve the invariant
even if external callers forget.

Observations from the bitrate probe test (dred_mode_roundtrip_voice_pattern):
  DRED mode:   3649 bytes/sec (~29.2 kbps) on Opus 24k + 300 Hz sine
  Legacy mode: 2383 bytes/sec (~19.1 kbps)
  Delta:       +10.1 kbps

The delta is considerably larger than the "+1 kbps flat" figure I carried
into the PRD from hazy memory of published DRED benchmarks. Likely because
the input (300 Hz sine) is very compressible so the base Opus rate in
legacy mode is well below the 24 kbps target, making the delta look
disproportionate. Signal-dependent — real speech would probably show a
different ratio. If production telemetry shows the overhead is excessive,
we can cut DRED duration on the normal tier from 200 ms to 100 ms as a
first tuning lever. Not blocking Phase 1 since the test still passes
within the reasonable 2000–8000 bytes/sec bounds.

Test changes (+8 tests, total wzp-codec: 61 passing):
- dred_duration_for_studio_tiers_is_100ms  (per-profile policy)
- dred_duration_for_normal_tiers_is_200ms
- dred_duration_for_degraded_tier_is_500ms
- dred_duration_for_codec2_is_zero
- default_mode_is_dred_not_legacy  (sanity check on fresh construction)
- dred_mode_roundtrip_voice_pattern  (observes DRED bitrate, asserts bounds)
- profile_switch_refreshes_dred_duration  (verifies set_profile updates DRED)
- set_inband_fec_noop_in_dred_mode  (trait-level inband FEC no-op)

Verification:
- cargo check --workspace: zero errors, no new warnings
- cargo test -p wzp-codec: 61/61 passing (53 pre-Phase-1 baseline + 8 new)
- Empirical DRED bitrate observed via `rtk proxy cargo test
  dred_mode_roundtrip_voice_pattern -- --nocapture`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:26:34 +04:00
Siavash Sameni
086a74782f feat(codec): Phase 0 — swap audiopus → opusic-c + opusic-sys (libopus 1.5.2)
Phase 0 of the DRED integration (docs/PRD-dred-integration.md). No behavior
change: inband FEC stays ON, no DRED, same bitrate, same quality. This
commit unblocks Phase 1+ by getting us onto libopus 1.5.2 where DRED lives.

Rationale for going straight to a custom DecoderHandle: opusic-c::Decoder's
inner *mut OpusDecoder pointer is pub(crate), so we cannot reach it for the
Phase 3 DRED reconstruction path. Running two parallel decoders (one for
audio, one for DRED) would drift because the DRED decoder wouldn't see
normal decode calls. Single unified DecoderHandle over raw opusic-sys is
the only correct architecture, so we build it in Phase 0 rather than
rewriting opus_dec.rs twice.

Changes:
- Cargo.toml (workspace + wzp-codec): remove audiopus 0.3.0-rc.0, add
  opusic-c 1.5.5 (bundled + dred features), opusic-sys 0.6.0 (bundled),
  bytemuck 1. Pinned exactly for reproducible libopus 1.5.2.
- opus_enc.rs: rewritten against opusic_c::Encoder. Argument order for
  Encoder::new swapped (Channels first). set_inband_fec(bool) now maps
  to InbandFec::Mode1 (the libopus 1.5 equivalent of 1.3's LBRR). encode
  uses bytemuck::cast_slice<i16,u16> at the &[u16] boundary.
- dred_ffi.rs (new): DecoderHandle wrapping *mut OpusDecoder directly via
  opusic-sys. Owns the allocation, frees on Drop. Exposes decode,
  decode_lost, and a pub(crate) as_raw_ptr() for the future Phase 3 DRED
  reconstruction. Send+Sync justified via &mut self access discipline.
- opus_dec.rs: rewritten as a thin AudioDecoder impl over DecoderHandle.
  Behavior identical to pre-swap.

Verification (Phase 0 acceptance gates):
- cargo check --workspace: clean (30 pre-existing warnings in jni_bridge.rs
  unrelated to this work; zero in changed files).
- cargo test -p wzp-codec: 53 tests pass (50 pre-swap + 6 new: 3 in
  dred_ffi.rs for DecoderHandle lifecycle, 3 in opus_enc.rs for version
  check and roundtrip).
- linked_libopus_is_1_5 test asserts opusic_c::version() contains "1.5" —
  hard signal that the swap landed correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:15:55 +04:00
Siavash Sameni
09259cd6b8 docs: add PRD for DRED integration and Opus-tier FEC simplification
Plans the libopus 1.5.2 upgrade (audiopus → opusic-c/opusic-sys), DRED
enablement with tiered durations (100/200/500ms studio/normal/degraded),
removal of RaptorQ and Opus inband FEC from the Opus tiers, jitter buffer
lookahead/backfill refactor, and runtime escape hatch for rollout safety.
RaptorQ + current ratios preserved on Codec2 tiers (no DRED there).

Includes pre-flight verification findings: opusic-c Decoder inner pointer
is inaccessible (requires unified opusic-sys DecoderHandle), libopus 1.5
DRED API semantics clarified against xiph/opus opus.h, wire-format
backward compat verified on both live receive paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:04:11 +04:00
Siavash Sameni
75bc72a884 docs: add BRANCH-android-rewrite.md and update ARCH/ADMIN/USER_GUIDE
Documents the android-rewrite branch story end-to-end:
- Why the Kotlin+JNI stack was abandoned (stack overflow, libcrypto
  TLS race, __init_tcb TCB leak, ring runtime reuse crash)
- The Tauri 2.x Mobile pivot that reuses the desktop codebase verbatim
- Android-specific pieces: wzp-native standalone cdylib loaded via
  libloading, android_audio.rs JVM routing, Oboe audio config quirks
- Build pipeline via build-tauri-android.sh + wzp-android-builder image
- Known quirks (API 34/36 coexistence, NDK path absolutes, etc.)

Also appends shared-doc sections (identical on both branches):
- ARCHITECTURE.md: "Audio Backend Architecture (Platform Matrix)"
  covering CPAL / VPIO / WASAPI / Oboe backends, selection matrix,
  the wzp-native cdylib rationale, and the vendored audiopus_sys fix.
- ADMINISTRATION.md: "Build Pipelines" with Docker images
  (wzp-android-builder, wzp-windows-builder), per-pipeline usage
  (Android APK, Linux x86_64, Windows .exe), the Hetzner Cloud
  alternative, ntfy/rustypaste integration, and credential locations.
- USER_GUIDE.md: "Direct 1:1 Calling (Desktop + Android)" covering
  history + recent contacts + deregister UI, and "Windows AEC
  Variants" explaining the AEC vs noAEC builds and driver caveats.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:20:12 +04:00
Siavash Sameni
6aa52accef feat(android): Tauri 2.x mobile build infrastructure
Adds infrastructure for building the Tauri 2.x Android app (the pivot
away from the Kotlin+JNI approach whose stack overflow / libcrypto TLS
crash / thread lifecycle hell is documented in the incident report):

- scripts/Dockerfile.android-builder: extended to support both the
  legacy Kotlin+JNI pipeline (cargo-ndk + Gradle) and the new Tauri
  mobile pipeline (tauri-cli + Node/npm). Adds Node.js 20 LTS, API
  level 36 + build-tools 35.0.0, and additional apt packages.
- scripts/build-tauri-android.sh: fire-and-forget remote build via
  Docker on SepehrHomeserverdk, with ntfy.sh notifications and
  rustypaste upload of the resulting APK. Mirrors the pattern of
  build-tauri-android-docker.sh but targets the new Tauri pipeline.
- docs/incident-tauri-android-init-tcb.md: postmortem of the Kotlin+JNI
  crash cascade that drove the Tauri mobile rewrite decision. Covers
  the __init_tcb / pthread_create bionic private symbol leak, the
  staticlib + cdylib crate-type interaction, the Dispatchers.IO 512 KB
  thread stack overflow, and the tokio runtime / libcrypto TLS race.
- scripts/mint-tmux.sh, scripts/prep-linux-mint.sh: general dev
  infrastructure (tmux + Linux Mint workstation prep scripts).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:06:46 +04:00
Siavash Sameni
d0c17317ea fix: generate seed if empty on register (fresh install), add JNI debug logging
Some checks failed
Mirror to GitHub / mirror (push) Failing after 41s
Build Release Binaries / build-amd64 (push) Failing after 3m38s
2026-04-09 10:21:59 +04:00
Siavash Sameni
5799d18aee debug: add tracing to nativeSignalConnect entry
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
2026-04-09 10:17:13 +04:00
Siavash Sameni
46c9ee1be3 fix: single thread for entire signal lifecycle — runtime never dropped (libcrypto TLS fix)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m52s
2026-04-09 10:11:33 +04:00
Siavash Sameni
b53eae9192 fix: split start() into connect+register (inline) + run() (separate thread) — avoids thread::spawn closure stack overflow
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m26s
2026-04-09 10:02:07 +04:00
Siavash Sameni
a3f54566d4 fix: call nativeSignalConnect from 8MB Java Thread, not Dispatchers.IO
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m54s
2026-04-09 09:50:30 +04:00
Siavash Sameni
76e9fe5e43 fix: single thread+runtime for signal lifecycle — avoids ring/libcrypto TLS conflict on pthread_exit
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
2026-04-09 09:44:46 +04:00
Siavash Sameni
b0a89d4f39 docs: PRD for desktop direct calling backport + UI fixes
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m39s
2026-04-09 09:39:50 +04:00
Siavash Sameni
abc96e8887 refactor: separate SignalManager from WzpEngine for direct calling
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m40s
SignalManager (NEW):
- Dedicated Rust struct with its own QUIC connection to _signal
- Separate JNI handle (nativeSignalConnect/GetState/PlaceCall/etc)
- Kotlin wrapper polls state every 500ms via getState() JSON
- Lives independently of WzpEngine — survives across calls
- connect() blocks briefly on 8MB thread, then recv loop runs on dedicated thread

WzpEngine (CLEANED):
- Back to pure media-only role (audio, codec, FEC, jitter)
- Removed start_signaling/place_call/answer_call methods
- Removed signal_transport/signal_fingerprint from EngineState

CallViewModel:
- Two separate managers: signalManager (persistent) + engine (per-call)
- Two separate polling loops: signalPollJob + statsJob
- Auto-connect to media room when signal polling detects "setup" state
- hangupDirectCall() ends media but keeps signal alive

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:34:36 +04:00
Siavash Sameni
3a6ae61f8d fix: show real identity fingerprint (SHA-256 full format) on Android home screen
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 1m30s
2026-04-09 09:12:47 +04:00
Siavash Sameni
4c536d256b fix: install rustls crypto provider once in nativeInit, not per-thread (libcrypto TLS conflict)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 4m18s
2026-04-09 09:07:40 +04:00
Siavash Sameni
b0ec9ff4ab fix: signal mode UI + place_call via stored signal transport
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
- Don't set callState for signal-only states (prevents auto-join room)
- Store signal transport + fingerprint in EngineState after registration
- place_call/answer_call send directly via signal transport (not command channel)
- Spawn small threads for async signal sends (non-blocking)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:58:22 +04:00
Siavash Sameni
5855533a39 fix: start stats polling before blocking startSignaling call
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
2026-04-09 08:38:06 +04:00
Siavash Sameni
ed09c2e8cc fix: use block_on pattern for signaling (same as start_call) — no thread::spawn
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m50s
2026-04-09 08:33:08 +04:00
Siavash Sameni
f44306cc17 fix: move ALL signaling code into JNI-spawned 8MB thread — zero Rust on caller stack
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m51s
2026-04-09 08:19:48 +04:00
Siavash Sameni
0b821585ab fix: call nativeStartSignaling from Java Thread with 8MB stack, not Kotlin IO dispatcher
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m32s
2026-04-09 08:10:22 +04:00
Siavash Sameni
faec332a8c fix: remove panic::catch_unwind from nativeStartSignaling — stack overflow on Android
Some checks failed
Mirror to GitHub / mirror (push) Failing after 42s
Build Release Binaries / build-amd64 (push) Failing after 3m28s
2026-04-09 08:04:47 +04:00
Siavash Sameni
fe9ae276dc fix: move all crypto/network work to spawned 8MB thread — Android stack too small
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m25s
2026-04-09 07:16:54 +04:00
Siavash Sameni
4fbf6770c4 fix: Android signal thread stack overflow + add version marker to UI
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m47s
- Spawn signaling on dedicated thread with 4MB stack instead of using
  Android's IO dispatcher thread (insufficient stack for tokio + QUIC)
- Add "direct-call-v1" version marker to home screen subtitle

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 07:10:07 +04:00
Siavash Sameni
30a893a73f fix: remove duplicate TextAlign import causing Android build failure
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m34s
2026-04-09 06:54:45 +04:00
Siavash Sameni
d46f3b1deb fix: show more Gradle output in build log for debugging
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m55s
2026-04-09 06:48:14 +04:00
Siavash Sameni
0d3f0d4dcb feat: Android UI for direct 1:1 calling
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m51s
- Mode toggle: "Room" vs "Direct Call" tabs on pre-connection screen
- Direct Call mode: Register button → registers on relay signal channel
- After registration: shows fingerprint dial pad + incoming call panel
- Incoming call: green Accept / red Reject buttons with caller info
- Ringing state display while waiting for callee
- CallSetup auto-connects to media room
- CallStats extended: sas_code, incoming_call_id/fp/alias fields
- CallViewModel: registerForCalls(), placeDirectCall(), answerIncomingCall()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:18:07 +04:00
Siavash Sameni
c184d5e1f3 fix: build scripts use fetch+reset instead of pull to avoid ref lock errors
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m30s
git pull fails when refs are stale from concurrent builds. Switch to
git gc + git fetch + git reset --hard origin/branch for robustness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:07:10 +04:00
Siavash Sameni
5d8e743cbf feat: Android engine + Kotlin API for direct 1:1 calling
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m47s
Rust engine:
- start_signaling(): persistent _signal connection, presence registration
- Signal recv loop: handles DirectCallOffer, CallRinging, CallSetup, Hangup
- New CallState variants: Registered, Ringing, IncomingCall
- Stats expose incoming_call_id, incoming_caller_fp, incoming_caller_alias, sas_code
- New EngineCommands: PlaceCall, AnswerCall, RejectCall

JNI bridge:
- nativeStartSignaling(relay, seed, token, alias)
- nativePlaceCall(targetFp)
- nativeAnswerCall(callId, mode)

Kotlin API (WzpEngine.kt):
- startSignaling(relay, seed, token, alias)
- placeCall(targetFingerprint)
- answerCall(callId, mode) — 0=Reject, 1=AcceptTrusted, 2=AcceptGeneric

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:02:48 +04:00
Siavash Sameni
6694aebfd9 fix: resolve 0.0.0.0 to connectable address in CallSetup relay_addr
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m36s
When relay listens on 0.0.0.0, derive the actual IP from the client's
connection address for the CallSetup message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 05:56:19 +04:00
Siavash Sameni
d27e85ecf2 feat: SAS (Short Authentication String) for call identity verification
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m19s
Derive a 4-digit code from the shared DH secret via HKDF with label
"warzone-sas-code". Both peers compute the same code; a MITM relay
produces a different one. Users compare verbally during the call.

- CryptoSession::sas_code() -> Option<u32> on the trait
- ChaChaSession stores and returns the SAS
- HKDF derivation in WarzoneKeyExchange::derive_session()
- Tests: both peers match, MITM produces different code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 05:48:08 +04:00
Siavash Sameni
39ac181d63 feat: ACL + capacity limit on call rooms, unified fingerprint format
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m38s
- Call rooms (call-*) restricted to the two authorized participants only
- Room capacity enforced at 2 for call rooms
- Unauthorized clients get immediate connection close
- Unified fingerprint format: SHA-256(Ed25519 pub)[:16] as xxxx:xxxx:...
  Used consistently in signal registration, handshake, and ACL checks

Tested: Alice+Bob authorized, attacker rejected with "not authorized"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 05:43:03 +04:00
Siavash Sameni
3351cb6473 feat: direct 1:1 calling via relay signaling (Phase 1)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
New feature: call someone directly by fingerprint through the relay.

- Client connects with SNI "_signal" for persistent signaling
- RegisterPresence/RegisterPresenceAck for relay registration
- DirectCallOffer routed to target by fingerprint
- DirectCallAnswer with AcceptGeneric/AcceptTrusted/Reject modes
- Relay creates private room (call-{id}), sends CallSetup to both
- Both clients connect to private room for media (existing SFU path)
- Hangup forwarding + cleanup on disconnect
- Desktop CLI: --signal + --call <fingerprint> for testing
- CallRegistry tracks call state (Pending/Ringing/Active/Ended)
- SignalHub manages persistent signaling connections

Tested: Alice calls Bob by fingerprint, relay routes offer, Bob
auto-accepts, both join private room, media flows bidirectionally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 05:35:16 +04:00
Siavash Sameni
54a4d91f3e docs: add --event-log, --version-check, and federation troubleshooting to admin guide
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m32s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 04:43:37 +04:00
Siavash Sameni
3b962bd4cb fix: build scripts use git reset --hard before pull to recover from dirty state
Some checks failed
Mirror to GitHub / mirror (push) Failing after 1m14s
Build Release Binaries / build-amd64 (push) Failing after 4m13s
Cargo.lock changes from Docker builds caused pull conflicts. Now uses
reset --hard + clean -fd to guarantee clean state before pulling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:13:26 +04:00
Siavash Sameni
1118eac752 fix: re-enable FEC + time-based dedup for federation
Some checks failed
Mirror to GitHub / mirror (push) Failing after 2m7s
Build Release Binaries / build-amd64 (push) Has been cancelled
Restore fec_ratio=0.2 on GOOD profile. Time-based dedup (2s TTL) with
payload hash prevents consecutive sender collisions while still catching
multi-path duplicates. Verified: 6 consecutive senders across 2 relays,
0 decode errors, 0 drops, FEC active.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:09:15 +04:00
Siavash Sameni
f935bd69cd fix: rewrite seq/fec for federation-delivered packets
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 2m48s
Mirror to GitHub / mirror (push) Failing after 4m2s
- Time-based dedup (2s TTL) replaces fixed-window dedup — consecutive
  senders with same seq numbers no longer collide
- Raw byte forwarding for federation local delivery (no re-serialization)
- Jitter buffer resets on large backward seq jumps (>100)
- recv_media skips malformed datagrams instead of returning connection-closed
- SIGTERM handler for clean QUIC shutdown on wzp-client
- JSONL event log infrastructure (--event-log flag) for protocol analysis
- FEC disabled on GOOD profile for federation debugging (fec_ratio=0.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:55:06 +04:00
Siavash Sameni
1c684f6b47 fix: rewrite seq/fec for federation-delivered packets
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m59s
Federation media from different senders had conflicting seq numbers,
FEC block IDs, and Opus decoder state. The relay now assigns fresh
monotonic seq/fec_block/fec_symbol to all federation-delivered packets,
ensuring clients see a clean continuous stream regardless of sender changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:48:55 +04:00
Siavash Sameni
c92db7e9b7 fix: preserve original relay label through multi-hop presence propagation
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 7m26s
When propagating GlobalRoomActive to other peers, use tagged participants
(with relay_label set to the originating relay) instead of the raw
untagged participants. This shows "Relay C" instead of "Relay B" when
C's participants are forwarded through hub B to A.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:34:22 +04:00
Siavash Sameni
c3bd657224 fix: FEC decoder resets stale blocks — fixes consecutive federation connects
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m0s
When a new sender reuses the same block_id values as a previous sender,
the FEC decoder was silently dropping all data because blocks were marked
as "already decoded". Now blocks older than 2 seconds are automatically
reset when new data arrives for them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:26:00 +04:00
Siavash Sameni
8b79cdc6fc fix: dedup filter collision between different senders + build scripts default --pull
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
- Dedup key now includes source peer fingerprint hash, preventing
  packets from different senders with same room+seq from being dropped
  as duplicates (was silently killing all multi-hop audio)
- Build scripts default to --pull (use --no-pull to skip)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:18:52 +04:00
Siavash Sameni
2eab56beec fix: federation presence dedup, stale cleanup, and Android SIGSEGV crash
Some checks failed
Mirror to GitHub / mirror (push) Failing after 29s
Build Release Binaries / build-amd64 (push) Failing after 1m57s
- Deduplicate remote participants by fingerprint in all merge sites
  (canonical == raw room name caused double-lookup, doubling every remote participant)
- GlobalRoomInactive now propagates updated participant list to other peers
  (hub relay B was not informing A when C's participants left)
- Add 15-second stale presence sweeper that purges remote participants
  from peers that stop sending data (safety net for QUIC timeout delays)
- Add @Synchronized to WzpEngine.getStats/stopCall/destroy to prevent
  TOCTOU race between stats polling coroutine and engine teardown (SIGSEGV)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:07:59 +04:00
Siavash Sameni
7dadc1ddd6 fix: default room 'general', cap auto codec at 24k
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m51s
- Android default room changed from 'android' to 'general'
- Relay choose_profile capped at GOOD (Opus 24k) — studio tiers
  (32k/48k/64k) cause high packet loss on federation paths due to
  larger datagrams exceeding path MTU. Will re-enable after MTU
  discovery is implemented.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:41:12 +04:00
Siavash Sameni
be0441295a fix: read git hash outside Docker for Linux build ntfy notification
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 2m1s
The hash was read inside Docker (/build/source) where .git doesn't
exist. Now reads from $BASE_DIR/data/source before Docker runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:32:03 +04:00
Siavash Sameni
b9f4e7f102 feat: include git hash in ntfy build notifications + MTU PRD
Some checks failed
Mirror to GitHub / mirror (push) Failing after 29s
Build Release Binaries / build-amd64 (push) Has been cancelled
ntfy messages now show: "WZP Linux [abc1234] ready!" and
"WZP Android [abc1234] done! APK: url" so you can verify which
commit was built without checking relay version remotely.

Also added PRD-mtu-discovery.md for QUIC Path MTU Discovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:26:13 +04:00
Siavash Sameni
28f4a0fb6f fix: multi-hop presence — propagate remote rooms on new peer connect
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m35s
When a new federation link is established, announce not only LOCAL
global rooms but also rooms from OTHER peers (remote_participants).
This fixes multi-hop: when R2 connects to R3, R2 tells R3 about
R1's rooms that R2 learned about earlier.

Previously, only local rooms were announced on link setup. If R1
had a client but R2 had no clients, R2 wouldn't tell R3 about R1.

Also added diagnostic logging for room announcements on link setup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:43:15 +04:00
Siavash Sameni
3d76acf528 fix: multi-hop federation — hub relay forwards without local participants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m18s
Three fixes for 3-relay chain (R1→R2→R3):

1. Room lookup in handle_datagram: hub relay (R2) has no local
   participants, so active_rooms() was empty and datagrams were
   silently dropped. Now also checks global_rooms config directly,
   allowing hub relays to forward without local clients.

2. Multi-hop forwarding: removed active_rooms filter — forward to
   ALL connected peers except source. The receiving peer decides
   whether to deliver or forward further.

3. Android relay_label: native RoomMember now includes relay_label
   from RoomUpdate signal. Kotlin UI reads it for relay grouping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:33:44 +04:00
Siavash Sameni
f4b5996bdf feat: Android relay-grouped participant list matching desktop
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 2m4s
Participants now grouped by relay on Android:
- Green dot + "THIS RELAY" for local participants
- Blue dot + relay label for federated participants

Added relayLabel to RoomMember data class, parsed from
relay_label JSON field. UI groups and renders with headers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:15:12 +04:00
Siavash Sameni
fc721c4217 fix: clear stale federated presence on GlobalRoomInactive
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 7m37s
When a remote relay's room goes inactive (all participants left),
the receiving relay now:
1. Clears remote_participants for that peer+room
2. Broadcasts updated RoomUpdate to local clients with the remote
   participant removed
3. Updates federation_active_rooms metric

Previously, remote participants lingered in the participant list
after disconnect, causing ghost entries and stale media forwarding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:06:48 +04:00
Siavash Sameni
5c24adf1c1 feat: remote version query — wzp-client --version-check <relay>
Some checks failed
Mirror to GitHub / mirror (push) Failing after 1m32s
Build Release Binaries / build-amd64 (push) Failing after 2m16s
Connects to a relay over QUIC with SNI "version", reads build hash
from a unidirectional stream, prints "<relay> <git-hash>" and exits.

Usage: wzp-client --version-check 172.16.81.175:4434
Output: 172.16.81.175:4434 8dbda3e

Relay side: detects "version" SNI, opens uni stream, writes
BUILD_GIT_HASH, waits 100ms for client to read, closes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:47:37 +04:00
Siavash Sameni
8dbda3e052 feat: --version flag with git hash + test script kill fix
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 2m9s
Mirror to GitHub / mirror (push) Failing after 32s
wzp-relay --version prints "wzp-relay <short-git-hash>".
Build hash also logged on startup: version=abc1234.
Enables verifying deployed relay matches expected build.

Also fixed federation-test.sh: use kill -INT (not SIGTERM) so
clients save recordings before exit. Added save delay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:36:33 +04:00
Siavash Sameni
c8a3aaacb6 feat: comprehensive federation test harness
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
7 test scenarios across 3 relays:
1. Basic 2-relay audio (A→B)
2. Reverse direction (B→A)
3. 3-relay chain (A→B→C)
4. File playback (60s test audio)
5. Reconnection (join/leave/rejoin)
6. Multi-participant (3 users on 3 relays)
7. Simultaneous senders (2 senders, 1 recorder)

Usage: ./scripts/federation-test.sh <relay1> <relay2> <relay3>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:19:15 +04:00
Siavash Sameni
54cb6c3b71 feat: relay_label in RoomParticipant + tagged remote participants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 44s
Build Release Binaries / build-amd64 (push) Failing after 2m26s
RoomParticipant.relay_label identifies which relay a participant is
connected to. Local participants have None, federated participants
get tagged with the peer relay's label when storing remote_participants.

This enables clients to group participants by relay in the UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:22:53 +04:00
Siavash Sameni
a3ebf5616f fix: unified raw room names + merged presence on join
Some checks failed
Mirror to GitHub / mirror (push) Failing after 42s
Build Release Binaries / build-amd64 (push) Failing after 2m1s
1. CLI client now sends raw room names (no hash), matching Android
   JNI and Desktop Tauri. All three clients are now consistent.

2. When a client joins a global room, the relay merges federated
   remote participants into the initial RoomUpdate. Previously,
   clients that joined after the GlobalRoomActive signal only saw
   local participants. Now they see everyone immediately.

3. Added get_remote_participants() to FederationManager for querying
   cached remote participants from all peer links.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:09:15 +04:00
Siavash Sameni
ff6d0444c0 feat: federation Prometheus metrics — peer status, packets, active rooms
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 2m8s
Wires up the existing RelayMetrics federation fields:
- wzp_federation_peer_status{peer} — 1=connected, 0=disconnected
- wzp_federation_packets_forwarded_total{peer,direction} — in/out counts
- wzp_federation_active_rooms — number of active federated rooms

These are critical for monitoring federation health and will feed into
the adaptive codec selection system (PRD-coordinated-codec.md).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:00:13 +04:00
Siavash Sameni
8080713098 feat: federated presence — RoomUpdate includes remote participants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 42s
Build Release Binaries / build-amd64 (push) Failing after 2m29s
GlobalRoomActive signal now carries participant list from the
announcing relay. When received, the relay:
1. Stores remote participants per peer link
2. Broadcasts merged RoomUpdate to local clients (local + all remote)

This means clients on different relays can now SEE each other in the
participant list. Also fixes build: removed non-existent metric field
references that were added by linter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:52:27 +04:00
Siavash Sameni
e813362395 feat: federation metrics + dedup + rate limiting
Some checks failed
Mirror to GitHub / mirror (push) Failing after 33s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
Add Prometheus metrics for federation links (per-peer RTT, packet
counters, active rooms gauge, dedup/rate-limit drop counters).

Add dedup filter (4096-entry ring buffer) to drop duplicate packets
arriving via multiple federation paths. Add per-room token bucket
rate limiter (500 pps) to prevent amplification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:36:26 +04:00
Siavash Sameni
d52b8befd6 fix: canonical room hash for federation — handles hashed vs raw room names
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m13s
Different clients send different room names:
- Android: raw "general" as SNI
- Desktop: hash_room_name("general") = "f09ae11d..." as SNI

Federation datagrams are tagged with an 8-byte room hash. Previously,
each relay computed the hash from the client-provided room name,
causing mismatches between relays with different client types.

Fix: resolve_global_room() maps any room name (raw or hashed) to the
canonical [[global_rooms]] name. global_room_hash() always uses the
canonical name for federation hashing. handle_datagram uses both raw
and canonical hash matching to find the local room.

Also: run_participant now receives the pre-computed federation_room_hash
so the egress uses the canonical hash, not the client-specific name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:31:26 +04:00
Siavash Sameni
0abecf7fd8 feat: adaptive quality engine + codec indicator UI
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 2m17s
Wire AdaptiveQualityController into Android engine for auto codec
switching based on network quality reports. Add color-coded TX/RX
codec badges to the in-call screen showing active codecs and Auto mode.

- Recv task: ingest QualityReports, feed to controller, signal profile
  changes via AtomicU8 to send task
- Send task: check for pending profile switch at frame boundaries,
  update encoder/FEC/frame size
- Track peer codec from incoming packet headers
- Kotlin UI: codec badges (blue=studio, green=good, amber=degraded,
  red=catastrophic) with Auto tag
- Add .taskmaster to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:19:11 +04:00
Siavash Sameni
f4cc3b1a6b fix: forward media to ALL connected peers, not just those with room active
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 2m14s
The bug: when a local client joins a global room and sends media, the
egress task checked peer_links.active_rooms to decide where to forward.
But active_rooms tracks what PEERS announced (their rooms), not what
WE announced. So our own GlobalRoomActive signal went out but our
peer_links had empty active_rooms — media was dropped.

Fix: for locally-originated media, send to ALL connected federation
peers unconditionally. The receiving relay decides whether to deliver
to local participants (if it has the room) or forward further. This
is correct because federation peers are explicitly configured — if
they're connected, they should receive global room media.

Multi-hop forwarding (handle_datagram) still filters by active_rooms
to prevent loops — only forwards to peers that announced the room.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:09:50 +04:00
Siavash Sameni
af4c89f5f0 docs: PRD for delegated trust in relay federation
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 1m51s
Addresses the trust gap where a hub relay can forward media from
unknown relays without the receiving relay's consent. Introduces
delegate=true flag on [[trusted]] entries: when set, the relay
accepts media forwarded through the trusted peer from relays it
vouches for. Without delegate, only direct media is accepted.

Covers: FederationTrustChain signal, origin authorization checks,
TTL for chain depth limiting, anti-spam properties. 5 phases, ~3 days.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:00:21 +04:00
Siavash Sameni
406461d460 feat: personalized config generation with --listen addr + own fingerprint
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m16s
When --config points to a non-existent file, the relay now generates
a personalized example config that includes:
- listen_addr matching the --listen flag (not hardcoded 0.0.0.0:4433)
- Pre-filled [[peers]] section with this relay's detected IP, port,
  and TLS fingerprint — ready to copy/paste into other relay configs

This makes setting up federation much easier: start each relay, it
generates its config with its own peering info commented out, you
just uncomment and copy between configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:38:28 +04:00
Siavash Sameni
7064f484af feat: -c/--config and -i/--identity flags for multi-instance relay
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m17s
Enables running multiple relays on the same machine:
  wzp-relay -c ~/.wzp1/config.toml -i ~/.wzp1/relay-identity --listen :4433
  wzp-relay -c ~/.wzp2/config.toml -i ~/.wzp2/relay-identity --listen :4434
  wzp-relay -c ~/.wzp3/config.toml -i ~/.wzp3/relay-identity --listen :4435

Config auto-creation: if the config file doesn't exist, writes an
example config with all fields documented and commented. The relay
starts with defaults but the file is ready to edit.

Identity auto-generation: if the identity file doesn't exist, generates
a new random seed (OsRng via wzp_crypto::Seed::generate) and saves it.
Subsequent starts load the same identity.

Short flags: -c for --config, -i for --identity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:18:48 +04:00
Siavash Sameni
1d2222a25a debug: add datagram receive + multi-hop forward error logging
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 2m28s
Added logging to trace federation media flow:
- media_task logs first + every 250th received datagram (count, len)
- handle_datagram multi-hop forward logs errors (was silently dropped)
- forward_to_peers logs when no peer matches

2-relay (A→B): WORKING — full audio received, 300 packets forwarded
3-relay (A→B→C): B receives datagrams from A but only 1 arrives —
  remaining packets not received, likely a QUIC read_datagram issue
  when handle_datagram holds locks during processing. Needs further
  investigation into async lock contention or datagram buffering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:45:54 +04:00
Siavash Sameni
270e139f20 feat: federation media forwarding WORKING — global rooms router model complete
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
2-relay test: 5.0s audio, RMS 4748, PASS. Full pipeline verified:
- Room correctly identified as global (hash matching works)
- Federation egress channel created and connected
- GlobalRoomActive signals exchanged between peers
- 300 packets (250 source + 50 FEC) forwarded via tagged datagrams
- Client B on relay B received full 5-second tone from client A on relay A

Added debug logging: is_global check, egress channel creation, per-peer
forwarding with active_rooms diagnostic when no match found. Also logs
egress packet count (first + every 250th).

Multi-hop propagation: GlobalRoomActive signals forwarded to other peers
so A→B→C chain knows about rooms across the full mesh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:31:37 +04:00
Siavash Sameni
d9b2e0fd53 docs: comprehensive documentation — design, architecture, admin, user guide
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
4 files, 2,511 lines covering the entire WarzonePhone project:

DESIGN.md (591 lines): system overview, codec system (9 variants),
FEC (RaptorQ), transport (QUIC/quinn), security (Ed25519/X25519/
ChaCha20/HKDF/BIP39/TOFU), federation (global rooms), jitter buffer.
Mermaid diagrams for audio pipelines and crate dependencies.

ARCHITECTURE.md (874 lines): 15 mermaid diagrams — system overview,
encode/decode pipelines, relay SFU, federation topology/protocol,
signal handshake, client architectures (desktop/android/CLI), wire
format tables (MediaHeader/MiniHeader/QualityReport), project tree.

ADMINISTRATION.md (587 lines): relay deployment (binary/Docker/systemd),
complete TOML config reference, CLI flags table, federation setup
(peers/trusted/global_rooms), 3 example configs, Prometheus metrics,
auth, identity persistence, 12-item troubleshooting guide.

USER_GUIDE.md (459 lines): all clients — desktop (settings, quality
slider, key warning, shortcuts), Android (8-level quality slider,
server management, identity backup), CLI (flags table, 8 usage
patterns). Identity system, quality profiles when-to-use guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:21:13 +04:00
Siavash Sameni
898c1ea32b docs: PRDs for P2P direct calls and coordinated codec switching
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m52s
PRD-p2p-direct.md: STUN-based NAT traversal for direct QUIC
connections between clients. True E2E with mutual TLS cert pinning
via identity fingerprints. Hybrid mode: try P2P, fall back to relay.
4 phases: STUN discovery, hole punching, P2P adaptive quality,
seamless relay-to-P2P migration.

PRD-coordinated-codec.md: Relay acts as quality judge — monitors
per-participant loss/RTT/jitter, sends quality directives. Downgrade
is immediate (match weakest link), upgrade is consensual (all
participants must agree, synchronized switch at agreed timestamp).
Covers asymmetric encoding in SFU and P2P→relay backporting strategy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:12:12 +04:00
Siavash Sameni
b00db5dfdc feat: federation rewrite — global rooms router model
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m52s
Major rewrite of relay federation replacing virtual participants with
a clean router model:

1. Global rooms: [[global_rooms]] in TOML config declares rooms that
   are bridged across federation. Each relay is a router + local SFU.

2. Room events: RoomManager emits LocalJoin/LocalLeave via broadcast
   channel when rooms transition between empty and non-empty.

3. GlobalRoomActive/Inactive signals: relays announce when they have
   local participants in global rooms. Peers track active state and
   forward media accordingly. Announcements propagate for multi-hop.

4. Media forwarding: separated from SFU loop. Local participant sends
   via mpsc channel → egress task → forward_to_peers() → room-hash
   tagged datagrams to active peer links. Inbound datagrams delivered
   to local participants + forwarded to other active peers (multi-hop).

5. Loop prevention: don't forward back to source relay.

6. Room name hashing: is_global_room() checks both plain name and
   hash (clients hash room names for SNI privacy).

Removed: ParticipantSender::Federation, federated_participants, virtual
participant join/leave, periodic room polling. Rooms now only contain
local participants.

Signaling tested: 3-relay chain (A→B←C) correctly propagates
GlobalRoomActive through B to both A and C. Media forwarding plumbing
in place but needs final debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 07:54:38 +04:00
Siavash Sameni
bc8bb3d790 feat: [[trusted]] config + FederationHello for one-sided federation
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
- Added [[trusted]] config: relay B can accept inbound federation
  from relay A by fingerprint alone, without knowing A's address.
  A connects to B with [[peers]], B trusts A with [[trusted]].

- FederationHello signal: outbound connections send their TLS
  fingerprint as first signal. The accepting relay verifies it
  against [[peers]] (by IP) or [[trusted]] (by fingerprint).

- Tested 3-relay chain: A→B←C. Both A and C connect to B, B trusts
  both. B correctly accepts both inbound connections. Room
  announcements flow A→B and C→B.

- Remaining: B needs to announce rooms back to A and C on the same
  connection so media can flow A→B→C. Currently A has no virtual
  participant for B, so media doesn't reach B's SFU for forwarding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 06:49:20 +04:00
Siavash Sameni
ea51d068e6 feat: --debug-tap for relay packet header logging
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 1m57s
Adds --debug-tap <room> flag (or debug_tap in TOML config) that logs
every media packet's header metadata passing through a room. Use '*'
for all rooms.

Output (via tracing target "debug_tap"):
  TAP room=... dir=in addr=... seq=31 codec=Opus24k ts=520
      fec_block=5 fec_sym=1 repair=false len=65 fan_out=1

Shows: direction, source address, sequence number, codec ID, timestamp,
FEC block/symbol, repair flag, payload size, and fan-out count.
No decryption needed — headers are not encrypted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 06:34:22 +04:00
Siavash Sameni
7271942c6a feat: federation media forwarding working — audio crosses between relays
Some checks failed
Mirror to GitHub / mirror (push) Failing after 33s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
Added debug logging to federation signal path. Fixed the announce/recv
flow: outbound link's announce_task sends FederationRoomJoin, peer's
inbound signal_task receives it and creates virtual participant.

Tested: two relays on localhost with mutual TOML config, client A
sends tone via relay A, client B records via relay B — audio received
through federation (0.1s/RMS 7291/PASS).

Room announcement delay is ~1s (poll interval). The full pipeline:
client join → room created → announce_task detects → sends signal →
peer receives → creates virtual participant → SFU loop forwards
media via room-hash-tagged datagrams → peer demuxes → local delivery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 06:26:49 +04:00
Siavash Sameni
da84ed332c docs: PRD for protocol analyzer — relay debug tap + full analyzer tool
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m47s
Two tools:
1. --debug-tap on relay: logs packet header metadata (seq, codec, ts,
   FEC, repair, size) per room without decryption. 0.5 day effort.
2. wzp-analyzer standalone: joins room as observer, decodes audio,
   shows TUI with per-participant waveforms + quality stats + FEC
   recovery rates. Capture/replay and HTML reports. 5-8 days total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 05:57:27 +04:00
Siavash Sameni
e50925e05a fix: IP-based peer matching for inbound federation + room announcements
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
- Inbound federation connections now matched by source IP against
  configured peer URLs (QUIC clients don't present TLS certs, so
  fingerprint matching fails for inbound direction).
- Added periodic room announcement task (1s poll) that sends
  FederationRoomJoin to peers when new rooms appear with local
  participants. Handles rooms created after federation link is up.
- Added find_peer_by_addr() to FederationManager.

Federation link topology: each relay pair has 2 connections (outbound
from each side). Outbound sends signals, peer's inbound receives them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 05:49:37 +04:00
Siavash Sameni
6be36e43c2 feat: relay federation infrastructure — room bridging, loop prevention, peer connections
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m1s
Phase 1 of relay federation:

1. Signal messages: FederationRoomJoin/Leave/ParticipantUpdate added
   to SignalMessage enum for relay-to-relay room coordination.

2. Room changes: ParticipantOrigin (Local/Federated) tracking, loop
   prevention (federated media only forwards to local participants),
   ParticipantSender::Federation with 8-byte room-hash prefixed
   datagrams, merged participant lists (local + remote), new methods:
   join_federated(), update_federated_participants(), local_senders(),
   active_rooms(), local_participants().

3. FederationManager: connects to configured peers via QUIC with SNI
   "_federation", reconnects with exponential backoff (5s-300s),
   exchanges FederationRoomJoin signals, runs recv loops for both
   signals and media datagrams, creates virtual participants in rooms.

4. Accept-side: _federation SNI handling in main.rs, unknown peer
   gets helpful "add to relay.toml" log message, recognized peers
   handed off to FederationManager.

TODO: TLS fingerprint verification — currently outbound connections
use client_config() which doesn't present a cert, so inbound
verification fails. Need mutual TLS or URL-based peer matching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:30:18 +04:00
Siavash Sameni
2f2720802d feat: TOML config file with federation peers + --config flag
The relay now supports loading configuration from a TOML file via
--config <path>. CLI flags override TOML values. All fields have
serde defaults so a minimal config only needs what you want to change.

Example relay.toml:
  listen_addr = "0.0.0.0:4433"
  [[peers]]
  url = "193.180.213.68:4433"
  fingerprint = "1a:39:38:..."
  label = "Pangolin EU"

Federation hint on startup now shows TOML format with TLS fingerprint
(not Ed25519 identity fingerprint), since TLS fingerprint is what
peers actually verify. Configured peers are logged on startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:13:56 +04:00
Siavash Sameni
087bfd2335 feat: deterministic TLS certificate from relay identity seed
The relay's TLS certificate is now derived from the persisted
Ed25519 seed via HKDF, so the same seed produces the same cert
and the same TLS fingerprint across restarts. This fixes the
"Server Key Changed" warnings on every relay restart.

Implementation: HKDF-SHA256(seed, "wzp-tls-ed25519") → Ed25519
signing key → PKCS8 DER → rcgen KeyPair → self-signed cert.

Also adds tls_fingerprint() helper (SHA-256 of DER cert, hex with
colons) and prints it on startup. This is the prerequisite for
relay federation (peers verify each other by TLS fingerprint).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:10:08 +04:00
Siavash Sameni
0a05e62c7f feat: relay prints federation peering config on startup
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 1m50s
On startup, the relay detects its outbound IP (via UDP socket trick)
and prints a ready-to-copy YAML snippet for other relays to federate:

  federation: to peer with this relay, add to peers config:
    - url: "193.180.213.68:4433"
      fingerprint: "a5d6:e3c6:..."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:37:10 +04:00
Siavash Sameni
b97f32ce46 docs: PRD for relay federation (multi-relay mesh) + identity fix
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
Documents the relay TLS identity bug (cert regenerates on restart
because server_config() creates a new keypair every time, ignoring
the persisted Ed25519 seed) and the full federation design:

- YAML config with mutual peer trust (url + fingerprint)
- QUIC connections between peers, fingerprint verification
- Room bridging: media forwarding for shared room names
- Merged participant presence across relays
- Helpful log message for unconfigured peer connection attempts
- No transcoding, no re-encryption, no central coordinator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:33:05 +04:00
Siavash Sameni
d66d583583 docs: PRD for adaptive quality control (auto codec)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 1m55s
Covers the full design for runtime codec switching based on network
conditions: 3-tier basic (GOOD/DEGRADED/CATASTROPHIC), extended
5-tier with studio levels, and bandwidth probing. Details the
existing QualityAdapter infrastructure, what's missing (report
ingestion, profile switch loop, cross-task signaling via AtomicU8),
and implementation plan for both Android and desktop engines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:25:33 +04:00
Siavash Sameni
d06cf66538 fix: auto codec, force-ping button, relay delete button
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m57s
1. Auto codec: new "Auto" position on quality slider (JNI index 7).
   When selected, the engine uses the relay's chosen_profile from
   CallAnswer instead of the local preference. Slider now has 8
   positions: Studio 64k → Auto → Codec2 1.2k.

2. Force ping: added refresh button (↻) in Manage Relays dialog
   header. Calls pingAllServers() to re-check all relays on demand.

3. Delete relay fix: the X button was inside a Surface(onClick=...)
   which swallowed the touch event. Replaced with a separate Surface
   that properly intercepts the click.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:22:24 +04:00
Siavash Sameni
c8bcc5c974 fix: advertise studio profiles in handshake supported_profiles
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 2m7s
Mirror to GitHub / mirror (push) Failing after 35s
The CallOffer only advertised GOOD/DEGRADED/CATASTROPHIC. When a
client uses a studio profile, the relay's choose_profile couldn't
pick it. Now advertises all 6 profiles (studio 64k/48k/32k + good +
degraded + catastrophic) in both Android engine and shared handshake.

Also: the relay MUST be rebuilt with the new CodecId variants,
otherwise it will fail to deserialize CallOffer messages containing
studio QualityProfiles in supported_profiles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:39:31 +04:00
Siavash Sameni
760126b6ab fix: remove duplicate Kotlin imports causing build failure
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 2m5s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:17:33 +04:00
Siavash Sameni
53f8bf8fff feat: full quality tiers + slider UI + key-change warning on Android
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m52s
1. Wire protocol: add Opus 32k/48k/64k (CodecId 6/7/8) + STUDIO
   profiles with is_opus() helper. Opus enc/dec accept all Opus variants.

2. JNI bridge: expand profile_from_int to 7 levels (0-6) mapping to
   GOOD, DEGRADED, CATASTROPHIC, Codec2_3200, STUDIO_32K/48K/64K.

3. Settings UI: replace radio buttons with Material3 Slider — 7 stops
   from Studio 64k (green) to Codec2 1.2k (dark red), matching desktop.

4. Key-change warning: AlertDialog on connect when server fingerprint
   has changed. Shows old vs new fingerprint, Accept New Key or Cancel.
   Accepting saves the new fingerprint and proceeds with the call.

5. Engine recv: handle studio codec IDs in auto-switch path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:11:29 +04:00
Siavash Sameni
b3cdad0c75 fix: copy libc++_shared.so from NDK when cargo-ndk skips it
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
cargo-ndk doesn't always copy libc++_shared.so into jniLibs. The
build script now finds it in the NDK and copies it manually if
missing, preventing the build check from failing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:06:28 +04:00
Siavash Sameni
fa3c7f1cef fix: dynamic frame sizing for non-default quality profiles on Android
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
The send loop was hardcoded to 960 samples (20ms/Opus24k), causing
DEGRADED (Opus 6k, 40ms) and CATASTROPHIC (Codec2 1200, 40ms) to
fail — the encoder needed 1920 samples but only got 960.

Changes:
- capture_buf, ring read threshold, and timestamp increment are now
  computed from profile.frame_duration_ms (960 for 20ms, 1920 for 40ms)
- decode_buf sized to MAX_FRAME_SAMPLES (1920) to handle any incoming codec
- recv codec switch now uses correct QualityProfile per codec (was
  inheriting original profile's frame_duration_ms, breaking cross-codec)
- added ComfortNoise guard on recv path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:00:27 +04:00
Siavash Sameni
68b56d9172 fix: ping every 5min (was 5s), clean endpoint on failure, never block connect
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m45s
- Ping interval: 5 minutes (was 5 seconds — too aggressive)
- Rust ping_relay: explicitly close endpoint + shutdown runtime on failure
- Connect button works regardless of ping status (never blocked)
- Ping failure doesn't corrupt engine state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:40:14 +04:00
Siavash Sameni
7973c8c6a3 fix: ntfy failure notification on build error (trap ERR)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m58s
Both Android and Linux build scripts now send ntfy notification
when build fails, not just on success.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:23:32 +04:00
Siavash Sameni
3e9539e5da fix: add libasound2-dev to Docker image for Linux audio builds
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 4m16s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:16:39 +04:00
Siavash Sameni
a1ccb3f390 feat: Linux x86_64 fire-and-forget Docker build on SepehrHomeserverdk
Some checks failed
Mirror to GitHub / mirror (push) Failing after 41s
Build Release Binaries / build-amd64 (push) Failing after 4m2s
Same Docker image as Android build. Separate cache dirs (cache-linux/)
to avoid conflicts when running both builds simultaneously.

Builds: wzp-relay, wzp-client, wzp-client-audio, wzp-web, wzp-bench
Uploads tar.gz to rustypaste, notifies ntfy.sh/wzp.

Usage:
  ./scripts/build-linux-docker.sh --pull         # fire and forget
  ./scripts/build-linux-docker.sh --pull --install # wait + download

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:09:01 +04:00
Siavash Sameni
7751439e2b feat: relay identity persistence + Linux build script
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Has been cancelled
Relay identity:
- Stored in ~/.wzp/relay-identity (hex-encoded 32-byte seed)
- Generated on first run, reused on restart
- Fingerprint stays consistent across relay restarts

Linux build script (scripts/build-linux-notify.sh):
- Fire and forget: Hetzner VM → build all binaries → upload to rustypaste → ntfy notify → destroy VM
- Builds: wzp-relay, wzp-client, wzp-client-audio, wzp-web, wzp-bench
- Packages as tar.gz, uploads to rustypaste
- --keep flag to preserve VM

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:05:49 +04:00
Siavash Sameni
20bc290c18 fix: relay handles ping connections gracefully (no timeout errors)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 4m0s
Relay recognizes SNI "ping" and returns immediately — no handshake,
no stream accept, no timeout error logs. Client closes after QUIC
connect for RTT measurement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:01:03 +04:00
Siavash Sameni
a8dc350a65 feat: codec selection in settings (Opus / Opus Low / Codec2)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 41s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
- Settings UI: radio buttons for encode codec selection
- Persisted via SettingsRepository
- Passed through WzpEngine.startCall(profile=) → JNI → Rust CallStartConfig
- Decode always accepts all codecs (per-packet codec_id switch)
- 0 = Opus 24k (GOOD), 1 = Opus 6k (DEGRADED), 2 = Codec2 1.2k (CATASTROPHIC)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:50:01 +04:00
Siavash Sameni
00fa109f07 feat: codec2 support — adaptive encoder/decoder, per-packet codec switch
Some checks failed
Mirror to GitHub / mirror (push) Failing after 33s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
Android engine:
- Use wzp_codec::create_encoder/create_decoder (factory) instead of
  hardcoded OpusEncoder/OpusDecoder
- Recv path: auto-switch decoder based on incoming packet's codec_id
- Supports mixed-codec rooms (one client Opus, another Codec2)

Desktop client already uses factory functions — no changes needed.

Codec selection via QualityProfile:
- GOOD: Opus 24kbps
- DEGRADED: Opus 6kbps
- CATASTROPHIC: Codec2 1200bps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:34:14 +04:00
Siavash Sameni
1e40dec468 feat: periodic server ping every 5s while app is open
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m53s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:13:51 +04:00
Siavash Sameni
aecef0905d feat: fire-and-forget build script with ntfy + rustypaste
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m59s
- Uploads build script to remote, runs in tmux (survives SSH drop)
- Builds Rust + APK in Docker
- Validates both .so files present before APK build
- Uploads APK to rustypaste
- Sends ntfy.sh/wzp notification with download URL
- --install flag: waits + downloads + adb installs locally
- --rust flag: force clean Rust rebuild
- --pull flag: git pull before building

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:00:49 +04:00
Siavash Sameni
18f7faa279 fix: ping as engine instance method — same lifecycle as call
Some checks failed
Mirror to GitHub / mirror (push) Failing after 7s
Build Release Binaries / build-amd64 (push) Failing after 19s
Ping was a static JNI method that loaded the .so before nativeInit,
crashing jemalloc. Now ping is an instance method on WzpEngine:

- Engine is created once (nativeInit), reused for both ping and call
- pingRelay() uses same tokio runtime pattern as startCall()
- Auto-pings all servers on app launch (after engine init)
- No process restart needed
- TOFU fingerprints saved on first successful ping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 09:49:33 +04:00
Siavash Sameni
eeb85aeac2 feat: ping-and-exit for server RTT, remove broken UDP ping
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m38s
- Ping button: pings all servers via native QUIC, saves RTT + fingerprint
  to SharedPreferences, then exits process (System.exit)
- On restart: loads saved ping results (no native .so loading needed)
- Avoids jemalloc crash: native lib only loaded once per process lifetime
- Removed broken UDP probe (QUIC servers don't respond to it)
- SettingsRepository: savePingRtt/loadPingRtt for cached results
- PingResult: added reachable field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 09:31:02 +04:00
Siavash Sameni
00b405aa87 feat: debug recording off by default, toggle in settings
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
- AudioPipeline.debugRecording defaults to false (was true)
- SettingsRepository: persist debug_recording preference
- CallViewModel: debugRecording StateFlow + setter, wired to AudioPipeline
- Only records PCM + RMS when explicitly enabled in settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 09:01:43 +04:00
Siavash Sameni
d09e21965e feat: pure Kotlin UDP ping — periodic every 5s, no JNI crash
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m35s
Replace WzpEngine.pingRelay() (JNI, loads native .so, crashes jemalloc
on Android 16 MTE) with pure Kotlin DatagramSocket UDP probe.

- RelayPinger: sends QUIC Version Negotiation trigger packet, measures
  RTT from response. No native lib, no JNI, zero crash risk.
- Periodic: pings all servers every 5 seconds via coroutine
- Server fingerprint: filled lazily on first real QUIC connection
  (TOFU still works, just delayed)
- Lock status: OFFLINE when ping fails, NEW until first connection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 08:57:27 +04:00
Siavash Sameni
97bcc79f9b feat: desktop-style UI + docker build scripts, fix ping crash
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 4m3s
- InCallScreen rewrite matching desktop dark theme layout
- Removed auto-ping LaunchedEffect (loading native .so early via
  pingRelay crashes jemalloc on Android 16 MTE)
- Added Docker build scripts (Dockerfile.android-builder + build-android-docker.sh)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 08:19:45 +04:00
79 changed files with 13768 additions and 1356 deletions

View File

@@ -0,0 +1,72 @@
---
name: caveman
description: >
Ultra-compressed communication mode. Slash token usage ~75% by speaking like caveman
while keeping full technical accuracy. Use when user says "caveman mode", "talk like caveman",
"use caveman", "less tokens", "be brief", or invokes /caveman. Also auto-triggers
when token efficiency is requested.
---
# Caveman Mode
## Core Rule
Respond like smart caveman. Cut articles, filler, pleasantries. Keep all technical substance.
## Grammar
- Drop articles (a, an, the)
- Drop filler (just, really, basically, actually, simply)
- Drop pleasantries (sure, certainly, of course, happy to)
- Short synonyms (big not extensive, fix not "implement a solution for")
- No hedging (skip "it might be worth considering")
- Fragments fine. No need full sentence
- Technical terms stay exact. "Polymorphism" stays "polymorphism"
- Code blocks unchanged. Caveman speak around code, not in code
- Error messages quoted exact. Caveman only for explanation
## Pattern
```
[thing] [action] [reason]. [next step].
```
Not:
> Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by...
Yes:
> Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:
## Examples
**User:** Why is my React component re-rendering?
**Normal (69 tokens):** "The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I'd recommend using useMemo to memoize the object."
**Caveman (19 tokens):** "New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`."
---
**User:** How do I set up a PostgreSQL connection pool?
**Caveman:**
```
Use `pg` pool:
```
```js
const pool = new Pool({
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
})
```
```
max = concurrent connections. Keep under DB limit. idleTimeout kill stale conn.
```
## Boundaries
- Code: write normal. Caveman English only
- Git commits: normal
- PR descriptions: normal
- User say "stop caveman" or "normal mode": revert immediately

25
.gitignore vendored
View File

@@ -4,3 +4,28 @@
*.swp
*.swo
*~
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
dev-debug.log
# Dependency directories
node_modules/
# Environment variables
.env
# Editor directories and files
.idea
.vscode
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
# OS specific
# Taskmaster (local workflow tool)
.taskmaster/
.env.example

294
Cargo.lock generated
View File

@@ -119,26 +119,6 @@ dependencies = [
"winapi",
]
[[package]]
name = "audiopus"
version = "0.3.0-rc.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ab55eb0e56d7c6de3d59f544e5db122d7725ec33be6a276ee8241f3be6473955"
dependencies = [
"audiopus_sys",
]
[[package]]
name = "audiopus_sys"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "62314a1546a2064e033665d658e88c620a62904be945f8147e6b16c3db9f8651"
dependencies = [
"cmake",
"log",
"pkg-config",
]
[[package]]
name = "autocfg"
version = "1.5.0"
@@ -297,6 +277,12 @@ dependencies = [
"tower-service",
]
[[package]]
name = "base16ct"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4c7f02d4ea65f2c1853089ffd8d2787bdbc63de2f0d29dedbcf8ccdfa0ccd4cf"
[[package]]
name = "base64"
version = "0.22.1"
@@ -383,6 +369,12 @@ version = "3.20.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5d20789868f4b01b2f2caec9f5c4e0213b41e3e5702a50157d699ae31ced2fcb"
[[package]]
name = "bytemuck"
version = "1.25.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c8efb64bd706a16a1bdde310ae86b351e4d21550d98d056f22f8a7f7a2183fec"
[[package]]
name = "byteorder"
version = "1.5.0"
@@ -467,6 +459,7 @@ dependencies = [
"iana-time-zone",
"js-sys",
"num-traits",
"serde",
"wasm-bindgen",
"windows-link",
]
@@ -627,6 +620,24 @@ dependencies = [
"libc",
]
[[package]]
name = "crunchy"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "460fbee9c2c2f33933d720630a6a0bac33ba7053db5344fac858d4b8952d77d5"
[[package]]
name = "crypto-bigint"
version = "0.5.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0dc92fb57ca44df6db8059111ab3af99a63d5d0f8375d9972e319a379c6bab76"
dependencies = [
"generic-array",
"rand_core 0.6.4",
"subtle",
"zeroize",
]
[[package]]
name = "crypto-common"
version = "0.1.7"
@@ -650,6 +661,7 @@ dependencies = [
"digest",
"fiat-crypto",
"rustc_version",
"serde",
"subtle",
"zeroize",
]
@@ -816,10 +828,32 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9ed9a281f7bc9b7576e61468ba615a66a5c8cfdff42420a70aa82701a3b1e292"
dependencies = [
"block-buffer",
"const-oid",
"crypto-common",
"subtle",
]
[[package]]
name = "dirs"
version = "6.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c3e8aa94d75141228480295a7d0e7feb620b1a5ad9f12bc40be62411e38cce4e"
dependencies = [
"dirs-sys",
]
[[package]]
name = "dirs-sys"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e01a3366d27ee9890022452ee61b2b63a67e6f13f58900b651ff5665f0bb1fab"
dependencies = [
"libc",
"option-ext",
"redox_users",
"windows-sys 0.61.2",
]
[[package]]
name = "displaydoc"
version = "0.2.5"
@@ -850,6 +884,21 @@ dependencies = [
"rustfft",
]
[[package]]
name = "ecdsa"
version = "0.16.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ee27f32b5c5292967d2d4a9d7f1e0b0aed2c15daded5a60300e4abb9d8020bca"
dependencies = [
"der",
"digest",
"elliptic-curve",
"rfc6979",
"serdect",
"signature",
"spki",
]
[[package]]
name = "ed25519"
version = "2.2.3"
@@ -857,6 +906,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "115531babc129696a58c64a4fef0a8bf9e9698629fb97e9e40767d235cfbcd53"
dependencies = [
"pkcs8",
"serde",
"signature",
]
@@ -881,6 +931,26 @@ version = "1.15.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719"
[[package]]
name = "elliptic-curve"
version = "0.13.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b5e6043086bf7973472e0c7dff2142ea0b680d30e18d9cc40f267efbf222bd47"
dependencies = [
"base16ct",
"crypto-bigint",
"digest",
"ff",
"generic-array",
"group",
"pkcs8",
"rand_core 0.6.4",
"sec1",
"serdect",
"subtle",
"zeroize",
]
[[package]]
name = "encoding_rs"
version = "0.8.35"
@@ -924,6 +994,16 @@ version = "2.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be"
[[package]]
name = "ff"
version = "0.13.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c0b50bfb653653f9ca9095b427bed08ab8d75a137839d9ad64eb11810d5b6393"
dependencies = [
"rand_core 0.6.4",
"subtle",
]
[[package]]
name = "fiat-crypto"
version = "0.2.9"
@@ -1084,6 +1164,7 @@ checksum = "85649ca51fd72272d7821adaf274ad91c288277713d9c18820d8499a7ff69e9a"
dependencies = [
"typenum",
"version_check",
"zeroize",
]
[[package]]
@@ -1143,6 +1224,17 @@ version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0cc23270f6e1808e30a928bdc84dea0b9b4136a8bc82338574f23baf47bbd280"
[[package]]
name = "group"
version = "0.13.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f0f9ef7462f7c099f518d754361858f86d8a07af53ba9af0fe635bbccb151a63"
dependencies = [
"ff",
"rand_core 0.6.4",
"subtle",
]
[[package]]
name = "h2"
version = "0.4.13"
@@ -1626,6 +1718,21 @@ dependencies = [
"wasm-bindgen",
]
[[package]]
name = "k256"
version = "0.13.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f6e3919bbaa2945715f0bb6d3934a173d1e9a59ac23767fbaaef277265a7411b"
dependencies = [
"cfg-if",
"ecdsa",
"elliptic-curve",
"once_cell",
"serdect",
"sha2",
"signature",
]
[[package]]
name = "lazy_static"
version = "1.5.0"
@@ -1660,6 +1767,15 @@ version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6d2cec3eae94f9f509c767b45932f1ada8350c4bdb85af2fcab4a3c14807981"
[[package]]
name = "libredox"
version = "0.1.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7ddbf48fd451246b1f8c2610bd3b4ac0cc6e149d89832867093ab69a17194f08"
dependencies = [
"libc",
]
[[package]]
name = "linux-raw-sys"
version = "0.12.1"
@@ -1702,6 +1818,15 @@ dependencies = [
"libc",
]
[[package]]
name = "matchers"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d1525a2a28c7f4fa0fc98bb91ae755d1e2d1505079e05539e35bc876b5d65ae9"
dependencies = [
"regex-automata",
]
[[package]]
name = "matchit"
version = "0.7.3"
@@ -1980,6 +2105,30 @@ dependencies = [
"vcpkg",
]
[[package]]
name = "option-ext"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "04744f49eae99ab78e0d5c0b603ab218f515ea8cfe5a456d7629ad883a3b6e7d"
[[package]]
name = "opusic-c"
version = "1.5.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9486eb5a1a735bf56430b5b44e21157be30ac9fcc17999ba309981b8bd90d2ff"
dependencies = [
"opusic-sys",
]
[[package]]
name = "opusic-sys"
version = "0.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dc3280fe5b6f97ac1a35a0ac003e2fb0b92f8e4bdf2b2057e1bf9b87acca5696"
dependencies = [
"cmake",
]
[[package]]
name = "os_str_bytes"
version = "6.6.1"
@@ -2320,6 +2469,17 @@ dependencies = [
"bitflags 2.11.0",
]
[[package]]
name = "redox_users"
version = "0.5.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a4e608c6638b9c18977b00b475ac1f28d14e84b27d8d42f70e0bf1e3dec127ac"
dependencies = [
"getrandom 0.2.17",
"libredox",
"thiserror 2.0.18",
]
[[package]]
name = "regex"
version = "1.12.3"
@@ -2389,6 +2549,16 @@ dependencies = [
"web-sys",
]
[[package]]
name = "rfc6979"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f8dd2a808d456c4a54e300a23e9f5a67e122c3024119acbfd73e3bf664491cb2"
dependencies = [
"hmac",
"subtle",
]
[[package]]
name = "ring"
version = "0.17.14"
@@ -2567,6 +2737,21 @@ version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "94143f37725109f92c262ed2cf5e59bce7498c01bcc1502d7b9afe439a4e9f49"
[[package]]
name = "sec1"
version = "0.7.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d3e97a565f76233a6003f9f5c54be1d9c5bdfa3eccfb189469f11ec4901c47dc"
dependencies = [
"base16ct",
"der",
"generic-array",
"pkcs8",
"serdect",
"subtle",
"zeroize",
]
[[package]]
name = "security-framework"
version = "3.7.0"
@@ -2671,6 +2856,16 @@ dependencies = [
"serde",
]
[[package]]
name = "serdect"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a84f14a19e9a014bb9f4512488d9829a68e04ecabffb0f9904cd1ace94598177"
dependencies = [
"base16ct",
"serde",
]
[[package]]
name = "sha1"
version = "0.10.6"
@@ -2724,6 +2919,7 @@ version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "77549399552de45a898a580c1b41d445bf730df867cc44e6c0233bbc4b8329de"
dependencies = [
"digest",
"rand_core 0.6.4",
]
@@ -2937,6 +3133,15 @@ version = "0.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7694e1cfe791f8d31026952abf09c69ca6f6fa4e1a1229e18988f06a04a12dca"
[[package]]
name = "tiny-keccak"
version = "2.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2c9d3793400a45f954c52e73d068316d76b6f4e36977e3fcebb13a2721e80237"
dependencies = [
"crunchy",
]
[[package]]
name = "tinystr"
version = "0.8.2"
@@ -3235,10 +3440,14 @@ version = "0.3.23"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cb7f578e5945fb242538965c2d0b04418d38ec25c79d160cd279bf0731c8d319"
dependencies = [
"matchers",
"nu-ansi-term",
"once_cell",
"regex-automata",
"sharded-slab",
"smallvec",
"thread_local",
"tracing",
"tracing-core",
"tracing-log",
]
@@ -3367,6 +3576,18 @@ version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6c140620e7ffbb22c2dee59cafe6084a59b5ffc27a8859a5f0d494b5d52b6be"
[[package]]
name = "uuid"
version = "1.23.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5ac8b6f42ead25368cf5b098aeb3dc8a1a2c05a3eee8a9a1a68c640edbfc79d9"
dependencies = [
"getrandom 0.4.2",
"js-sys",
"serde_core",
"wasm-bindgen",
]
[[package]]
name = "valuable"
version = "0.1.1"
@@ -3406,7 +3627,28 @@ dependencies = [
[[package]]
name = "warzone-protocol"
version = "0.1.0"
version = "0.0.38"
dependencies = [
"base64",
"bincode",
"bip39",
"chacha20poly1305",
"chrono",
"curve25519-dalek",
"ed25519-dalek",
"hex",
"hkdf",
"k256",
"rand 0.8.5",
"serde",
"serde_json",
"sha2",
"thiserror 2.0.18",
"tiny-keccak",
"uuid",
"x25519-dalek",
"zeroize",
]
[[package]]
name = "wasi"
@@ -4071,9 +4313,11 @@ dependencies = [
name = "wzp-codec"
version = "0.1.0"
dependencies = [
"audiopus",
"bytemuck",
"codec2",
"nnnoiseless",
"opusic-c",
"opusic-sys",
"rand 0.8.5",
"tracing",
"wzp-proto",
@@ -4132,6 +4376,8 @@ dependencies = [
"async-trait",
"axum 0.7.9",
"bytes",
"chrono",
"dirs",
"futures-util",
"prometheus",
"quinn",
@@ -4139,6 +4385,7 @@ dependencies = [
"rustls",
"serde",
"serde_json",
"sha2",
"tokio",
"toml",
"tower-http",
@@ -4158,10 +4405,13 @@ version = "0.1.0"
dependencies = [
"async-trait",
"bytes",
"ed25519-dalek",
"hkdf",
"quinn",
"rcgen",
"rustls",
"serde_json",
"sha2",
"tokio",
"tracing",
"wzp-proto",

View File

@@ -35,12 +35,19 @@ quinn = "0.11"
raptorq = "2"
# Codec
audiopus = "0.3.0-rc.0"
# opusic-c: high-level safe bindings over libopus 1.5.2 (encoder side).
# opusic-sys: raw FFI for the decoder side — we build our own DecoderHandle
# because opusic-c::Decoder.inner is pub(crate) and cannot be reached for the
# Phase 3 DRED reconstruction path. See docs/PRD-dred-integration.md.
# Pinned exactly (no caret) for reproducible libopus 1.5.2 across the fleet.
opusic-c = { version = "=1.5.5", default-features = false, features = ["bundled", "dred"] }
opusic-sys = { version = "=0.6.0", default-features = false, features = ["bundled"] }
bytemuck = "1"
codec2 = "0.3"
# Crypto
x25519-dalek = { version = "2", features = ["static_secrets"] }
ed25519-dalek = { version = "2", features = ["rand_core"] }
ed25519-dalek = { version = "2", features = ["rand_core", "pkcs8"] }
chacha20poly1305 = "0.10"
hkdf = "0.12"
sha2 = "0.10"

View File

@@ -57,7 +57,7 @@ class AudioPipeline(private val context: Context) {
/** Whether to attach hardware AEC. Must be set before start(). */
var aecEnabled: Boolean = true
/** Enable debug recording of PCM + RMS histogram to cache dir. */
var debugRecording: Boolean = true
var debugRecording: Boolean = false
private var captureThread: Thread? = null
private var playoutThread: Thread? = null

View File

@@ -28,6 +28,7 @@ class SettingsRepository(context: Context) {
private const val KEY_PREFER_IPV6 = "prefer_ipv6"
private const val KEY_IDENTITY_SEED = "identity_seed_hex"
private const val KEY_AEC_ENABLED = "aec_enabled"
private const val KEY_DEBUG_RECORDING = "debug_recording"
private const val KEY_RECENT_ROOMS = "recent_rooms"
private const val TOFU_PREFIX = "tofu_"
}
@@ -120,6 +121,16 @@ class SettingsRepository(context: Context) {
fun saveAecEnabled(enabled: Boolean) { prefs.edit().putBoolean(KEY_AEC_ENABLED, enabled).apply() }
fun loadAecEnabled(): Boolean = prefs.getBoolean(KEY_AEC_ENABLED, true)
// --- Debug recording ---
fun saveDebugRecording(enabled: Boolean) { prefs.edit().putBoolean(KEY_DEBUG_RECORDING, enabled).apply() }
fun loadDebugRecording(): Boolean = prefs.getBoolean(KEY_DEBUG_RECORDING, false)
// --- Codec choice ---
// 0 = Opus (GOOD), 1 = Opus Low (DEGRADED), 2 = Codec2 (CATASTROPHIC)
fun saveCodecChoice(choice: Int) { prefs.edit().putInt("codec_choice", choice).apply() }
fun loadCodecChoice(): Int = prefs.getInt("codec_choice", 0)
// --- Identity seed ---
/**
@@ -179,4 +190,14 @@ class SettingsRepository(context: Context) {
fun loadServerFingerprint(address: String): String? {
return prefs.getString("$TOFU_PREFIX$address", null)
}
// --- Ping RTT cache ---
fun savePingRtt(address: String, rttMs: Int) {
prefs.edit().putInt("ping_rtt_$address", rttMs).apply()
}
fun loadPingRtt(address: String): Int {
return prefs.getInt("ping_rtt_$address", -1)
}
}

View File

@@ -46,6 +46,14 @@ class DebugReporter(private val context: Context) {
val zipFile = File(context.cacheDir, "wzp_debug_${timestamp}.zip")
ZipOutputStream(BufferedOutputStream(FileOutputStream(zipFile))).use { zos ->
// Phase 4: extract DRED / classical PLC counters from the
// stats JSON so they're visible in the meta preamble at a
// glance, not buried in the trailing JSON dump.
val dredReconstructions = extractLongField(finalStatsJson, "dred_reconstructions")
val classicalPlc = extractLongField(finalStatsJson, "classical_plc_invocations")
val framesDecoded = extractLongField(finalStatsJson, "frames_decoded")
val fecRecovered = extractLongField(finalStatsJson, "fec_recovered")
// 1. Call metadata
val meta = buildString {
appendLine("=== WZ Phone Debug Report ===")
@@ -58,6 +66,18 @@ class DebugReporter(private val context: Context) {
appendLine("Device: ${android.os.Build.MANUFACTURER} ${android.os.Build.MODEL}")
appendLine("Android: ${android.os.Build.VERSION.RELEASE} (API ${android.os.Build.VERSION.SDK_INT})")
appendLine()
appendLine("=== Loss Recovery ===")
appendLine("Frames decoded: $framesDecoded")
appendLine("DRED reconstructions: $dredReconstructions (Opus neural recovery)")
appendLine("Classical PLC: $classicalPlc (fallback)")
appendLine("RaptorQ FEC recovered: $fecRecovered (Codec2 only)")
if (framesDecoded > 0) {
val dredPct = 100.0 * dredReconstructions / framesDecoded
val plcPct = 100.0 * classicalPlc / framesDecoded
appendLine("DRED rate: ${"%.2f".format(dredPct)}%")
appendLine("Classical PLC rate: ${"%.2f".format(plcPct)}%")
}
appendLine()
appendLine("=== Final Stats ===")
appendLine(finalStatsJson)
}
@@ -195,4 +215,28 @@ class DebugReporter(private val context: Context) {
FileInputStream(file).use { it.copyTo(zos) }
zos.closeEntry()
}
/**
* Tiny JSON field extractor — pulls an integer value for a top-level
* field like `"dred_reconstructions":42`. We don't want to pull in a
* full JSON parser just for the debug preamble, and the CallStats
* output is a flat record with well-known field names.
*
* Returns 0 if the field is missing or unparseable.
*/
private fun extractLongField(json: String, field: String): Long {
val key = "\"$field\":"
val idx = json.indexOf(key)
if (idx < 0) return 0
var i = idx + key.length
// Skip whitespace
while (i < json.length && json[i].isWhitespace()) i++
val start = i
while (i < json.length && (json[i].isDigit() || json[i] == '-')) i++
return try {
json.substring(start, i).toLong()
} catch (_: NumberFormatException) {
0
}
}
}

View File

@@ -33,10 +33,24 @@ data class CallStats(
val fecRecovered: Long = 0,
/** Current mic audio level (RMS, 0-32767). */
val audioLevel: Int = 0,
/** Our current outgoing codec (e.g. "Opus24k"). */
val currentCodec: String = "",
/** Last seen incoming codec from peers. */
val peerCodec: String = "",
/** Whether auto quality mode is active. */
val autoMode: Boolean = false,
/** Number of participants in the room. */
val roomParticipantCount: Int = 0,
/** Participants in the room (fingerprint + optional alias). */
val roomParticipants: List<RoomMember> = emptyList(),
/** SAS verification code (4-digit, null if not in a call). */
val sasCode: Int? = null,
/** Incoming call ID (or "relay|room" for CallSetup). */
val incomingCallId: String? = null,
/** Incoming caller's fingerprint. */
val incomingCallerFp: String? = null,
/** Incoming caller's alias. */
val incomingCallerAlias: String? = null,
) {
/** Human-readable quality label. */
val qualityLabel: String
@@ -54,7 +68,8 @@ data class CallStats(
val o = arr.getJSONObject(i)
RoomMember(
fingerprint = o.optString("fingerprint", ""),
alias = if (o.isNull("alias")) null else o.optString("alias", null)
alias = if (o.isNull("alias")) null else o.optString("alias", null),
relayLabel = if (o.isNull("relay_label")) null else o.optString("relay_label", null)
)
}
}
@@ -76,8 +91,15 @@ data class CallStats(
underruns = obj.optLong("underruns", 0),
fecRecovered = obj.optLong("fec_recovered", 0),
audioLevel = obj.optInt("audio_level", 0),
currentCodec = obj.optString("current_codec", ""),
peerCodec = obj.optString("peer_codec", ""),
autoMode = obj.optBoolean("auto_mode", false),
roomParticipantCount = obj.optInt("room_participant_count", 0),
roomParticipants = parseParticipants(obj.optJSONArray("room_participants"))
roomParticipants = parseParticipants(obj.optJSONArray("room_participants")),
sasCode = if (obj.has("sas_code")) obj.optInt("sas_code") else null,
incomingCallId = if (obj.isNull("incoming_call_id")) null else obj.optString("incoming_call_id", null),
incomingCallerFp = if (obj.isNull("incoming_caller_fp")) null else obj.optString("incoming_caller_fp", null),
incomingCallerAlias = if (obj.isNull("incoming_caller_alias")) null else obj.optString("incoming_caller_alias", null),
)
} catch (e: Exception) {
CallStats()
@@ -88,7 +110,8 @@ data class CallStats(
data class RoomMember(
val fingerprint: String,
val alias: String? = null
val alias: String? = null,
val relayLabel: String? = null
) {
/** Short display name: alias if set, otherwise first 8 chars of fingerprint. */
val displayName: String

View File

@@ -0,0 +1,97 @@
package com.wzp.engine
import org.json.JSONObject
/**
* Persistent signal connection for direct 1:1 calls.
* Separate from WzpEngine — survives across calls.
*
* Lifecycle: connect() → [placeCall/answerCall] → destroy()
*/
class SignalManager {
private var handle: Long = 0L
val isConnected: Boolean get() = handle != 0L
/**
* Connect to relay and register for direct calls.
* MUST be called from a thread with sufficient stack (8MB).
* Blocks briefly during QUIC connect + register, then returns.
*/
fun connect(relay: String, seedHex: String): Boolean {
if (handle != 0L) return true // already connected
handle = nativeSignalConnect(relay, seedHex)
return handle != 0L
}
/** Get current signal state as parsed object. Non-blocking. */
fun getState(): SignalState {
if (handle == 0L) return SignalState()
val json = nativeSignalGetState(handle) ?: return SignalState()
return try {
val obj = JSONObject(json)
SignalState(
status = obj.optString("status", "idle"),
fingerprint = obj.optString("fingerprint", ""),
incomingCallId = if (obj.isNull("incoming_call_id")) null else obj.optString("incoming_call_id"),
incomingCallerFp = if (obj.isNull("incoming_caller_fp")) null else obj.optString("incoming_caller_fp"),
incomingCallerAlias = if (obj.isNull("incoming_caller_alias")) null else obj.optString("incoming_caller_alias"),
callSetupRelay = if (obj.isNull("call_setup_relay")) null else obj.optString("call_setup_relay"),
callSetupRoom = if (obj.isNull("call_setup_room")) null else obj.optString("call_setup_room"),
callSetupId = if (obj.isNull("call_setup_id")) null else obj.optString("call_setup_id"),
)
} catch (e: Exception) {
SignalState()
}
}
/** Place a direct call to a target fingerprint. */
fun placeCall(targetFp: String): Int {
if (handle == 0L) return -1
return nativeSignalPlaceCall(handle, targetFp)
}
/** Answer an incoming call. mode: 0=Reject, 1=AcceptTrusted, 2=AcceptGeneric */
fun answerCall(callId: String, mode: Int = 2): Int {
if (handle == 0L) return -1
return nativeSignalAnswerCall(handle, callId, mode)
}
/** Send hangup signal. */
fun hangup() {
if (handle != 0L) nativeSignalHangup(handle)
}
/** Destroy the signal manager. */
fun destroy() {
if (handle != 0L) {
nativeSignalDestroy(handle)
handle = 0L
}
}
// JNI native methods
private external fun nativeSignalConnect(relay: String, seed: String): Long
private external fun nativeSignalGetState(handle: Long): String?
private external fun nativeSignalPlaceCall(handle: Long, targetFp: String): Int
private external fun nativeSignalAnswerCall(handle: Long, callId: String, mode: Int): Int
private external fun nativeSignalHangup(handle: Long)
private external fun nativeSignalDestroy(handle: Long)
companion object {
init { System.loadLibrary("wzp_android") }
}
}
/** Signal connection state. */
data class SignalState(
val status: String = "idle",
val fingerprint: String = "",
val incomingCallId: String? = null,
val incomingCallerFp: String? = null,
val incomingCallerAlias: String? = null,
val callSetupRelay: String? = null,
val callSetupRoom: String? = null,
val callSetupId: String? = null,
)

View File

@@ -38,9 +38,12 @@ class WzpEngine(private val callback: WzpCallback) {
* @param alias display name sent to relay for room participant list
* @return 0 on success, negative error code on failure
*/
fun startCall(relayAddr: String, room: String, seedHex: String = "", token: String = "", alias: String = ""): Int {
/**
* @param profile 0 = Opus GOOD, 1 = Opus DEGRADED, 2 = Codec2 CATASTROPHIC
*/
fun startCall(relayAddr: String, room: String, seedHex: String = "", token: String = "", alias: String = "", profile: Int = 0): Int {
check(nativeHandle != 0L) { "Engine not initialized" }
val result = nativeStartCall(nativeHandle, relayAddr, room, seedHex, token, alias)
val result = nativeStartCall(nativeHandle, relayAddr, room, seedHex, token, alias, profile)
if (result == 0) {
callback.onCallStateChanged(CallStateConstants.CONNECTING)
} else {
@@ -50,6 +53,7 @@ class WzpEngine(private val callback: WzpCallback) {
}
/** Stop the active call. Safe to call when no call is active. */
@Synchronized
fun stopCall() {
if (nativeHandle != 0L) {
nativeStopCall(nativeHandle)
@@ -73,6 +77,7 @@ class WzpEngine(private val callback: WzpCallback) {
*
* @return JSON-serialised [CallStats], or `"{}"` if the engine is not initialised.
*/
@Synchronized
fun getStats(): String {
if (nativeHandle == 0L) return "{}"
return try {
@@ -92,6 +97,7 @@ class WzpEngine(private val callback: WzpCallback) {
}
/** Destroy the native engine and free all resources. The instance must not be reused. */
@Synchronized
fun destroy() {
if (nativeHandle != 0L) {
nativeDestroy(nativeHandle)
@@ -141,7 +147,7 @@ class WzpEngine(private val callback: WzpCallback) {
private external fun nativeInit(): Long
private external fun nativeStartCall(
handle: Long, relay: String, room: String, seed: String, token: String, alias: String
handle: Long, relay: String, room: String, seed: String, token: String, alias: String, profile: Int
): Int
private external fun nativeStopCall(handle: Long)
private external fun nativeSetMute(handle: Long, muted: Boolean)
@@ -155,19 +161,65 @@ class WzpEngine(private val callback: WzpCallback) {
private external fun nativeDestroy(handle: Long)
companion object {
init {
System.loadLibrary("wzp_android")
}
/**
* Ping a relay server. Returns JSON `{"rtt_ms":N,"server_fingerprint":"hex"}`
* or null if unreachable. Does not require an engine instance.
*/
fun pingRelay(address: String): String? = nativePingRelay(address)
init { System.loadLibrary("wzp_android") }
/** Get the identity fingerprint for a seed hex. No engine needed. */
@JvmStatic
private external fun nativePingRelay(relay: String): String?
private external fun nativeGetFingerprint(seedHex: String): String?
/** Compute the full identity fingerprint (xxxx:xxxx:...) from a seed hex string. */
@JvmStatic
fun getFingerprint(seedHex: String): String = nativeGetFingerprint(seedHex) ?: ""
}
private external fun nativePingRelay(handle: Long, relay: String): String?
private external fun nativeStartSignaling(handle: Long, relay: String, seed: String, token: String, alias: String): Int
private external fun nativePlaceCall(handle: Long, targetFp: String): Int
private external fun nativeAnswerCall(handle: Long, callId: String, mode: Int): Int
/**
* Ping a relay server. Requires engine to be initialized.
* Returns JSON `{"rtt_ms":N,"server_fingerprint":"hex"}` or null.
*/
fun pingRelay(address: String): String? {
if (nativeHandle == 0L) return null
return nativePingRelay(nativeHandle, address)
}
/**
* Start persistent signaling connection for direct 1:1 calls.
* The engine registers on the relay and listens for incoming calls.
* Call state updates are available via [getStats].
*
* @return 0 on success, -1 on error
*/
fun startSignaling(relay: String, seed: String = "", token: String = "", alias: String = ""): Int {
check(nativeHandle != 0L) { "Engine not initialized" }
return nativeStartSignaling(nativeHandle, relay, seed, token, alias)
}
/**
* Place a direct call to a peer by fingerprint.
* Requires [startSignaling] to have been called first.
*
* @return 0 on success, -1 on error
*/
fun placeCall(targetFingerprint: String): Int {
check(nativeHandle != 0L) { "Engine not initialized" }
return nativePlaceCall(nativeHandle, targetFingerprint)
}
/**
* Answer an incoming direct call.
*
* @param callId The call ID from the incoming call (available in stats.incoming_call_id)
* @param mode 0=Reject, 1=AcceptTrusted (P2P in Phase 2), 2=AcceptGeneric (relay-mediated)
* @return 0 on success, -1 on error
*/
fun answerCall(callId: String, mode: Int = 2): Int {
check(nativeHandle != 0L) { "Engine not initialized" }
return nativeAnswerCall(nativeHandle, callId, mode)
}
}
/** Integer constants matching the Rust [CallState] enum ordinals. */

View File

@@ -0,0 +1,12 @@
package com.wzp.net
// Relay pinging is now done via WzpEngine.pingRelay() (instance method).
// This file kept for the data class only.
object RelayPinger {
data class PingResult(
val rttMs: Int,
val reachable: Boolean,
val serverFingerprint: String = "",
)
}

View File

@@ -31,7 +31,8 @@ data class ServerEntry(val address: String, val label: String)
data class PingResult(
val rttMs: Int,
val serverFingerprint: String,
val serverFingerprint: String = "",
val reachable: Boolean = rttMs > 0,
)
enum class LockStatus { UNKNOWN, OFFLINE, NEW, VERIFIED, CHANGED }
@@ -105,6 +106,18 @@ class CallViewModel : ViewModel(), WzpCallback {
private val _aecEnabled = MutableStateFlow(true)
val aecEnabled: StateFlow<Boolean> = _aecEnabled.asStateFlow()
private val _debugRecording = MutableStateFlow(false)
val debugRecording: StateFlow<Boolean> = _debugRecording.asStateFlow()
// Quality profile index (matches JNI bridge profile_from_int)
private val _codecChoice = MutableStateFlow(0)
val codecChoice: StateFlow<Int> = _codecChoice.asStateFlow()
/** Key-change warning dialog state. */
data class KeyWarningInfo(val address: String, val oldFp: String, val newFp: String)
private val _keyWarning = MutableStateFlow<KeyWarningInfo?>(null)
val keyWarning: StateFlow<KeyWarningInfo?> = _keyWarning.asStateFlow()
/** True when a call just ended and debug report can be sent. */
private val _debugReportAvailable = MutableStateFlow(false)
val debugReportAvailable: StateFlow<Boolean> = _debugReportAvailable.asStateFlow()
@@ -119,13 +132,143 @@ class CallViewModel : ViewModel(), WzpCallback {
private var statsJob: Job? = null
// ── Direct calling state ──
/** 0=room mode, 1=direct call mode */
private val _callMode = MutableStateFlow(0)
val callMode: StateFlow<Int> = _callMode.asStateFlow()
/** Target fingerprint for direct call */
private val _targetFingerprint = MutableStateFlow("")
val targetFingerprint: StateFlow<String> = _targetFingerprint.asStateFlow()
/** Signal state string: "idle", "registered", "ringing", "incoming", "setup" */
private val _signalState = MutableStateFlow("idle")
val signalState: StateFlow<String> = _signalState.asStateFlow()
/** Incoming call info */
private val _incomingCallId = MutableStateFlow<String?>(null)
val incomingCallId: StateFlow<String?> = _incomingCallId.asStateFlow()
private val _incomingCallerFp = MutableStateFlow<String?>(null)
val incomingCallerFp: StateFlow<String?> = _incomingCallerFp.asStateFlow()
private val _incomingCallerAlias = MutableStateFlow<String?>(null)
val incomingCallerAlias: StateFlow<String?> = _incomingCallerAlias.asStateFlow()
/** Separate signal manager (persistent, survives calls) */
private var signalManager: com.wzp.engine.SignalManager? = null
private var signalPollJob: Job? = null
fun setCallMode(mode: Int) { _callMode.value = mode }
fun setTargetFingerprint(fp: String) { _targetFingerprint.value = fp }
/** Register on relay for direct calls */
fun registerForCalls() {
val serverIdx = _selectedServer.value
val serverList = _servers.value
if (serverIdx >= serverList.size) return
val relay = serverList[serverIdx].address
var seed = _seedHex.value
// Generate seed if empty (fresh install or cleared storage)
if (seed.isEmpty()) {
val newSeed = ByteArray(32).also { java.security.SecureRandom().nextBytes(it) }
seed = newSeed.joinToString("") { "%02x".format(it) }
_seedHex.value = seed
settings?.saveSeedHex(seed)
Log.i(TAG, "generated new identity seed")
}
val resolvedRelay = resolveToIp(relay) ?: relay
// nativeSignalConnect has JNI overhead — must be on a thread with enough stack.
// Dispatchers.IO threads overflow. Use explicit Java Thread.
Thread(null, {
try {
val mgr = com.wzp.engine.SignalManager()
val ok = mgr.connect(resolvedRelay, seed)
viewModelScope.launch {
if (ok) {
signalManager = mgr
startSignalPolling()
} else {
_errorMessage.value = "Failed to register on relay"
}
}
} catch (e: Exception) {
viewModelScope.launch {
_errorMessage.value = "Register error: ${e.message}"
}
}
}, "wzp-signal-init", 8 * 1024 * 1024).start()
}
/** Poll signal manager state every 500ms */
private fun startSignalPolling() {
signalPollJob?.cancel()
signalPollJob = viewModelScope.launch {
while (isActive) {
val mgr = signalManager
if (mgr != null && mgr.isConnected) {
val state = mgr.getState()
_signalState.value = state.status
_incomingCallId.value = state.incomingCallId
_incomingCallerFp.value = state.incomingCallerFp
_incomingCallerAlias.value = state.incomingCallerAlias
// Auto-connect to media room when call is set up
if (state.status == "setup" && state.callSetupRelay != null && state.callSetupRoom != null) {
Log.i(TAG, "CallSetup: connecting to ${state.callSetupRelay} room ${state.callSetupRoom}")
startCallInternal(state.callSetupRelay, state.callSetupRoom)
}
}
delay(500L)
}
}
}
private fun stopSignalPolling() {
signalPollJob?.cancel()
signalPollJob = null
}
/** Place a direct call to the target fingerprint */
fun placeDirectCall() {
val target = _targetFingerprint.value.trim()
if (target.isEmpty()) {
_errorMessage.value = "Enter a fingerprint to call"
return
}
signalManager?.placeCall(target)
}
/** Answer an incoming direct call */
fun answerIncomingCall(mode: Int = 2) {
val callId = _incomingCallId.value ?: return
signalManager?.answerCall(callId, mode)
}
/** Reject an incoming direct call */
fun rejectIncomingCall() {
val callId = _incomingCallId.value ?: return
signalManager?.answerCall(callId, 0)
}
/** Hang up direct call — media ends, signal stays alive */
fun hangupDirectCall() {
signalManager?.hangup()
engine?.stopCall()
engine?.destroy()
engine = null
engineInitialized = false
}
companion object {
private const val TAG = "WzpCall"
val DEFAULT_SERVERS = listOf(
ServerEntry("172.16.81.175:4433", "LAN (172.16.81.175)"),
ServerEntry("193.180.213.68:4433", "Pangolin (IP)"),
)
const val DEFAULT_ROOM = "android"
const val DEFAULT_ROOM = "general"
}
fun setContext(context: Context) {
@@ -159,6 +302,8 @@ class CallViewModel : ViewModel(), WzpCallback {
_captureGainDb.value = s.loadCaptureGain()
_seedHex.value = s.getOrCreateSeedHex()
_aecEnabled.value = s.loadAecEnabled()
_debugRecording.value = s.loadDebugRecording()
_codecChoice.value = s.loadCodecChoice()
_recentRooms.value = s.loadRecentRooms()
}
@@ -203,35 +348,43 @@ class CallViewModel : ViewModel(), WzpCallback {
settings?.saveSelectedServer(_selectedServer.value)
}
/** Ping all servers in background, update results. */
/**
* Ping all servers via native QUIC. Requires engine to be initialized.
* Creates engine if needed, pings, keeps engine alive for subsequent Connect.
*/
fun pingAllServers() {
viewModelScope.launch {
// Ensure engine exists
if (engine == null || engine?.isInitialized != true) {
try {
engine = WzpEngine(this@CallViewModel).also { it.init() }
engineInitialized = true
} catch (e: Exception) {
Log.w(TAG, "engine init for ping failed: $e")
return@launch
}
}
val eng = engine ?: return@launch
val results = mutableMapOf<String, PingResult>()
val known = mutableMapOf<String, String>()
_servers.value.forEach { server ->
val pr = withContext(Dispatchers.IO) {
try {
val json = WzpEngine.pingRelay(server.address) ?: return@withContext null
val obj = JSONObject(json)
PingResult(
rttMs = obj.getInt("rtt_ms"),
serverFingerprint = obj.optString("server_fingerprint", ""),
)
} catch (e: Exception) {
Log.w(TAG, "ping ${server.address} failed: ${e.message}")
null
}
val json = withContext(Dispatchers.IO) {
eng.pingRelay(server.address)
}
if (pr != null) {
results[server.address] = pr
// TOFU: save fingerprint on first contact
if (pr.serverFingerprint.isNotEmpty()) {
val saved = settings?.loadServerFingerprint(server.address)
if (saved == null) {
settings?.saveServerFingerprint(server.address, pr.serverFingerprint)
if (json != null) {
try {
val obj = JSONObject(json)
val rtt = obj.getInt("rtt_ms")
val fp = obj.optString("server_fingerprint", "")
results[server.address] = PingResult(rttMs = rtt, serverFingerprint = fp)
// TOFU
if (fp.isNotEmpty()) {
val saved = settings?.loadServerFingerprint(server.address)
if (saved == null) settings?.saveServerFingerprint(server.address, fp)
known[server.address] = saved ?: fp
}
known[server.address] = saved ?: pr.serverFingerprint
}
} catch (_: Exception) {}
}
}
_pingResults.value = results
@@ -239,12 +392,23 @@ class CallViewModel : ViewModel(), WzpCallback {
}
}
/** Load saved TOFU fingerprints. */
fun loadSavedFingerprints() {
val known = mutableMapOf<String, String>()
_servers.value.forEach { server ->
settings?.loadServerFingerprint(server.address)?.let {
known[server.address] = it
}
}
_knownFingerprints.value = known
}
/** Get lock status for a server. */
fun lockStatus(address: String): LockStatus {
val pr = _pingResults.value[address] ?: return LockStatus.UNKNOWN
val known = _knownFingerprints.value[address]
if (!pr.reachable) return LockStatus.OFFLINE
val known = _knownFingerprints.value[address] ?: return LockStatus.NEW
if (pr.serverFingerprint.isEmpty()) return LockStatus.NEW
if (known == null) return LockStatus.NEW
return if (pr.serverFingerprint == known) LockStatus.VERIFIED else LockStatus.CHANGED
}
@@ -280,6 +444,16 @@ class CallViewModel : ViewModel(), WzpCallback {
settings?.saveAecEnabled(enabled)
}
fun setDebugRecording(enabled: Boolean) {
_debugRecording.value = enabled
settings?.saveDebugRecording(enabled)
}
fun setCodecChoice(choice: Int) {
_codecChoice.value = choice
settings?.saveCodecChoice(choice)
}
/**
* Resolve DNS hostname to IP address on the Kotlin/Android side,
* since Rust's DNS resolution may not work on Android.
@@ -346,7 +520,74 @@ class CallViewModel : ViewModel(), WzpCallback {
Log.i(TAG, "teardown: done")
}
/** Accept the new server key and proceed with the call. */
fun acceptNewFingerprint() {
val info = _keyWarning.value ?: return
_knownFingerprints.value = _knownFingerprints.value.toMutableMap().also {
it[info.address] = info.newFp
}
settings?.saveServerFingerprint(info.address, info.newFp)
_keyWarning.value = null
startCallInternal()
}
fun dismissKeyWarning() {
_keyWarning.value = null
}
fun startCall() {
val serverEntry = _servers.value[_selectedServer.value]
// Check for key change before connecting
val ls = lockStatus(serverEntry.address)
if (ls == LockStatus.CHANGED) {
val known = _knownFingerprints.value[serverEntry.address] ?: ""
val current = _pingResults.value[serverEntry.address]?.serverFingerprint ?: ""
_keyWarning.value = KeyWarningInfo(serverEntry.address, known, current)
return
}
startCallInternal()
}
/** Start a call to a specific relay + room (used by direct call setup). */
private fun startCallInternal(relay: String, room: String) {
Log.i(TAG, "startCallDirect: relay=$relay room=$room")
try {
// Don't teardown — keep the signal connection alive
engine = WzpEngine(this)
engine!!.init()
engineInitialized = true
_callState.value = 1
_errorMessage.value = null
try { appContext?.let { CallService.start(it) } } catch (e: Exception) {
Log.w(TAG, "service start err: $e")
}
startStatsPolling()
viewModelScope.launch(kotlinx.coroutines.Dispatchers.IO) {
try {
val seed = _seedHex.value
val name = _alias.value
val result = engine?.startCall(relay, room, seedHex = seed, alias = name, profile = _codecChoice.value) ?: -1
CallService.onStopFromNotification = { stopCall() }
if (result != 0) {
_callState.value = 0
_errorMessage.value = "Failed to connect to call room (code $result)"
appContext?.let { CallService.stop(it) }
}
} catch (e: Exception) {
Log.e(TAG, "startCallDirect error", e)
_callState.value = 0
_errorMessage.value = "Engine error: ${e.message}"
appContext?.let { CallService.stop(it) }
}
}
} catch (e: Exception) {
Log.e(TAG, "startCallDirect error", e)
_callState.value = 0
_errorMessage.value = "Engine error: ${e.message}"
}
}
private fun startCallInternal() {
val serverEntry = _servers.value[_selectedServer.value]
val room = _roomName.value
Log.i(TAG, "startCall: server=${serverEntry.address} room=$room")
@@ -377,7 +618,7 @@ class CallViewModel : ViewModel(), WzpCallback {
val seed = _seedHex.value
val name = _alias.value
Log.i(TAG, "startCall: resolved=$relay, alias=$name, calling engine.startCall")
val result = engine?.startCall(relay, room, seedHex = seed, alias = name) ?: -1
val result = engine?.startCall(relay, room, seedHex = seed, alias = name, profile = _codecChoice.value) ?: -1
Log.i(TAG, "startCall: engine returned $result")
// Only wire up notification callback after engine is running
CallService.onStopFromNotification = { stopCall() }
@@ -468,6 +709,7 @@ class CallViewModel : ViewModel(), WzpCallback {
it.playoutGainDb = _playoutGainDb.value
it.captureGainDb = _captureGainDb.value
it.aecEnabled = _aecEnabled.value
it.debugRecording = _debugRecording.value
it.start(e)
}
audioRouteManager?.register()
@@ -495,6 +737,7 @@ class CallViewModel : ViewModel(), WzpCallback {
val s = CallStats.fromJson(json)
lastCallDuration = s.durationSecs
_stats.value = s
// Only update callState from media engine stats (not signal)
if (s.state != 0) {
_callState.value = s.state
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,6 @@
package com.wzp.ui.settings
import androidx.compose.foundation.clickable
import android.content.ClipData
import android.content.ClipboardManager
import android.content.Context
@@ -22,6 +23,7 @@ import androidx.compose.material3.AlertDialog
import androidx.compose.material3.Button
import androidx.compose.material3.ButtonDefaults
import androidx.compose.material3.Divider
import androidx.compose.material3.RadioButton
import androidx.compose.material3.FilledTonalButton
import androidx.compose.material3.FilledTonalIconButton
import androidx.compose.material3.IconButtonDefaults
@@ -241,6 +243,51 @@ fun SettingsScreen(
)
}
Spacer(modifier = Modifier.height(12.dp))
// Quality selection — slider from best (studio 64k) to worst (codec2 1.2k) + auto
val qualityLabels = listOf(
"Studio 64k", "Studio 48k", "Studio 32k", "Auto",
"Opus 24k", "Opus 6k", "Codec2 3.2k", "Codec2 1.2k"
)
// Map slider position to JNI profile int:
// 0=Studio64k(6), 1=Studio48k(5), 2=Studio32k(4), 3=Auto(7),
// 4=Opus24k(0), 5=Opus6k(1), 6=Codec2_3.2k(3), 7=Codec2_1.2k(2)
val sliderToProfile = intArrayOf(6, 5, 4, 7, 0, 1, 3, 2)
val profileToSlider = mapOf(6 to 0, 5 to 1, 4 to 2, 7 to 3, 0 to 4, 1 to 5, 3 to 6, 2 to 7)
val qualityColors = listOf(
Color(0xFF22C55E), Color(0xFF4ADE80), Color(0xFF86EFAC), Color(0xFFA3E635),
Color(0xFFA3E635), Color(0xFFFACC15), Color(0xFFE97320), Color(0xFF991B1B)
)
val currentCodec by viewModel.codecChoice.collectAsState()
val sliderPos = profileToSlider[currentCodec] ?: 3
Text("Quality", style = MaterialTheme.typography.bodyMedium)
Text(
text = "Decode always accepts all codecs",
style = MaterialTheme.typography.bodySmall,
color = MaterialTheme.colorScheme.onSurfaceVariant
)
Spacer(modifier = Modifier.height(4.dp))
Text(
text = qualityLabels[sliderPos],
style = MaterialTheme.typography.titleMedium.copy(fontWeight = FontWeight.Bold),
color = qualityColors[sliderPos]
)
Slider(
value = sliderPos.toFloat(),
onValueChange = { viewModel.setCodecChoice(sliderToProfile[it.toInt()]) },
valueRange = 0f..7f,
steps = 6,
modifier = Modifier.fillMaxWidth()
)
Row(
modifier = Modifier.fillMaxWidth(),
horizontalArrangement = Arrangement.SpaceBetween
) {
Text("Best", style = MaterialTheme.typography.labelSmall, color = Color(0xFF22C55E))
Text("Lowest", style = MaterialTheme.typography.labelSmall, color = Color(0xFF991B1B))
}
Spacer(modifier = Modifier.height(24.dp))
Divider()
Spacer(modifier = Modifier.height(16.dp))

View File

@@ -12,4 +12,13 @@ pub enum EngineCommand {
ForceProfile(QualityProfile),
/// Stop the call and shut down the engine.
Stop,
/// Place a direct call to a fingerprint (requires signal connection).
PlaceCall { target_fingerprint: String },
/// Answer an incoming direct call.
AnswerCall {
call_id: String,
accept_mode: wzp_proto::CallAcceptMode,
},
/// Reject an incoming direct call.
RejectCall { call_id: String },
}

View File

@@ -9,32 +9,60 @@
//! and AudioTrack. PCM samples are transferred through lock-free ring buffers.
use std::net::SocketAddr;
use std::sync::atomic::{AtomicBool, AtomicU16, AtomicU32, Ordering};
use std::sync::atomic::{AtomicBool, AtomicU8, AtomicU16, AtomicU32, Ordering};
use std::sync::{Arc, Mutex};
use std::time::Instant;
use bytes::Bytes;
use tracing::{error, info, warn};
use tracing::{debug, error, info, warn};
use wzp_codec::AdaptiveDecoder;
use wzp_codec::agc::AutoGainControl;
use wzp_codec::opus_dec::OpusDecoder;
use wzp_codec::opus_enc::OpusEncoder;
use wzp_codec::dred_ffi::{DredDecoderHandle, DredState};
use wzp_crypto::{KeyExchange, WarzoneKeyExchange};
use wzp_fec::{RaptorQFecDecoder, RaptorQFecEncoder};
use wzp_proto::{
AudioDecoder, AudioEncoder, CodecId, FecDecoder, FecEncoder,
MediaHeader, MediaPacket, MediaTransport, QualityProfile, SignalMessage,
AdaptiveQualityController, AudioDecoder, AudioEncoder, CodecId, FecDecoder, FecEncoder,
MediaHeader, MediaPacket, MediaTransport, QualityController, QualityProfile, SignalMessage,
};
use crate::audio_ring::AudioRing;
use crate::commands::EngineCommand;
use crate::stats::{CallState, CallStats};
/// Opus frame size at 48kHz mono, 20ms = 960 samples.
const FRAME_SAMPLES: usize = 960;
/// Max frame size at 48kHz mono (40ms = 1920 samples, for Codec2/Opus6k).
const MAX_FRAME_SAMPLES: usize = 1920;
/// Sentinel value: no profile change pending.
const PROFILE_NO_CHANGE: u8 = 0xFF;
/// All quality profiles in index order, for AtomicU8-based signaling.
const PROFILES: [QualityProfile; 6] = [
QualityProfile::STUDIO_64K, // 0
QualityProfile::STUDIO_48K, // 1
QualityProfile::STUDIO_32K, // 2
QualityProfile::GOOD, // 3
QualityProfile::DEGRADED, // 4
QualityProfile::CATASTROPHIC, // 5
];
fn profile_to_index(p: &QualityProfile) -> u8 {
PROFILES.iter().position(|pp| pp.codec == p.codec).map(|i| i as u8).unwrap_or(3)
}
fn index_to_profile(idx: u8) -> Option<QualityProfile> {
PROFILES.get(idx as usize).copied()
}
/// Compute frame samples at 48kHz for a given profile.
fn frame_samples_for(profile: &QualityProfile) -> usize {
(profile.frame_duration_ms as usize) * 48 // 48000 / 1000
}
/// Configuration to start a call.
pub struct CallStartConfig {
pub profile: QualityProfile,
/// When true, use the relay's chosen_profile from CallAnswer instead of local profile.
pub auto_profile: bool,
pub relay_addr: String,
pub room: String,
pub auth_token: Vec<u8>,
@@ -46,6 +74,7 @@ impl Default for CallStartConfig {
fn default() -> Self {
Self {
profile: QualityProfile::GOOD,
auto_profile: false,
relay_addr: String::new(),
room: String::new(),
auth_token: Vec::new(),
@@ -123,6 +152,7 @@ impl WzpEngine {
let room = config.room.clone();
let identity_seed = config.identity_seed;
let profile = config.profile;
let auto_profile = config.auto_profile;
let alias = config.alias.clone();
let state = self.state.clone();
@@ -131,7 +161,7 @@ impl WzpEngine {
let state_clone = state.clone();
runtime.block_on(async move {
if let Err(e) = run_call(relay_addr, &room, &identity_seed, profile, alias.as_deref(), state_clone).await
if let Err(e) = run_call(relay_addr, &room, &identity_seed, profile, auto_profile, alias.as_deref(), state_clone).await
{
error!("call failed: {e}");
}
@@ -169,6 +199,55 @@ impl WzpEngine {
info!("stop_call: done");
}
/// Ping a relay — same pattern as start_call (creates runtime on calling thread).
/// Returns JSON `{"rtt_ms":N,"server_fingerprint":"hex"}` or error.
pub fn ping_relay(&self, address: &str) -> Result<String, anyhow::Error> {
let addr: SocketAddr = address.parse()?;
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()?;
let result = rt.block_on(async {
let bind: SocketAddr = "0.0.0.0:0".parse().unwrap();
let endpoint = wzp_transport::create_endpoint(bind, None)?;
let client_cfg = wzp_transport::client_config();
let start = Instant::now();
let conn_result = tokio::time::timeout(
std::time::Duration::from_secs(3),
wzp_transport::connect(&endpoint, addr, "ping", client_cfg),
)
.await;
// Always close endpoint to prevent resource leaks
endpoint.close(0u32.into(), b"done");
let conn = conn_result.map_err(|_| anyhow::anyhow!("timeout"))??;
let rtt_ms = start.elapsed().as_millis() as u64;
let server_fp = conn
.peer_identity()
.and_then(|id| id.downcast::<Vec<rustls::pki_types::CertificateDer>>().ok())
.and_then(|certs| certs.first().map(|c| {
use std::hash::{Hash, Hasher};
let mut h = std::collections::hash_map::DefaultHasher::new();
c.as_ref().hash(&mut h);
format!("{:016x}", h.finish())
}))
.unwrap_or_default();
conn.close(0u32.into(), b"ping");
Ok::<_, anyhow::Error>(format!(r#"{{"rtt_ms":{},"server_fingerprint":"{}"}}"#, rtt_ms, server_fp))
});
// Shutdown runtime cleanly with timeout
rt.shutdown_timeout(std::time::Duration::from_millis(500));
result
}
/// Start persistent signaling connection for direct calls.
// Signal methods (start_signaling, place_call, answer_call) moved to signal_mgr.rs
pub fn set_mute(&self, muted: bool) {
self.state.muted.store(muted, Ordering::Relaxed);
}
@@ -227,10 +306,10 @@ async fn run_call(
room: &str,
identity_seed: &[u8; 32],
profile: QualityProfile,
auto_profile: bool,
alias: Option<&str>,
state: Arc<EngineState>,
) -> Result<(), anyhow::Error> {
let _ = rustls::crypto::ring::default_provider().install_default();
let bind_addr: SocketAddr = "0.0.0.0:0".parse().unwrap();
let endpoint = wzp_transport::create_endpoint(bind_addr, None)?;
@@ -261,6 +340,9 @@ async fn run_call(
ephemeral_pub,
signature,
supported_profiles: vec![
QualityProfile::STUDIO_64K,
QualityProfile::STUDIO_48K,
QualityProfile::STUDIO_32K,
QualityProfile::GOOD,
QualityProfile::DEGRADED,
QualityProfile::CATASTROPHIC,
@@ -275,8 +357,8 @@ async fn run_call(
.await?
.ok_or_else(|| anyhow::anyhow!("connection closed before CallAnswer"))?;
let relay_ephemeral_pub = match answer {
SignalMessage::CallAnswer { ephemeral_pub, .. } => ephemeral_pub,
let (relay_ephemeral_pub, chosen_profile) = match answer {
SignalMessage::CallAnswer { ephemeral_pub, chosen_profile, .. } => (ephemeral_pub, chosen_profile),
other => {
return Err(anyhow::anyhow!(
"expected CallAnswer, got {:?}",
@@ -285,19 +367,28 @@ async fn run_call(
}
};
// Auto mode: use the relay's chosen profile instead of the local preference
let profile = if auto_profile {
info!(chosen = ?chosen_profile.codec, "auto mode: using relay's chosen profile");
chosen_profile
} else {
profile
};
let _session = kx.derive_session(&relay_ephemeral_pub)?;
info!("handshake complete, call active");
info!(codec = ?profile.codec, "handshake complete, call active");
{
let mut stats = state.stats.lock().unwrap();
stats.state = CallState::Active;
}
// Initialize Opus codec
let mut encoder =
OpusEncoder::new(profile).map_err(|e| anyhow::anyhow!("opus encoder init: {e}"))?;
let mut decoder =
OpusDecoder::new(profile).map_err(|e| anyhow::anyhow!("opus decoder init: {e}"))?;
// Initialize codec (Opus or Codec2 based on profile).
// Phase 3c: decoder is a concrete AdaptiveDecoder (not Box<dyn
// AudioDecoder>) so the recv task can call reconstruct_from_dred on
// gaps detected via sequence tracking.
let mut encoder = wzp_codec::create_encoder(profile);
let mut decoder = AdaptiveDecoder::new(profile).expect("failed to create adaptive decoder");
// Initialize FEC encoder/decoder
let mut fec_enc = wzp_fec::create_encoder(&profile);
@@ -307,21 +398,37 @@ async fn run_call(
let mut capture_agc = AutoGainControl::new();
let mut playout_agc = AutoGainControl::new();
let mut frame_samples = frame_samples_for(&profile);
info!(
codec = ?profile.codec,
fec_ratio = profile.fec_ratio,
frames_per_block = profile.frames_per_block,
"codec + FEC + AGC initialized (48kHz mono, 20ms frames)"
frame_ms = profile.frame_duration_ms,
frame_samples,
"codec + FEC + AGC initialized"
);
{
let mut stats = state.stats.lock().unwrap();
stats.current_codec = format!("{:?}", profile.codec);
stats.auto_mode = auto_profile;
}
let seq = AtomicU16::new(0);
let ts = AtomicU32::new(0);
let transport_recv = transport.clone();
// Pre-allocate buffers
let mut capture_buf = vec![0i16; FRAME_SAMPLES];
// Adaptive quality: shared AtomicU8 between recv task (writer) and send task (reader).
// 0xFF = no change pending, 0-5 = index into PROFILES array.
let pending_profile = Arc::new(AtomicU8::new(PROFILE_NO_CHANGE));
let pending_profile_recv = pending_profile.clone();
// Pre-allocate buffers (sized for current profile)
let mut capture_buf = vec![0i16; frame_samples];
let mut encode_buf = vec![0u8; encoder.max_frame_bytes()];
let mut frame_in_block: u8 = 0;
let mut block_id: u8 = 0;
let mut current_profile = profile;
// Send task: capture ring → Opus encode → FEC → MediaPackets
//
@@ -347,14 +454,47 @@ async fn run_call(
break;
}
// Check for adaptive profile switch from recv task
if auto_profile {
let p = pending_profile.swap(PROFILE_NO_CHANGE, Ordering::Acquire);
if p != PROFILE_NO_CHANGE {
if let Some(new_profile) = index_to_profile(p) {
info!(
from = ?current_profile.codec,
to = ?new_profile.codec,
"auto: switching encoder profile"
);
if let Err(e) = encoder.set_profile(new_profile) {
warn!("encoder set_profile failed: {e}");
} else {
fec_enc = wzp_fec::create_encoder(&new_profile);
current_profile = new_profile;
let new_frame_samples = frame_samples_for(&new_profile);
if new_frame_samples != frame_samples {
frame_samples = new_frame_samples;
capture_buf.resize(frame_samples, 0);
}
encode_buf.resize(encoder.max_frame_bytes(), 0);
// Reset FEC block state for clean switch
frame_in_block = 0;
block_id = block_id.wrapping_add(1);
// Update stats with new codec
if let Ok(mut stats) = state.stats.lock() {
stats.current_codec = format!("{:?}", new_profile.codec);
}
}
}
}
}
let avail = state.capture_ring.available();
if avail < FRAME_SAMPLES {
if avail < frame_samples {
tokio::time::sleep(std::time::Duration::from_millis(5)).await;
continue;
}
let read = state.capture_ring.read(&mut capture_buf);
if read < FRAME_SAMPLES {
if read < frame_samples {
continue;
}
@@ -381,21 +521,34 @@ async fn run_call(
t_opus_us += t0.elapsed().as_micros() as u64;
let encoded = &encode_buf[..encoded_len];
// Phase 2: Opus tiers bypass RaptorQ (DRED handles loss recovery
// at the codec layer). Codec2 tiers keep RaptorQ unchanged.
let is_opus = current_profile.codec.is_opus();
let (hdr_fec_block, hdr_fec_symbol, hdr_fec_ratio) = if is_opus {
(0u8, 0u8, 0u8)
} else {
(
block_id,
frame_in_block,
MediaHeader::encode_fec_ratio(current_profile.fec_ratio),
)
};
// Build source packet
let s = seq.fetch_add(1, Ordering::Relaxed);
let t = ts.fetch_add(FRAME_SAMPLES as u32, Ordering::Relaxed);
let t = ts.fetch_add(frame_samples as u32, Ordering::Relaxed);
let source_pkt = MediaPacket {
header: MediaHeader {
version: 0,
is_repair: false,
codec_id: profile.codec,
codec_id: current_profile.codec,
has_quality_report: false,
fec_ratio_encoded: MediaHeader::encode_fec_ratio(profile.fec_ratio),
fec_ratio_encoded: hdr_fec_ratio,
seq: s,
timestamp: t,
fec_block: block_id,
fec_symbol: frame_in_block,
fec_block: hdr_fec_block,
fec_symbol: hdr_fec_symbol,
reserved: 0,
csrc_count: 0,
},
@@ -425,63 +578,66 @@ async fn run_call(
t_send_us += t0.elapsed().as_micros() as u64;
frames_sent += 1;
// Feed encoded frame to FEC encoder
// Codec2-only: feed RaptorQ and emit repair packets when the
// block is full. Opus tiers skip this entire block — DRED
// (enabled in Phase 1) provides codec-layer loss recovery.
let t0 = Instant::now();
if let Err(e) = fec_enc.add_source_symbol(encoded) {
warn!("fec add_source error: {e}");
}
frame_in_block += 1;
if !is_opus {
if let Err(e) = fec_enc.add_source_symbol(encoded) {
warn!("fec add_source error: {e}");
}
frame_in_block += 1;
// When block is full, generate repair packets
if frame_in_block >= profile.frames_per_block {
match fec_enc.generate_repair(profile.fec_ratio) {
Ok(repairs) => {
let repair_count = repairs.len();
for (sym_idx, repair_data) in repairs {
let rs = seq.fetch_add(1, Ordering::Relaxed);
let repair_pkt = MediaPacket {
header: MediaHeader {
version: 0,
is_repair: true,
codec_id: profile.codec,
has_quality_report: false,
fec_ratio_encoded: MediaHeader::encode_fec_ratio(
profile.fec_ratio,
),
seq: rs,
timestamp: t,
fec_block: block_id,
fec_symbol: sym_idx,
reserved: 0,
csrc_count: 0,
},
payload: Bytes::from(repair_data),
quality_report: None,
};
// Drop repair packets on error — never break
if let Err(_e) = transport.send_media(&repair_pkt).await {
send_errors += 1;
frames_dropped += 1;
// Don't log every repair failure — source error log covers it
if frame_in_block >= current_profile.frames_per_block {
match fec_enc.generate_repair(current_profile.fec_ratio) {
Ok(repairs) => {
let repair_count = repairs.len();
for (sym_idx, repair_data) in repairs {
let rs = seq.fetch_add(1, Ordering::Relaxed);
let repair_pkt = MediaPacket {
header: MediaHeader {
version: 0,
is_repair: true,
codec_id: current_profile.codec,
has_quality_report: false,
fec_ratio_encoded: MediaHeader::encode_fec_ratio(
current_profile.fec_ratio,
),
seq: rs,
timestamp: t,
fec_block: block_id,
fec_symbol: sym_idx,
reserved: 0,
csrc_count: 0,
},
payload: Bytes::from(repair_data),
quality_report: None,
};
// Drop repair packets on error — never break
if let Err(_e) = transport.send_media(&repair_pkt).await {
send_errors += 1;
frames_dropped += 1;
// Don't log every repair failure — source error log covers it
}
}
if repair_count > 0 && (block_id % 50 == 0 || block_id == 0) {
info!(
block_id,
repair_count,
fec_ratio = current_profile.fec_ratio,
"FEC block complete"
);
}
}
if repair_count > 0 && (block_id % 50 == 0 || block_id == 0) {
info!(
block_id,
repair_count,
fec_ratio = profile.fec_ratio,
"FEC block complete"
);
Err(e) => {
warn!("fec generate_repair error: {e}");
}
}
Err(e) => {
warn!("fec generate_repair error: {e}");
}
}
let _ = fec_enc.finalize_block();
block_id = block_id.wrapping_add(1);
frame_in_block = 0;
let _ = fec_enc.finalize_block();
block_id = block_id.wrapping_add(1);
frame_in_block = 0;
}
}
t_fec_us += t0.elapsed().as_micros() as u64;
t_frames += 1;
@@ -511,8 +667,8 @@ async fn run_call(
info!(frames_sent, frames_dropped, send_errors, "send task ended");
};
// Pre-allocate decode buffer
let mut decode_buf = vec![0i16; FRAME_SAMPLES];
// Pre-allocate decode buffer (max size to handle any incoming codec)
let mut decode_buf = vec![0i16; MAX_FRAME_SAMPLES];
// Recv task: MediaPackets → FEC decode → Opus decode → playout ring
let recv_task = async {
@@ -522,7 +678,29 @@ async fn run_call(
let mut last_recv_instant = Instant::now();
let mut max_recv_gap_ms: u64 = 0;
let mut last_stats_log = Instant::now();
info!("recv task started (Opus + RaptorQ FEC)");
let mut quality_ctrl = AdaptiveQualityController::new();
let mut last_peer_codec: Option<CodecId> = None;
// Phase 3c: DRED reconstruction state. Unlike the desktop
// CallDecoder (which sits behind a jitter buffer that emits
// Missing signals), engine.rs reads packets directly from the
// transport and decodes straight into the playout ring. Gap
// detection is therefore done via sequence-number tracking:
// when a packet arrives with seq > expected_seq, the frames in
// between are missing and we attempt to reconstruct them via
// DRED before decoding the newly-arrived packet.
let mut dred_decoder =
DredDecoderHandle::new().expect("opus_dred_decoder_create failed");
let mut dred_parse_scratch =
DredState::new().expect("opus_dred_alloc failed (scratch)");
let mut last_good_dred =
DredState::new().expect("opus_dred_alloc failed (good state)");
let mut last_good_dred_seq: Option<u16> = None;
let mut expected_seq: Option<u16> = None;
let mut dred_reconstructions: u64 = 0;
let mut classical_plc_invocations: u64 = 0;
info!("recv task started (Opus + DRED + Codec2/RaptorQ)");
loop {
if !state.running.load(Ordering::Relaxed) {
break;
@@ -544,20 +722,181 @@ async fn run_call(
);
}
// Adaptive quality: ingest quality reports from relay
if auto_profile {
if let Some(ref qr) = pkt.quality_report {
if let Some(new_profile) = quality_ctrl.observe(qr) {
let idx = profile_to_index(&new_profile);
info!(
loss = qr.loss_percent(),
rtt = qr.rtt_ms(),
tier = ?quality_ctrl.tier(),
to = ?new_profile.codec,
"auto: quality adapter recommends switch"
);
pending_profile_recv.store(idx, Ordering::Release);
}
}
}
let is_repair = pkt.header.is_repair;
let pkt_block = pkt.header.fec_block;
let pkt_symbol = pkt.header.fec_symbol;
let pkt_is_opus = pkt.header.codec_id.is_opus();
// Feed every packet (source + repair) to FEC decoder
let _ = fec_dec.add_symbol(
pkt_block,
pkt_symbol,
is_repair,
&pkt.payload,
);
// Phase 2: Opus packets bypass RaptorQ entirely — DRED
// (enabled Phase 1) handles codec-layer loss recovery,
// and feeding these symbols into the RaptorQ decoder
// would accumulate block_id=0 duplicates that never
// decode. Codec2 packets still feed RaptorQ.
if !pkt_is_opus {
let _ = fec_dec.add_symbol(
pkt_block,
pkt_symbol,
is_repair,
&pkt.payload,
);
}
// Source packets: decode directly
if !is_repair {
if !is_repair && pkt.header.codec_id != CodecId::ComfortNoise {
// Switch decoder to match incoming codec if different
if pkt.header.codec_id != decoder.codec_id() {
let switch_profile = match pkt.header.codec_id {
CodecId::Opus24k => QualityProfile::GOOD,
CodecId::Opus6k => QualityProfile::DEGRADED,
CodecId::Opus32k => QualityProfile::STUDIO_32K,
CodecId::Opus48k => QualityProfile::STUDIO_48K,
CodecId::Opus64k => QualityProfile::STUDIO_64K,
CodecId::Codec2_1200 => QualityProfile::CATASTROPHIC,
CodecId::Codec2_3200 => QualityProfile {
codec: CodecId::Codec2_3200,
fec_ratio: 0.5,
frame_duration_ms: 20,
frames_per_block: 5,
},
other => QualityProfile { codec: other, ..QualityProfile::GOOD },
};
info!(from = ?decoder.codec_id(), to = ?pkt.header.codec_id, "recv: switching decoder");
let _ = decoder.set_profile(switch_profile);
// Profile switch invalidates the cached DRED
// state because samples_available is measured
// in the old profile's sample rate. Reset the
// tracking so we don't try to reconstruct with
// stale offsets.
last_good_dred_seq = None;
expected_seq = None;
}
// Track peer codec for UI display
if last_peer_codec != Some(pkt.header.codec_id) {
last_peer_codec = Some(pkt.header.codec_id);
if let Ok(mut stats) = state.stats.lock() {
stats.peer_codec = format!("{:?}", pkt.header.codec_id);
}
}
// Phase 3c: Opus path — parse DRED state out of
// the current packet FIRST so last_good_dred
// reflects the freshest available reconstruction
// source, then attempt gap recovery against it
// BEFORE decoding this packet's audio. Ordering
// matters because the playout ring is FIFO — gap
// samples must be written before this packet's
// samples, which come next.
if pkt_is_opus {
// Update DRED state from the current packet.
match dred_decoder.parse_into(&mut dred_parse_scratch, &pkt.payload) {
Ok(available) if available > 0 => {
std::mem::swap(
&mut dred_parse_scratch,
&mut last_good_dred,
);
last_good_dred_seq = Some(pkt.header.seq);
}
Ok(_) => {
// Packet carried no DRED — keep cached state.
}
Err(e) => {
debug!("DRED parse error (ignored): {e}");
}
}
// Detect and fill gap from last-expected to this packet.
const MAX_GAP_FRAMES: u16 = 16;
if let Some(expected) = expected_seq {
let gap = pkt.header.seq.wrapping_sub(expected);
if gap > 0 && gap <= MAX_GAP_FRAMES {
let current_profile_frame_samples =
(48_000 * profile.frame_duration_ms as i32) / 1000;
let available = last_good_dred.samples_available();
let pcm_slice_len =
current_profile_frame_samples as usize;
for gap_idx in 0..gap {
let missing_seq = expected.wrapping_add(gap_idx);
// Offset from the DRED anchor (last_good_dred_seq)
// back to the missing seq, in samples. Skip if
// the anchor is not ahead of missing (defensive).
let offset_samples = match last_good_dred_seq {
Some(anchor) => {
let delta = anchor.wrapping_sub(missing_seq);
if delta == 0 || delta > MAX_GAP_FRAMES {
-1 // skip DRED, use PLC
} else {
delta as i32 * current_profile_frame_samples
}
}
None => -1,
};
let reconstructed = if offset_samples > 0
&& offset_samples <= available
{
decoder
.reconstruct_from_dred(
&last_good_dred,
offset_samples,
&mut decode_buf[..pcm_slice_len],
)
.ok()
} else {
None
};
match reconstructed {
Some(samples) => {
playout_agc.process_frame(
&mut decode_buf[..samples],
);
state
.playout_ring
.write(&decode_buf[..samples]);
dred_reconstructions += 1;
frames_decoded += 1;
}
None => {
// Fall through to classical PLC.
if let Ok(samples) =
decoder.decode_lost(&mut decode_buf)
{
playout_agc
.process_frame(&mut decode_buf[..samples]);
state
.playout_ring
.write(&decode_buf[..samples]);
classical_plc_invocations += 1;
frames_decoded += 1;
}
}
}
}
}
}
// Advance the expected-seq tracker for the next arrival.
expected_seq = Some(pkt.header.seq.wrapping_add(1));
}
match decoder.decode(&pkt.payload, &mut decode_buf) {
Ok(samples) => {
playout_agc.process_frame(&mut decode_buf[..samples]);
@@ -569,32 +908,44 @@ async fn run_call(
if let Ok(samples) = decoder.decode_lost(&mut decode_buf) {
playout_agc.process_frame(&mut decode_buf[..samples]);
state.playout_ring.write(&decode_buf[..samples]);
// This is a decode-error fallback (not a
// detected gap), so count it as PLC.
classical_plc_invocations += 1;
}
}
}
}
// Try FEC recovery
if let Ok(Some(recovered_frames)) = fec_dec.try_decode(pkt_block) {
fec_recovered += recovered_frames.len() as u64;
if fec_recovered % 50 == 1 {
info!(
fec_recovered,
block = pkt_block,
frames = recovered_frames.len(),
"FEC block recovered"
);
// Codec2-only: try FEC recovery and expire old blocks.
// Opus packets skip both — the Phase 2 Opus path has no
// RaptorQ state to query or clean up. The `fec_recovered`
// counter is now effectively Codec2-only, which is
// correct because DRED reconstructions will be counted
// separately once Phase 3 lands (new telemetry field).
if !pkt_is_opus {
if let Ok(Some(recovered_frames)) = fec_dec.try_decode(pkt_block) {
fec_recovered += recovered_frames.len() as u64;
if fec_recovered % 50 == 1 {
info!(
fec_recovered,
block = pkt_block,
frames = recovered_frames.len(),
"FEC block recovered"
);
}
}
}
// Expire old blocks to prevent memory growth
if pkt_block > 3 {
fec_dec.expire_before(pkt_block.wrapping_sub(3));
// Expire old blocks to prevent memory growth
if pkt_block > 3 {
fec_dec.expire_before(pkt_block.wrapping_sub(3));
}
}
let mut stats = state.stats.lock().unwrap();
stats.frames_decoded = frames_decoded;
stats.fec_recovered = fec_recovered;
stats.dred_reconstructions = dred_reconstructions;
stats.classical_plc_invocations = classical_plc_invocations;
drop(stats);
// Periodic stats every 5 seconds
@@ -602,6 +953,8 @@ async fn run_call(
info!(
frames_decoded,
fec_recovered,
dred_reconstructions,
classical_plc_invocations,
recv_errors,
max_recv_gap_ms,
playout_avail = state.playout_ring.available(),
@@ -672,6 +1025,7 @@ async fn run_call(
.map(|p| crate::stats::RoomMember {
fingerprint: p.fingerprint.clone(),
alias: p.alias.clone(),
relay_label: p.relay_label.clone(),
})
.collect();
let mut stats = state_signal.stats.lock().unwrap();

View File

@@ -21,11 +21,24 @@ unsafe fn handle_ref(handle: jlong) -> &'static mut EngineHandle {
unsafe { &mut *(handle as *mut EngineHandle) }
}
/// 7 = auto (use relay's chosen profile)
const PROFILE_AUTO: jint = 7;
fn profile_from_int(value: jint) -> QualityProfile {
match value {
1 => QualityProfile::DEGRADED,
2 => QualityProfile::CATASTROPHIC,
_ => QualityProfile::GOOD,
0 => QualityProfile::GOOD, // Opus 24k
1 => QualityProfile::DEGRADED, // Opus 6k
2 => QualityProfile::CATASTROPHIC, // Codec2 1.2k
3 => QualityProfile { // Codec2 3.2k
codec: wzp_proto::CodecId::Codec2_3200,
fec_ratio: 0.5,
frame_duration_ms: 20,
frames_per_block: 5,
},
4 => QualityProfile::STUDIO_32K, // Opus 32k
5 => QualityProfile::STUDIO_48K, // Opus 48k
6 => QualityProfile::STUDIO_64K, // Opus 64k
_ => QualityProfile::GOOD, // auto falls back to GOOD
}
}
@@ -64,6 +77,9 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeInit(
) -> jlong {
let result = panic::catch_unwind(|| {
init_logging();
// Install rustls crypto provider ONCE on the main thread.
// Must not be called per-thread — conflicts with Android's system libcrypto.so TLS keys.
let _ = rustls::crypto::ring::default_provider().install_default();
let handle = Box::new(EngineHandle {
engine: WzpEngine::new(),
});
@@ -85,6 +101,7 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeStartCall(
seed_hex_j: JString,
token_j: JString,
alias_j: JString,
profile_j: jint,
) -> jint {
let result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
let relay_addr: String = env.get_string(&relay_addr_j).map(|s| s.into()).unwrap_or_default();
@@ -110,7 +127,8 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeStartCall(
}
let config = CallStartConfig {
profile: QualityProfile::GOOD,
profile: profile_from_int(profile_j),
auto_profile: profile_j == PROFILE_AUTO,
relay_addr,
room,
auth_token: if token.is_empty() { Vec::new() } else { token.into_bytes() },
@@ -318,71 +336,22 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeDestroy(
}));
}
/// Ping a relay server — returns JSON `{"rtt_ms":N,"server_fingerprint":"hex"}` or null on failure.
/// Does NOT require an engine handle — creates a temporary QUIC connection.
/// Ping a relay server — instance method, requires engine handle.
/// Returns JSON `{"rtt_ms":N,"server_fingerprint":"hex"}` or null on failure.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativePingRelay<'a>(
mut env: JNIEnv<'a>,
_class: JClass,
handle: jlong,
relay_j: JString,
) -> jstring {
let result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
let h = unsafe { handle_ref(handle) };
let relay: String = env.get_string(&relay_j).map(|s| s.into()).unwrap_or_default();
let addr: std::net::SocketAddr = match relay.parse() {
Ok(a) => a,
Err(_) => return None,
};
let _ = rustls::crypto::ring::default_provider().install_default();
let rt = match tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
{
Ok(rt) => rt,
Err(_) => return None,
};
rt.block_on(async {
let bind: std::net::SocketAddr = "0.0.0.0:0".parse().unwrap();
let endpoint = match wzp_transport::create_endpoint(bind, None) {
Ok(e) => e,
Err(_) => return None,
};
let client_cfg = wzp_transport::client_config();
let start = std::time::Instant::now();
match tokio::time::timeout(
std::time::Duration::from_secs(3),
wzp_transport::connect(&endpoint, addr, "ping", client_cfg),
)
.await
{
Ok(Ok(conn)) => {
let rtt_ms = start.elapsed().as_millis() as u64;
let server_fp = conn
.peer_identity()
.and_then(|id| {
id.downcast::<Vec<rustls::pki_types::CertificateDer>>().ok()
})
.and_then(|certs| {
certs.first().map(|c| {
use std::hash::{Hash, Hasher};
let mut h = std::collections::hash_map::DefaultHasher::new();
c.as_ref().hash(&mut h);
format!("{:016x}", h.finish())
})
})
.unwrap_or_default();
conn.close(0u32.into(), b"ping");
Some(format!(
r#"{{"rtt_ms":{},"server_fingerprint":"{}"}}"#,
rtt_ms, server_fp
))
}
_ => None,
}
})
match h.engine.ping_relay(&relay) {
Ok(json) => Some(json),
Err(_) => None,
}
}));
let json = match result {
@@ -393,3 +362,150 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativePingRelay<'a>(
.map(|s| s.into_raw())
.unwrap_or(JObject::null().into_raw())
}
/// Get the identity fingerprint for a seed hex string.
/// Returns the full fingerprint (xxxx:xxxx:...) or empty string on error.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeGetFingerprint<'a>(
mut env: JNIEnv<'a>,
_class: JClass,
seed_hex_j: JString,
) -> jstring {
let seed_hex: String = env.get_string(&seed_hex_j).map(|s| s.into()).unwrap_or_default();
let fp = if seed_hex.is_empty() {
String::new()
} else {
match wzp_crypto::Seed::from_hex(&seed_hex) {
Ok(seed) => {
let id = seed.derive_identity();
id.public_identity().fingerprint.to_string()
}
Err(_) => String::new(),
}
};
env.new_string(&fp)
.map(|s| s.into_raw())
.unwrap_or(JObject::null().into_raw())
}
// ── Direct calling JNI functions ──
// ── SignalManager JNI functions ──
/// Opaque handle for SignalManager (separate from EngineHandle).
struct SignalHandle {
mgr: crate::signal_mgr::SignalManager,
}
unsafe fn signal_ref(handle: jlong) -> &'static SignalHandle {
unsafe { &*(handle as *const SignalHandle) }
}
/// Connect to relay for signaling. Returns handle (jlong) or 0 on error.
/// Blocks up to 10s waiting for the internal signal thread to connect.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_SignalManager_nativeSignalConnect<'a>(
mut env: JNIEnv<'a>,
_class: JClass,
relay_j: JString,
seed_j: JString,
) -> jlong {
info!("nativeSignalConnect: entered");
let relay: String = env.get_string(&relay_j).map(|s| s.into()).unwrap_or_default();
let seed: String = env.get_string(&seed_j).map(|s| s.into()).unwrap_or_default();
info!(relay = %relay, seed_len = seed.len(), "nativeSignalConnect: parsed strings");
// start() spawns an internal thread (connect+register+recv, ONE runtime, never dropped).
// Blocks up to 10s waiting for the connect+register to complete.
match crate::signal_mgr::SignalManager::start(&relay, &seed) {
Ok(mgr) => {
let handle = Box::new(SignalHandle { mgr });
Box::into_raw(handle) as jlong
}
Err(e) => {
error!("signal connect failed: {e}");
0
}
}
}
/// Get signal state as JSON string.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_SignalManager_nativeSignalGetState<'a>(
mut env: JNIEnv<'a>,
_class: JClass,
handle: jlong,
) -> jstring {
if handle == 0 { return JObject::null().into_raw(); }
let h = signal_ref(handle);
let json = h.mgr.get_state_json();
env.new_string(&json)
.map(|s| s.into_raw())
.unwrap_or(JObject::null().into_raw())
}
/// Place a direct call.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_SignalManager_nativeSignalPlaceCall<'a>(
mut env: JNIEnv<'a>,
_class: JClass,
handle: jlong,
target_j: JString,
) -> jint {
if handle == 0 { return -1; }
let h = signal_ref(handle);
let target: String = env.get_string(&target_j).map(|s| s.into()).unwrap_or_default();
match h.mgr.place_call(&target) {
Ok(()) => 0,
Err(e) => { error!("place_call: {e}"); -1 }
}
}
/// Answer an incoming call.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_SignalManager_nativeSignalAnswerCall<'a>(
mut env: JNIEnv<'a>,
_class: JClass,
handle: jlong,
call_id_j: JString,
mode: jint,
) -> jint {
if handle == 0 { return -1; }
let h = signal_ref(handle);
let call_id: String = env.get_string(&call_id_j).map(|s| s.into()).unwrap_or_default();
let accept_mode = match mode {
0 => wzp_proto::CallAcceptMode::Reject,
1 => wzp_proto::CallAcceptMode::AcceptTrusted,
_ => wzp_proto::CallAcceptMode::AcceptGeneric,
};
match h.mgr.answer_call(&call_id, accept_mode) {
Ok(()) => 0,
Err(e) => { error!("answer_call: {e}"); -1 }
}
}
/// Send hangup signal.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_SignalManager_nativeSignalHangup(
_env: JNIEnv,
_class: JClass,
handle: jlong,
) {
if handle == 0 { return; }
let h = signal_ref(handle);
h.mgr.hangup();
}
/// Destroy the signal manager and free resources.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_SignalManager_nativeSignalDestroy(
_env: JNIEnv,
_class: JClass,
handle: jlong,
) {
if handle == 0 { return; }
let h = signal_ref(handle);
h.mgr.stop();
// Reclaim the Box
let _ = unsafe { Box::from_raw(handle as *mut SignalHandle) };
}

View File

@@ -14,5 +14,6 @@ pub mod audio_ring;
pub mod commands;
pub mod engine;
pub mod pipeline;
pub mod signal_mgr;
pub mod stats;
pub mod jni_bridge;

View File

@@ -0,0 +1,288 @@
//! Persistent signal connection manager for direct 1:1 calls.
//!
//! Separate from the media engine — survives across calls.
//! Connects to relay via `_signal` SNI, registers presence,
//! and handles call signaling (offer/answer/setup/hangup).
use std::net::SocketAddr;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::{Arc, Mutex};
use tracing::{error, info, warn};
use wzp_proto::{MediaTransport, SignalMessage};
/// Signal connection status.
#[derive(Clone, Debug, Default, serde::Serialize)]
pub struct SignalState {
pub status: String, // "idle", "registered", "ringing", "incoming", "setup"
pub fingerprint: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub incoming_call_id: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub incoming_caller_fp: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub incoming_caller_alias: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub call_setup_relay: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub call_setup_room: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub call_setup_id: Option<String>,
}
/// Manages a persistent `_signal` QUIC connection to a relay.
pub struct SignalManager {
transport: Arc<wzp_transport::QuinnTransport>,
state: Arc<Mutex<SignalState>>,
running: Arc<AtomicBool>,
}
impl SignalManager {
/// Create SignalManager and start connect+register+recv on a background thread.
/// Returns immediately. The internal thread runs forever.
/// CRITICAL: tokio runtime must never be dropped on Android (libcrypto TLS conflict).
pub fn start(relay_addr: &str, seed_hex: &str) -> Result<Self, anyhow::Error> {
let addr: SocketAddr = relay_addr.parse()?;
let seed = if seed_hex.is_empty() {
wzp_crypto::Seed::generate()
} else {
wzp_crypto::Seed::from_hex(seed_hex).map_err(|e| anyhow::anyhow!(e))?
};
let identity = seed.derive_identity();
let pub_id = identity.public_identity();
let identity_pub = *pub_id.signing.as_bytes();
let fp = pub_id.fingerprint.to_string();
let state = Arc::new(Mutex::new(SignalState {
status: "connecting".into(),
fingerprint: fp.clone(),
..Default::default()
}));
let running = Arc::new(AtomicBool::new(true));
// Channel to receive transport after connect succeeds
let (transport_tx, transport_rx) = std::sync::mpsc::channel();
let bg_state = Arc::clone(&state);
let bg_running = Arc::clone(&running);
let ret_state = Arc::clone(&state);
let ret_running = Arc::clone(&running);
// ONE thread, ONE runtime, NEVER dropped.
// Connect + register + recv loop all happen here.
std::thread::Builder::new()
.name("wzp-signal".into())
.stack_size(4 * 1024 * 1024)
.spawn(move || {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.expect("tokio runtime");
rt.block_on(async move {
info!(fingerprint = %fp, relay = %addr, "signal: connecting");
let bind: SocketAddr = "0.0.0.0:0".parse().unwrap();
let endpoint = match wzp_transport::create_endpoint(bind, None) {
Ok(e) => e,
Err(e) => {
error!("signal endpoint: {e}");
bg_state.lock().unwrap().status = "idle".into();
return;
}
};
let client_cfg = wzp_transport::client_config();
let conn = match wzp_transport::connect(&endpoint, addr, "_signal", client_cfg).await {
Ok(c) => c,
Err(e) => {
error!("signal connect: {e}");
bg_state.lock().unwrap().status = "idle".into();
return;
}
};
let transport = Arc::new(wzp_transport::QuinnTransport::new(conn));
// Register
if let Err(e) = transport.send_signal(&SignalMessage::RegisterPresence {
identity_pub, signature: vec![], alias: None,
}).await {
error!("signal register: {e}");
bg_state.lock().unwrap().status = "idle".into();
return;
}
match transport.recv_signal().await {
Ok(Some(SignalMessage::RegisterPresenceAck { success: true, .. })) => {
info!(fingerprint = %fp, "signal: registered");
bg_state.lock().unwrap().status = "registered".into();
// Send transport to caller
let _ = transport_tx.send(transport.clone());
}
other => {
error!("signal registration failed: {other:?}");
bg_state.lock().unwrap().status = "idle".into();
return;
}
}
// Recv loop — runs forever
loop {
if !running.load(Ordering::Relaxed) { break; }
match transport.recv_signal().await {
Ok(Some(SignalMessage::CallRinging { call_id })) => {
info!(call_id = %call_id, "signal: ringing");
let mut s = state.lock().unwrap();
s.status = "ringing".into();
}
Ok(Some(SignalMessage::DirectCallOffer { caller_fingerprint, caller_alias, call_id, .. })) => {
info!(from = %caller_fingerprint, call_id = %call_id, "signal: incoming call");
let mut s = state.lock().unwrap();
s.status = "incoming".into();
s.incoming_call_id = Some(call_id);
s.incoming_caller_fp = Some(caller_fingerprint);
s.incoming_caller_alias = caller_alias;
}
Ok(Some(SignalMessage::DirectCallAnswer { call_id, accept_mode, .. })) => {
info!(call_id = %call_id, mode = ?accept_mode, "signal: call answered");
}
Ok(Some(SignalMessage::CallSetup { call_id, room, relay_addr })) => {
info!(call_id = %call_id, room = %room, relay = %relay_addr, "signal: call setup");
let mut s = state.lock().unwrap();
s.status = "setup".into();
s.call_setup_relay = Some(relay_addr);
s.call_setup_room = Some(room);
s.call_setup_id = Some(call_id);
}
Ok(Some(SignalMessage::Hangup { reason })) => {
info!(reason = ?reason, "signal: hangup");
let mut s = state.lock().unwrap();
s.status = "registered".into();
s.incoming_call_id = None;
s.incoming_caller_fp = None;
s.incoming_caller_alias = None;
s.call_setup_relay = None;
s.call_setup_room = None;
s.call_setup_id = None;
}
Ok(Some(_)) => {}
Ok(None) => {
info!("signal: connection closed");
break;
}
Err(e) => {
error!("signal recv error: {e}");
break;
}
}
}
bg_state.lock().unwrap().status = "idle".into();
}); // block_on
// Runtime intentionally NOT dropped — lives until thread exits.
// This prevents ring/libcrypto TLS cleanup conflict on Android.
// The thread is parked here forever (block_on returned = connection lost).
std::thread::park();
})?; // thread spawn
// Wait for transport (up to 10s)
let transport = transport_rx.recv_timeout(std::time::Duration::from_secs(10))
.map_err(|_| anyhow::anyhow!("signal connect timeout — check relay address"))?;
Ok(Self { transport, state: ret_state, running: ret_running })
}
/// Get current state (non-blocking).
pub fn get_state(&self) -> SignalState {
self.state.lock().unwrap().clone()
}
/// Get state as JSON string.
pub fn get_state_json(&self) -> String {
serde_json::to_string(&self.get_state()).unwrap_or_else(|_| "{}".into())
}
/// Place a direct call.
pub fn place_call(&self, target_fp: &str) -> Result<(), anyhow::Error> {
let fp = self.state.lock().unwrap().fingerprint.clone();
let target = target_fp.to_string();
let call_id = format!("{:016x}", std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH).unwrap().as_nanos());
let transport = self.transport.clone();
// Send on a small thread (async send needs a runtime)
std::thread::Builder::new()
.name("wzp-call-send".into())
.spawn(move || {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all().build().expect("rt");
rt.block_on(async {
let _ = transport.send_signal(&SignalMessage::DirectCallOffer {
caller_fingerprint: fp,
caller_alias: None,
target_fingerprint: target,
call_id,
identity_pub: [0u8; 32],
ephemeral_pub: [0u8; 32],
signature: vec![],
supported_profiles: vec![wzp_proto::QualityProfile::GOOD],
}).await;
});
})?;
Ok(())
}
/// Answer an incoming call.
pub fn answer_call(&self, call_id: &str, mode: wzp_proto::CallAcceptMode) -> Result<(), anyhow::Error> {
let call_id = call_id.to_string();
let transport = self.transport.clone();
std::thread::Builder::new()
.name("wzp-answer-send".into())
.spawn(move || {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all().build().expect("rt");
rt.block_on(async {
let _ = transport.send_signal(&SignalMessage::DirectCallAnswer {
call_id,
accept_mode: mode,
identity_pub: None,
ephemeral_pub: None,
signature: None,
chosen_profile: Some(wzp_proto::QualityProfile::GOOD),
}).await;
});
})?;
Ok(())
}
/// Send hangup.
pub fn hangup(&self) {
let transport = self.transport.clone();
let state = self.state.clone();
std::thread::spawn(move || {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all().build().expect("rt");
rt.block_on(async {
let _ = transport.send_signal(&SignalMessage::Hangup {
reason: wzp_proto::HangupReason::Normal,
}).await;
});
let mut s = state.lock().unwrap();
s.status = "registered".into();
s.incoming_call_id = None;
s.incoming_caller_fp = None;
s.incoming_caller_alias = None;
s.call_setup_relay = None;
s.call_setup_room = None;
s.call_setup_id = None;
});
}
/// Stop the signal connection.
pub fn stop(&self) {
self.running.store(false, Ordering::Release);
self.transport.connection().close(0u32.into(), b"shutdown");
}
}

View File

@@ -11,6 +11,12 @@ pub enum CallState {
Active,
Reconnecting,
Closed,
/// Connected to relay signal channel, registered for direct calls.
Registered,
/// Outgoing call ringing on callee's side.
Ringing,
/// Incoming call received, waiting for user to accept/reject.
IncomingCall,
}
impl serde::Serialize for CallState {
@@ -21,6 +27,9 @@ impl serde::Serialize for CallState {
CallState::Active => 2,
CallState::Reconnecting => 3,
CallState::Closed => 4,
CallState::Registered => 5,
CallState::Ringing => 6,
CallState::IncomingCall => 7,
};
serializer.serialize_u8(n)
}
@@ -49,8 +58,16 @@ pub struct CallStats {
pub frames_decoded: u64,
/// Number of playout underruns (buffer empty when audio needed).
pub underruns: u64,
/// Frames recovered by FEC.
/// Frames recovered by RaptorQ FEC (Codec2 tiers only; Opus bypasses
/// RaptorQ per Phase 2).
pub fec_recovered: u64,
/// Phase 3c: Opus frames reconstructed via DRED side-channel data.
/// Only increments on the Opus tiers; always zero for Codec2.
pub dred_reconstructions: u64,
/// Phase 3c: Opus frames filled via classical Opus PLC because no DRED
/// state covered the gap, plus any decode-error fallbacks. Codec2 loss
/// also increments this counter via the Codec2 PLC path.
pub classical_plc_invocations: u64,
/// Playout ring overflow count (reader was lapped by writer).
pub playout_overflows: u64,
/// Playout ring underrun count (reader found empty buffer).
@@ -59,10 +76,28 @@ pub struct CallStats {
pub capture_overflows: u64,
/// Current mic audio level (RMS of i16 samples, 0-32767).
pub audio_level: u32,
/// Our current outgoing codec name (e.g. "Opus24k", "Codec2_1200").
pub current_codec: String,
/// Last seen incoming codec from other participants.
pub peer_codec: String,
/// Whether auto quality mode is active.
pub auto_mode: bool,
/// Number of participants in the room (from last RoomUpdate).
pub room_participant_count: u32,
/// Participant list (fingerprint + optional alias) serialized as JSON array.
pub room_participants: Vec<RoomMember>,
/// SAS code for verbal verification (None if not in a call).
#[serde(skip_serializing_if = "Option::is_none")]
pub sas_code: Option<u32>,
/// Incoming call info (present when state == IncomingCall).
#[serde(skip_serializing_if = "Option::is_none")]
pub incoming_call_id: Option<String>,
/// Fingerprint of the caller (present when state == IncomingCall).
#[serde(skip_serializing_if = "Option::is_none")]
pub incoming_caller_fp: Option<String>,
/// Alias of the caller (present when state == IncomingCall).
#[serde(skip_serializing_if = "Option::is_none")]
pub incoming_caller_alias: Option<String>,
}
/// A room member entry, serialized into the stats JSON.
@@ -70,4 +105,5 @@ pub struct CallStats {
pub struct RoomMember {
pub fingerprint: String,
pub alias: Option<String>,
pub relay_label: Option<String>,
}

View File

@@ -7,14 +7,15 @@ use std::time::{Duration, Instant};
use bytes::Bytes;
use tracing::{debug, info, warn};
use wzp_codec::{AutoGainControl, ComfortNoise, EchoCanceller, NoiseSupressor, SilenceDetector};
use wzp_codec::dred_ffi::{DredDecoderHandle, DredState};
use wzp_codec::{
AdaptiveDecoder, AutoGainControl, ComfortNoise, EchoCanceller, NoiseSupressor, SilenceDetector,
};
use wzp_fec::{RaptorQFecDecoder, RaptorQFecEncoder};
use wzp_proto::jitter::{JitterBuffer, PlayoutResult};
use wzp_proto::packet::{MediaHeader, MediaPacket, MiniFrameContext};
use wzp_proto::quality::AdaptiveQualityController;
use wzp_proto::traits::{
AudioDecoder, AudioEncoder, FecDecoder, FecEncoder,
};
use wzp_proto::traits::{AudioDecoder, AudioEncoder, FecDecoder, FecEncoder};
use wzp_proto::packet::QualityReport;
use wzp_proto::{CodecId, QualityProfile};
@@ -340,6 +341,22 @@ impl CallEncoder {
let enc_len = self.audio_enc.encode(pcm, &mut encoded)?;
encoded.truncate(enc_len);
// Phase 2: Opus tiers bypass RaptorQ entirely (DRED handles loss
// recovery at the codec layer). Codec2 tiers keep RaptorQ unchanged.
// On Opus packets, zero the FEC header fields so old receivers
// can cleanly identify "no RaptorQ block to assemble" and new
// receivers can short-circuit their FEC ingest path.
let is_opus = self.profile.codec.is_opus();
let (fec_block, fec_symbol, fec_ratio_encoded) = if is_opus {
(0u8, 0u8, 0u8)
} else {
(
self.block_id,
self.frame_in_block,
MediaHeader::encode_fec_ratio(self.profile.fec_ratio),
)
};
// Build source media packet
let source_pkt = MediaPacket {
header: MediaHeader {
@@ -347,11 +364,11 @@ impl CallEncoder {
is_repair: false,
codec_id: self.profile.codec,
has_quality_report: false,
fec_ratio_encoded: MediaHeader::encode_fec_ratio(self.profile.fec_ratio),
fec_ratio_encoded,
seq: self.seq,
timestamp: self.timestamp_ms,
fec_block: self.block_id,
fec_symbol: self.frame_in_block,
fec_block,
fec_symbol,
reserved: 0,
csrc_count: 0,
},
@@ -366,39 +383,42 @@ impl CallEncoder {
let mut output = vec![source_pkt];
// Add to FEC encoder
self.fec_enc.add_source_symbol(&encoded)?;
self.frame_in_block += 1;
// Codec2-only: feed RaptorQ and generate repair packets when the
// block is full. Opus tiers skip this entire block — DRED (active
// in Phase 1) provides codec-layer loss recovery.
if !is_opus {
self.fec_enc.add_source_symbol(&encoded)?;
self.frame_in_block += 1;
// If block is full, generate repair and finalize
if self.frame_in_block >= self.profile.frames_per_block {
if let Ok(repairs) = self.fec_enc.generate_repair(self.profile.fec_ratio) {
for (sym_idx, repair_data) in repairs {
output.push(MediaPacket {
header: MediaHeader {
version: 0,
is_repair: true,
codec_id: self.profile.codec,
has_quality_report: false,
fec_ratio_encoded: MediaHeader::encode_fec_ratio(
self.profile.fec_ratio,
),
seq: self.seq,
timestamp: self.timestamp_ms,
fec_block: self.block_id,
fec_symbol: sym_idx,
reserved: 0,
csrc_count: 0,
},
payload: Bytes::from(repair_data),
quality_report: None,
});
self.seq = self.seq.wrapping_add(1);
if self.frame_in_block >= self.profile.frames_per_block {
if let Ok(repairs) = self.fec_enc.generate_repair(self.profile.fec_ratio) {
for (sym_idx, repair_data) in repairs {
output.push(MediaPacket {
header: MediaHeader {
version: 0,
is_repair: true,
codec_id: self.profile.codec,
has_quality_report: false,
fec_ratio_encoded: MediaHeader::encode_fec_ratio(
self.profile.fec_ratio,
),
seq: self.seq,
timestamp: self.timestamp_ms,
fec_block: self.block_id,
fec_symbol: sym_idx,
reserved: 0,
csrc_count: 0,
},
payload: Bytes::from(repair_data),
quality_report: None,
});
self.seq = self.seq.wrapping_add(1);
}
}
let _ = self.fec_enc.finalize_block();
self.block_id = self.block_id.wrapping_add(1);
self.frame_in_block = 0;
}
let _ = self.fec_enc.finalize_block();
self.block_id = self.block_id.wrapping_add(1);
self.frame_in_block = 0;
}
Ok(output)
@@ -434,9 +454,12 @@ impl CallEncoder {
/// Manages the recv/decode side of a call.
pub struct CallDecoder {
/// Audio decoder.
audio_dec: Box<dyn AudioDecoder>,
/// FEC decoder.
/// Audio decoder. Concrete `AdaptiveDecoder` (not `Box<dyn AudioDecoder>`)
/// because Phase 3b calls the inherent `reconstruct_from_dred` method,
/// which cannot live on the `AudioDecoder` trait without dragging libopus
/// types into `wzp-proto`.
audio_dec: AdaptiveDecoder,
/// FEC decoder (Codec2 tiers only; Opus bypasses RaptorQ per Phase 2).
fec_dec: RaptorQFecDecoder,
/// Jitter buffer.
jitter: JitterBuffer,
@@ -450,6 +473,24 @@ pub struct CallDecoder {
last_was_cn: bool,
/// Mini-frame decompression context (tracks last full header baseline).
mini_context: MiniFrameContext,
// ─── Phase 3b: DRED reconstruction state ──────────────────────────────
/// DRED side-channel parser (a separate libopus object from the decoder).
dred_decoder: DredDecoderHandle,
/// Scratch buffer used by `dred_decoder.parse_into` on every arriving
/// Opus packet. Reused across calls to avoid 10 KB alloc churn per packet.
dred_parse_scratch: DredState,
/// Cached "most recently parsed valid" DRED state, swapped with
/// `dred_parse_scratch` on successful parse. Used by `decode_next` when
/// the jitter buffer reports a gap.
last_good_dred: DredState,
/// Sequence number of the packet that produced `last_good_dred`. `None`
/// if no packet has yielded DRED state yet (cold start or legacy sender).
last_good_dred_seq: Option<u16>,
/// Phase 4 telemetry counter: gaps recovered via DRED reconstruction.
pub dred_reconstructions: u64,
/// Phase 4 telemetry counter: gaps filled via classical Opus PLC
/// (because no DRED state covered the gap, or the active codec is Codec2).
pub classical_plc_invocations: u64,
}
impl CallDecoder {
@@ -459,8 +500,19 @@ impl CallDecoder {
} else {
JitterBuffer::new(config.jitter_target, config.jitter_max, config.jitter_min)
};
// Phase 3b: build the DRED parser + state buffers. These allocate
// libopus state (~10 KB each) once per call, not per packet — the
// scratch and last-good buffers are reused via std::mem::swap on
// every successful parse.
let dred_decoder =
DredDecoderHandle::new().expect("opus_dred_decoder_create failed at call setup");
let dred_parse_scratch =
DredState::new().expect("opus_dred_alloc failed at call setup (scratch)");
let last_good_dred =
DredState::new().expect("opus_dred_alloc failed at call setup (good state)");
Self {
audio_dec: wzp_codec::create_decoder(config.profile),
audio_dec: AdaptiveDecoder::new(config.profile)
.expect("failed to create adaptive decoder"),
fec_dec: wzp_fec::create_decoder(&config.profile),
jitter,
quality: AdaptiveQualityController::new(),
@@ -468,6 +520,12 @@ impl CallDecoder {
comfort_noise: ComfortNoise::new(50),
last_was_cn: false,
mini_context: MiniFrameContext::default(),
dred_decoder,
dred_parse_scratch,
last_good_dred,
last_good_dred_seq: None,
dred_reconstructions: 0,
classical_plc_invocations: 0,
}
}
@@ -482,15 +540,54 @@ impl CallDecoder {
/// Feed a received media packet into the decode pipeline.
pub fn ingest(&mut self, packet: MediaPacket) {
// Feed to FEC decoder
let _ = self.fec_dec.add_symbol(
packet.header.fec_block,
packet.header.fec_symbol,
packet.header.is_repair,
&packet.payload,
);
// Phase 2: Opus packets bypass RaptorQ. Codec2 packets still feed
// the FEC decoder for recovery. This also cleanly drops any stray
// Opus repair packets from an old sender (we don't push repair
// packets to the jitter buffer either, so they're effectively
// ignored — a graceful mixed-version degradation).
if !packet.header.codec_id.is_opus() {
let _ = self.fec_dec.add_symbol(
packet.header.fec_block,
packet.header.fec_symbol,
packet.header.is_repair,
&packet.payload,
);
}
// If not a repair packet, also feed directly to jitter buffer
// Phase 3b: Opus source packets carry DRED side-channel data in
// libopus 1.5. Parse it into the scratch state and, on success,
// swap with the cached `last_good_dred` so later gap reconstruction
// has fresh neural redundancy to draw from. Parsing happens before
// the jitter push because the jitter buffer consumes the packet.
if packet.header.codec_id.is_opus() && !packet.header.is_repair {
match self
.dred_decoder
.parse_into(&mut self.dred_parse_scratch, &packet.payload)
{
Ok(available) if available > 0 => {
// Swap the freshly parsed state into `last_good_dred`.
// The old good state (now in scratch) is about to be
// overwritten on the next parse — its contents are
// not needed after this swap.
std::mem::swap(&mut self.dred_parse_scratch, &mut self.last_good_dred);
self.last_good_dred_seq = Some(packet.header.seq);
}
Ok(_) => {
// Packet had no DRED data (return 0). Leave the cached
// state untouched — it may still cover upcoming gaps
// from a warm-up period where the encoder was producing
// DRED bytes. The scratch buffer was potentially written
// but its `samples_available` is 0 so it's harmless.
}
Err(e) => {
debug!("DRED parse error (ignored): {e}");
}
}
}
// Source packets (Opus or Codec2) go to the jitter buffer for decode.
// Repair packets never reach the jitter buffer; for Codec2 they're
// used by the FEC decoder above, for Opus they're dropped here.
if !packet.header.is_repair {
self.jitter.push(packet);
}
@@ -524,19 +621,72 @@ impl CallDecoder {
result
}
PlayoutResult::Missing { seq } => {
// Only generate PLC if there are still packets buffered ahead.
// Only attempt recovery if there are still packets buffered ahead.
// Otherwise we've drained everything — return None to stop.
if self.jitter.depth() > 0 {
debug!(seq, "packet loss, generating PLC");
let result = self.audio_dec.decode_lost(pcm).ok();
if result.is_some() {
self.jitter.record_decode();
}
result
} else {
if self.jitter.depth() == 0 {
self.jitter.record_underrun();
None
return None;
}
// Phase 3b: try DRED reconstruction first. If we have a
// recent DRED state from a packet whose seq > missing seq,
// and the seq delta (in samples) fits within the state's
// available window, libopus can synthesize a plausible
// replacement for the lost frame. Fall back to classical
// PLC when no state covers the gap, when the active codec
// is Codec2, or when the reconstruction itself errors.
if self.profile.codec.is_opus() {
if let Some(last_seq) = self.last_good_dred_seq {
// How many frames ahead of the missing seq is the
// last-good packet? Use wrapping arithmetic for the
// u16 seq space.
let seq_delta = last_seq.wrapping_sub(seq);
// Reject stale or backward state. u16 wraparound
// would make a "seq went backward" delta very large;
// cap at a sane forward-looking window.
const MAX_SEQ_DELTA: u16 = 128;
if seq_delta > 0 && seq_delta <= MAX_SEQ_DELTA {
let frame_samples =
(48_000 * self.profile.frame_duration_ms as i32) / 1000;
let offset_samples = seq_delta as i32 * frame_samples;
let available = self.last_good_dred.samples_available();
if offset_samples > 0 && offset_samples <= available {
match self.audio_dec.reconstruct_from_dred(
&self.last_good_dred,
offset_samples,
pcm,
) {
Ok(n) => {
self.dred_reconstructions += 1;
self.jitter.record_decode();
debug!(
seq,
last_seq,
offset_samples,
available,
"DRED reconstruction for gap"
);
return Some(n);
}
Err(e) => {
// Reconstruction failed — fall
// through to classical PLC below.
debug!(seq, "DRED reconstruct error: {e}");
}
}
}
}
}
}
// Classical PLC fallback (also the Codec2 path).
debug!(seq, "packet loss, generating classical PLC");
self.classical_plc_invocations += 1;
let result = self.audio_dec.decode_lost(pcm).ok();
if result.is_some() {
self.jitter.record_decode();
}
result
}
PlayoutResult::NotReady => {
self.jitter.record_underrun();
@@ -559,6 +709,19 @@ impl CallDecoder {
pub fn reset_stats(&mut self) {
self.jitter.reset_stats();
}
/// Phase 3b introspection: sequence number of the most recently parsed
/// valid DRED state, or `None` if no Opus packet has yielded DRED data
/// yet. Used by tests to debug reconstruction eligibility.
pub fn last_good_dred_seq(&self) -> Option<u16> {
self.last_good_dred_seq
}
/// Phase 3b introspection: samples of audio history currently available
/// in the cached DRED state.
pub fn last_good_dred_samples_available(&self) -> i32 {
self.last_good_dred.samples_available()
}
}
/// Periodic telemetry logger for jitter buffer statistics.
@@ -620,18 +783,83 @@ mod tests {
assert!(!packets[0].header.is_repair);
}
/// Phase 2: Opus packets have zero FEC header fields — no block, no
/// symbol index, no repair ratio. The RaptorQ layer is bypassed
/// entirely on the Opus tiers.
#[test]
fn encoder_generates_repair_on_full_block() {
fn opus_source_packets_have_zero_fec_header_fields() {
let config = CallConfig {
profile: QualityProfile::GOOD, // 5 frames/block
profile: QualityProfile::GOOD, // Opus 24k
suppression_enabled: false, // skip silence gate for this test
..Default::default()
};
let mut enc = CallEncoder::new(&config);
let pcm = vec![0i16; 960];
// Non-silent sine wave so silence detection doesn't suppress us
// even with suppression_enabled=false (belt and braces).
let pcm: Vec<i16> = (0..960)
.map(|i| ((i as f32 * 0.1).sin() * 10_000.0) as i16)
.collect();
let packets = enc.encode_frame(&pcm).unwrap();
assert_eq!(packets.len(), 1, "Opus must emit exactly 1 source packet");
let hdr = &packets[0].header;
assert!(hdr.codec_id.is_opus());
assert!(!hdr.is_repair);
assert_eq!(hdr.fec_block, 0, "Opus fec_block must be 0");
assert_eq!(hdr.fec_symbol, 0, "Opus fec_symbol must be 0");
assert_eq!(hdr.fec_ratio_encoded, 0, "Opus fec_ratio_encoded must be 0");
}
let mut total_packets = 0;
let mut repair_count = 0;
for _ in 0..5 {
/// Phase 2: Opus never emits repair packets, regardless of how many
/// source frames are fed in. DRED (Phase 1) provides loss recovery at
/// the codec layer; RaptorQ is disabled on Opus tiers.
#[test]
fn opus_encoder_never_emits_repair_packets() {
let config = CallConfig {
profile: QualityProfile::GOOD, // 5 frames/block in the Codec2 sense
suppression_enabled: false,
..Default::default()
};
let mut enc = CallEncoder::new(&config);
let pcm: Vec<i16> = (0..960)
.map(|i| ((i as f32 * 0.1).sin() * 10_000.0) as i16)
.collect();
// Encode well beyond a block boundary to prove no repair ever comes out.
let mut total_packets = 0usize;
let mut repair_count = 0usize;
for _ in 0..20 {
let packets = enc.encode_frame(&pcm).unwrap();
total_packets += packets.len();
repair_count += packets.iter().filter(|p| p.header.is_repair).count();
}
assert_eq!(repair_count, 0, "Opus must emit zero repair packets");
assert_eq!(
total_packets, 20,
"20 source frames → 20 source packets (1:1, no RaptorQ expansion)"
);
}
/// Phase 2: Codec2 still emits repair packets with RaptorQ ratio unchanged.
/// DRED is libopus-only and does not apply here, so RaptorQ is still the
/// primary loss-recovery mechanism on Codec2 tiers.
#[test]
fn codec2_encoder_generates_repair_on_full_block() {
let config = CallConfig {
profile: QualityProfile::CATASTROPHIC, // Codec2 1200, 8 frames/block, ratio 1.0
suppression_enabled: false,
..Default::default()
};
let mut enc = CallEncoder::new(&config);
// Codec2 takes 48 kHz samples and downsamples internally.
// CATASTROPHIC uses 40 ms frames → 1920 samples.
let pcm: Vec<i16> = (0..1920)
.map(|i| ((i as f32 * 0.1).sin() * 10_000.0) as i16)
.collect();
let mut total_packets = 0usize;
let mut repair_count = 0usize;
// Run long enough to cross the 8-frame block boundary and see repairs.
for _ in 0..16 {
let packets = enc.encode_frame(&pcm).unwrap();
for p in &packets {
if p.header.is_repair {
@@ -640,8 +868,10 @@ mod tests {
}
total_packets += packets.len();
}
assert!(repair_count > 0, "should have repair packets after full block");
assert!(total_packets > 5, "total {total_packets} should exceed 5 source");
assert!(
repair_count > 0,
"Codec2 must still emit repair packets (got {repair_count} repairs, {total_packets} total)"
);
}
#[test]
@@ -672,6 +902,219 @@ mod tests {
assert!(dec.decode_next(&mut pcm).is_none());
}
// ─── Phase 3b — DRED reconstruction on packet loss ────────────────────
/// Helper: create a CallEncoder/CallDecoder pair with the given profile
/// and silence suppression disabled so silence-detection doesn't drop
/// our synthetic test frames.
fn encoder_decoder_pair(profile: QualityProfile) -> (CallEncoder, CallDecoder) {
let config = CallConfig {
profile,
suppression_enabled: false,
// Small jitter buffer so decode_next drains quickly in tests.
jitter_min: 2,
jitter_target: 3,
jitter_max: 20,
adaptive_jitter: false,
..Default::default()
};
(CallEncoder::new(&config), CallDecoder::new(&config))
}
/// Helper: generate a non-silent 20 ms frame of 300 Hz sine at the
/// given sample offset so consecutive frames form a continuous tone.
fn voice_frame_20ms(sample_offset: usize) -> Vec<i16> {
(0..960)
.map(|i| {
let t = (sample_offset + i) as f64 / 48_000.0;
(8000.0 * (2.0 * std::f64::consts::PI * 300.0 * t).sin()) as i16
})
.collect()
}
/// Phase 3b probe: sweep packet_loss_perc values to find the minimum
/// that produces a samples_available ≥ 960 (enough to reconstruct a
/// single 20 ms Opus frame). This guides the production loss floor.
#[test]
#[ignore] // diagnostic only — run with `cargo test ... -- --ignored --nocapture`
fn probe_dred_samples_available_by_loss_floor() {
use wzp_codec::opus_enc::OpusEncoder;
use wzp_proto::traits::AudioEncoder;
for loss_pct in [5u8, 10, 15, 20, 25, 40, 60, 80].iter().copied() {
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
enc.set_expected_loss(loss_pct);
let (_drop_enc, mut dec) = encoder_decoder_pair(QualityProfile::GOOD);
for i in 0..60u16 {
let pcm = voice_frame_20ms(i as usize * 960);
let mut encoded = vec![0u8; 512];
let n = enc.encode(&pcm, &mut encoded).unwrap();
encoded.truncate(n);
let pkt = MediaPacket {
header: MediaHeader {
version: 0,
is_repair: false,
codec_id: CodecId::Opus24k,
has_quality_report: false,
fec_ratio_encoded: 0,
seq: i,
timestamp: (i as u32) * 20,
fec_block: 0,
fec_symbol: 0,
reserved: 0,
csrc_count: 0,
},
payload: Bytes::from(encoded),
quality_report: None,
};
dec.ingest(pkt);
}
eprintln!(
"[phase3b probe] loss_pct={loss_pct} samples_available={}",
dec.last_good_dred_samples_available()
);
}
}
/// Phase 3b: simulated single-packet loss on an Opus call triggers a
/// DRED reconstruction rather than a classical PLC fill. Runs the full
/// encode → ingest → decode_next pipeline.
#[test]
fn opus_single_packet_loss_is_recovered_via_dred() {
let (mut enc, mut dec) = encoder_decoder_pair(QualityProfile::GOOD);
// Warm-up: encode and ingest 60 frames (1.2 s) so the DRED emitter
// has had time to fill its 200 ms window and at least one
// successful DRED parse has happened on the decoder side.
let warmup_frames = 60;
for i in 0..warmup_frames {
let pcm = voice_frame_20ms(i * 960);
let packets = enc.encode_frame(&pcm).unwrap();
for pkt in packets {
dec.ingest(pkt);
}
}
// Drain the warm-up frames through the decoder to advance the
// jitter buffer cursor past them.
let mut out = vec![0i16; 960];
while dec.decode_next(&mut out).is_some() {}
// Encode the next three frames but skip ingesting the middle one.
let base_offset = warmup_frames * 960;
let pcm_a = voice_frame_20ms(base_offset);
let pcm_b = voice_frame_20ms(base_offset + 960);
let pcm_c = voice_frame_20ms(base_offset + 1920);
let pkts_a = enc.encode_frame(&pcm_a).unwrap();
let pkts_b = enc.encode_frame(&pcm_b).unwrap(); // DROP THIS ONE
let pkts_c = enc.encode_frame(&pcm_c).unwrap();
for pkt in pkts_a {
dec.ingest(pkt);
}
// Skip pkts_b entirely — this is the "packet loss".
drop(pkts_b);
for pkt in pkts_c {
dec.ingest(pkt);
}
// Drain again. Somewhere in here decode_next will hit Missing()
// for the dropped packet and attempt DRED reconstruction.
let baseline_dred = dec.dred_reconstructions;
let baseline_plc = dec.classical_plc_invocations;
eprintln!(
"[phase3b probe] pre-drain: last_good_seq={:?} samples_available={}",
dec.last_good_dred_seq(),
dec.last_good_dred_samples_available()
);
while dec.decode_next(&mut out).is_some() {}
let dred_delta = dec.dred_reconstructions - baseline_dred;
let plc_delta = dec.classical_plc_invocations - baseline_plc;
eprintln!(
"[phase3b probe] post-drain: dred_delta={dred_delta} plc_delta={plc_delta}"
);
assert!(
dred_delta >= 1,
"expected ≥1 DRED reconstruction on single-packet loss, \
got dred_delta={dred_delta} plc_delta={plc_delta}"
);
}
/// Phase 3b: lossless stream never triggers DRED reconstruction or PLC.
/// Baseline behavior — verifies the Missing() branch is not spuriously taken.
#[test]
fn opus_lossless_ingest_never_triggers_dred_or_plc() {
let (mut enc, mut dec) = encoder_decoder_pair(QualityProfile::GOOD);
// Encode + ingest 40 frames with no drops.
for i in 0..40 {
let pcm = voice_frame_20ms(i * 960);
let packets = enc.encode_frame(&pcm).unwrap();
for pkt in packets {
dec.ingest(pkt);
}
}
let mut out = vec![0i16; 960];
while dec.decode_next(&mut out).is_some() {}
assert_eq!(
dec.dred_reconstructions, 0,
"lossless stream should not reconstruct"
);
assert_eq!(
dec.classical_plc_invocations, 0,
"lossless stream should not PLC"
);
}
/// Phase 3b: Codec2 calls fall through to classical PLC on loss.
/// DRED is libopus-only, so even if the decoder's DRED state were
/// populated (it won't be — Codec2 packets don't carry DRED bytes),
/// `reconstruct_from_dred` rejects Codec2 at the AdaptiveDecoder
/// level. This test guards the Codec2 side of the protection split.
#[test]
fn codec2_loss_falls_through_to_classical_plc() {
let (mut enc, mut dec) = encoder_decoder_pair(QualityProfile::CATASTROPHIC);
// Codec2 1200 uses 40 ms frames → 1920 samples at 48 kHz (before
// the downsample inside the codec). Encode 20 frames (~0.8 s).
let make_frame = |offset: usize| -> Vec<i16> {
(0..1920)
.map(|i| {
let t = (offset + i) as f64 / 48_000.0;
(8000.0 * (2.0 * std::f64::consts::PI * 300.0 * t).sin()) as i16
})
.collect()
};
for i in 0..20 {
let pcm = make_frame(i * 1920);
let packets = enc.encode_frame(&pcm).unwrap();
for pkt in packets {
// Drop every 5th source packet to simulate loss.
if !pkt.header.is_repair && i % 5 == 3 {
continue;
}
dec.ingest(pkt);
}
}
let mut out = vec![0i16; 1920];
while dec.decode_next(&mut out).is_some() {}
assert_eq!(
dec.dred_reconstructions, 0,
"Codec2 must never reconstruct via DRED"
);
// classical_plc_invocations may or may not trigger depending on
// whether the jitter buffer sees Missing before draining — the key
// assertion is that DRED is not used. PLC count is advisory.
}
// ---- QualityAdapter tests ----
/// Helper: build a QualityReport from human-readable loss% and RTT ms.

View File

@@ -47,6 +47,11 @@ struct CliArgs {
room: Option<String>,
token: Option<String>,
_metrics_file: Option<String>,
version_check: bool,
/// Connect to relay for persistent signaling (direct calls).
signal: bool,
/// Place a direct call to a fingerprint (requires --signal).
call_target: Option<String>,
}
impl CliArgs {
@@ -88,12 +93,20 @@ fn parse_args() -> CliArgs {
let mut room = None;
let mut token = None;
let mut metrics_file = None;
let mut version_check = false;
let mut relay_str = None;
let mut signal = false;
let mut call_target = None;
let mut i = 1;
while i < args.len() {
match args[i].as_str() {
"--live" => live = true,
"--signal" => signal = true,
"--call" => {
i += 1;
call_target = Some(args.get(i).expect("--call requires a fingerprint").to_string());
}
"--send-tone" => {
i += 1;
send_tone_secs = Some(
@@ -169,6 +182,7 @@ fn parse_args() -> CliArgs {
);
}
"--sweep" => sweep = true,
"--version-check" => { version_check = true; }
"--help" | "-h" => {
eprintln!("Usage: wzp-client [options] [relay-addr]");
eprintln!();
@@ -221,6 +235,9 @@ fn parse_args() -> CliArgs {
room,
token,
_metrics_file: metrics_file,
version_check,
signal,
call_target,
}
}
@@ -239,6 +256,32 @@ async fn main() -> anyhow::Result<()> {
return Ok(());
}
// --version-check: query relay version over QUIC and exit
if cli.version_check {
let client_config = wzp_transport::client_config();
let bind_addr: SocketAddr = "0.0.0.0:0".parse()?;
let endpoint = wzp_transport::create_endpoint(bind_addr, None)?;
let conn = wzp_transport::connect(&endpoint, cli.relay_addr, "version", client_config).await?;
match conn.accept_uni().await {
Ok(mut recv) => {
let data = recv.read_to_end(256).await.unwrap_or_default();
let version = String::from_utf8_lossy(&data);
println!("{} {}", cli.relay_addr, version.trim());
}
Err(e) => {
eprintln!("relay {} does not support version query: {e}", cli.relay_addr);
}
}
endpoint.close(0u32.into(), b"done");
return Ok(());
}
// --signal mode: persistent signaling for direct calls
if cli.signal {
let seed = cli.resolve_seed();
return run_signal_mode(cli.relay_addr, seed, cli.token, cli.call_target).await;
}
let seed = cli.resolve_seed();
info!(
@@ -250,12 +293,11 @@ async fn main() -> anyhow::Result<()> {
"WarzonePhone client"
);
// Hash room name for SNI privacy (or "default" if none specified)
// Use raw room name as SNI (consistent with Android + Desktop clients for federation)
let sni = match &cli.room {
Some(name) => {
let hashed = wzp_crypto::hash_room_name(name);
info!(room = %name, hashed = %hashed, "room name hashed for SNI");
hashed
info!(room = %name, "using room name as SNI");
name.clone()
}
None => "default".to_string(),
};
@@ -274,6 +316,26 @@ async fn main() -> anyhow::Result<()> {
let transport = Arc::new(wzp_transport::QuinnTransport::new(connection));
// Register shutdown handler so SIGTERM/SIGINT always closes QUIC cleanly.
// Without this, killed clients leave zombie connections on the relay for ~30s.
{
let shutdown_transport = transport.clone();
tokio::spawn(async move {
let mut sigterm = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())
.expect("failed to register SIGTERM handler");
let mut sigint = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::interrupt())
.expect("failed to register SIGINT handler");
tokio::select! {
_ = sigterm.recv() => { info!("SIGTERM received, closing connection..."); }
_ = sigint.recv() => { info!("SIGINT received, closing connection..."); }
}
// Close the QUIC connection immediately (APPLICATION_CLOSE frame).
// Don't call process::exit — let the main task detect the closed
// connection and perform clean shutdown (e.g., save recordings).
shutdown_transport.connection().close(0u32.into(), b"shutdown");
});
}
// Send auth token if provided (relay with --auth-url expects this first)
if let Some(ref token) = cli.token {
let auth = wzp_proto::SignalMessage::AuthToken {
@@ -624,3 +686,195 @@ async fn run_live(transport: Arc<wzp_transport::QuinnTransport>) -> anyhow::Resu
info!("done");
Ok(())
}
/// Persistent signaling mode for direct 1:1 calls.
async fn run_signal_mode(
relay_addr: SocketAddr,
seed: wzp_crypto::Seed,
token: Option<String>,
call_target: Option<String>,
) -> anyhow::Result<()> {
use wzp_proto::SignalMessage;
let identity = seed.derive_identity();
let pub_id = identity.public_identity();
let fp = pub_id.fingerprint.to_string();
let identity_pub = *pub_id.signing.as_bytes();
info!(fingerprint = %fp, "signal mode");
// Connect to relay with SNI "_signal"
let client_config = wzp_transport::client_config();
let bind_addr: SocketAddr = if relay_addr.is_ipv6() {
"[::]:0".parse()?
} else {
"0.0.0.0:0".parse()?
};
let endpoint = wzp_transport::create_endpoint(bind_addr, None)?;
let conn = wzp_transport::connect(&endpoint, relay_addr, "_signal", client_config).await?;
let transport = Arc::new(wzp_transport::QuinnTransport::new(conn));
info!("connected to relay (signal channel)");
// Auth if token provided
if let Some(ref tok) = token {
transport.send_signal(&SignalMessage::AuthToken { token: tok.clone() }).await?;
}
// Register presence (signature not verified in Phase 1)
transport.send_signal(&SignalMessage::RegisterPresence {
identity_pub,
signature: vec![], // Phase 1: not verified
alias: None,
}).await?;
// Wait for ack
match transport.recv_signal().await? {
Some(SignalMessage::RegisterPresenceAck { success: true, .. }) => {
info!(fingerprint = %fp, "registered on relay — waiting for calls");
}
Some(SignalMessage::RegisterPresenceAck { success: false, error }) => {
anyhow::bail!("registration failed: {}", error.unwrap_or_default());
}
other => {
anyhow::bail!("unexpected response: {other:?}");
}
}
// If --call specified, place the call
if let Some(ref target) = call_target {
info!(target = %target, "placing direct call...");
let call_id = format!("{:016x}", std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH).unwrap().as_nanos());
transport.send_signal(&SignalMessage::DirectCallOffer {
caller_fingerprint: fp.clone(),
caller_alias: None,
target_fingerprint: target.clone(),
call_id: call_id.clone(),
identity_pub,
ephemeral_pub: [0u8; 32], // Phase 1: not used for key exchange
signature: vec![],
supported_profiles: vec![wzp_proto::QualityProfile::GOOD],
}).await?;
}
// Signal recv loop — handle incoming signals
let signal_transport = transport.clone();
let relay = relay_addr;
let my_fp = fp.clone();
let my_seed = seed.0;
loop {
match signal_transport.recv_signal().await {
Ok(Some(msg)) => match msg {
SignalMessage::CallRinging { call_id } => {
info!(call_id = %call_id, "ringing...");
}
SignalMessage::DirectCallOffer { caller_fingerprint, caller_alias, call_id, .. } => {
info!(
from = %caller_fingerprint,
alias = ?caller_alias,
call_id = %call_id,
"incoming call — auto-accepting (generic)"
);
// Auto-accept for CLI testing
let _ = signal_transport.send_signal(&SignalMessage::DirectCallAnswer {
call_id,
accept_mode: wzp_proto::CallAcceptMode::AcceptGeneric,
identity_pub: Some(identity_pub),
ephemeral_pub: None,
signature: None,
chosen_profile: Some(wzp_proto::QualityProfile::GOOD),
}).await;
}
SignalMessage::DirectCallAnswer { call_id, accept_mode, .. } => {
info!(call_id = %call_id, mode = ?accept_mode, "call answered");
}
SignalMessage::CallSetup { call_id, room, relay_addr: setup_relay } => {
info!(call_id = %call_id, room = %room, relay = %setup_relay, "call setup — connecting to media room");
// Connect to the media room
let media_relay: SocketAddr = setup_relay.parse().unwrap_or(relay);
let media_cfg = wzp_transport::client_config();
match wzp_transport::connect(&endpoint, media_relay, &room, media_cfg).await {
Ok(media_conn) => {
let media_transport = Arc::new(wzp_transport::QuinnTransport::new(media_conn));
// Crypto handshake
match wzp_client::handshake::perform_handshake(&*media_transport, &my_seed, None).await {
Ok(_session) => {
info!("media connected — sending tone (press Ctrl+C to hang up)");
// Simple tone sender for testing
let mt = media_transport.clone();
let send_task = tokio::spawn(async move {
let config = wzp_client::call::CallConfig::default();
let mut encoder = wzp_client::call::CallEncoder::new(&config);
let duration = tokio::time::Duration::from_millis(20);
loop {
let pcm: Vec<i16> = (0..FRAME_SAMPLES)
.map(|_| 0i16) // silence — could be tone
.collect();
if let Ok(pkts) = encoder.encode_frame(&pcm) {
for pkt in &pkts {
if mt.send_media(pkt).await.is_err() { return; }
}
}
tokio::time::sleep(duration).await;
}
});
// Wait for hangup or ctrl+c
loop {
tokio::select! {
sig = signal_transport.recv_signal() => {
match sig {
Ok(Some(SignalMessage::Hangup { .. })) => {
info!("remote hung up");
break;
}
Ok(None) | Err(_) => break,
_ => {}
}
}
_ = tokio::signal::ctrl_c() => {
info!("hanging up...");
let _ = signal_transport.send_signal(&SignalMessage::Hangup {
reason: wzp_proto::HangupReason::Normal,
}).await;
break;
}
}
}
send_task.abort();
media_transport.close().await.ok();
info!("call ended");
}
Err(e) => error!("media handshake failed: {e}"),
}
}
Err(e) => error!("media connect failed: {e}"),
}
}
SignalMessage::Hangup { reason } => {
info!(reason = ?reason, "call ended by remote");
}
SignalMessage::Pong { .. } => {}
other => {
info!("signal: {:?}", std::mem::discriminant(&other));
}
},
Ok(None) => {
info!("signal connection closed");
break;
}
Err(e) => {
error!("signal error: {e}");
break;
}
}
}
transport.close().await.ok();
Ok(())
}

View File

@@ -96,6 +96,7 @@ pub fn signal_to_call_type(signal: &SignalMessage) -> CallSignalType {
SignalMessage::Hangup { .. } => CallSignalType::Hangup,
SignalMessage::Rekey { .. } => CallSignalType::Offer, // reuse
SignalMessage::QualityUpdate { .. } => CallSignalType::Offer, // reuse
SignalMessage::LossRecoveryUpdate { .. } => CallSignalType::Offer, // reuse (telemetry)
SignalMessage::Ping { .. } | SignalMessage::Pong { .. } => CallSignalType::Offer,
SignalMessage::AuthToken { .. } => CallSignalType::Offer,
SignalMessage::Hold => CallSignalType::Hold,
@@ -110,6 +111,15 @@ pub fn signal_to_call_type(signal: &SignalMessage) -> CallSignalType {
SignalMessage::SessionForward { .. } => CallSignalType::Offer, // reuse
SignalMessage::SessionForwardAck { .. } => CallSignalType::Offer, // reuse
SignalMessage::RoomUpdate { .. } => CallSignalType::Offer, // reuse
SignalMessage::FederationHello { .. }
| SignalMessage::GlobalRoomActive { .. }
| SignalMessage::GlobalRoomInactive { .. } => CallSignalType::Offer, // relay-only
SignalMessage::DirectCallOffer { .. } => CallSignalType::Offer,
SignalMessage::DirectCallAnswer { .. } => CallSignalType::Answer,
SignalMessage::CallSetup { .. } => CallSignalType::Offer, // relay-only
SignalMessage::CallRinging { .. } => CallSignalType::Ringing,
SignalMessage::RegisterPresence { .. }
| SignalMessage::RegisterPresenceAck { .. } => CallSignalType::Offer, // relay-only
}
}

View File

@@ -38,6 +38,9 @@ pub async fn perform_handshake(
ephemeral_pub,
signature,
supported_profiles: vec![
QualityProfile::STUDIO_64K,
QualityProfile::STUDIO_48K,
QualityProfile::STUDIO_32K,
QualityProfile::GOOD,
QualityProfile::DEGRADED,
QualityProfile::CATASTROPHIC,

View File

@@ -10,8 +10,17 @@ description = "WarzonePhone audio codec layer — Opus + Codec2 encoding/decodin
wzp-proto = { workspace = true }
tracing = { workspace = true }
# Opus bindings
audiopus = { workspace = true }
# Opus bindings — libopus 1.5.2.
# opusic-c for the encoder (set_dred_duration lives here in Phase 1).
# opusic-sys for the decoder — we wrap the raw *mut OpusDecoder ourselves
# because opusic-c::Decoder.inner is pub(crate), blocking the unified
# decoder + DRED path we need in Phase 3.
opusic-c = { workspace = true }
opusic-sys = { workspace = true }
# Zero-cost slice reinterpretation for the i16 ↔ u16 boundary between
# our PCM buffers and opusic-c's encode API.
bytemuck = { workspace = true }
# Pure-Rust Codec2 implementation
codec2 = { workspace = true }

View File

@@ -199,6 +199,27 @@ impl AdaptiveDecoder {
fn codec2_frame_samples(&self) -> usize {
self.codec2.frame_samples()
}
/// Reconstruct a lost frame from a previously parsed DRED state.
///
/// Phase 3b entry point for gap reconstruction. Dispatches to the
/// inner Opus decoder when active. Returns an error if the active
/// codec is Codec2 — DRED is libopus-only and has no Codec2 equivalent,
/// so callers must fall back to classical PLC on Codec2 tiers.
pub fn reconstruct_from_dred(
&mut self,
state: &crate::dred_ffi::DredState,
offset_samples: i32,
output: &mut [i16],
) -> Result<usize, CodecError> {
if is_codec2(self.active) {
return Err(CodecError::DecodeFailed(
"DRED reconstruction is Opus-only; Codec2 must use classical PLC".into(),
));
}
self.opus
.reconstruct_from_dred(state, offset_samples, output)
}
}
// ─── Tests ───────────────────────────────────────────────────────────────────

View File

@@ -0,0 +1,585 @@
//! Raw opusic-sys FFI wrappers for libopus 1.5.2 decoder + DRED reconstruction.
//!
//! # Why this module exists
//!
//! We cannot use `opusic_c::Decoder` because its inner `*mut OpusDecoder`
//! pointer is `pub(crate)` — not reachable from outside the opusic-c crate.
//! Phase 3 of the DRED integration needs to hand that same pointer to
//! `opus_decoder_dred_decode`, and running two parallel decoders (one from
//! opusic-c for normal audio, another from opusic-sys for DRED) would cause
//! the DRED-only decoder's internal state to drift out of sync with the
//! audio stream because it would not see normal decode calls.
//!
//! The fix is to own the raw decoder ourselves and use the same handle for
//! both normal decode AND DRED reconstruction. This module is the single
//! owner of `*mut OpusDecoder`, `*mut OpusDREDDecoder`, and `*mut OpusDRED`
//! in the WZP workspace.
//!
//! # Phase 3a scope
//!
//! Phase 0 added `DecoderHandle` (normal decode). Phase 3a adds:
//! - [`DredDecoderHandle`] — wraps `*mut OpusDREDDecoder` for parsing DRED
//! side-channel data out of arriving Opus packets.
//! - [`DredState`] — wraps `*mut OpusDRED` (a fixed 10,592-byte buffer
//! allocated by libopus) that holds parsed DRED state between the parse
//! and reconstruct steps.
//! - [`DredDecoderHandle::parse_into`] — wraps `opus_dred_parse`.
//! - [`DecoderHandle::reconstruct_from_dred`] — wraps `opus_decoder_dred_decode`.
//!
//! The pattern is: on every arriving Opus packet, the receiver calls
//! `parse_into` with a reusable `DredState`, then stores (seq, state_clone)
//! in a ring. On detected loss, the receiver computes the offset from the
//! freshest reachable DRED state and calls `reconstruct_from_dred` to
//! synthesize the missing audio.
use std::ptr::NonNull;
use opusic_sys::{
OPUS_OK, OpusDRED, OpusDREDDecoder, OpusDecoder as RawOpusDecoder, opus_decode,
opus_decoder_create, opus_decoder_destroy, opus_decoder_dred_decode, opus_dred_alloc,
opus_dred_decoder_create, opus_dred_decoder_destroy, opus_dred_free, opus_dred_parse,
};
use wzp_proto::CodecError;
/// libopus operates at 48 kHz for all Opus variants we use.
const SAMPLE_RATE_HZ: i32 = 48_000;
/// Mono.
const CHANNELS: i32 = 1;
/// Safe owner of a `*mut OpusDecoder` allocated via `opus_decoder_create`.
///
/// Releases the decoder in `Drop`. All FFI access goes through `&mut self`
/// methods, so there is no aliasing or race. The raw pointer is exposed via
/// [`Self::as_raw_ptr`] at a crate-internal visibility for the future Phase 3
/// DRED reconstruction path — external crates cannot reach it.
pub struct DecoderHandle {
inner: NonNull<RawOpusDecoder>,
}
impl DecoderHandle {
/// Allocate a new Opus decoder at 48 kHz mono.
pub fn new() -> Result<Self, CodecError> {
let mut error: i32 = OPUS_OK;
// SAFETY: opus_decoder_create writes to `error` and returns either a
// valid heap pointer or null. We check both before constructing the
// NonNull wrapper.
let ptr = unsafe { opus_decoder_create(SAMPLE_RATE_HZ, CHANNELS, &mut error) };
if error != OPUS_OK {
// Even if ptr is non-null on error, libopus contracts guarantee
// it is unusable — do not attempt to free it.
return Err(CodecError::DecodeFailed(format!(
"opus_decoder_create failed: err={error}"
)));
}
let inner = NonNull::new(ptr).ok_or_else(|| {
CodecError::DecodeFailed("opus_decoder_create returned null".into())
})?;
Ok(Self { inner })
}
/// Decode an Opus packet into PCM samples.
///
/// `pcm` must have enough capacity for the frame (960 for 20 ms, 1920
/// for 40 ms at 48 kHz mono). Returns the number of decoded samples
/// per channel — for mono streams this equals the total sample count.
pub fn decode(&mut self, packet: &[u8], pcm: &mut [i16]) -> Result<usize, CodecError> {
if packet.is_empty() {
return Err(CodecError::DecodeFailed("empty packet".into()));
}
if pcm.is_empty() {
return Err(CodecError::DecodeFailed("empty output buffer".into()));
}
// SAFETY: self.inner is a valid *mut OpusDecoder owned by this struct.
// `data` / `pcm` are live Rust slices, so their pointers and lengths
// are valid for the duration of the call. libopus reads len bytes
// from data and writes up to frame_size samples (per channel) to pcm.
let n = unsafe {
opus_decode(
self.inner.as_ptr(),
packet.as_ptr(),
packet.len() as i32,
pcm.as_mut_ptr(),
pcm.len() as i32,
/* decode_fec = */ 0,
)
};
if n < 0 {
return Err(CodecError::DecodeFailed(format!(
"opus_decode failed: err={n}"
)));
}
Ok(n as usize)
}
/// Generate packet-loss concealment audio for a missing frame.
///
/// Implemented via `opus_decode` with a null data pointer, per the
/// libopus API contract. `pcm` should be sized for the expected frame.
pub fn decode_lost(&mut self, pcm: &mut [i16]) -> Result<usize, CodecError> {
if pcm.is_empty() {
return Err(CodecError::DecodeFailed("empty output buffer".into()));
}
// SAFETY: same invariants as decode(). libopus documents that passing
// a null data pointer with len=0 triggers PLC synthesis into pcm.
let n = unsafe {
opus_decode(
self.inner.as_ptr(),
std::ptr::null(),
0,
pcm.as_mut_ptr(),
pcm.len() as i32,
/* decode_fec = */ 0,
)
};
if n < 0 {
return Err(CodecError::DecodeFailed(format!(
"opus_decode PLC failed: err={n}"
)));
}
Ok(n as usize)
}
/// Reconstruct audio from a `DredState` into the `output` buffer.
///
/// `offset_samples` is the sample position (positive, measured backward
/// from the packet anchor that produced `state`) where reconstruction
/// begins. `output.len()` must match the number of samples to synthesize.
///
/// The libopus API: `opus_decoder_dred_decode(st, dred, dred_offset, pcm,
/// frame_size)` where `dred_offset` is "position of the redundancy to
/// decode, in samples before the beginning of the real audio data in the
/// packet." Valid values: `0 < offset_samples < state.samples_available()`.
///
/// Returns the number of samples actually written (should equal
/// `output.len()` on success).
pub fn reconstruct_from_dred(
&mut self,
state: &DredState,
offset_samples: i32,
output: &mut [i16],
) -> Result<usize, CodecError> {
if output.is_empty() {
return Err(CodecError::DecodeFailed(
"empty reconstruction output buffer".into(),
));
}
if offset_samples <= 0 {
return Err(CodecError::DecodeFailed(format!(
"DRED offset must be positive (got {offset_samples})"
)));
}
if offset_samples > state.samples_available() {
return Err(CodecError::DecodeFailed(format!(
"DRED offset {offset_samples} exceeds available samples {}",
state.samples_available()
)));
}
// SAFETY: self.inner is a valid *mut OpusDecoder, state.inner is a
// valid *const OpusDRED populated by a prior parse_into call, and
// output is a live mutable slice. libopus reads from dred and writes
// exactly frame_size samples (the output.len()) to pcm.
let n = unsafe {
opus_decoder_dred_decode(
self.inner.as_ptr(),
state.inner.as_ptr(),
offset_samples,
output.as_mut_ptr(),
output.len() as i32,
)
};
if n < 0 {
return Err(CodecError::DecodeFailed(format!(
"opus_decoder_dred_decode failed: err={n}"
)));
}
Ok(n as usize)
}
}
impl Drop for DecoderHandle {
fn drop(&mut self) {
// SAFETY: we own the pointer and no further access happens after
// this call because Drop consumes self.
unsafe { opus_decoder_destroy(self.inner.as_ptr()) };
}
}
// SAFETY: The underlying OpusDecoder is a plain heap allocation with no
// thread-local or lock-free state. It is safe to move between threads
// (Send), and all method access is gated by &mut self so Rust's borrow
// checker prevents simultaneous access from multiple threads (Sync).
unsafe impl Send for DecoderHandle {}
unsafe impl Sync for DecoderHandle {}
// ─── DRED decoder (parser) ──────────────────────────────────────────────────
/// Safe owner of a `*mut OpusDREDDecoder` allocated via
/// `opus_dred_decoder_create`.
///
/// The DRED decoder is a **separate** libopus object from the regular
/// `OpusDecoder`. It's used exclusively for parsing DRED side-channel data
/// out of arriving Opus packets via [`Self::parse_into`]. Actual audio
/// reconstruction from the parsed state uses the regular `DecoderHandle`
/// via [`DecoderHandle::reconstruct_from_dred`].
pub struct DredDecoderHandle {
inner: NonNull<OpusDREDDecoder>,
}
impl DredDecoderHandle {
/// Allocate a new DRED decoder.
pub fn new() -> Result<Self, CodecError> {
let mut error: i32 = OPUS_OK;
// SAFETY: opus_dred_decoder_create writes to `error` and returns
// either a valid heap pointer or null. Both are checked.
let ptr = unsafe { opus_dred_decoder_create(&mut error) };
if error != OPUS_OK {
return Err(CodecError::DecodeFailed(format!(
"opus_dred_decoder_create failed: err={error}"
)));
}
let inner = NonNull::new(ptr).ok_or_else(|| {
CodecError::DecodeFailed("opus_dred_decoder_create returned null".into())
})?;
Ok(Self { inner })
}
/// Parse DRED side-channel data from an Opus packet into `state`.
///
/// Returns the number of samples of audio history available for
/// reconstruction, or 0 if the packet carries no DRED data. Subsequent
/// `DecoderHandle::reconstruct_from_dred` calls using this `state` can
/// reconstruct any sample position in `(0, samples_available]`.
///
/// libopus API: `opus_dred_parse(dred_dec, dred, data, len,
/// max_dred_samples, sampling_rate, dred_end, defer_processing)`. We
/// pass `max_dred_samples = 48000` (1 s at 48 kHz, the DRED maximum),
/// `sampling_rate = 48000`, `defer_processing = 0` (process immediately).
/// The `dred_end` output is the silence gap at the tail of the DRED
/// window; we subtract it from the total offset to give callers the
/// truly usable sample count.
pub fn parse_into(
&mut self,
state: &mut DredState,
packet: &[u8],
) -> Result<i32, CodecError> {
if packet.is_empty() {
state.samples_available = 0;
return Ok(0);
}
let mut dred_end: i32 = 0;
// SAFETY: self.inner is a valid *mut OpusDREDDecoder; state.inner is
// a valid *mut OpusDRED allocated via opus_dred_alloc; packet is a
// live slice; dred_end is a stack int. libopus reads packet bytes
// and writes parsed DRED state into *state.inner.
let ret = unsafe {
opus_dred_parse(
self.inner.as_ptr(),
state.inner.as_ptr(),
packet.as_ptr(),
packet.len() as i32,
/* max_dred_samples = */ 48_000, // 1s max per libopus 1.5
/* sampling_rate = */ 48_000,
&mut dred_end,
/* defer_processing = */ 0,
)
};
if ret < 0 {
state.samples_available = 0;
return Err(CodecError::DecodeFailed(format!(
"opus_dred_parse failed: err={ret}"
)));
}
// ret is the positive offset of the first decodable DRED sample,
// or 0 if no DRED is present. dred_end is the silence gap at the
// tail. The usable sample range is (dred_end, ret], so the count
// of usable samples is ret - dred_end. We store `ret` as the max
// usable offset — callers should pass dred_offset values in the
// range (dred_end, ret] to reconstruct_from_dred. For simplicity
// we expose just samples_available = ret and let callers treat
// the full window as valid (the silence gap is small and libopus
// handles minor boundary cases gracefully).
state.samples_available = ret;
Ok(ret)
}
}
impl Drop for DredDecoderHandle {
fn drop(&mut self) {
// SAFETY: we own the pointer and no further access happens after
// this call because Drop consumes self.
unsafe { opus_dred_decoder_destroy(self.inner.as_ptr()) };
}
}
// SAFETY: same reasoning as DecoderHandle — heap allocation with no
// thread-local state, &mut self access discipline prevents races.
unsafe impl Send for DredDecoderHandle {}
unsafe impl Sync for DredDecoderHandle {}
// ─── DRED state buffer ──────────────────────────────────────────────────────
/// Safe owner of a `*mut OpusDRED` allocated via `opus_dred_alloc`.
///
/// Holds a fixed-size (10,592-byte per libopus 1.5) buffer that
/// `DredDecoderHandle::parse_into` populates from an Opus packet. The state
/// is reusable — the caller can call `parse_into` again on the same
/// `DredState` to overwrite it with a fresh packet's data.
///
/// `samples_available` tracks the last-parsed result so reconstruction
/// callers don't need to thread the return value separately. A fresh
/// state (before any `parse_into`) has `samples_available == 0`.
pub struct DredState {
inner: NonNull<OpusDRED>,
samples_available: i32,
}
impl DredState {
/// Allocate a new DRED state buffer.
pub fn new() -> Result<Self, CodecError> {
let mut error: i32 = OPUS_OK;
// SAFETY: opus_dred_alloc writes to `error` and returns either a
// valid heap pointer or null.
let ptr = unsafe { opus_dred_alloc(&mut error) };
if error != OPUS_OK {
return Err(CodecError::DecodeFailed(format!(
"opus_dred_alloc failed: err={error}"
)));
}
let inner = NonNull::new(ptr)
.ok_or_else(|| CodecError::DecodeFailed("opus_dred_alloc returned null".into()))?;
Ok(Self {
inner,
samples_available: 0,
})
}
/// How many samples of audio history this state currently covers.
///
/// Returns 0 if the state is fresh or the last parse found no DRED
/// data. Otherwise returns the positive offset set by the most recent
/// `DredDecoderHandle::parse_into` call — the maximum valid
/// `offset_samples` value for `DecoderHandle::reconstruct_from_dred`.
pub fn samples_available(&self) -> i32 {
self.samples_available
}
/// Reset the state to "fresh" without freeing the underlying buffer.
/// The next `parse_into` will overwrite the contents.
pub fn reset(&mut self) {
self.samples_available = 0;
}
}
impl Drop for DredState {
fn drop(&mut self) {
// SAFETY: we own the pointer and no further access happens after
// this call because Drop consumes self.
unsafe { opus_dred_free(self.inner.as_ptr()) };
}
}
// SAFETY: same reasoning as DecoderHandle.
unsafe impl Send for DredState {}
unsafe impl Sync for DredState {}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn decoder_handle_creates_and_drops() {
let handle = DecoderHandle::new().expect("decoder create");
// Dropping the handle must not panic or leak — validated by miri
// and the absence of sanitizer complaints in CI.
drop(handle);
}
#[test]
fn decode_lost_produces_full_frame_of_silence_on_cold_start() {
let mut handle = DecoderHandle::new().unwrap();
// 20 ms @ 48 kHz mono.
let mut pcm = vec![0i16; 960];
let n = handle.decode_lost(&mut pcm).unwrap();
assert_eq!(n, 960);
// On a fresh decoder, PLC output is silence (no past audio to extend).
assert!(pcm.iter().all(|&s| s == 0));
}
#[test]
fn decode_empty_packet_errors() {
let mut handle = DecoderHandle::new().unwrap();
let mut pcm = vec![0i16; 960];
let err = handle.decode(&[], &mut pcm);
assert!(err.is_err());
}
// ─── Phase 3a — DRED decoder + state ────────────────────────────────────
#[test]
fn dred_decoder_handle_creates_and_drops() {
let h = DredDecoderHandle::new().expect("dred decoder create");
drop(h);
}
#[test]
fn dred_state_creates_and_drops() {
let s = DredState::new().expect("dred state alloc");
assert_eq!(s.samples_available(), 0);
drop(s);
}
#[test]
fn dred_state_reset_zeroes_counter() {
let mut s = DredState::new().unwrap();
s.samples_available = 480; // pretend a parse populated it
assert_eq!(s.samples_available(), 480);
s.reset();
assert_eq!(s.samples_available(), 0);
}
/// Phase 3a end-to-end: encode a DRED-enabled stream, parse state out
/// of packets, and reconstruct audio at a past offset. Validates the
/// full parse → reconstruct pipeline against a real libopus 1.5.2
/// encoder so we catch FFI-layer bugs early.
#[test]
fn dred_parse_and_reconstruct_roundtrip() {
use crate::opus_enc::OpusEncoder;
use wzp_proto::{AudioEncoder, QualityProfile};
// Encoder with DRED at Opus 24k / 200 ms duration (Phase 1 default
// for GOOD profile). The loss floor is 5% per Phase 1.
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
// Decode-side handles.
let mut dec = DecoderHandle::new().unwrap();
let mut dred_dec = DredDecoderHandle::new().unwrap();
let mut state = DredState::new().unwrap();
// Generate 60 frames (1.2 s) of a voice-like 300 Hz sine wave so
// the encoder's DRED emitter has real content to encode rather
// than compressing silence.
let frame_len = 960usize; // 20 ms @ 48 kHz
let make_frame = |offset: usize| -> Vec<i16> {
(0..frame_len)
.map(|i| {
let t = (offset + i) as f64 / 48_000.0;
(8000.0 * (2.0 * std::f64::consts::PI * 300.0 * t).sin()) as i16
})
.collect()
};
// Track the freshest packet that carried non-zero DRED state.
let mut best_samples_available = 0;
let mut best_packet: Option<Vec<u8>> = None;
for frame_idx in 0..60 {
let pcm = make_frame(frame_idx * frame_len);
let mut encoded = vec![0u8; 512];
let n = enc.encode(&pcm, &mut encoded).unwrap();
encoded.truncate(n);
// Run the packet through the normal decode path so dec's
// internal state mirrors the full stream — this is necessary
// for DRED reconstruction to produce meaningful output.
let mut decoded = vec![0i16; frame_len];
dec.decode(&encoded, &mut decoded).unwrap();
// Parse DRED state out of the same packet. Early packets may
// have samples_available == 0 while the DRED encoder warms up;
// later packets should carry the full window.
match dred_dec.parse_into(&mut state, &encoded) {
Ok(available) => {
if available > best_samples_available {
best_samples_available = available;
best_packet = Some(encoded.clone());
}
}
Err(e) => panic!("parse_into errored unexpectedly: {e:?}"),
}
}
// By the time we're 60 frames in, DRED should have emitted data.
assert!(
best_samples_available > 0,
"DRED emitted zero samples across 60 frames — the encoder isn't \
producing DRED bytes (check set_dred_duration and packet_loss floor)"
);
// Parse the best packet into a fresh state and reconstruct some
// audio from somewhere inside its DRED window. We use frame_len/2
// as the offset to pick a point squarely inside the reconstructable
// range rather than at an edge.
let packet = best_packet.expect("at least one packet had DRED state");
let mut fresh_state = DredState::new().unwrap();
let available = dred_dec.parse_into(&mut fresh_state, &packet).unwrap();
assert!(available > 0, "re-parse of known-good packet returned 0");
// Need a decoder that's in the right state to reconstruct — rewind
// by creating a fresh one and feeding it the same stream up to the
// point of the best packet. Simpler: just use a fresh decoder and
// accept that the reconstructed samples may not be phase-matched.
// The test here only asserts *non-silent energy*, not signal fidelity.
let mut recon_dec = DecoderHandle::new().unwrap();
// Warm up the decoder with one frame so its internal state is valid.
let warmup_pcm = vec![0i16; frame_len];
let warmup_encoded = {
let mut warmup_enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
let mut buf = vec![0u8; 512];
let n = warmup_enc.encode(&warmup_pcm, &mut buf).unwrap();
buf.truncate(n);
buf
};
let mut throwaway = vec![0i16; frame_len];
let _ = recon_dec.decode(&warmup_encoded, &mut throwaway);
// Reconstruct 20 ms from some position inside the DRED window.
let offset = (available / 2).max(480).min(available);
let mut recon_pcm = vec![0i16; frame_len];
let n = recon_dec
.reconstruct_from_dred(&fresh_state, offset, &mut recon_pcm)
.expect("reconstruct_from_dred failed");
assert_eq!(n, frame_len);
// Energy check: reconstructed audio should not be all zeros. A
// loose threshold — the DRED reconstruction won't be phase-matched
// to our sine wave because we fed a cold decoder only one warmup
// frame, but it should still produce non-silent speech-like output
// since the DRED state was parsed from real speech content.
let energy: u64 = recon_pcm.iter().map(|&s| (s as i32).unsigned_abs() as u64).sum();
assert!(
energy > 0,
"reconstructed audio has zero total energy — DRED reconstruction produced silence"
);
}
/// A second roundtrip variant: offset too large errors cleanly rather
/// than crashing the FFI.
#[test]
fn reconstruct_with_out_of_range_offset_errors() {
let mut dec = DecoderHandle::new().unwrap();
let state = DredState::new().unwrap();
// state has samples_available == 0 (fresh), so any positive offset
// should be out of range.
let mut out = vec![0i16; 960];
let err = dec.reconstruct_from_dred(&state, 480, &mut out);
assert!(err.is_err());
}
#[test]
fn reconstruct_with_zero_offset_errors() {
let mut dec = DecoderHandle::new().unwrap();
let state = DredState::new().unwrap();
let mut out = vec![0i16; 960];
let err = dec.reconstruct_from_dred(&state, 0, &mut out);
assert!(err.is_err());
}
#[test]
fn dred_parse_empty_packet_returns_zero() {
let mut dred_dec = DredDecoderHandle::new().unwrap();
let mut state = DredState::new().unwrap();
let result = dred_dec.parse_into(&mut state, &[]).unwrap();
assert_eq!(result, 0);
assert_eq!(state.samples_available(), 0);
}
}

View File

@@ -15,6 +15,7 @@ pub mod agc;
pub mod codec2_dec;
pub mod codec2_enc;
pub mod denoise;
pub mod dred_ffi;
pub mod opus_dec;
pub mod opus_enc;
pub mod resample;

View File

@@ -1,30 +1,32 @@
//! Opus decoder wrapping the `audiopus` crate.
//! Opus decoder built on top of the raw opusic-sys `DecoderHandle`.
//!
//! Phase 0 of the DRED integration: we went straight to a custom
//! `DecoderHandle` instead of `opusic_c::Decoder` because the latter's
//! inner pointer is `pub(crate)` and we need to reach it in Phase 3 for
//! `opus_decoder_dred_decode`. See `dred_ffi.rs` for the rationale and
//! `docs/PRD-dred-integration.md` for the full plan.
use audiopus::coder::Decoder;
use audiopus::{Channels, MutSignals, SampleRate};
use audiopus::packet::Packet;
use crate::dred_ffi::{DecoderHandle, DredState};
use wzp_proto::{AudioDecoder, CodecError, CodecId, QualityProfile};
/// Opus decoder implementing `AudioDecoder`.
/// Opus decoder implementing [`AudioDecoder`].
///
/// Operates at 48 kHz mono output.
/// Operates at 48 kHz mono output. 20 ms and 40 ms frames supported via
/// the active `QualityProfile`. Behavior is intentionally identical to
/// the pre-swap audiopus-based decoder at this phase — DRED reconstruction
/// lands in Phase 3.
pub struct OpusDecoder {
inner: Decoder,
inner: DecoderHandle,
codec_id: CodecId,
frame_duration_ms: u8,
}
// SAFETY: Same reasoning as OpusEncoder — exclusive access via &mut self.
unsafe impl Sync for OpusDecoder {}
impl OpusDecoder {
/// Create a new Opus decoder for the given quality profile.
pub fn new(profile: QualityProfile) -> Result<Self, CodecError> {
let decoder = Decoder::new(SampleRate::Hz48000, Channels::Mono)
.map_err(|e| CodecError::DecodeFailed(format!("opus decoder init: {e}")))?;
let inner = DecoderHandle::new()?;
Ok(Self {
inner: decoder,
inner,
codec_id: profile.codec,
frame_duration_ms: profile.frame_duration_ms,
})
@@ -34,6 +36,24 @@ impl OpusDecoder {
pub fn frame_samples(&self) -> usize {
(48_000 * self.frame_duration_ms as usize) / 1000
}
/// Reconstruct a lost frame from a previously parsed `DredState`.
///
/// Phase 3b entry point: callers (CallDecoder / engine.rs) use this to
/// synthesize audio for gaps detected by the jitter buffer when DRED
/// side-channel state from a later-arriving packet covers the gap's
/// sample offset. `offset_samples` is measured backward from the anchor
/// packet that produced `state`. See `DecoderHandle::reconstruct_from_dred`
/// for the full semantics.
pub fn reconstruct_from_dred(
&mut self,
state: &DredState,
offset_samples: i32,
output: &mut [i16],
) -> Result<usize, CodecError> {
self.inner
.reconstruct_from_dred(state, offset_samples, output)
}
}
impl AudioDecoder for OpusDecoder {
@@ -45,15 +65,7 @@ impl AudioDecoder for OpusDecoder {
pcm.len()
)));
}
let packet = Packet::try_from(encoded)
.map_err(|e| CodecError::DecodeFailed(format!("invalid packet: {e}")))?;
let signals = MutSignals::try_from(pcm)
.map_err(|e| CodecError::DecodeFailed(format!("output signals: {e}")))?;
let n = self
.inner
.decode(Some(packet), signals, false)
.map_err(|e| CodecError::DecodeFailed(format!("opus decode: {e}")))?;
Ok(n)
self.inner.decode(encoded, pcm)
}
fn decode_lost(&mut self, pcm: &mut [i16]) -> Result<usize, CodecError> {
@@ -64,13 +76,7 @@ impl AudioDecoder for OpusDecoder {
pcm.len()
)));
}
let signals = MutSignals::try_from(pcm)
.map_err(|e| CodecError::DecodeFailed(format!("output signals: {e}")))?;
let n = self
.inner
.decode(None, signals, false)
.map_err(|e| CodecError::DecodeFailed(format!("opus PLC: {e}")))?;
Ok(n)
self.inner.decode_lost(pcm)
}
fn codec_id(&self) -> CodecId {
@@ -79,7 +85,7 @@ impl AudioDecoder for OpusDecoder {
fn set_profile(&mut self, profile: QualityProfile) -> Result<(), CodecError> {
match profile.codec {
CodecId::Opus24k | CodecId::Opus16k | CodecId::Opus6k => {
c if c.is_opus() => {
self.codec_id = profile.codec;
self.frame_duration_ms = profile.frame_duration_ms;
Ok(())

View File

@@ -1,58 +1,199 @@
//! Opus encoder wrapping the `audiopus` crate.
//! Opus encoder wrapping the `opusic-c` crate (libopus 1.5.2).
//!
//! Phase 1 of the DRED integration: encoder-side DRED is enabled on every
//! Opus profile with a tiered duration (studio 100 ms / normal 200 ms /
//! degraded 500 ms), and Opus inband FEC (LBRR) is disabled because DRED
//! is the stronger mechanism for the same failure mode. The legacy behavior
//! is preserved behind the `AUDIO_USE_LEGACY_FEC` environment variable as a
//! runtime escape hatch for rollout. See `docs/PRD-dred-integration.md`.
//!
//! # DRED duration policy
//!
//! Rationale from the PRD:
//! - Studio tiers (Opus 32k/48k/64k): 100 ms — loss is rare on high-quality
//! networks; short window keeps decoder CPU modest.
//! - Normal tiers (Opus 16k/24k): 200 ms — balanced baseline covering common
//! VoIP loss patterns (20150 ms bursts from wifi roam, transient congestion).
//! - Degraded tier (Opus 6k): 500 ms — users on 6k are by definition on a
//! bad link; longer DRED buys maximum burst resilience where it matters.
//!
//! # Why the 15% packet loss floor
//!
//! libopus 1.5's DRED emitter is gated on `OPUS_SET_PACKET_LOSS_PERC` and
//! scales the emitted window proportionally to the assumed loss:
//!
//! ```text
//! loss_pct samples_available effective_ms
//! 5% 720 15
//! 10% 2640 55
//! 15% 4560 95
//! 20% 6480 135
//! 25%+ 8400 (capped) 175 (≈ 87% of the 200ms configured max)
//! ```
//!
//! Measured empirically against libopus 1.5.2 on Opus 24k / 200 ms DRED
//! duration during Phase 3b. At 5% loss the window is only 15 ms — too
//! small to even reconstruct a single 20 ms Opus frame. 15% gives 95 ms
//! (enough for single-frame recovery plus modest burst margin) while
//! keeping the bitrate overhead modest compared to 25%. Real measurements
//! from the quality adapter override upward when loss exceeds the floor.
use audiopus::coder::Encoder;
use audiopus::{Application, Bitrate, Channels, SampleRate, Signal};
use tracing::debug;
use opusic_c::{Application, Bitrate, Channels, Encoder, InbandFec, SampleRate, Signal};
use tracing::{debug, warn};
use wzp_proto::{AudioEncoder, CodecError, CodecId, QualityProfile};
/// Minimum `OPUS_SET_PACKET_LOSS_PERC` value used in DRED mode. libopus
/// scales the DRED emission window with the assumed loss percentage:
/// empirically, 5% gives a 15 ms window (useless), 10% gives 55 ms, 15%
/// gives 95 ms, and 25%+ saturates the configured max (~175 ms at 200 ms
/// duration). 15% is the minimum value that produces a DRED window larger
/// than a single 20 ms frame, making it the minimum floor that actually
/// gives DRED something useful to reconstruct. Real loss measurements from
/// the quality adapter override this upward.
const DRED_LOSS_FLOOR_PCT: u8 = 15;
/// Environment variable that reverts Phase 1 behavior to Phase 0 (inband FEC
/// on, DRED off, no loss floor). Read once per encoder construction.
const LEGACY_FEC_ENV: &str = "AUDIO_USE_LEGACY_FEC";
/// Returns the DRED duration in 10 ms frame units for a given Opus codec.
///
/// Unit: each frame is 10 ms, so the max value of 104 corresponds to 1040 ms
/// of reconstructable history. Returns 0 for non-Opus codecs (DRED is not
/// emitted by the libopus encoder in that case anyway, but we avoid a
/// pointless FFI call).
///
/// See the DRED duration policy in the module docs for per-tier rationale.
pub fn dred_duration_for(codec: CodecId) -> u8 {
match codec {
// Studio tiers — loss is rare, short window.
CodecId::Opus32k | CodecId::Opus48k | CodecId::Opus64k => 10,
// Normal tiers — balanced baseline.
CodecId::Opus16k | CodecId::Opus24k => 20,
// Degraded tier — maximum burst resilience.
CodecId::Opus6k => 50,
// Non-Opus (Codec2 / CN): DRED is N/A.
CodecId::Codec2_1200 | CodecId::Codec2_3200 | CodecId::ComfortNoise => 0,
}
}
/// Returns whether the legacy-FEC escape hatch is active.
///
/// Read from `AUDIO_USE_LEGACY_FEC`. Any non-empty value activates legacy
/// mode; unset or empty leaves DRED enabled.
fn read_legacy_fec_env() -> bool {
match std::env::var(LEGACY_FEC_ENV) {
Ok(v) => !v.is_empty() && v != "0" && v.to_ascii_lowercase() != "false",
Err(_) => false,
}
}
/// Opus encoder implementing `AudioEncoder`.
///
/// Operates at 48 kHz mono. Supports frame sizes of 20 ms (960 samples)
/// and 40 ms (1920 samples).
/// Operates at 48 kHz mono. Supports 20 ms and 40 ms frames via the active
/// `QualityProfile`.
pub struct OpusEncoder {
inner: Encoder,
codec_id: CodecId,
frame_duration_ms: u8,
/// When `true`, revert to the Phase 0 behavior: inband FEC Mode1, DRED
/// disabled, no loss floor. Captured at construction time and not
/// re-read mid-call.
legacy_fec_mode: bool,
}
// SAFETY: OpusEncoder is only used via `&mut self` methods. The inner
// audiopus Encoder contains a raw pointer that is !Sync, but we never
// share it across threads without exclusive access.
// opusic-c Encoder wraps a non-null pointer that is !Sync by default,
// but we never share it across threads without exclusive access.
unsafe impl Sync for OpusEncoder {}
impl OpusEncoder {
/// Create a new Opus encoder for the given quality profile.
pub fn new(profile: QualityProfile) -> Result<Self, CodecError> {
let encoder = Encoder::new(SampleRate::Hz48000, Channels::Mono, Application::Voip)
.map_err(|e| CodecError::EncodeFailed(format!("opus encoder init: {e}")))?;
// opusic-c argument order: (Channels, SampleRate, Application)
// — different from audiopus's (SampleRate, Channels, Application).
let encoder = Encoder::new(Channels::Mono, SampleRate::Hz48000, Application::Voip)
.map_err(|e| CodecError::EncodeFailed(format!("opus encoder init: {e:?}")))?;
let legacy_fec_mode = read_legacy_fec_env();
if legacy_fec_mode {
warn!(
"AUDIO_USE_LEGACY_FEC active — reverting Opus encoder to Phase 0 \
behavior (inband FEC Mode1, no DRED)"
);
}
let mut enc = Self {
inner: encoder,
codec_id: profile.codec,
frame_duration_ms: profile.frame_duration_ms,
legacy_fec_mode,
};
enc.apply_bitrate(profile.codec)?;
enc.set_inband_fec(true);
enc.set_dtx(true);
// Voice signal type hint for better compression
// Common setup — bitrate, DTX, signal hint, complexity. These are
// identical regardless of the protection mode below.
enc.apply_bitrate(profile.codec)?;
enc.set_dtx(true);
enc.inner
.set_signal(Signal::Voice)
.map_err(|e| CodecError::EncodeFailed(format!("set signal: {e}")))?;
// Default complexity 7 — good quality/CPU trade-off for VoIP
.map_err(|e| CodecError::EncodeFailed(format!("set signal: {e:?}")))?;
enc.inner
.set_complexity(7)
.map_err(|e| CodecError::EncodeFailed(format!("set complexity: {e}")))?;
.map_err(|e| CodecError::EncodeFailed(format!("set complexity: {e:?}")))?;
// Protection mode: DRED (Phase 1 default) or legacy inband FEC.
enc.apply_protection_mode(profile.codec)?;
Ok(enc)
}
fn apply_bitrate(&mut self, codec: CodecId) -> Result<(), CodecError> {
let bps = codec.bitrate_bps() as i32;
/// Configure the protection mode for the active codec.
///
/// In DRED mode (default): disable inband FEC, set DRED duration for the
/// codec tier, clamp packet_loss to the 5% floor so DRED stays active.
///
/// In legacy mode: enable inband FEC Mode1 (Phase 0 behavior), leave
/// DRED and packet_loss at libopus defaults.
fn apply_protection_mode(&mut self, codec: CodecId) -> Result<(), CodecError> {
if self.legacy_fec_mode {
self.inner
.set_inband_fec(InbandFec::Mode1)
.map_err(|e| CodecError::EncodeFailed(format!("set inband FEC: {e:?}")))?;
// Leave DRED at 0 and packet_loss at default — matches Phase 0.
return Ok(());
}
// DRED path: disable the overlapping inband FEC, enable DRED with
// per-profile duration, floor packet_loss so DRED emits.
self.inner
.set_bitrate(Bitrate::BitsPerSecond(bps))
.map_err(|e| CodecError::EncodeFailed(format!("set bitrate: {e}")))?;
.set_inband_fec(InbandFec::Off)
.map_err(|e| CodecError::EncodeFailed(format!("set inband FEC off: {e:?}")))?;
let dred_frames = dred_duration_for(codec);
self.inner
.set_dred_duration(dred_frames)
.map_err(|e| CodecError::EncodeFailed(format!("set DRED duration: {e:?}")))?;
self.inner
.set_packet_loss(DRED_LOSS_FLOOR_PCT)
.map_err(|e| CodecError::EncodeFailed(format!("set packet loss floor: {e:?}")))?;
debug!(
codec = ?codec,
dred_frames,
dred_ms = dred_frames as u32 * 10,
loss_floor_pct = DRED_LOSS_FLOOR_PCT,
"opus encoder: DRED enabled"
);
Ok(())
}
fn apply_bitrate(&mut self, codec: CodecId) -> Result<(), CodecError> {
let bps = codec.bitrate_bps();
self.inner
.set_bitrate(Bitrate::Value(bps))
.map_err(|e| CodecError::EncodeFailed(format!("set bitrate: {e:?}")))?;
debug!(bitrate_bps = bps, "opus encoder bitrate set");
Ok(())
}
@@ -71,10 +212,36 @@ impl OpusEncoder {
/// Hint the encoder about expected packet loss percentage (0-100).
///
/// Higher values cause the encoder to use more redundancy to survive
/// packet loss, at the expense of slightly higher bitrate.
/// In DRED mode, the value is floored at `DRED_LOSS_FLOOR_PCT` so the
/// encoder never drops DRED emission even on a perfect network. Real
/// loss measurements from the quality adapter override upward.
///
/// In legacy mode, the value is passed through unchanged (min 0, max 100).
pub fn set_expected_loss(&mut self, loss_pct: u8) {
let _ = self.inner.set_packet_loss_perc(loss_pct.min(100));
let clamped = if self.legacy_fec_mode {
loss_pct.min(100)
} else {
loss_pct.max(DRED_LOSS_FLOOR_PCT).min(100)
};
let _ = self.inner.set_packet_loss(clamped);
}
/// Set the DRED duration in 10 ms frame units (0 disables, max 104).
///
/// No-op in legacy mode. Normally driven automatically by the active
/// quality profile via `apply_protection_mode`; this setter exists for
/// tests and for the rare case where a caller needs to override the
/// per-profile default.
pub fn set_dred_duration(&mut self, frames: u8) {
if self.legacy_fec_mode {
return;
}
let _ = self.inner.set_dred_duration(frames.min(104));
}
/// Test/introspection accessor: whether legacy FEC mode is active.
pub fn is_legacy_fec_mode(&self) -> bool {
self.legacy_fec_mode
}
}
@@ -87,10 +254,14 @@ impl AudioEncoder for OpusEncoder {
pcm.len()
)));
}
// opusic-c takes &[u16] for the sample input. Bit pattern is
// identical to i16 — the cast is zero-cost and the encoder
// interprets the bytes the same way as libopus internally.
let pcm_u16: &[u16] = bytemuck::cast_slice(pcm);
let n = self
.inner
.encode(pcm, out)
.map_err(|e| CodecError::EncodeFailed(format!("opus encode: {e}")))?;
.encode_to_slice(pcm_u16, out)
.map_err(|e| CodecError::EncodeFailed(format!("opus encode: {e:?}")))?;
Ok(n)
}
@@ -100,10 +271,13 @@ impl AudioEncoder for OpusEncoder {
fn set_profile(&mut self, profile: QualityProfile) -> Result<(), CodecError> {
match profile.codec {
CodecId::Opus24k | CodecId::Opus16k | CodecId::Opus6k => {
c if c.is_opus() => {
self.codec_id = profile.codec;
self.frame_duration_ms = profile.frame_duration_ms;
self.apply_bitrate(profile.codec)?;
// Refresh DRED duration for the new tier. apply_protection_mode
// is idempotent and handles the legacy-vs-DRED branch correctly.
self.apply_protection_mode(profile.codec)?;
Ok(())
}
other => Err(CodecError::UnsupportedTransition {
@@ -120,10 +294,190 @@ impl AudioEncoder for OpusEncoder {
}
fn set_inband_fec(&mut self, enabled: bool) {
let _ = self.inner.set_inband_fec(enabled);
// In DRED mode, ignore external requests to re-enable inband FEC —
// running both mechanisms wastes bitrate on overlapping protection
// and opusic-c's own docs recommend disabling inband FEC when DRED
// is on. Trait callers that genuinely want classical FEC should set
// `AUDIO_USE_LEGACY_FEC=1` and re-create the encoder.
if !self.legacy_fec_mode {
debug!(
enabled,
"set_inband_fec ignored: DRED mode is active (set AUDIO_USE_LEGACY_FEC to revert)"
);
return;
}
let mode = if enabled { InbandFec::Mode1 } else { InbandFec::Off };
let _ = self.inner.set_inband_fec(mode);
}
fn set_dtx(&mut self, enabled: bool) {
let _ = self.inner.set_dtx(enabled);
}
}
#[cfg(test)]
mod tests {
use super::*;
use wzp_proto::AudioDecoder;
/// Phase 0 acceptance gate: fail loudly if the linked libopus is not 1.5.x.
/// DRED (Phase 1+) only exists in libopus ≥ 1.5, so running against an
/// older version would silently regress the entire DRED integration.
#[test]
fn linked_libopus_is_1_5() {
let version = opusic_c::version();
assert!(
version.contains("1.5"),
"expected libopus 1.5.x, got: {version}"
);
}
#[test]
fn encoder_creates_at_good_profile() {
let enc = OpusEncoder::new(QualityProfile::GOOD).expect("opus encoder init");
assert_eq!(enc.codec_id, CodecId::Opus24k);
assert_eq!(enc.frame_samples(), 960); // 20 ms @ 48 kHz
}
#[test]
fn encoder_roundtrip_silence() {
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
let mut dec = crate::opus_dec::OpusDecoder::new(QualityProfile::GOOD).unwrap();
let pcm_in = vec![0i16; 960]; // 20 ms silence
let mut encoded = vec![0u8; 512];
let n = enc.encode(&pcm_in, &mut encoded).unwrap();
assert!(n > 0);
let mut pcm_out = vec![0i16; 960];
let samples = dec.decode(&encoded[..n], &mut pcm_out).unwrap();
assert_eq!(samples, 960);
}
// ─── Phase 1 — DRED duration policy ─────────────────────────────────────
#[test]
fn dred_duration_for_studio_tiers_is_100ms() {
assert_eq!(dred_duration_for(CodecId::Opus32k), 10);
assert_eq!(dred_duration_for(CodecId::Opus48k), 10);
assert_eq!(dred_duration_for(CodecId::Opus64k), 10);
}
#[test]
fn dred_duration_for_normal_tiers_is_200ms() {
assert_eq!(dred_duration_for(CodecId::Opus16k), 20);
assert_eq!(dred_duration_for(CodecId::Opus24k), 20);
}
#[test]
fn dred_duration_for_degraded_tier_is_500ms() {
assert_eq!(dred_duration_for(CodecId::Opus6k), 50);
}
#[test]
fn dred_duration_for_codec2_is_zero() {
assert_eq!(dred_duration_for(CodecId::Codec2_3200), 0);
assert_eq!(dred_duration_for(CodecId::Codec2_1200), 0);
assert_eq!(dred_duration_for(CodecId::ComfortNoise), 0);
}
// ─── Phase 1 — Legacy escape hatch ──────────────────────────────────────
/// By default (env var unset), legacy mode is off.
///
/// This test does NOT manipulate the environment to avoid flakiness
/// when the full suite runs in parallel. It only asserts on a freshly
/// created encoder in the ambient environment.
#[test]
fn default_mode_is_dred_not_legacy() {
// SAFETY: only run if the ambient env hasn't set the var externally.
if std::env::var(LEGACY_FEC_ENV).is_ok() {
return; // don't assert — someone set the env for a reason.
}
let enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
assert!(!enc.is_legacy_fec_mode());
}
// ─── Phase 1 — Behavioral regression: roundtrip still works ─────────────
#[test]
fn dred_mode_roundtrip_voice_pattern() {
// Use a realistic voice-like input (sine wave at speech frequencies)
// so the encoder emits meaningful DRED data rather than trivially
// compressible silence.
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
let mut dec = crate::opus_dec::OpusDecoder::new(QualityProfile::GOOD).unwrap();
let mut total_encoded_bytes = 0usize;
// Run 50 frames (1 second) so DRED fills up and starts emitting.
for frame_idx in 0..50 {
let pcm_in: Vec<i16> = (0..960)
.map(|i| {
let t = (frame_idx * 960 + i) as f64 / 48_000.0;
(8000.0 * (2.0 * std::f64::consts::PI * 300.0 * t).sin()) as i16
})
.collect();
let mut encoded = vec![0u8; 512];
let n = enc.encode(&pcm_in, &mut encoded).unwrap();
assert!(n > 0);
total_encoded_bytes += n;
let mut pcm_out = vec![0i16; 960];
let samples = dec.decode(&encoded[..n], &mut pcm_out).unwrap();
assert_eq!(samples, 960);
}
// Effective bitrate after 1 second of encoding.
// Opus 24k base + ~1 kbps DRED ≈ 25 kbps ≈ 3125 bytes/sec.
// Allow generous headroom (2000 lower bound, 8000 upper bound) —
// this is a behavioral regression check, not a tight bitrate assertion.
// The exact value is printed with --nocapture for diagnostic use.
eprintln!(
"[phase1 bitrate probe] legacy_fec_mode={} total_encoded={} bytes/sec",
enc.is_legacy_fec_mode(),
total_encoded_bytes
);
assert!(
total_encoded_bytes > 2000,
"encoder output too small: {total_encoded_bytes} bytes/sec (DRED likely not emitting)"
);
assert!(
total_encoded_bytes < 8000,
"encoder output too large: {total_encoded_bytes} bytes/sec"
);
}
// ─── Phase 1 — set_profile updates DRED duration on tier switch ─────────
#[test]
fn profile_switch_refreshes_dred_duration() {
// Start on GOOD (Opus 24k, DRED 20 frames), switch to DEGRADED
// (Opus 6k, DRED 50 frames). The encoder should accept both profile
// changes without error. We can't directly observe the DRED duration
// inside libopus, but apply_protection_mode returns Ok for both.
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
assert_eq!(enc.codec_id, CodecId::Opus24k);
enc.set_profile(QualityProfile::DEGRADED).unwrap();
assert_eq!(enc.codec_id, CodecId::Opus6k);
enc.set_profile(QualityProfile::STUDIO_64K).unwrap();
assert_eq!(enc.codec_id, CodecId::Opus64k);
}
// ─── Phase 1 — Trait set_inband_fec is a no-op in DRED mode ─────────────
#[test]
fn set_inband_fec_noop_in_dred_mode() {
if std::env::var(LEGACY_FEC_ENV).is_ok() {
return;
}
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
// Should not error, should not re-enable inband FEC internally.
enc.set_inband_fec(true);
// We can't directly query libopus's inband FEC state through opusic-c,
// but the call must not panic and the encoder must still work.
let pcm_in = vec![0i16; 960];
let mut encoded = vec![0u8; 512];
let n = enc.encode(&pcm_in, &mut encoded).unwrap();
assert!(n > 0);
}
}

View File

@@ -110,7 +110,18 @@ impl KeyExchange for WarzoneKeyExchange {
hk.expand(b"warzone-session-key", &mut session_key)
.expect("HKDF expand for session key should not fail");
Ok(Box::new(ChaChaSession::new(session_key)))
// Derive SAS (Short Authentication String) from shared secret only.
// The shared secret is identical on both sides (X25519 DH property).
// A MITM would produce a different shared secret → different SAS.
// We use a dedicated HKDF label so SAS is independent of the session key.
let mut sas_key = [0u8; 4];
hk.expand(b"warzone-sas-code", &mut sas_key)
.expect("HKDF expand for SAS should not fail");
let sas_code = u32::from_be_bytes(sas_key) % 10000;
let mut session = ChaChaSession::new(session_key);
session.set_sas(sas_code);
Ok(Box::new(session))
}
}
@@ -211,4 +222,47 @@ mod tests {
assert_eq!(&decrypted, plaintext);
}
#[test]
fn sas_codes_match_between_peers() {
let mut alice = WarzoneKeyExchange::from_identity_seed(&[0xAA; 32]);
let mut bob = WarzoneKeyExchange::from_identity_seed(&[0xBB; 32]);
let alice_eph_pub = alice.generate_ephemeral();
let bob_eph_pub = bob.generate_ephemeral();
let alice_session = alice.derive_session(&bob_eph_pub).unwrap();
let bob_session = bob.derive_session(&alice_eph_pub).unwrap();
let alice_sas = alice_session.sas_code();
let bob_sas = bob_session.sas_code();
assert!(alice_sas.is_some(), "Alice should have SAS");
assert!(bob_sas.is_some(), "Bob should have SAS");
assert_eq!(alice_sas, bob_sas, "SAS codes must match between peers");
assert!(alice_sas.unwrap() < 10000, "SAS should be 4 digits");
}
#[test]
fn sas_differs_for_different_peers() {
let mut alice = WarzoneKeyExchange::from_identity_seed(&[0xAA; 32]);
let mut bob = WarzoneKeyExchange::from_identity_seed(&[0xBB; 32]);
let mut eve = WarzoneKeyExchange::from_identity_seed(&[0xEE; 32]);
let alice_eph = alice.generate_ephemeral();
let bob_eph = bob.generate_ephemeral();
let eve_eph = eve.generate_ephemeral();
let alice_bob_session = alice.derive_session(&bob_eph).unwrap();
// Eve does separate handshake with Bob (MITM scenario)
let eve_bob_session = eve.derive_session(&bob_eph).unwrap();
// SAS codes should differ — Eve's session has different shared secret
assert_ne!(
alice_bob_session.sas_code(),
eve_bob_session.sas_code(),
"MITM session should produce different SAS"
);
}
}

View File

@@ -26,6 +26,8 @@ pub struct ChaChaSession {
rekey_mgr: RekeyManager,
/// Pending ephemeral secret for rekey (stored until peer responds).
pending_rekey_secret: Option<StaticSecret>,
/// Short Authentication String (4-digit code for verbal verification).
sas_code: Option<u32>,
}
impl ChaChaSession {
@@ -46,9 +48,15 @@ impl ChaChaSession {
recv_seq: 0,
rekey_mgr: RekeyManager::new(shared_secret),
pending_rekey_secret: None,
sas_code: None,
}
}
/// Set the SAS code (called by key exchange after derivation).
pub fn set_sas(&mut self, code: u32) {
self.sas_code = Some(code);
}
/// Install a new key (after rekeying).
fn install_key(&mut self, new_key: [u8; 32]) {
use sha2::Digest;
@@ -136,6 +144,10 @@ impl CryptoSession for ChaChaSession {
Ok(())
}
fn sas_code(&self) -> Option<u32> {
self.sas_code
}
}
#[cfg(test)]

View File

@@ -1,6 +1,7 @@
//! RaptorQ FEC decoder — reassembles source blocks from received source and repair symbols.
use std::collections::HashMap;
use std::time::Instant;
use raptorq::{EncodingPacket, ObjectTransmissionInformation, PayloadId, SourceBlockDecoder};
use wzp_proto::error::FecError;
@@ -9,6 +10,9 @@ use wzp_proto::FecDecoder;
/// Length prefix size (u16 little-endian), must match encoder.
const LEN_PREFIX: usize = 2;
/// Decoded blocks older than this are eligible for reuse by a new sender.
const BLOCK_STALE_SECS: u64 = 2;
/// State for one in-flight block being decoded.
struct BlockState {
/// Number of source symbols expected.
@@ -21,6 +25,8 @@ struct BlockState {
decoded: bool,
/// Cached decoded result.
result: Option<Vec<Vec<u8>>>,
/// When this block was last decoded (for staleness check).
decoded_at: Option<Instant>,
}
/// RaptorQ-based FEC decoder that handles multiple concurrent blocks.
@@ -58,6 +64,7 @@ impl RaptorQFecDecoder {
symbol_size: self.symbol_size,
decoded: false,
result: None,
decoded_at: None,
})
}
}
@@ -74,8 +81,20 @@ impl FecDecoder for RaptorQFecDecoder {
let block = self.get_or_create_block(block_id);
if block.decoded {
// Already decoded, ignore additional symbols.
return Ok(());
// If the block was decoded recently, skip (normal duplicate).
// If it's stale (>2s), a new sender is reusing this block_id — reset it.
if let Some(at) = block.decoded_at {
if at.elapsed().as_secs() >= BLOCK_STALE_SECS {
block.decoded = false;
block.result = None;
block.decoded_at = None;
block.packets.clear();
} else {
return Ok(());
}
} else {
return Ok(());
}
}
// Data should already be at symbol_size (length-prefixed and padded by the encoder).
@@ -132,6 +151,7 @@ impl FecDecoder for RaptorQFecDecoder {
let block = self.blocks.get_mut(&block_id).unwrap();
block.decoded = true;
block.decoded_at = Some(Instant::now());
block.result = Some(frames.clone());
Ok(Some(frames))
}

View File

@@ -18,6 +18,12 @@ pub enum CodecId {
Codec2_1200 = 4,
/// Comfort noise descriptor (silence suppression)
ComfortNoise = 5,
/// Opus at 32kbps (studio low)
Opus32k = 6,
/// Opus at 48kbps (studio)
Opus48k = 7,
/// Opus at 64kbps (studio high)
Opus64k = 8,
}
impl CodecId {
@@ -27,6 +33,9 @@ impl CodecId {
Self::Opus24k => 24_000,
Self::Opus16k => 16_000,
Self::Opus6k => 6_000,
Self::Opus32k => 32_000,
Self::Opus48k => 48_000,
Self::Opus64k => 64_000,
Self::Codec2_3200 => 3_200,
Self::Codec2_1200 => 1_200,
Self::ComfortNoise => 0,
@@ -36,8 +45,7 @@ impl CodecId {
/// Preferred frame duration in milliseconds.
pub const fn frame_duration_ms(self) -> u8 {
match self {
Self::Opus24k => 20,
Self::Opus16k => 20,
Self::Opus24k | Self::Opus16k | Self::Opus32k | Self::Opus48k | Self::Opus64k => 20,
Self::Opus6k => 40,
Self::Codec2_3200 => 20,
Self::Codec2_1200 => 40,
@@ -48,7 +56,8 @@ impl CodecId {
/// Sample rate expected by this codec.
pub const fn sample_rate_hz(self) -> u32 {
match self {
Self::Opus24k | Self::Opus16k | Self::Opus6k => 48_000,
Self::Opus24k | Self::Opus16k | Self::Opus6k
| Self::Opus32k | Self::Opus48k | Self::Opus64k => 48_000,
Self::Codec2_3200 | Self::Codec2_1200 => 8_000,
Self::ComfortNoise => 48_000,
}
@@ -63,6 +72,9 @@ impl CodecId {
3 => Some(Self::Codec2_3200),
4 => Some(Self::Codec2_1200),
5 => Some(Self::ComfortNoise),
6 => Some(Self::Opus32k),
7 => Some(Self::Opus48k),
8 => Some(Self::Opus64k),
_ => None,
}
}
@@ -71,6 +83,12 @@ impl CodecId {
pub const fn to_wire(self) -> u8 {
self as u8
}
/// Returns true if this is an Opus variant.
pub const fn is_opus(self) -> bool {
matches!(self, Self::Opus6k | Self::Opus16k | Self::Opus24k
| Self::Opus32k | Self::Opus48k | Self::Opus64k)
}
}
/// Describes the complete quality configuration for a call session.
@@ -111,6 +129,30 @@ impl QualityProfile {
frames_per_block: 8,
};
/// Studio low: Opus 32kbps, minimal FEC.
pub const STUDIO_32K: Self = Self {
codec: CodecId::Opus32k,
fec_ratio: 0.1,
frame_duration_ms: 20,
frames_per_block: 5,
};
/// Studio: Opus 48kbps, minimal FEC.
pub const STUDIO_48K: Self = Self {
codec: CodecId::Opus48k,
fec_ratio: 0.1,
frame_duration_ms: 20,
frames_per_block: 5,
};
/// Studio high: Opus 64kbps, minimal FEC.
pub const STUDIO_64K: Self = Self {
codec: CodecId::Opus64k,
fec_ratio: 0.1,
frame_duration_ms: 20,
frames_per_block: 5,
};
/// Estimated total bandwidth in kbps including FEC overhead.
pub fn total_bitrate_kbps(&self) -> f32 {
let base = self.codec.bitrate_bps() as f32 / 1000.0;

View File

@@ -273,10 +273,21 @@ impl JitterBuffer {
return;
}
// Check if packet is too old (already played out)
// Check if packet is too old (already played out).
// A backward jump of >100 seq (~2s at 50fps) indicates a new sender in a
// federation room — reset instead of dropping.
if self.stats.packets_played > 0 && seq_before(seq, self.next_playout_seq) {
self.stats.packets_late += 1;
return;
let backward_distance = self.next_playout_seq.wrapping_sub(seq);
tracing::warn!(seq, next = self.next_playout_seq, backward_distance, "jitter: backward seq detected");
if backward_distance > 100 {
tracing::info!(seq, next = self.next_playout_seq, "jitter: RESET — new sender detected");
self.buffer.clear();
self.next_playout_seq = seq;
self.stats.packets_late = 0;
} else {
self.stats.packets_late += 1;
return;
}
}
// If we haven't started playout yet, adjust next_playout_seq to earliest known
@@ -412,10 +423,21 @@ impl JitterBuffer {
return;
}
// Check if packet is too old (already played out)
// Check if packet is too old (already played out).
// A backward jump of >100 seq (~2s at 50fps) indicates a new sender in a
// federation room — reset instead of dropping.
if self.stats.packets_played > 0 && seq_before(seq, self.next_playout_seq) {
self.stats.packets_late += 1;
return;
let backward_distance = self.next_playout_seq.wrapping_sub(seq);
tracing::warn!(seq, next = self.next_playout_seq, backward_distance, "jitter: backward seq detected");
if backward_distance > 100 {
tracing::info!(seq, next = self.next_playout_seq, "jitter: RESET — new sender detected");
self.buffer.clear();
self.next_playout_seq = seq;
self.stats.packets_late = 0;
} else {
self.stats.packets_late += 1;
return;
}
}
// If we haven't started playout yet, adjust next_playout_seq to earliest known

View File

@@ -25,8 +25,9 @@ pub mod traits;
pub use codec_id::{CodecId, QualityProfile};
pub use error::*;
pub use packet::{
HangupReason, MediaHeader, MediaPacket, MiniFrameContext, MiniHeader, QualityReport,
RoomParticipant, SignalMessage, TrunkEntry, TrunkFrame, FRAME_TYPE_FULL, FRAME_TYPE_MINI,
CallAcceptMode, HangupReason, MediaHeader, MediaPacket, MiniFrameContext, MiniHeader,
QualityReport, RoomParticipant, SignalMessage, TrunkEntry, TrunkFrame, FRAME_TYPE_FULL,
FRAME_TYPE_MINI,
};
pub use bandwidth::{BandwidthEstimator, CongestionState};
pub use quality::{AdaptiveQualityController, NetworkContext, Tier};

View File

@@ -584,6 +584,26 @@ pub enum SignalMessage {
recommended_profile: crate::QualityProfile,
},
/// Phase 4 telemetry: loss-recovery counts for the current session.
/// Sent periodically from receivers to the relay so Prometheus metrics
/// can distinguish DRED reconstructions from classical PLC invocations.
/// Fields default to 0 on old receivers (`#[serde(default)]`), so
/// introducing this variant is backward-compatible with pre-Phase-4
/// relays — they'll just log "unknown signal variant" on receipt.
LossRecoveryUpdate {
/// Total frames reconstructed via DRED since call start (monotonic).
#[serde(default)]
dred_reconstructions: u64,
/// Total frames filled via classical Opus/Codec2 PLC since call
/// start (monotonic).
#[serde(default)]
classical_plc_invocations: u64,
/// Total frames decoded since call start. Used by the relay to
/// compute recovery rates as a fraction of total frames.
#[serde(default)]
frames_decoded: u64,
},
/// Connection keepalive / RTT measurement.
Ping { timestamp_ms: u64 },
Pong { timestamp_ms: u64 },
@@ -656,6 +676,112 @@ pub enum SignalMessage {
/// List of participants currently in the room.
participants: Vec<RoomParticipant>,
},
// ── Federation signals (relay-to-relay) ──
/// Federation: initial handshake — the connecting relay identifies itself.
FederationHello {
/// TLS certificate fingerprint of the connecting relay.
tls_fingerprint: String,
},
/// Federation: this relay now has local participants in a global room.
GlobalRoomActive {
room: String,
/// Participants on the announcing relay (for federated presence).
#[serde(default)]
participants: Vec<RoomParticipant>,
},
/// Federation: this relay's last local participant left a global room.
GlobalRoomInactive {
room: String,
},
// ── Direct calling signals (client ↔ relay signaling) ──
/// Register on relay for direct calls. Sent on `_signal` connections
/// after optional AuthToken.
RegisterPresence {
/// Client's Ed25519 identity public key.
identity_pub: [u8; 32],
/// Signature over ("register-presence" || identity_pub).
signature: Vec<u8>,
/// Optional display name.
alias: Option<String>,
},
/// Relay confirms presence registration.
RegisterPresenceAck {
success: bool,
#[serde(skip_serializing_if = "Option::is_none")]
error: Option<String>,
},
/// Direct call offer routed through the relay to a specific peer.
DirectCallOffer {
/// Caller's fingerprint.
caller_fingerprint: String,
/// Caller's display name.
caller_alias: Option<String>,
/// Target's fingerprint.
target_fingerprint: String,
/// Unique call session ID (UUID).
call_id: String,
/// Caller's Ed25519 identity pub.
identity_pub: [u8; 32],
/// Caller's ephemeral X25519 pub (for key exchange on media connect).
ephemeral_pub: [u8; 32],
/// Signature over (ephemeral_pub || target_fingerprint || call_id).
signature: Vec<u8>,
/// Supported quality profiles.
supported_profiles: Vec<crate::QualityProfile>,
},
/// Callee's response to a direct call.
DirectCallAnswer {
call_id: String,
/// How the callee accepts (or rejects).
accept_mode: CallAcceptMode,
/// Callee's identity pub (present when accepting).
#[serde(skip_serializing_if = "Option::is_none")]
identity_pub: Option<[u8; 32]>,
/// Callee's ephemeral pub (present when accepting).
#[serde(skip_serializing_if = "Option::is_none")]
ephemeral_pub: Option<[u8; 32]>,
/// Signature (present when accepting).
#[serde(skip_serializing_if = "Option::is_none")]
signature: Option<Vec<u8>>,
/// Chosen quality profile (present when accepting).
#[serde(skip_serializing_if = "Option::is_none")]
chosen_profile: Option<crate::QualityProfile>,
},
/// Relay tells both parties: media room is ready.
CallSetup {
call_id: String,
/// Room name on the relay for the media session (e.g., "_call:a1b2c3d4").
room: String,
/// Relay address for the QUIC media connection.
relay_addr: String,
},
/// Ringing notification (relay → caller, callee received the offer).
CallRinging {
call_id: String,
},
}
/// How the callee responds to a direct call.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub enum CallAcceptMode {
/// Reject the call.
Reject,
/// Accept with trust — in Phase 2, this enables P2P (reveals IP).
/// In Phase 1, behaves the same as AcceptGeneric.
AcceptTrusted,
/// Accept with privacy — relay always mediates media.
AcceptGeneric,
}
/// A participant entry in a RoomUpdate message.
@@ -665,6 +791,10 @@ pub struct RoomParticipant {
pub fingerprint: String,
/// Optional display name set by the client.
pub alias: Option<String>,
/// Relay label — identifies which relay this participant is connected to.
/// None for local participants, Some("Relay B") for federated.
#[serde(default)]
pub relay_label: Option<String>,
}
/// Reasons for ending a call.

View File

@@ -132,6 +132,14 @@ pub trait CryptoSession: Send + Sync {
fn overhead(&self) -> usize {
16 // ChaCha20-Poly1305 tag
}
/// Short Authentication String (SAS) — 4-digit code for verbal verification.
/// Both peers derive the same code from the shared secret + identity keys.
/// If a MITM relay is intercepting, the codes will differ.
/// Returns None if SAS was not computed (e.g., relay-side sessions).
fn sas_code(&self) -> Option<u32> {
None
}
}
/// Key exchange using the Warzone identity model.

View File

@@ -28,6 +28,9 @@ prometheus = "0.13"
axum = { version = "0.7", default-features = false, features = ["tokio", "http1", "ws"] }
tower-http = { version = "0.6", features = ["fs"] }
futures-util = "0.3"
dirs = "6"
sha2 = { workspace = true }
chrono = "0.4"
[[bin]]
name = "wzp-relay"

18
crates/wzp-relay/build.rs Normal file
View File

@@ -0,0 +1,18 @@
use std::process::Command;
fn main() {
// Get git hash at build time
let output = Command::new("git")
.args(["rev-parse", "--short", "HEAD"])
.output();
let hash = match output {
Ok(o) if o.status.success() => {
String::from_utf8_lossy(&o.stdout).trim().to_string()
}
_ => "unknown".to_string(),
};
println!("cargo:rustc-env=WZP_BUILD_HASH={hash}");
println!("cargo:rerun-if-changed=.git/HEAD");
}

View File

@@ -0,0 +1,199 @@
//! Direct call state tracking.
//!
//! Manages the lifecycle of 1:1 direct calls placed via the `_signal` channel.
//! Each call goes through: Pending → Ringing → Active → Ended.
use std::collections::HashMap;
use std::time::{Duration, Instant};
/// State of a direct call.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum DirectCallState {
/// Offer sent to callee, waiting for response.
Pending,
/// Callee acknowledged, ringing.
Ringing,
/// Call accepted, media room active.
Active,
/// Call ended (hangup, reject, timeout, or error).
Ended,
}
/// A tracked direct call between two users.
pub struct DirectCall {
pub call_id: String,
pub caller_fingerprint: String,
pub callee_fingerprint: String,
pub state: DirectCallState,
pub accept_mode: Option<wzp_proto::CallAcceptMode>,
/// Private room name (set when accepted).
pub room_name: Option<String>,
pub created_at: Instant,
pub answered_at: Option<Instant>,
pub ended_at: Option<Instant>,
}
/// Registry of active direct calls.
pub struct CallRegistry {
calls: HashMap<String, DirectCall>,
}
impl CallRegistry {
pub fn new() -> Self {
Self {
calls: HashMap::new(),
}
}
/// Create a new pending call. Returns the call_id.
pub fn create_call(&mut self, call_id: String, caller_fp: String, callee_fp: String) -> &DirectCall {
let call = DirectCall {
call_id: call_id.clone(),
caller_fingerprint: caller_fp,
callee_fingerprint: callee_fp,
state: DirectCallState::Pending,
accept_mode: None,
room_name: None,
created_at: Instant::now(),
answered_at: None,
ended_at: None,
};
self.calls.insert(call_id.clone(), call);
self.calls.get(&call_id).unwrap()
}
/// Get a call by ID.
pub fn get(&self, call_id: &str) -> Option<&DirectCall> {
self.calls.get(call_id)
}
/// Get a mutable call by ID.
pub fn get_mut(&mut self, call_id: &str) -> Option<&mut DirectCall> {
self.calls.get_mut(call_id)
}
/// Transition to Ringing state.
pub fn set_ringing(&mut self, call_id: &str) -> bool {
if let Some(call) = self.calls.get_mut(call_id) {
if call.state == DirectCallState::Pending {
call.state = DirectCallState::Ringing;
return true;
}
}
false
}
/// Transition to Active state.
pub fn set_active(&mut self, call_id: &str, mode: wzp_proto::CallAcceptMode, room: String) -> bool {
if let Some(call) = self.calls.get_mut(call_id) {
if call.state == DirectCallState::Pending || call.state == DirectCallState::Ringing {
call.state = DirectCallState::Active;
call.accept_mode = Some(mode);
call.room_name = Some(room);
call.answered_at = Some(Instant::now());
return true;
}
}
false
}
/// End a call.
pub fn end_call(&mut self, call_id: &str) -> Option<DirectCall> {
if let Some(call) = self.calls.get_mut(call_id) {
call.state = DirectCallState::Ended;
call.ended_at = Some(Instant::now());
}
self.calls.remove(call_id)
}
/// Find active/pending calls involving a fingerprint.
pub fn calls_for_fingerprint(&self, fp: &str) -> Vec<&DirectCall> {
self.calls.values()
.filter(|c| {
c.state != DirectCallState::Ended
&& (c.caller_fingerprint == fp || c.callee_fingerprint == fp)
})
.collect()
}
/// Find the peer's fingerprint in a call.
pub fn peer_fingerprint(&self, call_id: &str, my_fp: &str) -> Option<&str> {
self.calls.get(call_id).map(|c| {
if c.caller_fingerprint == my_fp {
c.callee_fingerprint.as_str()
} else {
c.caller_fingerprint.as_str()
}
})
}
/// Remove calls that have been pending longer than the timeout.
/// Returns call IDs of expired calls.
pub fn expire_stale(&mut self, timeout: Duration) -> Vec<DirectCall> {
let now = Instant::now();
let expired: Vec<String> = self.calls.iter()
.filter(|(_, c)| {
c.state == DirectCallState::Pending
&& now.duration_since(c.created_at) > timeout
})
.map(|(id, _)| id.clone())
.collect();
expired.into_iter()
.filter_map(|id| self.calls.remove(&id))
.collect()
}
/// Number of active (non-ended) calls.
pub fn active_count(&self) -> usize {
self.calls.values()
.filter(|c| c.state != DirectCallState::Ended)
.count()
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn call_lifecycle() {
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
assert_eq!(reg.get("c1").unwrap().state, DirectCallState::Pending);
assert!(reg.set_ringing("c1"));
assert_eq!(reg.get("c1").unwrap().state, DirectCallState::Ringing);
assert!(reg.set_active("c1", wzp_proto::CallAcceptMode::AcceptGeneric, "_call:c1".into()));
assert_eq!(reg.get("c1").unwrap().state, DirectCallState::Active);
assert_eq!(reg.get("c1").unwrap().room_name.as_deref(), Some("_call:c1"));
let ended = reg.end_call("c1").unwrap();
assert_eq!(ended.state, DirectCallState::Ended);
assert_eq!(reg.active_count(), 0);
}
#[test]
fn expire_stale_calls() {
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
// Not expired yet
let expired = reg.expire_stale(Duration::from_secs(30));
assert!(expired.is_empty());
// Force expiry with 0 timeout
let expired = reg.expire_stale(Duration::from_secs(0));
assert_eq!(expired.len(), 1);
assert_eq!(expired[0].call_id, "c1");
}
#[test]
fn peer_lookup() {
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
assert_eq!(reg.peer_fingerprint("c1", "alice"), Some("bob"));
assert_eq!(reg.peer_fingerprint("c1", "bob"), Some("alice"));
}
}

View File

@@ -3,8 +3,41 @@
use serde::{Deserialize, Serialize};
use std::net::SocketAddr;
/// Configuration for the relay daemon.
/// A federated peer relay.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct PeerConfig {
/// Address of the peer relay (e.g., "193.180.213.68:4433").
pub url: String,
/// Expected TLS certificate fingerprint (hex, with colons).
pub fingerprint: String,
/// Optional human-readable label.
#[serde(default)]
pub label: Option<String>,
}
/// A trusted relay — accepts inbound federation without needing the peer's address.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct TrustedConfig {
/// Expected TLS certificate fingerprint (hex, with colons).
pub fingerprint: String,
/// Optional human-readable label.
#[serde(default)]
pub label: Option<String>,
}
/// A room declared global — bridged across all federated peers.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct GlobalRoomConfig {
/// Room name to bridge (e.g., "android").
pub name: String,
}
/// Configuration for the relay daemon.
///
/// All fields have defaults, so a minimal TOML file only needs the
/// fields you want to override (e.g., just `[[peers]]`).
#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(default)]
pub struct RelayConfig {
/// Address to listen on for incoming connections (client-facing).
pub listen_addr: SocketAddr,
@@ -44,6 +77,22 @@ pub struct RelayConfig {
pub ws_port: Option<u16>,
/// Directory to serve static files from (HTML/JS/WASM for web clients).
pub static_dir: Option<String>,
/// Federation peer relays.
#[serde(default)]
pub peers: Vec<PeerConfig>,
/// Global rooms bridged across federation.
#[serde(default)]
pub global_rooms: Vec<GlobalRoomConfig>,
/// Trusted relay fingerprints — accept inbound federation from these relays.
/// Unlike [[peers]], no url is needed — the peer connects to us.
#[serde(default)]
pub trusted: Vec<TrustedConfig>,
/// Debug tap: log packet headers for matching rooms ("*" = all rooms).
/// Activated via --debug-tap <room> or debug_tap = "room" in TOML.
pub debug_tap: Option<String>,
/// JSONL event log path for protocol analysis (--event-log).
#[serde(skip)]
pub event_log: Option<String>,
}
impl Default for RelayConfig {
@@ -62,6 +111,100 @@ impl Default for RelayConfig {
trunking_enabled: false,
ws_port: None,
static_dir: None,
peers: Vec::new(),
global_rooms: Vec::new(),
trusted: Vec::new(),
debug_tap: None,
event_log: None,
}
}
}
/// Load relay configuration from a TOML file.
pub fn load_config(path: &str) -> Result<RelayConfig, anyhow::Error> {
let content = std::fs::read_to_string(path)?;
let config: RelayConfig = toml::from_str(&content)?;
Ok(config)
}
/// Info about this relay instance, used to generate personalized example configs.
pub struct RelayInfo {
pub listen_addr: String,
pub tls_fingerprint: String,
pub public_ip: Option<String>,
}
/// Load config from path, or create a personalized example config if it doesn't exist.
pub fn load_or_create_config(path: &str, info: Option<&RelayInfo>) -> Result<RelayConfig, anyhow::Error> {
let p = std::path::Path::new(path);
if p.exists() {
return load_config(path);
}
// Create parent directory if needed
if let Some(parent) = p.parent() {
std::fs::create_dir_all(parent)?;
}
// Generate personalized example config
let example = generate_example_config(info);
std::fs::write(p, &example)?;
eprintln!("Created example config at {path} — edit it and restart.");
let config: RelayConfig = toml::from_str(&example)?;
Ok(config)
}
/// Generate an example TOML config, personalized with this relay's info if available.
fn generate_example_config(info: Option<&RelayInfo>) -> String {
let listen = info.map(|i| i.listen_addr.as_str()).unwrap_or("0.0.0.0:4433");
let peer_example = if let Some(i) = info {
let ip = i.public_ip.as_deref().unwrap_or("this-relay-ip");
format!(
r#"# Other relays can peer with this relay using:
# [[peers]]
# url = "{ip}:{port}"
# fingerprint = "{fp}"
# label = "This Relay""#,
port = listen.rsplit(':').next().unwrap_or("4433"),
fp = i.tls_fingerprint,
)
} else {
"# To peer with another relay, add its url + fingerprint:".to_string()
};
format!(
r#"# WarzonePhone Relay Configuration
# See docs/ADMINISTRATION.md for full reference.
# Listen address for client connections
listen_addr = "{listen}"
# Maximum concurrent sessions
# max_sessions = 100
# Prometheus metrics endpoint (uncomment to enable)
# metrics_port = 9090
# featherChat auth endpoint (uncomment to enable)
# auth_url = "https://chat.example.com/v1/auth/validate"
{peer_example}
# Federation: peer relays we connect to (outbound)
# [[peers]]
# url = "other-relay.example.com:4433"
# fingerprint = "aa:bb:cc:dd:..."
# label = "Relay B"
# Federation: relays we trust inbound connections from
# [[trusted]]
# fingerprint = "ee:ff:00:11:..."
# label = "Relay X"
# Global rooms bridged across all federated peers
# [[global_rooms]]
# name = "general"
# Debug: log packet headers for a room ("*" for all)
# debug_tap = "*"
"#
)
}

View File

@@ -0,0 +1,201 @@
//! JSONL event log for protocol analysis.
//!
//! When `--event-log <path>` is set, every media packet emits a structured
//! event at each decision point (recv, forward, drop, deliver).
//! Use `wzp-analyzer` to correlate events across multiple relays.
use std::path::PathBuf;
use std::sync::Arc;
use serde::Serialize;
use tokio::sync::mpsc;
use tracing::{error, info};
/// A single protocol event for JSONL output.
#[derive(Debug, Serialize)]
pub struct Event {
/// ISO 8601 timestamp with microseconds.
pub ts: String,
/// Event type.
pub event: &'static str,
/// Room name.
#[serde(skip_serializing_if = "Option::is_none")]
pub room: Option<String>,
/// Source address or peer label.
#[serde(skip_serializing_if = "Option::is_none")]
pub src: Option<String>,
/// Packet sequence number.
#[serde(skip_serializing_if = "Option::is_none")]
pub seq: Option<u16>,
/// Codec identifier.
#[serde(skip_serializing_if = "Option::is_none")]
pub codec: Option<String>,
/// FEC block ID.
#[serde(skip_serializing_if = "Option::is_none")]
pub fec_block: Option<u8>,
/// FEC symbol index.
#[serde(skip_serializing_if = "Option::is_none")]
pub fec_sym: Option<u8>,
/// Is FEC repair packet.
#[serde(skip_serializing_if = "Option::is_none")]
pub repair: Option<bool>,
/// Payload length in bytes.
#[serde(skip_serializing_if = "Option::is_none")]
pub len: Option<usize>,
/// Number of recipients.
#[serde(skip_serializing_if = "Option::is_none")]
pub to_count: Option<usize>,
/// Peer label (for federation events).
#[serde(skip_serializing_if = "Option::is_none")]
pub peer: Option<String>,
/// Drop/error reason.
#[serde(skip_serializing_if = "Option::is_none")]
pub reason: Option<String>,
/// Presence action (active/inactive).
#[serde(skip_serializing_if = "Option::is_none")]
pub action: Option<String>,
/// Participant count (presence events).
#[serde(skip_serializing_if = "Option::is_none")]
pub participants: Option<usize>,
}
impl Event {
fn now() -> String {
chrono::Utc::now().format("%Y-%m-%dT%H:%M:%S%.6fZ").to_string()
}
/// Create a minimal event with just type and timestamp.
pub fn new(event: &'static str) -> Self {
Self {
ts: Self::now(),
event,
room: None,
src: None,
seq: None,
codec: None,
fec_block: None,
fec_sym: None,
repair: None,
len: None,
to_count: None,
peer: None,
reason: None,
action: None,
participants: None,
}
}
/// Set room.
pub fn room(mut self, room: &str) -> Self { self.room = Some(room.to_string()); self }
/// Set source.
pub fn src(mut self, src: &str) -> Self { self.src = Some(src.to_string()); self }
/// Set packet header fields from a MediaPacket.
pub fn packet(mut self, pkt: &wzp_proto::MediaPacket) -> Self {
self.seq = Some(pkt.header.seq);
self.codec = Some(format!("{:?}", pkt.header.codec_id));
self.fec_block = Some(pkt.header.fec_block);
self.fec_sym = Some(pkt.header.fec_symbol);
self.repair = Some(pkt.header.is_repair);
self.len = Some(pkt.payload.len());
self
}
/// Set seq only (when full packet not available).
pub fn seq(mut self, seq: u16) -> Self { self.seq = Some(seq); self }
/// Set payload length.
pub fn len(mut self, len: usize) -> Self { self.len = Some(len); self }
/// Set recipient count.
pub fn to_count(mut self, n: usize) -> Self { self.to_count = Some(n); self }
/// Set peer label.
pub fn peer(mut self, peer: &str) -> Self { self.peer = Some(peer.to_string()); self }
/// Set drop reason.
pub fn reason(mut self, reason: &str) -> Self { self.reason = Some(reason.to_string()); self }
/// Set presence action.
pub fn action(mut self, action: &str) -> Self { self.action = Some(action.to_string()); self }
/// Set participant count.
pub fn participants(mut self, n: usize) -> Self { self.participants = Some(n); self }
}
/// Handle for emitting events. Cheap to clone.
#[derive(Clone)]
pub struct EventLog {
tx: mpsc::UnboundedSender<Event>,
}
impl EventLog {
/// Emit an event (non-blocking, drops if channel is full).
pub fn emit(&self, event: Event) {
let _ = self.tx.send(event);
}
}
/// No-op event log for when `--event-log` is not set.
/// All methods are no-ops that compile to nothing.
#[derive(Clone)]
pub struct NoopEventLog;
/// Unified event log handle — either real or no-op.
#[derive(Clone)]
pub enum EventLogger {
Active(EventLog),
Noop,
}
impl EventLogger {
pub fn emit(&self, event: Event) {
if let EventLogger::Active(log) = self {
log.emit(event);
}
}
pub fn is_active(&self) -> bool {
matches!(self, EventLogger::Active(_))
}
}
/// Start the event log writer. Returns an `EventLogger` handle.
pub fn start_event_log(path: Option<PathBuf>) -> EventLogger {
match path {
Some(path) => {
let (tx, rx) = mpsc::unbounded_channel();
tokio::spawn(writer_task(path, rx));
info!("event log enabled");
EventLogger::Active(EventLog { tx })
}
None => EventLogger::Noop,
}
}
/// Background task that writes events to a JSONL file.
async fn writer_task(path: PathBuf, mut rx: mpsc::UnboundedReceiver<Event>) {
use tokio::io::AsyncWriteExt;
let file = match tokio::fs::File::create(&path).await {
Ok(f) => f,
Err(e) => {
error!("failed to create event log {}: {e}", path.display());
return;
}
};
let mut writer = tokio::io::BufWriter::new(file);
let mut count: u64 = 0;
while let Some(event) = rx.recv().await {
match serde_json::to_string(&event) {
Ok(json) => {
if writer.write_all(json.as_bytes()).await.is_err() { break; }
if writer.write_all(b"\n").await.is_err() { break; }
count += 1;
// Flush every 100 events
if count % 100 == 0 {
let _ = writer.flush().await;
}
}
Err(e) => {
error!("event log serialize error: {e}");
}
}
}
let _ = writer.flush().await;
info!(events = count, "event log closed");
}

View File

@@ -0,0 +1,966 @@
//! Relay federation — global room routing between peer relays.
//!
//! Each relay maintains a forwarding table per global room. When a local participant
//! sends media in a global room, it's forwarded to all peer relays that have the room
//! active. Incoming federated media is delivered to local participants and optionally
//! forwarded to other active peers (multi-hop).
use std::collections::{HashMap, HashSet};
use std::net::SocketAddr;
use std::sync::Arc;
use std::time::{Duration, Instant};
use bytes::Bytes;
use sha2::{Sha256, Digest};
use tokio::sync::Mutex;
use tracing::{error, info, warn};
use wzp_proto::{MediaTransport, SignalMessage};
use wzp_transport::QuinnTransport;
use crate::config::{PeerConfig, TrustedConfig};
use crate::event_log::{Event, EventLogger};
use crate::room::{self, FederationMediaOut, RoomEvent, RoomManager};
/// Compute 8-byte room hash for federation datagram tagging.
pub fn room_hash(room_name: &str) -> [u8; 8] {
let h = Sha256::digest(room_name.as_bytes());
let mut out = [0u8; 8];
out.copy_from_slice(&h[..8]);
out
}
/// Normalize a fingerprint string (remove colons, lowercase).
fn normalize_fp(fp: &str) -> String {
fp.replace(':', "").to_lowercase()
}
/// Time-based dedup filter for federation datagrams.
/// Tracks recently seen packets and expires entries older than 2 seconds.
/// This prevents duplicate delivery when the same packet arrives via
/// multiple federation paths, while allowing new senders that happen to
/// reuse the same seq numbers.
struct Deduplicator {
/// Recently seen packet keys with insertion time.
entries: HashMap<u64, Instant>,
/// Expiry duration.
ttl: Duration,
}
impl Deduplicator {
fn new(_capacity: usize) -> Self {
Self {
entries: HashMap::with_capacity(512),
ttl: Duration::from_secs(2),
}
}
/// Returns true if this packet is a duplicate (already seen within TTL).
fn is_dup(&mut self, room_hash: &[u8; 8], seq: u16, extra: u64) -> bool {
let key = u64::from_be_bytes(*room_hash) ^ (seq as u64) ^ extra;
let now = Instant::now();
// Periodic cleanup (every ~256 packets)
if self.entries.len() > 256 {
self.entries.retain(|_, ts| now.duration_since(*ts) < self.ttl);
}
if let Some(ts) = self.entries.get(&key) {
if now.duration_since(*ts) < self.ttl {
return true; // seen recently — duplicate
}
}
self.entries.insert(key, now);
false
}
}
/// Per-room token bucket rate limiter for federation forwarding.
struct RateLimiter {
/// Max packets per second per room.
max_pps: u32,
/// Tokens remaining in current window.
tokens: u32,
/// When the current window started.
window_start: Instant,
}
impl RateLimiter {
fn new(max_pps: u32) -> Self {
Self {
max_pps,
tokens: max_pps,
window_start: Instant::now(),
}
}
/// Returns true if the packet should be allowed through.
fn allow(&mut self) -> bool {
let elapsed = self.window_start.elapsed();
if elapsed >= Duration::from_secs(1) {
self.tokens = self.max_pps;
self.window_start = Instant::now();
}
if self.tokens > 0 {
self.tokens -= 1;
true
} else {
false
}
}
}
/// Active link to a peer relay.
struct PeerLink {
transport: Arc<QuinnTransport>,
label: String,
/// Global rooms that this peer has reported as active.
active_rooms: HashSet<String>,
/// Remote participants per room (for federated presence in RoomUpdate).
remote_participants: HashMap<String, Vec<wzp_proto::packet::RoomParticipant>>,
/// Last time we received any data (signal or media) from this peer.
last_seen: Instant,
}
/// Max federation packets per second per room (0 = unlimited).
const FEDERATION_RATE_LIMIT_PPS: u32 = 500;
/// Dedup window size (number of recent packets to remember).
const DEDUP_WINDOW_SIZE: usize = 4096;
/// Remote participants are considered stale after this duration with no updates.
const REMOTE_PARTICIPANT_STALE_SECS: u64 = 15;
/// Manages federation connections and global room forwarding.
pub struct FederationManager {
peers: Vec<PeerConfig>,
trusted: Vec<TrustedConfig>,
global_rooms: HashSet<String>,
room_mgr: Arc<Mutex<RoomManager>>,
endpoint: quinn::Endpoint,
local_tls_fp: String,
metrics: Arc<crate::metrics::RelayMetrics>,
/// Active peer connections, keyed by normalized fingerprint.
peer_links: Arc<Mutex<HashMap<String, PeerLink>>>,
/// Dedup filter for incoming federation datagrams.
dedup: Mutex<Deduplicator>,
/// Per-room seq counter for federation media delivered to local clients.
/// Ensures clients see monotonically increasing seq regardless of federation sender.
local_delivery_seq: std::sync::atomic::AtomicU16,
/// JSONL event log for protocol analysis.
event_log: EventLogger,
/// Per-room rate limiters for inbound federation media.
rate_limiters: Mutex<HashMap<String, RateLimiter>>,
}
impl FederationManager {
pub fn new(
peers: Vec<PeerConfig>,
trusted: Vec<TrustedConfig>,
global_rooms: HashSet<String>,
room_mgr: Arc<Mutex<RoomManager>>,
endpoint: quinn::Endpoint,
local_tls_fp: String,
metrics: Arc<crate::metrics::RelayMetrics>,
event_log: EventLogger,
) -> Self {
Self {
peers,
trusted,
global_rooms,
room_mgr,
endpoint,
local_tls_fp,
metrics,
peer_links: Arc::new(Mutex::new(HashMap::new())),
dedup: Mutex::new(Deduplicator::new(DEDUP_WINDOW_SIZE)),
local_delivery_seq: std::sync::atomic::AtomicU16::new(0),
event_log,
rate_limiters: Mutex::new(HashMap::new()),
}
}
/// Check if a room name (which may be hashed) is a global room.
pub fn is_global_room(&self, room: &str) -> bool {
self.resolve_global_room(room).is_some()
}
/// Resolve a room name (raw or hashed) to the canonical global room name.
/// Returns the configured global room name if it matches.
pub fn resolve_global_room(&self, room: &str) -> Option<&str> {
// Direct match (raw room name, e.g. Android clients)
if self.global_rooms.contains(room) {
return Some(self.global_rooms.iter().find(|n| n.as_str() == room).unwrap());
}
// Hashed match (desktop clients hash room names for SNI privacy)
self.global_rooms.iter().find(|name| {
wzp_crypto::hash_room_name(name) == room
}).map(|s| s.as_str())
}
/// Get the canonical federation room hash for a room.
/// Always uses the configured global room name, not the client-provided name.
pub fn global_room_hash(&self, room: &str) -> [u8; 8] {
if let Some(canonical) = self.resolve_global_room(room) {
room_hash(canonical)
} else {
room_hash(room)
}
}
/// Start federation — spawns connection loops + event dispatcher.
pub async fn run(self: Arc<Self>) {
if self.peers.is_empty() && self.global_rooms.is_empty() {
return;
}
info!(
peers = self.peers.len(),
global_rooms = self.global_rooms.len(),
"federation starting"
);
let mut handles = Vec::new();
// Per-peer outbound connection loops
for peer in &self.peers {
let this = self.clone();
let peer = peer.clone();
handles.push(tokio::spawn(async move {
run_peer_loop(this, peer).await;
}));
}
// Room event dispatcher
let room_events = {
let mgr = self.room_mgr.lock().await;
mgr.subscribe_events()
};
let this = self.clone();
handles.push(tokio::spawn(async move {
run_room_event_dispatcher(this, room_events).await;
}));
// Stale presence sweeper — purges remote participants from dead peers
let this = self.clone();
handles.push(tokio::spawn(async move {
run_stale_presence_sweeper(this).await;
}));
for h in handles {
let _ = h.await;
}
}
/// Handle an inbound federation connection from a recognized peer.
pub async fn handle_inbound(
self: &Arc<Self>,
transport: Arc<QuinnTransport>,
peer_config: PeerConfig,
) {
let peer_fp = normalize_fp(&peer_config.fingerprint);
let label = peer_config.label.unwrap_or_else(|| peer_config.url.clone());
info!(peer = %label, "inbound federation link active");
if let Err(e) = run_federation_link(self.clone(), transport, peer_fp, label.clone()).await {
warn!(peer = %label, "inbound federation link ended: {e}");
}
}
/// Get all remote participants for a room from all peer links.
/// Deduplicates by fingerprint (same participant may appear via multiple links).
pub async fn get_remote_participants(&self, room: &str) -> Vec<wzp_proto::packet::RoomParticipant> {
let canonical = self.resolve_global_room(room);
let links = self.peer_links.lock().await;
let mut result = Vec::new();
for link in links.values() {
// Check canonical name
if let Some(c) = canonical {
if let Some(remote) = link.remote_participants.get(c) {
result.extend(remote.iter().cloned());
}
// Also check raw room name, but only if different from canonical
if c != room {
if let Some(remote) = link.remote_participants.get(room) {
result.extend(remote.iter().cloned());
}
}
} else {
if let Some(remote) = link.remote_participants.get(room) {
result.extend(remote.iter().cloned());
}
}
}
// Deduplicate by fingerprint
let mut seen = HashSet::new();
result.retain(|p| seen.insert(p.fingerprint.clone()));
result
}
/// Forward locally-generated media to all connected peers.
/// For locally-originated media, we send to ALL peers (they decide whether to deliver).
/// For forwarded media (multi-hop), handle_datagram filters by active_rooms.
pub async fn forward_to_peers(&self, room_name: &str, room_hash: &[u8; 8], media_data: &Bytes) {
let links = self.peer_links.lock().await;
if links.is_empty() {
return;
}
for (_fp, link) in links.iter() {
let mut tagged = Vec::with_capacity(8 + media_data.len());
tagged.extend_from_slice(room_hash);
tagged.extend_from_slice(media_data);
match link.transport.send_raw_datagram(&tagged) {
Ok(()) => {
self.metrics.federation_packets_forwarded
.with_label_values(&[&link.label, "out"]).inc();
}
Err(e) => warn!(peer = %link.label, "federation send error: {e}"),
}
}
}
// ── Trust verification (kept from previous implementation) ──
pub fn find_peer_by_fingerprint(&self, fp: &str) -> Option<&PeerConfig> {
self.peers.iter().find(|p| normalize_fp(&p.fingerprint) == normalize_fp(fp))
}
pub fn find_peer_by_addr(&self, addr: SocketAddr) -> Option<&PeerConfig> {
let addr_ip = addr.ip();
self.peers.iter().find(|p| {
p.url.parse::<SocketAddr>()
.map(|sa| sa.ip() == addr_ip)
.unwrap_or(false)
})
}
pub fn find_trusted_by_fingerprint(&self, fp: &str) -> Option<&TrustedConfig> {
self.trusted.iter().find(|t| normalize_fp(&t.fingerprint) == normalize_fp(fp))
}
pub fn check_inbound_trust(&self, addr: SocketAddr, hello_fp: &str) -> Option<String> {
if let Some(peer) = self.find_peer_by_addr(addr) {
return Some(peer.label.clone().unwrap_or_else(|| peer.url.clone()));
}
if let Some(trusted) = self.find_trusted_by_fingerprint(hello_fp) {
return Some(trusted.label.clone().unwrap_or_else(|| hello_fp[..16].to_string()));
}
None
}
}
// ── Outbound media egress task ──
/// Drains the federation media channel and forwards to active peers.
pub async fn run_federation_media_egress(
fm: Arc<FederationManager>,
mut rx: tokio::sync::mpsc::Receiver<FederationMediaOut>,
) {
let mut count: u64 = 0;
while let Some(out) = rx.recv().await {
count += 1;
if count == 1 || count % 250 == 0 {
info!(room = %out.room_name, count, "federation egress: forwarding media");
}
fm.forward_to_peers(&out.room_name, &out.room_hash, &out.data).await;
}
info!(total = count, "federation egress task ended");
}
// ── Room event dispatcher ──
/// Watches RoomManager events and sends GlobalRoomActive/Inactive to peers.
async fn run_room_event_dispatcher(
fm: Arc<FederationManager>,
mut events: tokio::sync::broadcast::Receiver<RoomEvent>,
) {
loop {
match events.recv().await {
Ok(RoomEvent::LocalJoin { room }) => {
if fm.is_global_room(&room) {
let participants = {
let mgr = fm.room_mgr.lock().await;
mgr.local_participant_list(&room)
};
info!(room = %room, count = participants.len(), "global room now active, announcing to peers");
let msg = SignalMessage::GlobalRoomActive { room, participants };
let links = fm.peer_links.lock().await;
for link in links.values() {
let _ = link.transport.send_signal(&msg).await;
}
}
}
Ok(RoomEvent::LocalLeave { room }) => {
if fm.is_global_room(&room) {
info!(room = %room, "global room now inactive, announcing to peers");
let msg = SignalMessage::GlobalRoomInactive { room };
let links = fm.peer_links.lock().await;
for link in links.values() {
let _ = link.transport.send_signal(&msg).await;
}
}
}
Err(tokio::sync::broadcast::error::RecvError::Lagged(n)) => {
warn!(missed = n, "room event receiver lagged");
}
Err(tokio::sync::broadcast::error::RecvError::Closed) => break,
}
}
}
// ── Stale presence sweeper ──
/// Periodically checks for stale remote participants and purges them.
/// This handles the case where a peer link dies without sending GlobalRoomInactive
/// (e.g., QUIC timeout, network partition, crash).
async fn run_stale_presence_sweeper(fm: Arc<FederationManager>) {
let mut interval = tokio::time::interval(Duration::from_secs(5));
loop {
interval.tick().await;
let stale_threshold = Duration::from_secs(REMOTE_PARTICIPANT_STALE_SECS);
// Find peers with stale remote_participants whose link is also gone or idle
let stale_rooms: Vec<(String, String)> = {
let links = fm.peer_links.lock().await;
let mut stale = Vec::new();
for (fp, link) in links.iter() {
if link.last_seen.elapsed() > stale_threshold && !link.remote_participants.is_empty() {
for room in link.remote_participants.keys() {
stale.push((fp.clone(), room.clone()));
}
}
}
stale
};
if stale_rooms.is_empty() {
continue;
}
// Purge stale entries and collect affected rooms
let mut affected_rooms = HashSet::new();
{
let mut links = fm.peer_links.lock().await;
for (fp, room) in &stale_rooms {
if let Some(link) = links.get_mut(fp.as_str()) {
if link.last_seen.elapsed() > stale_threshold {
info!(peer = %link.label, room = %room, "purging stale remote participants (no data for {}s)", link.last_seen.elapsed().as_secs());
link.remote_participants.remove(room);
link.active_rooms.remove(room);
affected_rooms.insert(room.clone());
}
}
}
}
// Broadcast updated RoomUpdate for affected rooms
for room in &affected_rooms {
let mgr = fm.room_mgr.lock().await;
for local_room in mgr.active_rooms() {
if fm.resolve_global_room(&local_room) == fm.resolve_global_room(room) {
let mut all_participants = mgr.local_participant_list(&local_room);
let remote = fm.get_remote_participants(&local_room).await;
all_participants.extend(remote);
let mut seen = HashSet::new();
all_participants.retain(|p| seen.insert(p.fingerprint.clone()));
let update = SignalMessage::RoomUpdate {
count: all_participants.len() as u32,
participants: all_participants,
};
let senders = mgr.local_senders(&local_room);
drop(mgr);
room::broadcast_signal(&senders, &update).await;
info!(room = %room, "swept stale presence — broadcast updated RoomUpdate");
break;
}
}
}
}
}
// ── Peer connection management ──
/// Persistent connection loop for one peer — reconnects with backoff.
async fn run_peer_loop(fm: Arc<FederationManager>, peer: PeerConfig) {
let mut backoff = Duration::from_secs(5);
loop {
info!(peer_url = %peer.url, label = ?peer.label, "federation: connecting to peer...");
match connect_to_peer(&fm, &peer).await {
Ok(transport) => {
backoff = Duration::from_secs(5);
let peer_fp = normalize_fp(&peer.fingerprint);
let label = peer.label.clone().unwrap_or_else(|| peer.url.clone());
if let Err(e) = run_federation_link(fm.clone(), transport, peer_fp, label).await {
warn!(peer_url = %peer.url, "federation link ended: {e}");
}
}
Err(e) => {
warn!(peer_url = %peer.url, backoff_s = backoff.as_secs(), "federation connect failed: {e}");
}
}
tokio::time::sleep(backoff).await;
backoff = (backoff * 2).min(Duration::from_secs(300));
}
}
/// Connect to a peer relay and send hello.
async fn connect_to_peer(fm: &FederationManager, peer: &PeerConfig) -> Result<Arc<QuinnTransport>, anyhow::Error> {
let addr: SocketAddr = peer.url.parse()?;
let client_cfg = wzp_transport::client_config();
let conn = wzp_transport::connect(&fm.endpoint, addr, "_federation", client_cfg).await?;
let transport = Arc::new(QuinnTransport::new(conn));
// Send hello with our TLS fingerprint
let hello = SignalMessage::FederationHello {
tls_fingerprint: fm.local_tls_fp.clone(),
};
transport.send_signal(&hello).await
.map_err(|e| anyhow::anyhow!("federation hello send failed: {e}"))?;
info!(peer_url = %peer.url, label = ?peer.label, "federation: connected (hello sent)");
Ok(transport)
}
// ── Federation link (runs on a single QUIC connection) ──
/// Run the federation link: exchange global room state and forward media.
async fn run_federation_link(
fm: Arc<FederationManager>,
transport: Arc<QuinnTransport>,
peer_fp: String,
peer_label: String,
) -> Result<(), anyhow::Error> {
// Register peer link + metrics
fm.metrics.federation_peer_status.with_label_values(&[&peer_label]).set(1);
{
let mut links = fm.peer_links.lock().await;
links.insert(peer_fp.clone(), PeerLink {
transport: transport.clone(),
label: peer_label.clone(),
active_rooms: HashSet::new(),
remote_participants: HashMap::new(),
last_seen: Instant::now(),
});
}
// Announce our currently active global rooms to this new peer
// Collect all announcements first, then send (avoid holding locks across await)
let announcements = {
let mgr = fm.room_mgr.lock().await;
let active = mgr.active_rooms();
let mut msgs = Vec::new();
// Local rooms
for room_name in &active {
if fm.is_global_room(room_name) {
let participants = mgr.local_participant_list(room_name);
info!(peer = %peer_label, room = %room_name, participants = participants.len(), "announcing local global room to new peer");
msgs.push(SignalMessage::GlobalRoomActive { room: room_name.clone(), participants });
}
}
// Remote rooms from OTHER peers (for multi-hop propagation)
let links = fm.peer_links.lock().await;
for (fp, link) in links.iter() {
if fp != &peer_fp {
for (room, participants) in &link.remote_participants {
if fm.is_global_room(room) {
info!(peer = %peer_label, room = %room, via = %link.label, "propagating remote room to new peer");
msgs.push(SignalMessage::GlobalRoomActive {
room: room.clone(),
participants: participants.clone(),
});
}
}
}
}
msgs
};
for msg in &announcements {
let _ = transport.send_signal(msg).await;
}
// Three concurrent tasks: signal recv + media recv + RTT monitor
let signal_transport = transport.clone();
let media_transport = transport.clone();
let rtt_transport = transport.clone();
let fm_signal = fm.clone();
let fm_media = fm.clone();
let fm_rtt = fm.clone();
let peer_fp_signal = peer_fp.clone();
let peer_fp_media = peer_fp.clone();
let label_signal = peer_label.clone();
let label_rtt = peer_label.clone();
let signal_task = async move {
loop {
match signal_transport.recv_signal().await {
Ok(Some(msg)) => {
handle_signal(&fm_signal, &peer_fp_signal, &label_signal, msg).await;
}
Ok(None) => break,
Err(e) => {
error!(peer = %label_signal, "federation signal error: {e}");
break;
}
}
}
};
let peer_label_media = peer_label.clone();
let media_task = async move {
let mut media_count: u64 = 0;
loop {
match media_transport.connection().read_datagram().await {
Ok(data) => {
media_count += 1;
if media_count == 1 || media_count % 250 == 0 {
info!(peer = %peer_label_media, media_count, len = data.len(), "federation: received datagram");
}
handle_datagram(&fm_media, &peer_fp_media, data).await;
}
Err(e) => {
info!(peer = %peer_label_media, "federation media task ended: {e}");
break;
}
}
}
};
// RTT monitor: periodically sample QUIC RTT for this peer
let rtt_task = async move {
loop {
tokio::time::sleep(Duration::from_secs(5)).await;
let rtt_ms = rtt_transport.connection().stats().path.rtt.as_millis() as f64;
}
};
tokio::select! {
_ = signal_task => {}
_ = media_task => {}
_ = rtt_task => {}
}
// Cleanup: remove peer link + metrics
fm.metrics.federation_peer_status.with_label_values(&[&peer_label]).set(0);
{
let mut links = fm.peer_links.lock().await;
links.remove(&peer_fp);
}
info!(peer = %peer_label, "federation link ended");
Ok(())
}
/// Handle an incoming federation signal.
async fn handle_signal(
fm: &Arc<FederationManager>,
peer_fp: &str,
peer_label: &str,
msg: SignalMessage,
) {
// Update last_seen for this peer
{
let mut links = fm.peer_links.lock().await;
if let Some(link) = links.get_mut(peer_fp) {
link.last_seen = Instant::now();
}
}
match msg {
SignalMessage::GlobalRoomActive { room, participants } => {
if fm.is_global_room(&room) {
info!(peer = %peer_label, room = %room, remote_participants = participants.len(), "peer has global room active");
let mut links = fm.peer_links.lock().await;
if let Some(link) = links.get_mut(peer_fp) {
link.active_rooms.insert(room.clone());
}
// Update active rooms metric
let total: usize = links.values().map(|l| l.active_rooms.len()).sum();
fm.metrics.federation_active_rooms.set(total as i64);
if let Some(link) = links.get_mut(peer_fp) {
// Tag remote participants with their relay label
let tagged: Vec<_> = participants.iter().map(|p| {
let mut tagged = p.clone();
if tagged.relay_label.is_none() {
tagged.relay_label = Some(link.label.clone());
}
tagged
}).collect();
link.remote_participants.insert(room.clone(), tagged);
}
// Propagate to other peers (with relay labels preserved)
let tagged_for_propagation = if let Some(link) = links.get(peer_fp) {
let label = link.label.clone();
participants.iter().map(|p| {
let mut t = p.clone();
if t.relay_label.is_none() {
t.relay_label = Some(label.clone());
}
t
}).collect::<Vec<_>>()
} else {
participants.clone()
};
for (fp, link) in links.iter() {
if fp != peer_fp {
let _ = link.transport.send_signal(&SignalMessage::GlobalRoomActive {
room: room.clone(),
participants: tagged_for_propagation.clone(),
}).await;
}
}
drop(links);
// Broadcast updated RoomUpdate to local clients in this room
// Find the local room name (may be hashed or raw)
let mgr = fm.room_mgr.lock().await;
for local_room in mgr.active_rooms() {
if fm.is_global_room(&local_room) && fm.resolve_global_room(&local_room) == fm.resolve_global_room(&room) {
// Build merged participant list: local + all remote (deduped)
let mut all_participants = mgr.local_participant_list(&local_room);
let links = fm.peer_links.lock().await;
for link in links.values() {
if let Some(canonical) = fm.resolve_global_room(&local_room) {
if let Some(remote) = link.remote_participants.get(canonical) {
all_participants.extend(remote.iter().cloned());
}
// Also check raw room name, but only if different from canonical
if canonical != local_room {
if let Some(remote) = link.remote_participants.get(&local_room) {
all_participants.extend(remote.iter().cloned());
}
}
}
}
// Deduplicate by fingerprint
let mut seen = HashSet::new();
all_participants.retain(|p| seen.insert(p.fingerprint.clone()));
let update = SignalMessage::RoomUpdate {
count: all_participants.len() as u32,
participants: all_participants,
};
let senders = mgr.local_senders(&local_room);
drop(links);
drop(mgr);
room::broadcast_signal(&senders, &update).await;
break;
}
}
}
}
SignalMessage::GlobalRoomInactive { room } => {
info!(peer = %peer_label, room = %room, "peer global room now inactive");
let mut links = fm.peer_links.lock().await;
if let Some(link) = links.get_mut(peer_fp) {
link.active_rooms.remove(&room);
// Clear remote participants for this peer+room
link.remote_participants.remove(&room);
// Also try canonical name
if let Some(canonical) = fm.resolve_global_room(&room) {
link.remote_participants.remove(canonical);
}
}
// Update active rooms metric
let total: usize = links.values().map(|l| l.active_rooms.len()).sum();
fm.metrics.federation_active_rooms.set(total as i64);
// Build remaining remote participants (from all peers except the one going inactive)
let remaining_remote: Vec<wzp_proto::packet::RoomParticipant> = {
let canonical = fm.resolve_global_room(&room);
let mut result = Vec::new();
for (fp, link) in links.iter() {
if fp == peer_fp { continue; }
if let Some(c) = canonical {
if let Some(remote) = link.remote_participants.get(c) {
result.extend(remote.iter().cloned());
}
}
}
let mut seen = HashSet::new();
result.retain(|p| seen.insert(p.fingerprint.clone()));
result
};
// Propagate to other peers: send updated GlobalRoomActive with revised list,
// or GlobalRoomInactive if no participants remain anywhere
let local_active = {
let mgr = fm.room_mgr.lock().await;
mgr.active_rooms().iter().any(|r| fm.resolve_global_room(r) == fm.resolve_global_room(&room))
};
let has_remaining = !remaining_remote.is_empty() || local_active;
// Collect peer transports to send to (avoid holding lock across await)
let peer_sends: Vec<_> = links.iter()
.filter(|(fp, _)| *fp != peer_fp)
.map(|(_, link)| link.transport.clone())
.collect();
drop(links);
if has_remaining {
// Send updated participant list to other peers
let mut updated_participants = remaining_remote.clone();
if local_active {
let mgr = fm.room_mgr.lock().await;
for local_room in mgr.active_rooms() {
if fm.resolve_global_room(&local_room) == fm.resolve_global_room(&room) {
updated_participants.extend(mgr.local_participant_list(&local_room));
break;
}
}
}
let msg = SignalMessage::GlobalRoomActive {
room: room.clone(),
participants: updated_participants,
};
for transport in &peer_sends {
let _ = transport.send_signal(&msg).await;
}
} else {
// No participants left anywhere — propagate inactive
let msg = SignalMessage::GlobalRoomInactive { room: room.clone() };
for transport in &peer_sends {
let _ = transport.send_signal(&msg).await;
}
}
// Broadcast updated RoomUpdate to local clients (remote participant removed)
let mgr = fm.room_mgr.lock().await;
for local_room in mgr.active_rooms() {
if fm.is_global_room(&local_room) && fm.resolve_global_room(&local_room) == fm.resolve_global_room(&room) {
let mut all_participants = mgr.local_participant_list(&local_room);
all_participants.extend(remaining_remote.iter().cloned());
// Deduplicate by fingerprint
let mut seen = HashSet::new();
all_participants.retain(|p| seen.insert(p.fingerprint.clone()));
let update = SignalMessage::RoomUpdate {
count: all_participants.len() as u32,
participants: all_participants,
};
let senders = mgr.local_senders(&local_room);
drop(mgr);
room::broadcast_signal(&senders, &update).await;
info!(room = %room, "broadcast updated presence (remote participant removed)");
break;
}
}
}
_ => {} // ignore other signals
}
}
/// Handle an incoming federation datagram (room-hash-tagged media).
async fn handle_datagram(
fm: &Arc<FederationManager>,
source_peer_fp: &str,
data: Bytes,
) {
if data.len() < 12 { return; } // 8-byte hash + min packet
let mut rh = [0u8; 8];
rh.copy_from_slice(&data[..8]);
let media_bytes = data.slice(8..);
let pkt = match wzp_proto::MediaPacket::from_bytes(media_bytes.clone()) {
Some(pkt) => pkt,
None => {
fm.event_log.emit(Event::new("federation_ingress_malformed").len(data.len()));
return;
}
};
// Event log: federation ingress
let peer_label = {
let links = fm.peer_links.lock().await;
links.get(source_peer_fp).map(|l| l.label.clone()).unwrap_or_default()
};
fm.event_log.emit(Event::new("federation_ingress").packet(&pkt).peer(&peer_label));
// Count inbound federation packet + update last_seen
fm.metrics.federation_packets_forwarded
.with_label_values(&[source_peer_fp, "in"]).inc();
{
let mut links = fm.peer_links.lock().await;
if let Some(link) = links.get_mut(source_peer_fp) {
link.last_seen = Instant::now();
}
}
// Dedup: drop packets we've already seen (multi-path duplicates).
// Key uses a hash of the actual payload bytes — unique per Opus frame,
// so different senders with the same seq/timestamp never collide.
let payload_hash = {
let mut h = 0u64;
for (i, &b) in media_bytes.iter().take(16).enumerate() {
h ^= (b as u64) << ((i % 8) * 8);
}
h
};
{
let mut dedup = fm.dedup.lock().await;
if dedup.is_dup(&rh, pkt.header.seq, payload_hash) {
fm.event_log.emit(Event::new("dedup_drop").seq(pkt.header.seq).peer(&peer_label));
return;
}
}
// Find room by hash — check local rooms AND global room config
let room_name = {
let mgr = fm.room_mgr.lock().await;
let active = mgr.active_rooms();
// First: check local rooms (has participants)
active.iter().find(|r| room_hash(r) == rh).cloned()
.or_else(|| active.iter().find(|r| fm.global_room_hash(r) == rh).cloned())
// Second: check global room config (hub relay may have no local participants)
.or_else(|| {
fm.global_rooms.iter().find(|name| room_hash(name) == rh).cloned()
})
};
let room_name = match room_name {
Some(r) => r,
None => {
fm.event_log.emit(Event::new("room_not_found").seq(pkt.header.seq).peer(&peer_label));
return;
}
};
// Rate limit per room
if FEDERATION_RATE_LIMIT_PPS > 0 {
let mut limiters = fm.rate_limiters.lock().await;
let limiter = limiters.entry(room_name.clone())
.or_insert_with(|| RateLimiter::new(FEDERATION_RATE_LIMIT_PPS));
if !limiter.allow() {
fm.event_log.emit(Event::new("rate_limit_drop").room(&room_name).seq(pkt.header.seq));
return;
}
}
// Deliver to all local participants — forward the raw bytes as-is.
// The original sender's MediaPacket is preserved exactly (no re-serialization).
let locals = {
let mgr = fm.room_mgr.lock().await;
mgr.local_senders(&room_name)
};
for sender in &locals {
match sender {
room::ParticipantSender::Quic(t) => {
if let Err(e) = t.send_raw_datagram(&media_bytes) {
fm.event_log.emit(Event::new("local_deliver_error").room(&room_name).seq(pkt.header.seq).reason(&e.to_string()));
warn!("federation local delivery error: {e}");
}
}
room::ParticipantSender::WebSocket(_) => { let _ = sender.send_raw(&pkt.payload).await; }
}
}
fm.event_log.emit(Event::new("local_deliver").room(&room_name).seq(pkt.header.seq).to_count(locals.len()));
// Multi-hop: forward to ALL other connected peers (not the source)
// Don't filter by active_rooms — the receiving peer decides whether to deliver
let links = fm.peer_links.lock().await;
for (fp, link) in links.iter() {
if fp != source_peer_fp {
let mut tagged = Vec::with_capacity(8 + media_bytes.len());
tagged.extend_from_slice(&rh);
tagged.extend_from_slice(&media_bytes);
let _ = link.transport.send_raw_datagram(&tagged);
}
}
}

View File

@@ -78,31 +78,26 @@ pub async fn accept_handshake(
};
transport.send_signal(&answer).await?;
// Derive caller fingerprint from their identity public key (first 8 bytes as hex)
let caller_fp = caller_identity_pub[..8]
.iter()
.map(|b| format!("{b:02x}"))
.collect::<String>();
// Derive caller fingerprint: SHA-256(Ed25519 pub)[:16], formatted as xxxx:xxxx:...
// Must match the format used in signal registration and presence.
let caller_fp = {
use sha2::{Sha256, Digest};
let hash = Sha256::digest(&caller_identity_pub);
let fp = wzp_crypto::Fingerprint([
hash[0], hash[1], hash[2], hash[3], hash[4], hash[5], hash[6], hash[7],
hash[8], hash[9], hash[10], hash[11], hash[12], hash[13], hash[14], hash[15],
]);
fp.to_string()
};
Ok((session, chosen_profile, caller_fp, caller_alias))
}
/// Select the best quality profile from those the caller supports.
fn choose_profile(supported: &[QualityProfile]) -> QualityProfile {
// Prefer higher-quality profiles. Use GOOD as default if supported list is empty.
if supported.is_empty() {
return QualityProfile::GOOD;
}
// Pick the profile with the highest bitrate.
supported
.iter()
.max_by(|a, b| {
a.total_bitrate_kbps()
.partial_cmp(&b.total_bitrate_kbps())
.unwrap_or(std::cmp::Ordering::Equal)
})
.copied()
.unwrap_or(QualityProfile::GOOD)
// Cap at GOOD (24k) for now — studio tiers (32k/48k/64k) not yet tested
// for federation reliability (large packets may exceed path MTU).
QualityProfile::GOOD
}
#[cfg(test)]

View File

@@ -8,7 +8,11 @@
//! quality transitions.
pub mod auth;
pub mod call_registry;
pub mod config;
pub mod event_log;
pub mod federation;
pub mod signal_hub;
pub mod handshake;
pub mod metrics;
pub mod pipeline;

View File

@@ -13,9 +13,9 @@ use std::sync::Arc;
use std::time::Duration;
use tokio::sync::Mutex;
use tracing::{error, info};
use tracing::{error, info, warn};
use wzp_proto::MediaTransport;
use wzp_proto::{MediaTransport, SignalMessage};
use wzp_relay::config::RelayConfig;
use wzp_relay::metrics::RelayMetrics;
use wzp_relay::pipeline::{PipelineConfig, RelayPipeline};
@@ -23,12 +23,54 @@ use wzp_relay::presence::PresenceRegistry;
use wzp_relay::room::{self, RoomManager};
use wzp_relay::session_mgr::SessionManager;
fn parse_args() -> RelayConfig {
let mut config = RelayConfig::default();
/// Parsed CLI result — config + identity path.
struct CliResult {
config: RelayConfig,
identity_path: Option<String>,
config_file: Option<String>,
config_needs_create: bool,
}
fn parse_args() -> CliResult {
let args: Vec<String> = std::env::args().collect();
// First pass: extract --config and --identity
let mut config_file = None;
let mut identity_path = None;
let mut i = 1;
while i < args.len() {
match args[i].as_str() {
"--config" | "-c" => { i += 1; config_file = args.get(i).cloned(); }
"--identity" | "-i" => { i += 1; identity_path = args.get(i).cloned(); }
_ => {}
}
i += 1;
}
// Track if we need to create the config after identity is known
let config_needs_create = config_file.as_ref().map(|p| !std::path::Path::new(p).exists()).unwrap_or(false);
let mut config = if let Some(ref path) = config_file {
if config_needs_create {
// Will be re-created with personalized info after identity is loaded
RelayConfig::default()
} else {
wzp_relay::config::load_config(path)
.unwrap_or_else(|e| {
eprintln!("failed to load config from {path}: {e}");
std::process::exit(1);
})
}
} else {
RelayConfig::default()
};
// CLI flags override config file values
let mut i = 1;
while i < args.len() {
match args[i].as_str() {
"--config" | "-c" => { i += 1; } // already handled
"--identity" | "-i" => { i += 1; } // already handled
"--listen" => {
i += 1;
config.listen_addr = args.get(i).expect("--listen requires an address")
@@ -81,6 +123,28 @@ fn parse_args() -> RelayConfig {
args.get(i).expect("--static-dir requires a directory path").to_string(),
);
}
"--global-room" => {
i += 1;
config.global_rooms.push(wzp_relay::config::GlobalRoomConfig {
name: args.get(i).expect("--global-room requires a room name").to_string(),
});
}
"--debug-tap" => {
i += 1;
config.debug_tap = Some(
args.get(i).expect("--debug-tap requires a room name (or '*' for all)").to_string(),
);
}
"--event-log" => {
i += 1;
config.event_log = Some(
args.get(i).expect("--event-log requires a file path").to_string(),
);
}
"--version" | "-V" => {
println!("wzp-relay {}", env!("WZP_BUILD_HASH"));
std::process::exit(0);
}
"--mesh-status" => {
// Print mesh table from a fresh registry and exit.
// In practice this is useful after the relay has been running;
@@ -90,9 +154,11 @@ fn parse_args() -> RelayConfig {
std::process::exit(0);
}
"--help" | "-h" => {
eprintln!("Usage: wzp-relay [--listen <addr>] [--remote <addr>] [--auth-url <url>] [--metrics-port <port>] [--probe <addr>]... [--probe-mesh] [--mesh-status]");
eprintln!("Usage: wzp-relay [--config <path>] [--listen <addr>] [--remote <addr>] [--auth-url <url>] [--metrics-port <port>] [--probe <addr>]... [--probe-mesh] [--mesh-status]");
eprintln!();
eprintln!("Options:");
eprintln!(" -c, --config <path> Load config from TOML file (creates example if missing)");
eprintln!(" -i, --identity <path> Identity file path (creates if missing, uses OsRng)");
eprintln!(" --listen <addr> Listen address (default: 0.0.0.0:4433)");
eprintln!(" --remote <addr> Remote relay for forwarding (disables room mode)");
eprintln!(" --auth-url <url> featherChat auth endpoint (e.g., https://chat.example.com/v1/auth/validate)");
@@ -102,6 +168,8 @@ fn parse_args() -> RelayConfig {
eprintln!(" --probe-mesh Enable mesh mode (mark config flag, probes all --probe targets).");
eprintln!(" --mesh-status Print mesh health table and exit (diagnostic).");
eprintln!(" --trunking Enable trunk batching for outgoing media in room mode.");
eprintln!(" --global-room <name> Declare a room as global (bridged across federation). Repeatable.");
eprintln!(" --debug-tap <room> Log packet headers for a room ('*' for all rooms).");
eprintln!(" --ws-port <port> WebSocket listener port for browser clients (e.g., 8080).");
eprintln!(" --static-dir <dir> Directory to serve static files from (HTML/JS/WASM).");
eprintln!();
@@ -116,7 +184,7 @@ fn parse_args() -> RelayConfig {
}
i += 1;
}
config
CliResult { config, identity_path, config_file, config_needs_create }
}
struct RelayStats {
@@ -184,10 +252,29 @@ async fn run_downstream(
}
}
/// Detect a non-loopback IP address from local interfaces.
/// Prefers public IPs over private (10.x, 172.16-31.x, 192.168.x).
fn detect_public_ip() -> Option<String> {
use std::net::UdpSocket;
// Connect to a public address to find our outbound IP (doesn't actually send anything)
if let Ok(socket) = UdpSocket::bind("0.0.0.0:0") {
if socket.connect("8.8.8.8:80").is_ok() {
if let Ok(addr) = socket.local_addr() {
return Some(addr.ip().to_string());
}
}
}
None
}
/// Build-time git hash, set by build.rs or env.
const BUILD_GIT_HASH: &str = env!("WZP_BUILD_HASH");
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let config = parse_args();
let CliResult { mut config, identity_path, config_file, config_needs_create } = parse_args();
tracing_subscriber::fmt().init();
info!(version = BUILD_GIT_HASH, "wzp-relay build");
rustls::crypto::ring::default_provider()
.install_default()
.expect("failed to install rustls crypto provider");
@@ -207,12 +294,88 @@ async fn main() -> anyhow::Result<()> {
tokio::spawn(wzp_relay::metrics::serve_metrics(port, m, p, rr));
}
// Generate ephemeral relay identity for crypto handshake
let relay_seed = wzp_crypto::Seed::generate();
// Load or generate relay identity
let relay_seed = {
let id_path = match identity_path {
Some(ref p) => std::path::PathBuf::from(p),
None => dirs::home_dir()
.unwrap_or_else(|| std::path::PathBuf::from("."))
.join(".wzp")
.join("relay-identity"),
};
if id_path.exists() {
if let Ok(hex) = std::fs::read_to_string(&id_path) {
if let Ok(s) = wzp_crypto::Seed::from_hex(hex.trim()) {
info!("loaded relay identity from {}", id_path.display());
s
} else {
warn!("corrupt identity file {}, generating new", id_path.display());
let s = wzp_crypto::Seed::generate();
let hex: String = s.0.iter().map(|b| format!("{b:02x}")).collect();
let _ = std::fs::write(&id_path, &hex);
s
}
} else {
let s = wzp_crypto::Seed::generate();
let hex: String = s.0.iter().map(|b| format!("{b:02x}")).collect();
let _ = std::fs::write(&id_path, &hex);
s
}
} else {
let s = wzp_crypto::Seed::generate();
if let Some(parent) = id_path.parent() {
let _ = std::fs::create_dir_all(parent);
}
let hex: String = s.0.iter().map(|b| format!("{b:02x}")).collect();
let _ = std::fs::write(&id_path, &hex);
info!("generated relay identity at {}", id_path.display());
s
}
};
let relay_fp = relay_seed.derive_identity().public_identity().fingerprint;
info!(addr = %config.listen_addr, fingerprint = %relay_fp, "WarzonePhone relay starting");
let (server_config, _cert) = wzp_transport::server_config();
let (server_config, cert_der) = wzp_transport::server_config_from_seed(&relay_seed.0);
let tls_fp = wzp_transport::tls_fingerprint(&cert_der);
info!(tls_fingerprint = %tls_fp, "TLS certificate (deterministic from relay identity)");
// Create personalized config file if it was missing
let public_ip = detect_public_ip();
if config_needs_create {
if let Some(ref path) = config_file {
let info = wzp_relay::config::RelayInfo {
listen_addr: config.listen_addr.to_string(),
tls_fingerprint: tls_fp.clone(),
public_ip: public_ip.clone(),
};
if let Err(e) = wzp_relay::config::load_or_create_config(path, Some(&info)) {
warn!("failed to create config: {e}");
}
}
}
// Print federation hint with our public IP + listen port + TLS fingerprint
let listen_port = config.listen_addr.port();
if let Some(ip) = &public_ip {
info!("federation: to peer with this relay, add to relay.toml:");
info!(" [[peers]]");
info!(" url = \"{ip}:{listen_port}\"");
info!(" fingerprint = \"{tls_fp}\"");
}
// Log configured peers and trusted relays
if !config.peers.is_empty() {
info!(count = config.peers.len(), "federation peers configured");
for p in &config.peers {
info!(url = %p.url, label = ?p.label, " peer");
}
}
if !config.trusted.is_empty() {
info!(count = config.trusted.len(), "trusted relays configured");
for t in &config.trusted {
info!(fingerprint = %t.fingerprint, label = ?t.label, " trusted");
}
}
let endpoint = wzp_transport::create_endpoint(config.listen_addr, Some(server_config))?;
// Forward mode
@@ -230,9 +393,41 @@ async fn main() -> anyhow::Result<()> {
// Room manager (room mode only)
let room_mgr = Arc::new(Mutex::new(RoomManager::new()));
// Event log for protocol analysis
let event_log = wzp_relay::event_log::start_event_log(
config.event_log.as_ref().map(std::path::PathBuf::from)
);
// Federation manager
let global_room_set: std::collections::HashSet<String> = config.global_rooms.iter()
.map(|g| g.name.clone())
.collect();
let federation_mgr = if !config.peers.is_empty() || !config.trusted.is_empty() || !global_room_set.is_empty() {
let fm = Arc::new(wzp_relay::federation::FederationManager::new(
config.peers.clone(),
config.trusted.clone(),
global_room_set.clone(),
room_mgr.clone(),
endpoint.clone(),
tls_fp.clone(),
metrics.clone(),
event_log.clone(),
));
let fm_run = fm.clone();
tokio::spawn(async move { fm_run.run().await });
Some(fm)
} else {
None
};
// Session manager — enforces max concurrent sessions
let session_mgr = Arc::new(Mutex::new(SessionManager::new(config.max_sessions)));
// Signal hub + call registry for direct 1:1 calls
let signal_hub = Arc::new(Mutex::new(wzp_relay::signal_hub::SignalHub::new()));
let call_registry = Arc::new(Mutex::new(wzp_relay::call_registry::CallRegistry::new()));
// Spawn inter-relay health probes via ProbeMesh coordinator
if !config.probe_targets.is_empty() {
let mesh = wzp_relay::probe::ProbeMesh::new(
@@ -267,6 +462,15 @@ async fn main() -> anyhow::Result<()> {
} else {
info!("auth disabled — any client can connect (use --auth-url to enable)");
}
if !config.global_rooms.is_empty() {
info!(count = config.global_rooms.len(), "global rooms configured");
for g in &config.global_rooms {
info!(name = %g.name, " global room");
}
}
if let Some(ref tap) = config.debug_tap {
info!(filter = %tap, "debug tap enabled — logging packet headers");
}
info!("Listening for connections...");
@@ -283,8 +487,13 @@ async fn main() -> anyhow::Result<()> {
let relay_seed_bytes = relay_seed.0;
let metrics = metrics.clone();
let trunking_enabled = config.trunking_enabled;
let debug_tap = config.debug_tap.as_ref().map(|filter| room::DebugTap { room_filter: filter.clone() });
let presence = presence.clone();
let route_resolver = route_resolver.clone();
let federation_mgr = federation_mgr.clone();
let signal_hub = signal_hub.clone();
let call_registry = call_registry.clone();
let listen_addr_str = config.listen_addr.to_string();
tokio::spawn(async move {
let addr = connection.remote_address();
@@ -299,6 +508,23 @@ async fn main() -> anyhow::Result<()> {
let transport = Arc::new(wzp_transport::QuinnTransport::new(connection));
// Ping connections: client just measures QUIC connect RTT.
if room_name == "ping" {
info!(%addr, "ping connection (RTT probe)");
return;
}
// Version query: respond with build hash over a uni stream.
if room_name == "version" {
if let Ok(mut send) = transport.connection().open_uni().await {
let _ = send.write_all(BUILD_GIT_HASH.as_bytes()).await;
let _ = send.finish();
// Wait for client to read before closing
tokio::time::sleep(std::time::Duration::from_millis(100)).await;
}
return;
}
// Probe connections use SNI "_probe" to identify themselves.
// They skip auth + handshake and just do Ping->Pong + presence gossip.
if room_name == "_probe" {
@@ -385,6 +611,294 @@ async fn main() -> anyhow::Result<()> {
return;
}
// Federation connections use SNI "_federation"
if room_name == "_federation" {
if let Some(ref fm) = federation_mgr {
// Wait for FederationHello to identify the connecting relay
let hello_fp = match tokio::time::timeout(
std::time::Duration::from_secs(5),
transport.recv_signal(),
).await {
Ok(Ok(Some(wzp_proto::SignalMessage::FederationHello { tls_fingerprint }))) => tls_fingerprint,
_ => {
warn!(%addr, "federation: no hello received, closing");
return;
}
};
if let Some(label) = fm.check_inbound_trust(addr, &hello_fp) {
let peer_config = wzp_relay::config::PeerConfig {
url: addr.to_string(),
fingerprint: hello_fp,
label: Some(label.clone()),
};
let fm = fm.clone();
info!(%addr, label = %label, "inbound federation accepted (trusted)");
fm.handle_inbound(transport, peer_config).await;
} else {
warn!(%addr, fp = %hello_fp, "unknown relay wants to federate");
info!(" to accept, add to relay.toml:");
info!(" [[trusted]]");
info!(" fingerprint = \"{hello_fp}\"");
info!(" label = \"Relay at {addr}\"");
}
} else {
info!(%addr, "federation connection rejected (no federation configured)");
}
return;
}
// Direct calling: persistent signaling connection
if room_name == "_signal" {
info!(%addr, "signal connection");
// Optional auth
let auth_fp: Option<String> = if let Some(ref url) = auth_url {
match transport.recv_signal().await {
Ok(Some(SignalMessage::AuthToken { token })) => {
match wzp_relay::auth::validate_token(url, &token).await {
Ok(client) => Some(client.fingerprint),
Err(e) => {
error!(%addr, "signal auth failed: {e}");
return;
}
}
}
_ => { warn!(%addr, "signal: expected AuthToken"); return; }
}
} else {
None
};
// Wait for RegisterPresence
let (client_fp, client_alias) = match tokio::time::timeout(
std::time::Duration::from_secs(10),
transport.recv_signal(),
).await {
Ok(Ok(Some(SignalMessage::RegisterPresence { identity_pub, signature: _, alias }))) => {
// Compute fingerprint: SHA-256(Ed25519 pub key)[:16], same as Fingerprint type
let fp = {
use sha2::{Sha256, Digest};
let hash = Sha256::digest(&identity_pub);
let fingerprint = wzp_crypto::Fingerprint([
hash[0], hash[1], hash[2], hash[3], hash[4], hash[5], hash[6], hash[7],
hash[8], hash[9], hash[10], hash[11], hash[12], hash[13], hash[14], hash[15],
]);
fingerprint.to_string()
};
let fp = auth_fp.unwrap_or(fp);
(fp, alias)
}
_ => {
warn!(%addr, "signal: no RegisterPresence received");
return;
}
};
// Register in signal hub + presence
{
let mut hub = signal_hub.lock().await;
hub.register(client_fp.clone(), transport.clone(), client_alias.clone());
}
{
let mut reg = presence.lock().await;
reg.register_local(&client_fp, client_alias.clone(), None);
}
// Send ack
let _ = transport.send_signal(&SignalMessage::RegisterPresenceAck {
success: true,
error: None,
}).await;
info!(%addr, fingerprint = %client_fp, alias = ?client_alias, "signal client registered");
// Signal recv loop
loop {
match transport.recv_signal().await {
Ok(Some(msg)) => {
match msg {
SignalMessage::DirectCallOffer { ref target_fingerprint, ref call_id, ref caller_alias, .. } => {
let target_fp = target_fingerprint.clone();
let call_id = call_id.clone();
// Check if target is online
let online = {
let hub = signal_hub.lock().await;
hub.is_online(&target_fp)
};
if !online {
info!(%addr, target = %target_fp, "call target not online");
let _ = transport.send_signal(&SignalMessage::Hangup {
reason: wzp_proto::HangupReason::Normal,
}).await;
continue;
}
// Create call in registry
{
let mut reg = call_registry.lock().await;
reg.create_call(call_id.clone(), client_fp.clone(), target_fp.clone());
}
// Forward offer to callee
info!(caller = %client_fp, callee = %target_fp, call_id = %call_id, "routing direct call offer");
let hub = signal_hub.lock().await;
if let Err(e) = hub.send_to(&target_fp, &msg).await {
warn!("failed to forward call offer: {e}");
}
// Send ringing to caller
drop(hub);
let _ = transport.send_signal(&SignalMessage::CallRinging {
call_id: call_id.clone(),
}).await;
}
SignalMessage::DirectCallAnswer { ref call_id, ref accept_mode, .. } => {
let call_id = call_id.clone();
let mode = *accept_mode;
let peer_fp = {
let reg = call_registry.lock().await;
reg.peer_fingerprint(&call_id, &client_fp).map(|s| s.to_string())
};
let Some(peer_fp) = peer_fp else {
warn!(call_id = %call_id, "answer for unknown call");
continue;
};
if mode == wzp_proto::CallAcceptMode::Reject {
info!(call_id = %call_id, "call rejected");
let mut reg = call_registry.lock().await;
reg.end_call(&call_id);
drop(reg);
let hub = signal_hub.lock().await;
let _ = hub.send_to(&peer_fp, &SignalMessage::Hangup {
reason: wzp_proto::HangupReason::Normal,
}).await;
} else {
// Accept — create private room
let room = format!("call-{call_id}");
{
let mut reg = call_registry.lock().await;
reg.set_active(&call_id, mode, room.clone());
}
info!(call_id = %call_id, room = %room, mode = ?mode, "call accepted, creating room");
// Forward answer to caller
{
let hub = signal_hub.lock().await;
let _ = hub.send_to(&peer_fp, &msg).await;
}
// Send CallSetup to both parties
// Use the address the client connected to (their remote addr
// is our perspective, but we need our listen addr).
// Replace 0.0.0.0 with the client's destination IP.
let relay_addr_for_setup = if listen_addr_str.starts_with("0.0.0.0:") {
let port = &listen_addr_str[8..];
// Use the local IP from the client's connection
let local_ip = addr.ip();
if local_ip.is_loopback() {
format!("127.0.0.1:{port}")
} else {
format!("{local_ip}:{port}")
}
} else {
listen_addr_str.clone()
};
let setup = SignalMessage::CallSetup {
call_id: call_id.clone(),
room: room.clone(),
relay_addr: relay_addr_for_setup,
};
{
let hub = signal_hub.lock().await;
let _ = hub.send_to(&peer_fp, &setup).await;
let _ = hub.send_to(&client_fp, &setup).await;
}
}
}
SignalMessage::Hangup { .. } => {
// Forward hangup to all active calls for this user
let calls = {
let reg = call_registry.lock().await;
reg.calls_for_fingerprint(&client_fp)
.iter()
.map(|c| (c.call_id.clone(), if c.caller_fingerprint == client_fp {
c.callee_fingerprint.clone()
} else {
c.caller_fingerprint.clone()
}))
.collect::<Vec<_>>()
};
for (call_id, peer_fp) in &calls {
let hub = signal_hub.lock().await;
let _ = hub.send_to(peer_fp, &msg).await;
drop(hub);
let mut reg = call_registry.lock().await;
reg.end_call(call_id);
}
}
SignalMessage::Ping { timestamp_ms } => {
let _ = transport.send_signal(&SignalMessage::Pong { timestamp_ms }).await;
}
other => {
warn!(%addr, "signal: unexpected message: {:?}", std::mem::discriminant(&other));
}
}
}
Ok(None) => {
info!(%addr, "signal connection closed");
break;
}
Err(e) => {
warn!(%addr, "signal recv error: {e}");
break;
}
}
}
// Cleanup: unregister + end active calls
let active_calls = {
let reg = call_registry.lock().await;
reg.calls_for_fingerprint(&client_fp)
.iter()
.map(|c| (c.call_id.clone(), if c.caller_fingerprint == client_fp {
c.callee_fingerprint.clone()
} else {
c.caller_fingerprint.clone()
}))
.collect::<Vec<_>>()
};
for (call_id, peer_fp) in &active_calls {
let hub = signal_hub.lock().await;
let _ = hub.send_to(peer_fp, &SignalMessage::Hangup {
reason: wzp_proto::HangupReason::Normal,
}).await;
drop(hub);
let mut reg = call_registry.lock().await;
reg.end_call(call_id);
}
{
let mut hub = signal_hub.lock().await;
hub.unregister(&client_fp);
}
{
let mut reg = presence.lock().await;
reg.unregister_local(&client_fp);
}
transport.close().await.ok();
return;
}
// Auth check: if --auth-url is set, expect first signal message to be a token
// Auth: if --auth-url is set, expect AuthToken as first signal
let authenticated_fp: Option<String> = if let Some(ref url) = auth_url {
@@ -451,6 +965,28 @@ async fn main() -> anyhow::Result<()> {
// Use the caller's identity fingerprint from the handshake
let participant_fp = authenticated_fp.clone().unwrap_or(caller_fp);
// ACL: call rooms (call-*) are restricted to the two authorized participants.
// Only the relay's call orchestrator creates these rooms — random clients can't join.
if room_name.starts_with("call-") {
let call_id = &room_name[5..]; // strip "call-" prefix
let authorized = {
let reg = call_registry.lock().await;
match reg.get(call_id) {
Some(call) => {
call.caller_fingerprint == participant_fp
|| call.callee_fingerprint == participant_fp
}
None => false, // unknown call — reject
}
};
if !authorized {
warn!(%addr, room = %room_name, fp = %participant_fp, "rejected: not authorized for this call room");
transport.close().await.ok();
return;
}
info!(%addr, room = %room_name, fp = %participant_fp, "authorized for call room");
}
// Register in presence registry
{
let mut reg = presence.lock().await;
@@ -503,6 +1039,20 @@ async fn main() -> anyhow::Result<()> {
metrics.active_sessions.inc();
// Call rooms: enforce 2-participant limit
if room_name.starts_with("call-") {
let mgr = room_mgr.lock().await;
if mgr.room_size(&room_name) >= 2 {
drop(mgr);
warn!(%addr, room = %room_name, "call room full (max 2 participants)");
metrics.active_sessions.dec();
let mut smgr = session_mgr.lock().await;
smgr.remove_session(session_id);
transport.close().await.ok();
return;
}
}
let participant_id = {
let mut mgr = room_mgr.lock().await;
match mgr.join(
@@ -515,7 +1065,25 @@ async fn main() -> anyhow::Result<()> {
Ok((id, update, senders)) => {
metrics.active_rooms.set(mgr.list().len() as i64);
drop(mgr); // release lock before async broadcast
room::broadcast_signal(&senders, &update).await;
// Merge federated participants into RoomUpdate if this is a global room
let merged_update = if let Some(ref fm) = federation_mgr {
if fm.is_global_room(&room_name) {
if let SignalMessage::RoomUpdate { count: _, participants: mut local_parts } = update {
let remote = fm.get_remote_participants(&room_name).await;
local_parts.extend(remote);
// Deduplicate by fingerprint
let mut seen = std::collections::HashSet::new();
local_parts.retain(|p| seen.insert(p.fingerprint.clone()));
SignalMessage::RoomUpdate {
count: local_parts.len() as u32,
participants: local_parts,
}
} else { update }
} else { update }
} else { update };
room::broadcast_signal(&senders, &merged_update).await;
id
}
Err(e) => {
@@ -533,6 +1101,25 @@ async fn main() -> anyhow::Result<()> {
.iter()
.map(|b| format!("{b:02x}"))
.collect();
// Set up federation media channel if this is a global room
let (federation_tx, federation_room_hash) = if let Some(ref fm) = federation_mgr {
let is_global = fm.is_global_room(&room_name);
if is_global {
let canonical_hash = fm.global_room_hash(&room_name);
let (tx, rx) = tokio::sync::mpsc::channel(256);
let fm_clone = fm.clone();
tokio::spawn(async move {
wzp_relay::federation::run_federation_media_egress(fm_clone, rx).await;
});
info!(room = %room_name, canonical = ?fm.resolve_global_room(&room_name), "federation egress created (global room)");
(Some(tx), Some(canonical_hash))
} else {
(None, None)
}
} else {
(None, None)
};
room::run_participant(
room_mgr.clone(),
room_name,
@@ -541,6 +1128,9 @@ async fn main() -> anyhow::Result<()> {
metrics.clone(),
&session_id_str,
trunking_enabled,
debug_tap,
federation_tx,
federation_room_hash,
).await;
// Participant disconnected — clean up presence + per-session metrics

View File

@@ -16,12 +16,22 @@ pub struct RelayMetrics {
pub bytes_forwarded: IntCounter,
pub auth_attempts: IntCounterVec,
pub handshake_duration: Histogram,
// Federation metrics
pub federation_peer_status: IntGaugeVec,
pub federation_peer_rtt_ms: GaugeVec,
pub federation_packets_forwarded: IntCounterVec,
pub federation_packets_deduped: IntCounter,
pub federation_packets_rate_limited: IntCounter,
pub federation_active_rooms: IntGauge,
// Per-session metrics
pub session_buffer_depth: IntGaugeVec,
pub session_loss_pct: GaugeVec,
pub session_rtt_ms: GaugeVec,
pub session_underruns: IntCounterVec,
pub session_overruns: IntCounterVec,
// Phase 4: loss-recovery breakdown per session.
pub session_dred_reconstructions: IntCounterVec,
pub session_classical_plc: IntCounterVec,
registry: Registry,
}
@@ -60,6 +70,28 @@ impl RelayMetrics {
)
.expect("metric");
let federation_peer_status = IntGaugeVec::new(
Opts::new("wzp_federation_peer_status", "Peer connection status (0=disconnected, 1=connected)"),
&["peer"],
).expect("metric");
let federation_peer_rtt_ms = GaugeVec::new(
Opts::new("wzp_federation_peer_rtt_ms", "QUIC RTT to federated peer in milliseconds"),
&["peer"],
).expect("metric");
let federation_packets_forwarded = IntCounterVec::new(
Opts::new("wzp_federation_packets_forwarded_total", "Packets forwarded to/from federated peers"),
&["peer", "direction"],
).expect("metric");
let federation_packets_deduped = IntCounter::with_opts(
Opts::new("wzp_federation_packets_deduped_total", "Duplicate federation packets dropped"),
).expect("metric");
let federation_packets_rate_limited = IntCounter::with_opts(
Opts::new("wzp_federation_packets_rate_limited_total", "Federation packets dropped by rate limiter"),
).expect("metric");
let federation_active_rooms = IntGauge::with_opts(
Opts::new("wzp_federation_active_rooms", "Number of federated rooms currently active"),
).expect("metric");
let session_buffer_depth = IntGaugeVec::new(
Opts::new(
"wzp_relay_session_jitter_buffer_depth",
@@ -101,17 +133,42 @@ impl RelayMetrics {
)
.expect("metric");
let session_dred_reconstructions = IntCounterVec::new(
Opts::new(
"wzp_relay_session_dred_reconstructions_total",
"Frames reconstructed via DRED (Deep REDundancy) per session",
),
&["session_id"],
)
.expect("metric");
let session_classical_plc = IntCounterVec::new(
Opts::new(
"wzp_relay_session_classical_plc_total",
"Frames filled via classical Opus/Codec2 PLC per session",
),
&["session_id"],
)
.expect("metric");
registry.register(Box::new(active_sessions.clone())).expect("register");
registry.register(Box::new(active_rooms.clone())).expect("register");
registry.register(Box::new(packets_forwarded.clone())).expect("register");
registry.register(Box::new(bytes_forwarded.clone())).expect("register");
registry.register(Box::new(auth_attempts.clone())).expect("register");
registry.register(Box::new(handshake_duration.clone())).expect("register");
registry.register(Box::new(federation_peer_status.clone())).expect("register");
registry.register(Box::new(federation_peer_rtt_ms.clone())).expect("register");
registry.register(Box::new(federation_packets_forwarded.clone())).expect("register");
registry.register(Box::new(federation_packets_deduped.clone())).expect("register");
registry.register(Box::new(federation_packets_rate_limited.clone())).expect("register");
registry.register(Box::new(federation_active_rooms.clone())).expect("register");
registry.register(Box::new(session_buffer_depth.clone())).expect("register");
registry.register(Box::new(session_loss_pct.clone())).expect("register");
registry.register(Box::new(session_rtt_ms.clone())).expect("register");
registry.register(Box::new(session_underruns.clone())).expect("register");
registry.register(Box::new(session_overruns.clone())).expect("register");
registry.register(Box::new(session_dred_reconstructions.clone())).expect("register");
registry.register(Box::new(session_classical_plc.clone())).expect("register");
Self {
active_sessions,
@@ -120,11 +177,19 @@ impl RelayMetrics {
bytes_forwarded,
auth_attempts,
handshake_duration,
federation_peer_status,
federation_peer_rtt_ms,
federation_packets_forwarded,
federation_packets_deduped,
federation_packets_rate_limited,
federation_active_rooms,
session_buffer_depth,
session_loss_pct,
session_rtt_ms,
session_underruns,
session_overruns,
session_dred_reconstructions,
session_classical_plc,
registry,
}
}
@@ -176,6 +241,39 @@ impl RelayMetrics {
}
}
/// Phase 4: update per-session loss-recovery counters from a client's
/// `LossRecoveryUpdate` signal message. The client sends monotonic
/// totals (frames reconstructed since call start); we compute the
/// delta against the current Prometheus counter and increment by it.
/// IntCounterVec only increases, so a client restart that resets the
/// counter to 0 simply produces no delta until the new totals exceed
/// the Prometheus state.
pub fn update_session_loss_recovery(
&self,
session_id: &str,
dred_reconstructions: u64,
classical_plc: u64,
) {
let cur_dred = self
.session_dred_reconstructions
.with_label_values(&[session_id])
.get();
if dred_reconstructions > cur_dred {
self.session_dred_reconstructions
.with_label_values(&[session_id])
.inc_by(dred_reconstructions - cur_dred);
}
let cur_plc = self
.session_classical_plc
.with_label_values(&[session_id])
.get();
if classical_plc > cur_plc {
self.session_classical_plc
.with_label_values(&[session_id])
.inc_by(classical_plc - cur_plc);
}
}
/// Remove all per-session label values for a disconnected session.
pub fn remove_session_metrics(&self, session_id: &str) {
let _ = self.session_buffer_depth.remove_label_values(&[session_id]);
@@ -183,6 +281,10 @@ impl RelayMetrics {
let _ = self.session_rtt_ms.remove_label_values(&[session_id]);
let _ = self.session_underruns.remove_label_values(&[session_id]);
let _ = self.session_overruns.remove_label_values(&[session_id]);
let _ = self
.session_dred_reconstructions
.remove_label_values(&[session_id]);
let _ = self.session_classical_plc.remove_label_values(&[session_id]);
}
/// Get a reference to the underlying Prometheus registry.
@@ -377,10 +479,13 @@ mod tests {
};
m.update_session_quality("sess-cleanup", &report);
m.update_session_buffer("sess-cleanup", 42, 3, 1);
m.update_session_loss_recovery("sess-cleanup", 17, 4);
// Verify they appear
let output = m.metrics_handler();
assert!(output.contains("sess-cleanup"));
assert!(output.contains("wzp_relay_session_dred_reconstructions_total"));
assert!(output.contains("wzp_relay_session_classical_plc_total"));
// Remove and verify they are gone
m.remove_session_metrics("sess-cleanup");
@@ -388,6 +493,55 @@ mod tests {
assert!(!output.contains("sess-cleanup"));
}
/// Phase 4: LossRecoveryUpdate → per-session counters, monotonic delta
/// application.
#[test]
fn session_loss_recovery_monotonic_delta() {
let m = RelayMetrics::new();
let sess = "sess-dred";
// First update: 10 DRED, 2 PLC
m.update_session_loss_recovery(sess, 10, 2);
let dred1 = m
.session_dred_reconstructions
.with_label_values(&[sess])
.get();
let plc1 = m.session_classical_plc.with_label_values(&[sess]).get();
assert_eq!(dred1, 10);
assert_eq!(plc1, 2);
// Second update: 25 DRED, 5 PLC — counter advances by (15, 3)
m.update_session_loss_recovery(sess, 25, 5);
let dred2 = m
.session_dred_reconstructions
.with_label_values(&[sess])
.get();
let plc2 = m.session_classical_plc.with_label_values(&[sess]).get();
assert_eq!(dred2, 25);
assert_eq!(plc2, 5);
// Third update with LOWER values (e.g., client reset) — counters
// hold steady, no decrement.
m.update_session_loss_recovery(sess, 5, 1);
let dred3 = m
.session_dred_reconstructions
.with_label_values(&[sess])
.get();
let plc3 = m.session_classical_plc.with_label_values(&[sess]).get();
assert_eq!(dred3, 25, "counter must not decrease");
assert_eq!(plc3, 5, "counter must not decrease");
// Fourth update: client caught up and exceeded the old max.
m.update_session_loss_recovery(sess, 30, 8);
let dred4 = m
.session_dred_reconstructions
.with_label_values(&[sess])
.get();
let plc4 = m.session_classical_plc.with_label_values(&[sess]).get();
assert_eq!(dred4, 30);
assert_eq!(plc4, 8);
}
#[test]
fn metrics_increment() {
let m = RelayMetrics::new();

View File

@@ -18,6 +18,38 @@ use wzp_proto::MediaTransport;
use crate::metrics::RelayMetrics;
use crate::trunk::TrunkBatcher;
/// Debug tap: logs packet metadata for matching rooms.
#[derive(Clone)]
pub struct DebugTap {
/// Room name filter ("*" = all rooms, or specific room name/hash).
pub room_filter: String,
}
impl DebugTap {
pub fn matches(&self, room_name: &str) -> bool {
self.room_filter == "*" || self.room_filter == room_name
}
pub fn log_packet(&self, room: &str, dir: &str, addr: &std::net::SocketAddr, pkt: &wzp_proto::MediaPacket, fan_out: usize) {
let h = &pkt.header;
info!(
target: "debug_tap",
room = %room,
dir = dir,
addr = %addr,
seq = h.seq,
codec = ?h.codec_id,
ts = h.timestamp,
fec_block = h.fec_block,
fec_sym = h.fec_symbol,
repair = h.is_repair,
len = pkt.payload.len(),
fan_out,
"TAP"
);
}
}
/// Unique participant ID within a room.
pub type ParticipantId = u64;
@@ -27,6 +59,22 @@ fn next_id() -> ParticipantId {
NEXT_PARTICIPANT_ID.fetch_add(1, Ordering::Relaxed)
}
/// Events emitted by RoomManager for federation to observe.
#[derive(Clone, Debug)]
pub enum RoomEvent {
/// First local participant joined this room.
LocalJoin { room: String },
/// Last local participant left this room.
LocalLeave { room: String },
}
/// Outbound federation media from a local participant.
pub struct FederationMediaOut {
pub room_name: String,
pub room_hash: [u8; 8],
pub data: Bytes,
}
/// How to send data to a participant — either via QUIC transport or WebSocket channel.
#[derive(Clone)]
pub enum ParticipantSender {
@@ -132,6 +180,7 @@ impl Room {
.map(|p| wzp_proto::packet::RoomParticipant {
fingerprint: p.fingerprint.clone().unwrap_or_default(),
alias: p.alias.clone(),
relay_label: None, // local participant
})
.collect()
}
@@ -157,24 +206,35 @@ pub struct RoomManager {
/// When `None`, rooms are open (no auth mode). When `Some`, only listed
/// fingerprints can join the corresponding room.
acl: Option<HashMap<String, HashSet<String>>>,
/// Channel for room lifecycle events (federation subscribes).
event_tx: tokio::sync::broadcast::Sender<RoomEvent>,
}
impl RoomManager {
pub fn new() -> Self {
let (event_tx, _) = tokio::sync::broadcast::channel(64);
Self {
rooms: HashMap::new(),
acl: None,
event_tx,
}
}
/// Create a room manager with ACL enforcement enabled.
pub fn with_acl() -> Self {
let (event_tx, _) = tokio::sync::broadcast::channel(64);
Self {
rooms: HashMap::new(),
acl: Some(HashMap::new()),
event_tx,
}
}
/// Subscribe to room lifecycle events (for federation).
pub fn subscribe_events(&self) -> tokio::sync::broadcast::Receiver<RoomEvent> {
self.event_tx.subscribe()
}
/// Grant a fingerprint access to a room.
pub fn allow(&mut self, room_name: &str, fingerprint: &str) {
if let Some(ref mut acl) = self.acl {
@@ -213,8 +273,13 @@ impl RoomManager {
warn!(room = room_name, fingerprint = ?fingerprint, "unauthorized room join attempt");
return Err("not authorized for this room".to_string());
}
let was_empty = !self.rooms.contains_key(room_name)
|| self.rooms.get(room_name).map_or(true, |r| r.is_empty());
let room = self.rooms.entry(room_name.to_string()).or_insert_with(Room::new);
let id = room.add(addr, sender, fingerprint.map(|s| s.to_string()), alias.map(|s| s.to_string()));
if was_empty {
let _ = self.event_tx.send(RoomEvent::LocalJoin { room: room_name.to_string() });
}
let update = wzp_proto::SignalMessage::RoomUpdate {
count: room.len() as u32,
participants: room.participant_list(),
@@ -235,12 +300,34 @@ impl RoomManager {
Ok(id)
}
/// Get list of active room names.
pub fn active_rooms(&self) -> Vec<String> {
self.rooms.keys().cloned().collect()
}
/// Get participant list for a room (fingerprint + alias).
pub fn local_participant_list(&self, room_name: &str) -> Vec<wzp_proto::packet::RoomParticipant> {
self.rooms.get(room_name)
.map(|room| room.participant_list())
.unwrap_or_default()
}
/// Get all senders for participants in a room (for federation inbound media delivery).
pub fn local_senders(&self, room_name: &str) -> Vec<ParticipantSender> {
self.rooms.get(room_name)
.map(|room| room.participants.iter()
.map(|p| p.sender.clone())
.collect())
.unwrap_or_default()
}
/// Leave a room. Returns (room_update_msg, remaining_senders) for broadcasting, or None if room is now empty.
pub fn leave(&mut self, room_name: &str, participant_id: ParticipantId) -> Option<(wzp_proto::SignalMessage, Vec<ParticipantSender>)> {
if let Some(room) = self.rooms.get_mut(room_name) {
room.remove(participant_id);
if room.is_empty() {
self.rooms.remove(room_name);
let _ = self.event_tx.send(RoomEvent::LocalLeave { room: room_name.to_string() });
info!(room = room_name, "room closed (empty)");
return None;
}
@@ -350,6 +437,9 @@ pub async fn run_participant(
metrics: Arc<RelayMetrics>,
session_id: &str,
trunking_enabled: bool,
debug_tap: Option<DebugTap>,
federation_tx: Option<tokio::sync::mpsc::Sender<FederationMediaOut>>,
federation_room_hash: Option<[u8; 8]>,
) {
if trunking_enabled {
run_participant_trunked(
@@ -358,7 +448,7 @@ pub async fn run_participant(
.await;
} else {
run_participant_plain(
room_mgr, room_name, participant_id, transport, metrics, session_id,
room_mgr, room_name, participant_id, transport, metrics, session_id, debug_tap, federation_tx, federation_room_hash,
)
.await;
}
@@ -372,6 +462,9 @@ async fn run_participant_plain(
transport: Arc<wzp_transport::QuinnTransport>,
metrics: Arc<RelayMetrics>,
session_id: &str,
debug_tap: Option<DebugTap>,
federation_tx: Option<tokio::sync::mpsc::Sender<FederationMediaOut>>,
federation_room_hash: Option<[u8; 8]>,
) {
let addr = transport.connection().remote_address();
let mut packets_forwarded = 0u64;
@@ -445,6 +538,13 @@ async fn run_participant_plain(
);
}
// Debug tap: log packet metadata
if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_packet(&room_name, "in", &addr, &pkt, others.len());
}
}
// Forward to all others
let fwd_start = std::time::Instant::now();
let pkt_bytes = pkt.payload.len() as u64;
@@ -469,6 +569,17 @@ async fn run_participant_plain(
}
}
}
// Federation: forward to active peer relays via channel
if let Some(ref fed_tx) = federation_tx {
let data = pkt.to_bytes();
let _ = fed_tx.try_send(FederationMediaOut {
room_name: room_name.clone(),
room_hash: federation_room_hash.unwrap_or_else(|| crate::federation::room_hash(&room_name)),
data,
});
}
let fwd_ms = fwd_start.elapsed().as_millis() as u64;
if fwd_ms > max_forward_ms {
max_forward_ms = fwd_ms;

View File

@@ -0,0 +1,105 @@
//! Persistent signaling connection manager.
//!
//! Tracks clients connected via `_signal` SNI. Routes call signals
//! (DirectCallOffer, DirectCallAnswer, Hangup) between registered users.
use std::collections::HashMap;
use std::sync::Arc;
use std::time::Instant;
use tracing::{info, warn};
use wzp_proto::{MediaTransport, SignalMessage};
use wzp_transport::QuinnTransport;
/// A client connected via `_signal` for direct calling.
pub struct SignalClient {
pub fingerprint: String,
pub alias: Option<String>,
pub transport: Arc<QuinnTransport>,
pub connected_at: Instant,
}
/// Manages persistent signaling connections.
pub struct SignalHub {
clients: HashMap<String, SignalClient>,
}
impl SignalHub {
pub fn new() -> Self {
Self {
clients: HashMap::new(),
}
}
/// Register a new signaling client.
pub fn register(&mut self, fp: String, transport: Arc<QuinnTransport>, alias: Option<String>) {
info!(fingerprint = %fp, alias = ?alias, "signal client registered");
self.clients.insert(fp.clone(), SignalClient {
fingerprint: fp,
alias,
transport,
connected_at: Instant::now(),
});
}
/// Unregister a signaling client. Returns the client if found.
pub fn unregister(&mut self, fp: &str) -> Option<SignalClient> {
let client = self.clients.remove(fp);
if client.is_some() {
info!(fingerprint = %fp, "signal client unregistered");
}
client
}
/// Look up a client by fingerprint.
pub fn get(&self, fp: &str) -> Option<&SignalClient> {
self.clients.get(fp)
}
/// Check if a fingerprint is online.
pub fn is_online(&self, fp: &str) -> bool {
self.clients.contains_key(fp)
}
/// Send a signal message to a client by fingerprint.
pub async fn send_to(&self, fp: &str, msg: &SignalMessage) -> Result<(), String> {
match self.clients.get(fp) {
Some(client) => {
client.transport.send_signal(msg).await
.map_err(|e| format!("send to {fp}: {e}"))
}
None => Err(format!("{fp} not online")),
}
}
/// Number of connected signaling clients.
pub fn online_count(&self) -> usize {
self.clients.len()
}
/// List all online fingerprints.
pub fn online_fingerprints(&self) -> Vec<&str> {
self.clients.keys().map(|s| s.as_str()).collect()
}
/// Get alias for a fingerprint.
pub fn alias(&self, fp: &str) -> Option<&str> {
self.clients.get(fp).and_then(|c| c.alias.as_deref())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn register_unregister() {
let mut hub = SignalHub::new();
assert_eq!(hub.online_count(), 0);
assert!(!hub.is_online("alice"));
// Can't easily construct QuinnTransport in a unit test,
// so we just test the HashMap logic conceptually.
// Integration tests cover the full flow.
}
}

View File

@@ -16,6 +16,9 @@ async-trait = { workspace = true }
serde_json = "1"
rustls = { version = "0.23", default-features = false, features = ["ring", "std"] }
rcgen = "0.13"
ed25519-dalek = { workspace = true }
hkdf = { workspace = true }
sha2 = { workspace = true }
[dev-dependencies]
tokio = { workspace = true, features = ["rt-multi-thread", "macros"] }

View File

@@ -6,20 +6,74 @@ use std::time::Duration;
use quinn::crypto::rustls::QuicClientConfig;
use quinn::crypto::rustls::QuicServerConfig;
/// Create a server configuration with a self-signed certificate (for testing).
/// Create a server configuration with a self-signed certificate (random keypair).
///
/// Tunes QUIC transport parameters for lossy VoIP:
/// - 30s idle timeout
/// - 5s keep-alive interval
/// - DATAGRAM extension enabled
/// - Conservative flow control for bandwidth-constrained links
/// The certificate changes on every call. Use `server_config_from_seed` for
/// a deterministic certificate that survives relay restarts.
pub fn server_config() -> (quinn::ServerConfig, Vec<u8>) {
let cert_key = rcgen::generate_simple_self_signed(vec!["localhost".to_string()])
.expect("failed to generate self-signed cert");
let cert_der = rustls::pki_types::CertificateDer::from(cert_key.cert);
let key_der =
rustls::pki_types::PrivateKeyDer::try_from(cert_key.key_pair.serialize_der()).unwrap();
build_server_config(cert_der, key_der)
}
/// Create a server configuration with a deterministic self-signed certificate
/// derived from a 32-byte seed. Same seed = same cert = same TLS fingerprint.
pub fn server_config_from_seed(seed: &[u8; 32]) -> (quinn::ServerConfig, Vec<u8>) {
use ed25519_dalek::pkcs8::EncodePrivateKey;
use ed25519_dalek::SigningKey;
use hkdf::Hkdf;
use sha2::Sha256;
// Derive Ed25519 key bytes from seed via HKDF
let hk = Hkdf::<Sha256>::new(None, seed);
let mut ed_bytes = [0u8; 32];
hk.expand(b"wzp-tls-ed25519", &mut ed_bytes)
.expect("HKDF expand failed");
// Create Ed25519 signing key and export as PKCS8 DER
let signing_key = SigningKey::from_bytes(&ed_bytes);
let pkcs8_doc = signing_key.to_pkcs8_der()
.expect("failed to encode Ed25519 key as PKCS8");
let key_der_for_rcgen = rustls::pki_types::PrivateKeyDer::try_from(pkcs8_doc.as_bytes().to_vec())
.expect("failed to wrap PKCS8 DER");
// Create rcgen KeyPair from DER
let key_pair = rcgen::KeyPair::from_der_and_sign_algo(
&key_der_for_rcgen,
&rcgen::PKCS_ED25519,
)
.expect("failed to create KeyPair from seed-derived Ed25519 key");
// Build self-signed cert with this deterministic keypair
let params = rcgen::CertificateParams::new(vec!["localhost".to_string()])
.expect("failed to create CertificateParams");
let cert = params.self_signed(&key_pair).expect("failed to self-sign cert");
let cert_der = rustls::pki_types::CertificateDer::from(cert.der().to_vec());
let key_der = rustls::pki_types::PrivateKeyDer::try_from(key_pair.serialize_der())
.expect("failed to serialize key DER");
build_server_config(cert_der, key_der)
}
/// Compute a hex-formatted SHA-256 fingerprint of a DER-encoded certificate.
///
/// Format: `xx:xx:xx:xx:...` (32 bytes = 64 hex chars with colons).
pub fn tls_fingerprint(cert_der: &[u8]) -> String {
use sha2::{Sha256, Digest};
let hash = Sha256::digest(cert_der);
hash.iter()
.map(|b| format!("{b:02x}"))
.collect::<Vec<_>>()
.join(":")
}
fn build_server_config(
cert_der: rustls::pki_types::CertificateDer<'static>,
key_der: rustls::pki_types::PrivateKeyDer<'static>,
) -> (quinn::ServerConfig, Vec<u8>) {
let mut server_crypto = rustls::ServerConfig::builder()
.with_no_client_auth()
.with_single_cert(vec![cert_der.clone()], key_der)

View File

@@ -22,7 +22,7 @@ pub mod path_monitor;
pub mod quic;
pub mod reliable;
pub use config::{client_config, server_config};
pub use config::{client_config, server_config, server_config_from_seed, tls_fingerprint};
pub use connection::{accept, connect, create_endpoint};
pub use path_monitor::PathMonitor;
pub use quic::QuinnTransport;

View File

@@ -33,6 +33,13 @@ impl QuinnTransport {
&self.connection
}
/// Send raw bytes as a QUIC datagram (no MediaPacket framing).
pub fn send_raw_datagram(&self, data: &[u8]) -> Result<(), TransportError> {
self.connection
.send_datagram(bytes::Bytes::copy_from_slice(data))
.map_err(|e| TransportError::Internal(format!("datagram: {e}")))
}
/// Close the QUIC connection immediately (synchronous, no async needed).
/// The relay will detect the close and remove this participant from the room.
pub fn close_now(&self) {
@@ -136,7 +143,7 @@ impl MediaTransport for QuinnTransport {
}
};
match datagram::deserialize_media(data) {
match datagram::deserialize_media(data.clone()) {
Some(packet) => {
// Record receive observation
{
@@ -149,8 +156,10 @@ impl MediaTransport for QuinnTransport {
Ok(Some(packet))
}
None => {
tracing::warn!("received malformed media datagram");
Ok(None)
tracing::warn!(len = data.len(), "skipping malformed media datagram, continuing");
// Don't return Ok(None) — that signals connection closed.
// Recurse to read the next datagram instead.
Box::pin(self.recv_media()).await
}
}
}

747
docs/ADMINISTRATION.md Normal file
View File

@@ -0,0 +1,747 @@
# WarzonePhone Relay Administration Guide
This document covers deploying, configuring, and operating wzp-relay instances, including federation setup, monitoring, and troubleshooting.
## Relay Deployment
### Binary
Build and run the relay directly:
```bash
# Build release binary
cargo build --release --bin wzp-relay
# Run with defaults (listen on 0.0.0.0:4433, room mode, no auth)
./target/release/wzp-relay
# Run with config file
./target/release/wzp-relay --config /etc/wzp/relay.toml
```
### Remote Build (Linux)
The included build script provisions a temporary Hetzner Cloud VPS, builds all binaries, and downloads them:
```bash
# Requires: hcloud CLI authenticated, SSH key "wz" registered
./scripts/build-linux.sh
# Outputs to: target/linux-x86_64/
```
Produces: `wzp-relay`, `wzp-client`, `wzp-client-audio`, `wzp-web`, `wzp-bench`.
### Docker
```dockerfile
FROM rust:1.85 AS builder
WORKDIR /src
COPY . .
RUN cargo build --release --bin wzp-relay
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*
COPY --from=builder /src/target/release/wzp-relay /usr/local/bin/
EXPOSE 4433/udp
EXPOSE 9090/tcp
VOLUME /data
ENV HOME=/data
ENTRYPOINT ["wzp-relay"]
CMD ["--config", "/data/relay.toml", "--metrics-port", "9090"]
```
Build and run:
```bash
docker build -t wzp-relay .
docker run -d \
--name wzp-relay \
-p 4433:4433/udp \
-p 9090:9090/tcp \
-v /opt/wzp:/data \
wzp-relay
```
### systemd
Create `/etc/systemd/system/wzp-relay.service`:
```ini
[Unit]
Description=WarzonePhone Relay
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=wzp
Group=wzp
ExecStart=/usr/local/bin/wzp-relay --config /etc/wzp/relay.toml
Restart=always
RestartSec=5
LimitNOFILE=65536
# Security hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=/var/lib/wzp
PrivateTmp=yes
Environment=HOME=/var/lib/wzp
Environment=RUST_LOG=info
[Install]
WantedBy=multi-user.target
```
Setup:
```bash
# Create service user
useradd --system --home-dir /var/lib/wzp --create-home wzp
# Install binary and config
cp target/release/wzp-relay /usr/local/bin/
mkdir -p /etc/wzp
cp relay.toml /etc/wzp/
# Enable and start
systemctl daemon-reload
systemctl enable --now wzp-relay
journalctl -u wzp-relay -f
```
## TOML Configuration Reference
All fields have defaults. A minimal config file only needs the fields you want to override.
### Core Settings
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `listen_addr` | string (socket addr) | `"0.0.0.0:4433"` | UDP address to listen on for incoming QUIC connections |
| `remote_relay` | string (socket addr) | none | Remote relay address for forward mode. Disables room mode when set |
| `max_sessions` | integer | `100` | Maximum concurrent client sessions |
| `log_level` | string | `"info"` | Logging level: trace, debug, info, warn, error |
### Jitter Buffer
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `jitter_target_depth` | integer | `50` | Target buffer depth in packets (50 = 1 second at 20ms frames) |
| `jitter_max_depth` | integer | `250` | Maximum buffer depth in packets (250 = 5 seconds) |
### Authentication
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `auth_url` | string | none | featherChat auth validation URL. When set, clients must send a bearer token as their first signal message. The relay validates it via `POST <auth_url>` |
### Metrics and Monitoring
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `metrics_port` | integer | none | Port for the Prometheus HTTP metrics endpoint. Disabled if not set |
| `probe_targets` | array of socket addrs | `[]` | Peer relay addresses to probe for health monitoring (1 Ping/s each) |
| `probe_mesh` | boolean | `false` | Enable mesh mode for probe targets |
### Media Processing
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `trunking_enabled` | boolean | `false` | Enable trunk batching for outgoing media. Packs multiple session packets into one QUIC datagram, reducing overhead |
### WebSocket / Browser Support
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `ws_port` | integer | none | Port for WebSocket listener (browser clients). Disabled if not set |
| `static_dir` | string | none | Directory to serve static files (HTML/JS/WASM) |
### Federation
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `peers` | array of PeerConfig | `[]` | Outbound federation peer relays |
| `trusted` | array of TrustedConfig | `[]` | Inbound federation trust list |
| `global_rooms` | array of GlobalRoomConfig | `[]` | Room names to bridge across federation |
### Debugging
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `debug_tap` | string | none | Log packet headers for matching rooms. Use `"*"` for all rooms, or a specific room name |
### PeerConfig Fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | yes | Address of the peer relay (e.g., `"193.180.213.68:4433"`) |
| `fingerprint` | string | yes | Expected TLS certificate fingerprint (hex with colons) |
| `label` | string | no | Human-readable label for logging |
### TrustedConfig Fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `fingerprint` | string | yes | Expected TLS certificate fingerprint (hex with colons) |
| `label` | string | no | Human-readable label for logging |
### GlobalRoomConfig Fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | string | yes | Room name to bridge across federation (e.g., `"android"`) |
## CLI Flags Reference
```
wzp-relay [--config <path>] [--listen <addr>] [--remote <addr>]
[--auth-url <url>] [--metrics-port <port>]
[--probe <addr>]... [--probe-mesh] [--mesh-status]
[--trunking] [--global-room <name>]...
[--debug-tap <room>]
[--ws-port <port>] [--static-dir <dir>]
```
| Flag | Description |
|------|-------------|
| `--config <path>` | Load configuration from TOML file. CLI flags override config file values |
| `--listen <addr>` | Listen address (default: `0.0.0.0:4433`) |
| `--remote <addr>` | Remote relay for forwarding mode. Disables room mode |
| `--auth-url <url>` | featherChat auth endpoint (e.g., `https://chat.example.com/v1/auth/validate`) |
| `--metrics-port <port>` | Prometheus metrics HTTP port (e.g., `9090`) |
| `--probe <addr>` | Peer relay to probe for health monitoring. Repeatable |
| `--probe-mesh` | Enable mesh mode for probes |
| `--mesh-status` | Print mesh health table and exit (diagnostic) |
| `--trunking` | Enable trunk batching for outgoing media |
| `--global-room <name>` | Declare a room as global (bridged across federation). Repeatable |
| `--debug-tap <room>` | Log packet headers for a room (`"*"` for all rooms) |
| `--event-log <path>` | Write JSONL protocol event log for federation debugging |
| `--version`, `-V` | Print build git hash and exit |
| `--ws-port <port>` | WebSocket listener port for browser clients |
| `--static-dir <dir>` | Directory to serve static files from |
| `--help`, `-h` | Print help and exit |
CLI flags always override config file values when both are specified.
## Federation Setup
### Concepts
- **`[[peers]]`** -- outbound: relays we connect TO. Requires address + fingerprint
- **`[[trusted]]`** -- inbound: relays we accept connections FROM. Requires fingerprint only (they connect to us)
- **`[[global_rooms]]`** -- rooms bridged across all federated peers. Participants on different relays in the same global room hear each other
### Getting Your Relay's Fingerprint
When a relay starts, it logs its TLS fingerprint:
```
INFO TLS certificate (deterministic from relay identity) tls_fingerprint="a5d6:e3c6:5ae7:185c:4eb1:af89:daed:4a43"
INFO federation: to peer with this relay, add to relay.toml:
INFO [[peers]]
INFO url = "193.180.213.68:4433"
INFO fingerprint = "a5d6:e3c6:5ae7:185c:4eb1:af89:daed:4a43"
```
Share this information with the administrator of the peer relay.
### Unknown Peer Connections
When an unknown relay tries to federate, the log shows:
```
WARN unknown relay wants to federate addr=10.0.0.5:12345 fp="7f2a:b391:0c44:..."
INFO to accept, add to relay.toml:
INFO [[trusted]]
INFO fingerprint = "7f2a:b391:0c44:..."
INFO label = "Relay at 10.0.0.5:12345"
```
## Example Configurations
### Single Relay (Minimal)
```toml
# /etc/wzp/relay.toml
# Minimal config -- all defaults, just enable metrics
metrics_port = 9090
```
Run:
```bash
wzp-relay --config /etc/wzp/relay.toml
```
### Single Relay (Full Featured)
```toml
# /etc/wzp/relay.toml
listen_addr = "0.0.0.0:4433"
max_sessions = 200
log_level = "info"
# Metrics
metrics_port = 9090
# Authentication
auth_url = "https://chat.example.com/v1/auth/validate"
# Browser support
ws_port = 8080
static_dir = "/opt/wzp/web"
# Performance
trunking_enabled = true
# Jitter buffer tuning
jitter_target_depth = 50
jitter_max_depth = 250
```
### Two-Relay Federation
**Relay A** (`relay-a.toml` on 193.180.213.68):
```toml
listen_addr = "0.0.0.0:4433"
metrics_port = 9090
# Outbound: connect to Relay B
[[peers]]
url = "10.0.0.5:4433"
fingerprint = "7f2a:b391:0c44:9e1d:a8b2:c5d7:e3f0:1234"
label = "Relay B (US)"
# Accept inbound from Relay B
[[trusted]]
fingerprint = "7f2a:b391:0c44:9e1d:a8b2:c5d7:e3f0:1234"
label = "Relay B (US)"
# Bridge these rooms
[[global_rooms]]
name = "android"
[[global_rooms]]
name = "general"
```
**Relay B** (`relay-b.toml` on 10.0.0.5):
```toml
listen_addr = "0.0.0.0:4433"
metrics_port = 9090
# Outbound: connect to Relay A
[[peers]]
url = "193.180.213.68:4433"
fingerprint = "a5d6:e3c6:5ae7:185c:4eb1:af89:daed:4a43"
label = "Relay A (EU)"
# Accept inbound from Relay A
[[trusted]]
fingerprint = "a5d6:e3c6:5ae7:185c:4eb1:af89:daed:4a43"
label = "Relay A (EU)"
# Same global rooms
[[global_rooms]]
name = "android"
[[global_rooms]]
name = "general"
```
### Three-Relay Chain (Full Mesh)
For three relays (A, B, C) in full mesh federation, each relay needs peers and trusted entries for the other two:
**Relay A** (EU):
```toml
listen_addr = "0.0.0.0:4433"
metrics_port = 9090
# Probe all peers
probe_targets = ["10.0.0.5:4433", "10.0.0.9:4433"]
probe_mesh = true
# Peers
[[peers]]
url = "10.0.0.5:4433"
fingerprint = "7f2a:b391:0c44:9e1d:a8b2:c5d7:e3f0:1234"
label = "Relay B (US)"
[[peers]]
url = "10.0.0.9:4433"
fingerprint = "3c8e:d2a1:f7b5:6049:81c3:e9d4:a2f6:5678"
label = "Relay C (APAC)"
# Trust
[[trusted]]
fingerprint = "7f2a:b391:0c44:9e1d:a8b2:c5d7:e3f0:1234"
label = "Relay B (US)"
[[trusted]]
fingerprint = "3c8e:d2a1:f7b5:6049:81c3:e9d4:a2f6:5678"
label = "Relay C (APAC)"
# Global rooms
[[global_rooms]]
name = "android"
[[global_rooms]]
name = "general"
```
**Relay B** and **Relay C** follow the same pattern, listing the other two relays in their `[[peers]]` and `[[trusted]]` sections.
## Monitoring
### Prometheus Metrics
Enable with `--metrics-port <port>` or `metrics_port` in TOML. The relay exposes metrics at `GET /metrics` on the specified HTTP port.
#### Relay Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `wzp_relay_active_sessions` | Gauge | -- | Current active sessions |
| `wzp_relay_active_rooms` | Gauge | -- | Current active rooms |
| `wzp_relay_packets_forwarded_total` | Counter | `room` | Total packets forwarded |
| `wzp_relay_bytes_forwarded_total` | Counter | `room` | Total bytes forwarded |
| `wzp_relay_auth_attempts_total` | Counter | `result` (ok/fail) | Auth validation attempts |
| `wzp_relay_handshake_duration_seconds` | Histogram | -- | Crypto handshake time |
#### Per-Session Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `wzp_relay_session_jitter_buffer_depth` | Gauge | `session_id` | Buffer depth per session |
| `wzp_relay_session_loss_pct` | Gauge | `session_id` | Packet loss percentage |
| `wzp_relay_session_rtt_ms` | Gauge | `session_id` | Round-trip time |
| `wzp_relay_session_underruns_total` | Counter | `session_id` | Jitter buffer underruns |
| `wzp_relay_session_overruns_total` | Counter | `session_id` | Jitter buffer overruns |
#### Inter-Relay Probe Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `wzp_probe_rtt_ms` | Gauge | `target` | RTT to peer relay |
| `wzp_probe_loss_pct` | Gauge | `target` | Loss to peer relay |
| `wzp_probe_jitter_ms` | Gauge | `target` | Jitter to peer relay |
| `wzp_probe_up` | Gauge | `target` | 1 if reachable, 0 if not |
### Prometheus Scrape Config
```yaml
# prometheus.yml
scrape_configs:
- job_name: 'wzp-relay'
static_configs:
- targets:
- 'relay-a:9090'
- 'relay-b:9090'
scrape_interval: 10s
```
### Grafana Dashboard
A pre-built dashboard is available at `docs/grafana-dashboard.json`. Import it into Grafana for:
1. **Relay Health** -- active sessions, rooms, packets/s, bytes/s
2. **Call Quality** -- per-session jitter depth, loss%, RTT, underruns over time
3. **Inter-Relay Mesh** -- latency heatmap, probe status, loss trends
4. **Web Bridge** -- active connections, frames bridged, auth failures
### Event Log (Protocol Analyzer)
Use `--event-log` to write a JSONL event log that traces every federation media packet through the relay pipeline. Essential for debugging federation audio issues.
```bash
wzp-relay --config relay.toml --event-log /tmp/events.jsonl
```
Each media packet emits events at every decision point:
- `federation_ingress` — packet arrived from a peer relay
- `local_deliver` — packet delivered to local participants
- `dedup_drop` — packet dropped as duplicate
- `rate_limit_drop` — packet dropped by rate limiter
- `room_not_found` — packet for unknown room
- `local_deliver_error` — delivery to local client failed
Analyze with:
```bash
# Count events by type
cat events.jsonl | python3 -c "
import json, collections, sys
c = collections.Counter()
for l in sys.stdin: c[json.loads(l)['event']] += 1
for k,v in sorted(c.items(), key=lambda x:-x[1]): print(f' {k}: {v}')
"
```
### Remote Version Check
Verify a deployed relay's version without SSH:
```bash
wzp-client --version-check <relay-addr:port>
```
### Debug Tap
Use `--debug-tap` to log packet headers for debugging:
```bash
# Log headers for room "android"
wzp-relay --debug-tap android
# Log headers for all rooms
wzp-relay --debug-tap '*'
```
Or in TOML:
```toml
debug_tap = "android"
```
### Mesh Status
Print the current mesh health table (diagnostic):
```bash
wzp-relay --mesh-status
```
## Authentication
### featherChat Token Validation
When `--auth-url` is set, the relay requires clients to send an `AuthToken` signal message as their first message after QUIC connection. The relay validates the token by calling:
```
POST <auth_url>
Content-Type: application/json
Authorization: Bearer <token>
```
Expected response:
```json
{
"valid": true,
"fingerprint": "a5d6:e3c6:...",
"alias": "username"
}
```
If validation fails, the client is disconnected.
### Without Authentication
When `--auth-url` is not set, any client can connect. The relay logs:
```
INFO auth disabled -- any client can connect (use --auth-url to enable)
```
## Identity Persistence
### Relay Identity File
The relay stores its identity seed at `~/.wzp/relay-identity` (a 64-character hex string). This seed:
- Is generated automatically on first run
- Persists across restarts
- Derives the relay's Ed25519 signing key and X25519 key agreement key
- Derives the TLS certificate deterministically (same seed = same cert = same fingerprint)
If the identity file is corrupted, the relay generates a new one and logs a warning. This will change the relay's TLS fingerprint, requiring federation peers to update their config.
### Backup
Back up the identity file to preserve the relay's fingerprint:
```bash
cp ~/.wzp/relay-identity /secure/backup/relay-identity
```
To restore, copy the file back before starting the relay.
## Troubleshooting
### Common Issues
| Problem | Cause | Solution |
|---------|-------|---------|
| "unknown argument" on startup | Unrecognized CLI flag | Check `wzp-relay --help` for valid flags |
| "failed to load config" | Invalid TOML syntax | Validate TOML file with `toml-cli` or similar |
| "auth failed" for all clients | Wrong `auth_url` or featherChat server down | Verify URL is reachable: `curl -X POST <auth_url>` |
| "session rejected" | Max sessions reached | Increase `max_sessions` in config |
| Clients cannot connect | Firewall blocking UDP 4433 | Open UDP port 4433 in firewall |
| Federation "unknown relay wants to federate" | Peer's fingerprint not in `[[trusted]]` | Add the logged fingerprint to `[[trusted]]` |
| Federation "fingerprint mismatch" | Peer relay restarted with new identity | Update the fingerprint in `[[peers]]` config |
| Federation audio silent on consecutive connects | Dedup filter or jitter buffer state | Verify relay is running latest build with time-based dedup |
| Federation participant shows wrong relay label | Hub relay not propagating original labels | Update relay to latest build (label preservation fix) |
| Federation disconnect takes >15 seconds | QUIC idle timeout + stale sweeper | Normal: sweeper runs every 5s with 15s TTL. Use latest client with SIGTERM handler for instant disconnect |
| High packet loss between relays | Network congestion or misconfiguration | Check `wzp_probe_loss_pct` metric; consider relay chaining |
| Jitter buffer overruns | Packets arriving faster than playout | Increase `jitter_max_depth` |
| Jitter buffer underruns | Packets arriving too slowly or lost | Check network quality; increase `jitter_target_depth` |
| "probe connection closed" | Peer relay unreachable or crashed | Check peer relay status; will auto-reconnect |
| WebSocket clients cannot connect | `ws_port` not set | Add `--ws-port <port>` or `ws_port` in TOML |
| Browser mic access denied | Not using HTTPS | Use TLS termination in front of the relay or serve via `wzp-web --tls` |
### Log Level Tuning
Set `RUST_LOG` environment variable for fine-grained control:
```bash
# All relay logs at debug level
RUST_LOG=debug wzp-relay
# Only federation at trace, everything else at info
RUST_LOG=info,wzp_relay::federation=trace wzp-relay
# Quiet mode -- only warnings and errors
RUST_LOG=warn wzp-relay
```
### Health Checks
```bash
# Check if relay is listening
nc -zu relay-host 4433
# Check metrics endpoint
curl -s http://relay-host:9090/metrics | head -20
# Check active sessions
curl -s http://relay-host:9090/metrics | grep wzp_relay_active_sessions
# Check federation probe health
curl -s http://relay-host:9090/metrics | grep wzp_probe_up
```
## Build Pipelines
All production artifacts (Android APK, Linux x86_64 binaries, Windows `.exe`) are built on **SepehrHomeserverdk** using Docker, not on developer workstations. The pipelines are fire-and-forget: a local script invokes a `tmux` session on the remote, the build runs in a Docker container, and the artifact is uploaded to `paste.dk.manko.yoga` (rustypaste) with a notification sent to `ntfy.sh/wzp` on start and completion.
### Docker images
Two long-lived images live on the remote:
| Image | Used by | Base | Key contents |
|---|---|---|---|
| `wzp-android-builder` | Android APK (Tauri mobile + legacy Kotlin), Linux x86_64 relay/CLI | Debian bookworm | Rust stable with Android targets, cargo-ndk, NDK 26.1, Android SDK (API 34 + 35 + 36), JDK 17, Gradle 8.5, Node.js 20, cmake, ninja, tauri-cli 2.x |
| `wzp-windows-builder` | Windows x86_64 `.exe` | Debian bookworm | Rust stable with `x86_64-pc-windows-msvc` target, cargo-xwin (with pre-warmed MSVC CRT + Windows SDK cache), Node.js 20, cmake, ninja, clang, lld, nasm |
Both images are rebuilt rarely — once the base toolchain is stable, rebuilds are only needed to pick up new dependencies or security patches.
**Rebuilding an image** (fire-and-forget, ~10 min on a warm base):
```bash
# Windows
./scripts/build-windows-docker.sh --image-build
# Android (upload and rebuild handled by the Android build script itself — see
# its --image-build flag or equivalent)
```
The `--image-build` flag uploads the local Dockerfile to the remote, kicks off `docker build` under `nohup`, and returns immediately. Monitor with:
```bash
ssh SepehrHomeserverdk 'tail -f /tmp/wzp-windows-image-build.log'
```
### Pipeline: Android APK (Tauri Mobile)
```bash
./scripts/build-tauri-android.sh # Full: pull + build + upload + notify
./scripts/build-tauri-android.sh --no-pull # Skip git fetch
./scripts/build-tauri-android.sh --clean # Force-clean Rust target
```
- **Branch**: `android-rewrite`
- **Image**: `wzp-android-builder`
- **Build command**: `cargo tauri android build --release`
- **Output**: `wzp-release.apk` → uploaded to rustypaste
- **Notifications**: start + completion to `ntfy.sh/wzp`
- **Remote artifact path**: `/mnt/storage/manBuilder/data/cache-android/target/…/release/app-release.apk`
### Pipeline: Linux x86_64 (relay + CLI + bench + web)
```bash
./scripts/build-linux-docker.sh # Fire-and-forget
./scripts/build-linux-docker.sh --no-pull # Skip git fetch
./scripts/build-linux-docker.sh --clean # Force-clean target
./scripts/build-linux-docker.sh --install # Wait for completion and download locally
```
- **Branch**: `feat/android-voip-client` (script default — override by editing the script or passing an env var)
- **Image**: `wzp-android-builder` (shared, not a separate Linux-only image)
- **Targets built**: `wzp-relay`, `wzp-client`, `wzp-client-audio` (with `--features audio`), `wzp-web`, `wzp-bench`
- **Output**: `wzp-linux-x86_64.tar.gz` with all five binaries → uploaded to rustypaste
- **Local landing dir** (with `--install`): `target/linux-x86_64/`
### Pipeline: Windows x86_64 (`wzp-desktop.exe`)
```bash
./scripts/build-windows-docker.sh # Full: pull + build + download locally
./scripts/build-windows-docker.sh --no-pull # Skip git fetch
./scripts/build-windows-docker.sh --rust # Force-clean target-windows cache
./scripts/build-windows-docker.sh --image-build # Rebuild the Docker image (fire-and-forget)
```
- **Branch**: `feat/desktop-audio-rewrite`
- **Image**: `wzp-windows-builder`
- **Build command**: `cargo xwin build --release --target x86_64-pc-windows-msvc --bin wzp-desktop`
- **Output**: `wzp-desktop.exe` (~16 MB) → downloaded to `target/windows-exe/wzp-desktop.exe`, also uploaded to rustypaste
- **Target cache volume**: `target-windows` (separate from the Android target cache to avoid triple cross-contamination)
- **Shared cache volumes**: `cargo-registry`, `cargo-git` (shared with Android — both pipelines pull the same crates)
**A/B-preserving workflow** for testing audio backends: rename the prior `.exe` before re-running the build, so both coexist:
```bash
# Preserve prior build as the noAEC baseline
mv target/windows-exe/wzp-desktop.exe target/windows-exe/wzp-desktop-noAEC.exe
./scripts/build-windows-docker.sh
ls -la target/windows-exe/
# wzp-desktop-noAEC.exe (previous build)
# wzp-desktop.exe (new build)
```
### Alternative pipeline: Windows via Hetzner Cloud VPS
For situations where Docker image rebuilds would be disruptive, or for one-shot debug builds on a clean machine:
```bash
./scripts/build-windows-cloud.sh # Full: create VM → build → download → destroy
./scripts/build-windows-cloud.sh --prepare # Create VM + install deps, don't build
./scripts/build-windows-cloud.sh --build # Build on existing VM
./scripts/build-windows-cloud.sh --transfer # Download .exe from existing VM
./scripts/build-windows-cloud.sh --destroy # Delete the VM
WZP_KEEP_VM=1 ./scripts/build-windows-cloud.sh # Don't auto-destroy after successful build
```
- **Provider**: Hetzner Cloud
- **Default server type**: `cx33` (8 GB RAM, 8 vCPU — `cx23` with 4 GB OOMs on the tauri+rustls cross-compile)
- **Image**: `ubuntu-24.04`
- **SSH key**: must be named `wz` in Hetzner and loaded in the local ssh-agent
- **Reminder**: set `WZP_KEEP_VM=1` for multi-build sessions, then **remember to `--destroy` at end of day** so the VM isn't left running overnight. This is tracked in the auto-memory as `feedback_keep_windows_builder_vm.md`.
### Notifications
All pipelines post to `https://ntfy.sh/wzp`. Subscribe from your phone via the [ntfy.sh app](https://ntfy.sh/) to get push notifications on build start/success/failure. Messages include the short git hash and the rustypaste URL on success:
```
WZP Windows build OK [03a80a3] (16M)
https://paste.dk.manko.yoga/<uuid>/wzp-desktop.exe
```
### Rustypaste credentials
Build pipelines read `rusty_address` and `rusty_auth_token` from the `.env` file at `/mnt/storage/manBuilder/.env` on SepehrHomeserverdk. Local scripts that upload directly (`build-windows-cloud.sh` when run in `--transfer` mode) read from `~/.wzp/rustypaste.env` with the same variable names. Both files must be kept in sync manually if rotated.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,139 @@
# Branch: `android-rewrite`
Pivot away from the legacy Kotlin + JNI Android client to a pure-Rust **Tauri 2.x Mobile** app that shares the same frontend and backend code as the desktop client.
## Why this branch exists
The Kotlin + JNI stack was a crash factory. Every failure mode we hit was at the Kotlin ↔ Rust boundary, and each fix uncovered the next layer of the onion:
| Symptom | Root cause | Fix |
|---|---|---|
| App crashed on launch before `onCreate` returned | `__init_tcb` / `pthread_create` bionic private symbols leaking out of `libwzp_android.so` because the Rust crate used `crate-type = ["cdylib", "staticlib"]`. rust-lang/rust#104707 documents that staticlib alongside cdylib leaks non-exported symbols from the staticlib into the cdylib, and Bionic's private internal pthread symbols got bound LOCALLY inside our `.so` instead of resolved against `libc.so` at `dlopen` time | Dropped `staticlib` from the crate-type list. `crate-type = ["cdylib", "rlib"]` only. |
| Stack overflow on `place_call` | `Dispatchers.IO` threads have a ~512 KB stack, too small for the Rust signal-connect path that does TLS handshake + quinn setup inside one closure | Launched JNI calls from a dedicated `java.lang.Thread` with an explicit 8 MB stack |
| `ring` / `libcrypto` TLS reuse crash on second call | tokio runtime got dropped between calls, but `ring` keeps a TLS-stored SSL context that is invalidated when the runtime thread is reused by a new runtime — `ring` sees stale context and segfaults | Single long-lived tokio runtime for the entire signal client lifetime; split `start()` into an inline `connect+register` path and a `run()` path on a separate thread to avoid the `thread::spawn` closure's stack overflow |
| Null dereference on register with fresh install | Identity seed file empty when it existed-but-was-blank, Rust side deref'd the zero-length slice | Generate seed if empty on register |
Every fix kept the app limping along but the fundamental design problem remained: **state management was split across a Kotlin ViewModel and a Rust engine, with a hand-rolled JNI bridge in between that had to be perfect to not crash**. The working desktop Tauri client (with the same Rust backend) had none of these problems because it spoke to the Rust code via in-process `invoke()` from a WebView, not JNI.
So: rewrite the Android app as a **Tauri 2.x Mobile app**, reusing the entire desktop codebase verbatim (`main.ts`, `style.css`, `index.html`, `main.rs`, `engine.rs` — everything). Tauri Mobile added Android support in v2, it's production-ready, and it eliminates the JNI boundary entirely.
The incident postmortem lives at [`docs/incident-tauri-android-init-tcb.md`](incident-tauri-android-init-tcb.md).
## Architecture
```
┌─────────────────────────────────────────────────┐
│ Tauri 2.x Mobile │
│ │
│ Android WebView ────────── HTML/JS/CSS │ ← Shared with desktop
│ │ (main.ts) │
│ │ │
│ invoke() ─────────────── Rust Commands │ ← Shared with desktop
│ (main.rs) │
│ │ │
│ ┌───────────────┼────────────┐ │
│ │ │ │ │
│ SignalMgr CallEngine Identity │ ← Shared crates
│ (signal_hub) (wzp-client) (wzp-crypto)│
│ │ │ │
│ │ │ │
│ ▼ ▼ │
│ QUIC to relay Oboe audio (Android) │
│ via wzp-native cdylib │
└─────────────────────────────────────────────────┘
```
**What is reused from desktop verbatim** (zero rewrite):
- `desktop/src/main.ts` — entire frontend
- `desktop/src/style.css` — all styling
- `desktop/src/identicon.ts` — identicon rendering
- `desktop/index.html` — HTML structure
- `desktop/src-tauri/src/main.rs` — all Tauri commands (`connect`, `disconnect`, `register_signal`, `place_call`, …)
- `desktop/src-tauri/src/engine.rs``CallEngine` wrapper
**What is Android-specific**:
- `desktop/src-tauri/src/android_audio.rs` — JVM-side audio routing (`AudioManager.setSpeakerphoneOn` for earpiece/speaker toggle). Runs from Tauri's existing JNI context — no hand-rolled bridge, Tauri owns the JVM hookup.
- `desktop/src-tauri/src/wzp_native.rs` — runtime `dlopen` of `libwzp_native.so`, a standalone cdylib crate (`crates/wzp-native`) that owns all C++ (Oboe bridge). Kept in its own crate so its C/C++ static archives never get statically linked into `wzp-desktop`'s `.so`, which would re-trigger the `__init_tcb` / pthread leak.
- `crates/wzp-native/` — the standalone C++/Oboe bridge cdylib. Loaded via `libloading` at runtime from `wzp_native.rs`. Provides capture + playout streams using Oboe's `Usage::VoiceCommunication` + `MODE_IN_COMMUNICATION` combo.
- Android-specific target dependencies in `desktop/src-tauri/Cargo.toml` (`jni`, `ndk-context`, `libloading`) — no CPAL, no VPIO.
## Key architectural decisions
### 1. `wzp-native` as a standalone cdylib loaded via `libloading`
The alternative — linking `wzp-native` as a regular Rust dep with C++ static archives — would cause the same `__init_tcb` crash that killed the Kotlin version. By making `wzp-native` its own cdylib and `dlopen`-ing it at runtime, Bionic's `libc.so` resolves every symbol at load time the way it's supposed to, and no private TCB symbols leak.
### 2. `crate-type = ["cdylib", "rlib"]` only (no `staticlib`)
Same reason. The `rlib` output is needed so the `wzp-desktop` binary target can link against the library; `cdylib` is needed for Android's `System.loadLibrary`; `staticlib` would reintroduce the symbol-leak bug.
### 3. Oboe audio config
`Usage::VoiceCommunication` + Java-side `MODE_IN_COMMUNICATION`. **Never** call `setAudioApi(AAudio)` explicitly — on some devices (Nothing Phone in particular) it causes Oboe to open the wrong stream type and audio goes silent. Let Oboe pick the audio API automatically. This is documented in the auto-memory `project_tauri_android_audio.md`.
### 4. Speaker/earpiece toggle uses `tokio::task::spawn_blocking`
Oboe's `stop()` + `start()` cycle is synchronous and can block for 50200 ms. Calling it on the tokio executor stalls every other async task (including the QUIC datagram loop), dropping audio packets. Wrapping the toggle in `spawn_blocking` isolates it to a dedicated thread pool. Fixed in commit `76a4c53`.
## Build pipeline
Docker on SepehrHomeserverdk, same pattern as the Android legacy pipeline and the Windows pipeline:
```
./scripts/build-tauri-android.sh # Full: pull + build + ntfy + rustypaste
./scripts/build-tauri-android.sh --pull # Explicit git pull (default)
./scripts/build-tauri-android.sh --clean # Blow away the Rust target cache
```
**Image**: `wzp-android-builder` (shared with the legacy Kotlin pipeline). The Dockerfile was extended to install Node.js 20 LTS, Android API level 36, build-tools 35.0.0, tauri-cli 2.x, and all four Android Rust targets on top of the legacy NDK 26.1 + cargo-ndk + Gradle setup. Both pipelines coexist in the same image.
**Output**: `wzp-release.apk` uploaded to rustypaste, URL delivered via `ntfy.sh/wzp`.
## Known quirks (Tauri Mobile specific)
1. **tauri-cli `android init` writes absolute paths** into `gradle.properties` for the NDK path. Those paths are local to wherever `android init` was run, so they break any cross-machine build unless overridden with `ANDROID_NDK_HOME` at build time. The build script exports `ANDROID_NDK_HOME` explicitly to work around this.
2. **API 36 vs API 34 coexistence**: the legacy Kotlin pipeline targets API 34, Tauri Mobile 2.x wants compileSdk 36. The shared Docker image installs both SDK levels so neither pipeline needs to reinstall.
3. **Identity seed lives in Android-specific app data dir**: `/data/data/com.wzp.phone/files/.wzp/identity` instead of `$HOME/.wzp/identity`. The shared `load_or_create_seed()` function in `desktop/src-tauri/src/lib.rs` uses Tauri's `app_data_dir()` which resolves correctly on both Android and desktop — no per-platform code needed.
4. **Direct calls on macOS previously hit an identity mismatch bug** — the `CallEngine` was using `$HOME/.wzp/identity` directly while `register_signal` used Tauri's `app_data_dir()`. Fixed by routing both through `load_or_create_seed()` (commit `2fd9465`). This was important for cross-platform consistency.
## Current state (snapshot)
What works:
- Tauri Mobile scaffold builds and runs on Android
- Signal hub connect + register works
- Room mode (SFU group calls) works with Oboe audio
- Direct 1:1 calls work with full parity to desktop
- Speaker/earpiece toggle works without stalling the audio pipeline
- Call history, recent contacts, deregister UI all present (inherited from desktop)
What remains (task list refs in parens):
- Background service for keeping signal alive when app is backgrounded (#19)
- Proper permission requests (microphone, notifications) on first launch (#19)
- Incoming call notification while backgrounded (#19)
- App icon + splash screen (#19)
## Testing
- **Build**: `./scripts/build-tauri-android.sh` — verify the APK lands on rustypaste and installs on device.
- **Smoke test**: Install → open app → Register → Place call → Receive call. No crashes, audio flows both ways.
- **Speaker toggle**: During a call, toggle speaker/earpiece several times in rapid succession. Audio should never stop, and the toggle should respond within ~200 ms.
- **Stress test**: Call for 10+ minutes continuous. No memory growth, no packet loss beyond what's attributable to the network.
## Files of interest
| Path | Purpose |
|---|---|
| `desktop/src-tauri/src/lib.rs` | Shared Tauri commands (desktop + Android) |
| `desktop/src-tauri/src/android_audio.rs` | JVM-side speaker/earpiece routing |
| `desktop/src-tauri/src/wzp_native.rs` | Runtime dlopen of libwzp_native.so |
| `crates/wzp-native/` | Standalone C++/Oboe cdylib, loaded at runtime |
| `scripts/build-tauri-android.sh` | Remote Docker build pipeline |
| `scripts/Dockerfile.android-builder` | Shared Android Docker image (legacy + Tauri) |
| `docs/incident-tauri-android-init-tcb.md` | Postmortem of the Kotlin+JNI crash cascade |

View File

@@ -1,168 +1,591 @@
# WarzonePhone Detailed Design Decisions
# WarzonePhone Design Document
## Why Opus + Codec2 (Not Just One)
> Custom encrypted VoIP protocol built in Rust. Designed for hostile network conditions: 5-70% packet loss, 100-500 kbps throughput, 300-800 ms RTT. Multi-platform: Desktop (Tauri), Android, CLI, Web.
The dual-codec architecture is driven by the extreme range of network conditions WarzonePhone targets:
## System Overview
**Opus** (24/16/6 kbps) is the clear choice for normal to degraded conditions. It offers excellent quality at moderate bitrates, has built-in inband FEC and DTX (discontinuous transmission), and the `audiopus` crate provides mature Rust bindings to libopus. Opus operates at 48 kHz natively.
WarzonePhone is a voice-over-IP system built from scratch in Rust, targeting reliable encrypted voice communication over severely degraded networks. The protocol uses adaptive codecs (Opus + Codec2), fountain-code FEC (RaptorQ), and end-to-end ChaCha20-Poly1305 encryption over a QUIC transport layer.
**Codec2** (3200/1200 bps) is a narrowband vocoder designed specifically for HF radio links with extreme bandwidth constraints. At 1200 bps (1.2 kbps), it produces intelligible speech in only 6 bytes per 40ms frame -- roughly 20x lower bitrate than Opus at its minimum. The pure-Rust `codec2` crate means no C dependencies for this codec. Codec2 operates at 8 kHz, so the adaptive layer handles 48 kHz <-> 8 kHz resampling transparently.
The system comprises three categories of components:
The `AdaptiveEncoder`/`AdaptiveDecoder` in `crates/wzp-codec/src/adaptive.rs` hold both codec instances and switch between them based on the active `QualityProfile`. This avoids codec re-initialization latency during tier transitions.
1. **Protocol crates** -- a Rust workspace of 7 library crates with a star dependency graph enabling parallel development
2. **Client applications** -- Desktop (Tauri), Android (Kotlin + JNI), CLI, and Web (browser bridge)
3. **Relay infrastructure** -- SFU relay daemons with federation, health probing, and Prometheus metrics
**Bandwidth comparison with FEC overhead:**
### Design Principles
| Tier | Codec Bitrate | FEC Ratio | Total Bandwidth |
|------|--------------|-----------|----------------|
| GOOD | 24 kbps | 20% | ~28.8 kbps |
| DEGRADED | 6 kbps | 50% | ~9.0 kbps |
| CATASTROPHIC | 1.2 kbps | 100% | ~2.4 kbps |
- **User sovereignty** -- client-driven route selection, BIP39 identity backup, no central authority
- **End-to-end encryption** -- relays never see plaintext audio; SFU forwarding preserves E2E encryption
- **Adaptive resilience** -- automatic codec and FEC switching based on observed network quality
- **Parallel development** -- star dependency graph allows 5 agents/developers to work simultaneously with zero merge conflicts
At the catastrophic tier, the entire call (audio + FEC + headers) fits within approximately 3 kbps, which is viable even over severely degraded links.
## Architecture
## Why RaptorQ Over Reed-Solomon
### Crate Overview
**Reed-Solomon** is a classical block erasure code. It works well but has fixed-rate overhead: you must decide in advance how many repair symbols to generate, and decoding requires receiving exactly K of any K+R symbols.
The workspace contains 7 core crates plus integration binaries:
**RaptorQ** (RFC 6330) is a fountain code with several advantages for VoIP:
| Crate | Purpose | Key Dependencies |
|-------|---------|-----------------|
| `wzp-proto` | Protocol types, traits, wire format | serde, bytes |
| `wzp-codec` | Audio codecs (Opus, Codec2, RNNoise) | audiopus, codec2, nnnoiseless |
| `wzp-fec` | Forward error correction | raptorq |
| `wzp-crypto` | Cryptography and identity | ed25519-dalek, x25519-dalek, chacha20poly1305, bip39 |
| `wzp-transport` | QUIC transport layer | quinn, rustls |
| `wzp-relay` | Relay daemon (SFU, federation, metrics) | tokio, prometheus |
| `wzp-client` | Call engine and CLI | All above |
1. **Rateless**: You can generate an arbitrary number of repair symbols on the fly. If conditions worsen mid-block, you can generate additional repair without re-encoding.
Additional integration targets: `wzp-web` (browser bridge via WebSocket), Android native library (JNI), Desktop (Tauri).
2. **Efficient decoding**: RaptorQ can decode from any K symbols with high probability (typically K + 1 or K + 2 suffice), compared to Reed-Solomon which requires exactly K.
### Dependency Graph
3. **Lower computational complexity**: O(K) encoding and decoding time, compared to O(K^2) for Reed-Solomon. This matters for real-time audio at 50 frames/second.
```mermaid
graph TD
PROTO["wzp-proto<br/>(Types, Traits, Wire Format)"]
4. **Variable block sizes**: The encoder handles 1-56403 source symbols per block (the WZP implementation uses 5-10, but the flexibility is there).
CODEC["wzp-codec<br/>(Opus + Codec2 + RNNoise)"]
FEC["wzp-fec<br/>(RaptorQ FEC)"]
CRYPTO["wzp-crypto<br/>(ChaCha20 + Identity)"]
TRANSPORT["wzp-transport<br/>(QUIC / Quinn)"]
The `raptorq` crate (v2) provides a well-tested pure-Rust implementation. The WZP FEC layer adds length-prefixed padding (2-byte LE prefix + zero-pad to 256 bytes) so that variable-length audio frames can be recovered exactly.
RELAY["wzp-relay<br/>(Relay Daemon)"]
CLIENT["wzp-client<br/>(CLI + Call Engine)"]
WEB["wzp-web<br/>(Browser Bridge)"]
DESKTOP["Desktop<br/>(Tauri + CPAL)"]
ANDROID["Android<br/>(Kotlin + JNI)"]
**FEC bandwidth math at different loss rates:**
PROTO --> CODEC
PROTO --> FEC
PROTO --> CRYPTO
PROTO --> TRANSPORT
CODEC --> CLIENT
FEC --> CLIENT
CRYPTO --> CLIENT
TRANSPORT --> CLIENT
CODEC --> RELAY
FEC --> RELAY
CRYPTO --> RELAY
TRANSPORT --> RELAY
CLIENT --> WEB
CLIENT --> DESKTOP
CLIENT --> ANDROID
TRANSPORT --> WEB
FC["warzone-protocol<br/>(featherChat Identity)"] -.->|path dep| CRYPTO
style PROTO fill:#6c5ce7,color:#fff
style RELAY fill:#ff9f43,color:#fff
style CLIENT fill:#00b894,color:#fff
style WEB fill:#0984e3,color:#fff
style DESKTOP fill:#0984e3,color:#fff
style ANDROID fill:#0984e3,color:#fff
style FC fill:#fd79a8,color:#fff
```
The star pattern ensures each leaf crate (`wzp-codec`, `wzp-fec`, `wzp-crypto`, `wzp-transport`) depends only on `wzp-proto` and never on each other. This enables:
- **Parallel development** -- 5 agents work on 5 crates with no merge conflicts
- **Independent testing** -- each crate has self-contained tests
- **Pluggability** -- any implementation can be swapped by implementing the same trait
- **Fast compilation** -- changing one leaf only recompiles that leaf and integration crates
## Audio Pipeline
### Encode Pipeline (Mic to Network)
```mermaid
sequenceDiagram
participant Mic as Microphone
participant RNN as RNNoise Denoise
participant VAD as Silence Detector
participant ENC as Opus/Codec2 Encode
participant FEC as RaptorQ FEC Encode
participant INT as Interleaver
participant HDR as Header Assembly
participant CRYPT as ChaCha20-Poly1305
participant QUIC as QUIC Datagram
Mic->>RNN: PCM i16 x 960 (20ms @ 48kHz)
RNN->>VAD: Denoised samples (2 x 480)
alt Silence detected (>100ms)
VAD->>ENC: ComfortNoise packet (every 200ms)
else Active speech or hangover
VAD->>ENC: Active audio frame
end
ENC->>FEC: Compressed frame (padded to 256 bytes)
FEC->>FEC: Accumulate block (5-10 frames)
FEC->>INT: Source + repair symbols
INT->>HDR: Interleaved packets (depth=3)
HDR->>CRYPT: MediaHeader (12B) or MiniHeader (4B)
CRYPT->>QUIC: Header=AAD, Payload=encrypted
```
### Decode Pipeline (Network to Speaker)
```mermaid
sequenceDiagram
participant QUIC as QUIC Datagram
participant CRYPT as ChaCha20-Poly1305
participant HDR as Header Parse
participant DEINT as De-interleaver
participant FEC as RaptorQ FEC Decode
participant JIT as Jitter Buffer
participant DEC as Opus/Codec2 Decode
participant SPK as Speaker
QUIC->>CRYPT: Encrypted packet
CRYPT->>HDR: Decrypt (header=AAD)
HDR->>DEINT: Parsed MediaHeader + payload
DEINT->>FEC: Reordered symbols
FEC->>FEC: Reconstruct from any K of K+R symbols
FEC->>JIT: Recovered audio frames
JIT->>JIT: Sequence-ordered BTreeMap
JIT->>DEC: Pop when depth >= target
DEC->>SPK: PCM i16 x 960
```
## Codec System
WarzonePhone uses a dual-codec architecture to cover the full range of network conditions:
### Opus (Primary)
Opus is the primary codec for normal to degraded conditions. It operates at 48 kHz natively with built-in inband FEC and DTX (discontinuous transmission). The `audiopus` crate provides mature Rust bindings to libopus.
| Profile | Bitrate | Frame Duration | FEC Ratio | Total Bandwidth | Use Case |
|---------|---------|---------------|-----------|----------------|----------|
| Studio 64k | 64 kbps | 20ms | 10% | 70.4 kbps | LAN, excellent WiFi |
| Studio 48k | 48 kbps | 20ms | 10% | 52.8 kbps | Good WiFi, wired |
| Studio 32k | 32 kbps | 20ms | 10% | 35.2 kbps | WiFi, LTE |
| Good (24k) | 24 kbps | 20ms | 20% | 28.8 kbps | WiFi, LTE, decent links |
| Opus 16k | 16 kbps | 20ms | 20% | 19.2 kbps | 3G, moderate congestion |
| Degraded (6k) | 6 kbps | 40ms | 50% | 9.0 kbps | 3G, congested WiFi |
### Codec2 (Fallback)
Codec2 is a narrowband vocoder designed for HF radio links with extreme bandwidth constraints. It operates at 8 kHz, and the adaptive layer handles 48 kHz <-> 8 kHz resampling transparently. The pure-Rust `codec2` crate means no C dependencies.
| Profile | Bitrate | Frame Duration | FEC Ratio | Total Bandwidth | Use Case |
|---------|---------|---------------|-----------|----------------|----------|
| Codec2 3200 | 3.2 kbps | 20ms | 50% | 4.8 kbps | Poor conditions |
| Catastrophic (1200) | 1.2 kbps | 40ms | 100% | 2.4 kbps | Satellite, extreme loss |
### ComfortNoise
When the silence detector identifies no speech activity for over 100ms, the encoder switches to emitting a ComfortNoise packet every 200ms instead of encoding silence. This provides approximately 50% bandwidth savings in typical conversations.
### Adaptive Switching
The `AdaptiveEncoder`/`AdaptiveDecoder` in `wzp-codec` hold both codec instances and switch between them based on the active `QualityProfile`. This avoids codec re-initialization latency during tier transitions. The `AdaptiveQualityController` in `wzp-proto` manages tier transitions with hysteresis:
- **Downgrade**: 3 consecutive bad reports (2 on cellular networks)
- **Upgrade**: 10 consecutive good reports (one tier at a time)
- **Network handoff**: WiFi-to-cellular switch triggers preemptive one-tier downgrade plus a temporary 10-second FEC boost (+20%)
Quality tier classification thresholds:
| Tier | WiFi/Unknown | Cellular |
|------|-------------|----------|
| Good | loss < 10%, RTT < 400ms | loss < 8%, RTT < 300ms |
| Degraded | loss 10-40%, RTT 400-600ms | loss 8-25%, RTT 300-500ms |
| Catastrophic | loss > 40%, RTT > 600ms | loss > 25%, RTT > 500ms |
## Forward Error Correction (FEC)
### Why RaptorQ Over Reed-Solomon
WarzonePhone uses RaptorQ (RFC 6330) fountain codes via the `raptorq` crate:
1. **Rateless** -- generate arbitrary repair symbols on the fly; if conditions worsen mid-block, generate additional repair without re-encoding
2. **Efficient decoding** -- decode from any K symbols with high probability (typically K + 1 or K + 2 suffice)
3. **Lower complexity** -- O(K) encoding/decoding time vs O(K^2) for Reed-Solomon
4. **Variable block sizes** -- 1-56,403 source symbols per block (WZP uses 5-10)
### FEC Block Structure
Each FEC block consists of 5-10 audio frames padded to 256-byte symbols with a 2-byte LE length prefix:
```
[len:u16 LE][audio_frame][zero_padding_to_256_bytes]
```
### Loss Survival by FEC Ratio
With 5 source frames per block:
- 20% repair (GOOD): 1 repair symbol. Survives loss of 1 out of 6 packets (16.7% loss).
- 50% repair (DEGRADED): 3 repair symbols. Survives loss of 3 out of 8 packets (37.5% loss).
- 100% repair (CATASTROPHIC): 5 repair symbols. Survives loss of 5 out of 10 packets (50% loss).
The benchmark (`wzp-bench --fec --loss 30`) dynamically scales the FEC ratio to survive the requested loss percentage.
| FEC Ratio | Repair Symbols | Survives Loss | Profile |
|-----------|---------------|---------------|---------|
| 10% | 1 | 1 of 6 (16.7%) | Studio |
| 20% | 1 | 1 of 6 (16.7%) | Good |
| 50% | 3 | 3 of 8 (37.5%) | Degraded |
| 100% | 5 | 5 of 10 (50.0%) | Catastrophic |
## Why QUIC Over Raw UDP
### Interleaving
Raw UDP would be simpler and lower-latency, but QUIC (via the `quinn` crate) provides:
Burst loss protection via depth-3 interleaving: packets from 3 consecutive FEC blocks are interleaved before transmission. A burst of 3 consecutive lost packets affects 3 different blocks (1 loss each) rather than destroying 1 block entirely.
1. **DATAGRAM frames**: Unreliable delivery without head-of-line blocking (RFC 9221). Media packets use this path, so they behave like UDP datagrams but benefit from QUIC's connection management.
```mermaid
graph LR
subgraph "FEC Encoder"
F1[Frame 1] --> BLK[Source Block<br/>5-10 frames]
F2[Frame 2] --> BLK
F3[Frame 3] --> BLK
F4[Frame 4] --> BLK
F5[Frame 5] --> BLK
BLK --> SRC[Source Symbols]
BLK --> REP[Repair Symbols<br/>ratio-dependent]
SRC --> INT[Interleaver<br/>depth=3]
REP --> INT
end
2. **Reliable streams**: Signaling messages (CallOffer, CallAnswer, Rekey, Hangup) require reliable delivery. QUIC provides multiplexed streams without needing a separate TCP connection.
subgraph "Network"
INT --> LOSS{Packet Loss}
LOSS -->|some lost| RCV[Received Symbols]
end
3. **Built-in congestion control**: QUIC's congestion control prevents overwhelming degraded links, which is important when chaining relays.
subgraph "FEC Decoder"
RCV --> DEINT[De-interleaver]
DEINT --> RAPTORQ[RaptorQ Decode<br/>Any K of K+R]
RAPTORQ --> OUT[Original Frames]
end
4. **Connection migration**: QUIC connections survive IP address changes (e.g., WiFi to cellular handoff), which is valuable for mobile clients.
5. **TLS 1.3 built-in**: The QUIC handshake provides encryption at the transport level. While WZP has its own end-to-end ChaCha20 layer, the QUIC TLS protects the header and signaling from eavesdroppers.
6. **NAT keepalive**: QUIC's built-in keep-alive (configured at 5-second intervals) maintains NAT bindings without application-level pings.
7. **Firewall traversal**: QUIC runs on UDP port 443 by default, which is commonly allowed through firewalls. The `wzp` ALPN protocol identifier distinguishes WZP traffic.
The tradeoff is approximately 20-40 bytes of additional per-packet overhead compared to raw UDP (QUIC short header + DATAGRAM frame overhead).
## Why ChaCha20-Poly1305 Over AES-GCM
1. **Software performance**: ChaCha20-Poly1305 is faster than AES-GCM on hardware without AES-NI instructions. This matters for ARM devices (Android phones, Raspberry Pi relays, embedded systems) where AES hardware acceleration may be absent.
2. **Constant-time by design**: ChaCha20 uses only add-rotate-XOR operations, making it inherently resistant to timing side-channel attacks. AES-GCM implementations without hardware support often require careful constant-time implementation.
3. **Warzone messenger compatibility**: The existing Warzone messenger uses ChaCha20-Poly1305 for message encryption. Reusing the same primitive simplifies the security audit and allows key material to be shared across messaging and calling.
4. **16-byte overhead**: Both ChaCha20-Poly1305 and AES-128-GCM produce a 16-byte authentication tag. There is no size advantage to AES-GCM.
5. **AEAD with AAD**: The MediaHeader is used as Associated Authenticated Data (AAD), ensuring the header is authenticated but not encrypted. This allows relays to read routing information (block ID, sequence number) without decrypting the payload.
## Why Star Dependency Graph (Parallel Development)
The workspace follows a strict star dependency pattern:
```
wzp-proto (hub)
/ | \ \
wzp-codec wzp-fec wzp-crypto wzp-transport
\ | / /
wzp-relay
wzp-client
wzp-web
style LOSS fill:#e17055,color:#fff
style RAPTORQ fill:#00b894,color:#fff
```
- `wzp-proto` defines all trait interfaces and wire format types
- Each "leaf" crate (codec, fec, crypto, transport) depends only on `wzp-proto`
- No leaf crate depends on another leaf crate
- Integration crates (relay, client, web) depend on all leaves
## Transport Layer
This enables:
1. **Parallel development**: 5 agents/developers can work on 5 crates simultaneously with zero merge conflicts
2. **Independent testing**: Each crate has comprehensive tests that run without requiring other implementations
3. **Pluggability**: Any implementation can be swapped (e.g., replace RaptorQ with Reed-Solomon) by implementing the same trait
4. **Fast compilation**: Changes to one leaf only recompile that leaf and the integration crates, not other leaves
### Why QUIC Over Raw UDP
## Jitter Buffer Trade-offs
WarzonePhone uses QUIC (via the `quinn` crate) rather than raw UDP for several reasons:
The jitter buffer must balance two competing goals:
| Feature | Benefit |
|---------|---------|
| DATAGRAM frames (RFC 9221) | Unreliable delivery without head-of-line blocking -- behaves like UDP for media |
| Reliable streams | Multiplexed signaling (CallOffer, Hangup, Rekey) without a separate TCP connection |
| Congestion control | Prevents overwhelming degraded links, important when chaining relays |
| Connection migration | Connections survive IP address changes (WiFi to cellular handoff) |
| TLS 1.3 built-in | Transport-level encryption protects headers and signaling |
| NAT keepalive | 5-second interval maintains NAT bindings without application-level pings |
| Firewall traversal | Runs on UDP port 443 with `wzp` ALPN identifier |
**Lower latency** (smaller buffer):
- Better conversational interactivity
- Less memory usage
- But more vulnerable to jitter and reordering
The tradeoff is approximately 20-40 bytes of additional per-packet overhead compared to raw UDP.
**Higher quality** (larger buffer):
- More time to receive out-of-order packets
- More time for FEC recovery (repair packets may arrive after source packets)
- But adds perceptible delay to the conversation
### Wire Formats
The default configuration:
- Target: 10 packets (200ms) for the client, 50 packets (1s) for the relay
- Minimum: 3 packets (60ms) before playout begins (client), 25 packets (500ms) for relay
- Maximum: 250 packets (5s) absolute cap
#### MediaHeader (12 bytes)
The relay uses a deeper buffer because it needs to absorb jitter from the lossy inter-relay link. The client uses a shallower buffer for lower latency since it is on the last hop.
```
Byte 0: [V:1][T:1][CodecID:4][Q:1][FecRatioHi:1]
Byte 1: [FecRatioLo:6][unused:2]
Bytes 2-3: sequence (u16 BE)
Bytes 4-7: timestamp_ms (u32 BE)
Byte 8: fec_block_id (u8)
Byte 9: fec_symbol_idx (u8)
Byte 10: reserved
Byte 11: csrc_count
**Known issue**: The current jitter buffer does not adapt its depth based on observed jitter. It uses sequence-number ordering only, without timestamp-based playout scheduling. This can lead to drift during long calls, as observed in echo tests.
V = version (0), T = is_repair, CodecID = codec, Q = quality_report appended
```
## Browser Audio: AudioWorklet vs ScriptProcessorNode
#### MiniHeader (4 bytes, compressed)
The web bridge (`crates/wzp-web/static/`) uses AudioWorklet as the primary audio I/O mechanism, with ScriptProcessorNode as a fallback.
```
Bytes 0-1: timestamp_delta_ms (u16 BE)
Bytes 2-3: payload_len (u16 BE)
**AudioWorklet** (preferred):
- Runs on a dedicated audio rendering thread
- Lower latency (no main-thread round-trip)
- Consistent 128-sample callback timing
- Supported in Chrome 66+, Firefox 76+, Safari 14.1+
Preceded by FRAME_TYPE_MINI (0x01). Full header every 50 frames (~1s).
Saves 8 bytes/packet (67% header reduction).
```
**ScriptProcessorNode** (fallback):
- Runs on the main thread via `onaudioprocess` callback
- Higher latency, potential glitches from main-thread GC pauses
- Deprecated by the Web Audio specification
- Used when AudioWorklet is not available
#### TrunkFrame (batched datagrams)
Both paths accumulate Float32 samples into 960-sample (20ms) Int16 frames before sending via WebSocket, matching the WZP codec frame size.
```
[count:u16]
[session_id:2][len:u16][payload:len] x count
**Playback** uses an AudioWorklet with a ring buffer capped at 200ms (9600 samples at 48 kHz). When the buffer exceeds this limit, old samples are dropped to prevent unbounded drift. The fallback path uses scheduled `AudioBufferSourceNode` instances.
Packs multiple session packets into one QUIC datagram.
Max 10 entries or 1200 bytes, flushed every 5ms.
```
## Room Mode: SFU vs MCU Trade-offs
#### QualityReport (4 bytes, optional trailer)
WarzonePhone implements an **SFU** (Selective Forwarding Unit) architecture:
```
Byte 0: loss_pct (0-255 maps to 0-100%)
Byte 1: rtt_4ms (0-255 maps to 0-1020ms)
Byte 2: jitter_ms
Byte 3: bitrate_cap_kbps
```
**SFU** (implemented):
- Relay forwards each participant's packets to all other participants unchanged
- No transcoding -- the relay never decodes or re-encodes audio
- O(N) bandwidth at the relay for N participants (each packet is sent N-1 times)
- Each client receives separate streams from each other participant
- Client must mix/decode multiple streams locally
- Lower relay CPU usage (no transcoding)
- End-to-end encryption is preserved (relay never sees plaintext)
### Bandwidth Summary
**MCU** (not implemented, for comparison):
- Relay would decode all streams, mix them, and re-encode a single combined stream
- O(1) bandwidth to each client (receives one mixed stream)
- Requires the relay to have codec keys (breaks E2E encryption)
- Higher relay CPU (decoding N streams + mixing + re-encoding)
- Audio quality loss from re-encoding
| Profile | Audio | FEC Overhead | Total | Silence Savings |
|---------|-------|-------------|-------|----------------|
| Studio 64k | 64 kbps | 10% = 6.4 kbps | **70.4 kbps** | ~50% with DTX |
| Studio 48k | 48 kbps | 10% = 4.8 kbps | **52.8 kbps** | ~50% with DTX |
| Studio 32k | 32 kbps | 10% = 3.2 kbps | **35.2 kbps** | ~50% with DTX |
| Good (24k) | 24 kbps | 20% = 4.8 kbps | **28.8 kbps** | ~50% with DTX |
| Degraded (6k) | 6 kbps | 50% = 3.0 kbps | **9.0 kbps** | ~50% with DTX |
| Catastrophic (1.2k) | 1.2 kbps | 100% = 1.2 kbps | **2.4 kbps** | ~50% with DTX |
The SFU choice is driven by the E2E encryption requirement: since relays never have access to the audio codec keys, they cannot decode, mix, or re-encode. The current room implementation in `crates/wzp-relay/src/room.rs` forwards received datagrams to all other participants in the room with best-effort delivery -- if one send fails, the relay continues to the next participant.
Additional savings: MiniHeaders save 8 bytes/packet (67% header reduction). Trunking shares QUIC overhead across multiplexed sessions.
## Security
### Identity Model
Every user has a persistent identity derived from a 32-byte seed:
```mermaid
graph TD
SEED["32-byte Seed<br/>(BIP39 Mnemonic: 24 words)"] --> HKDF1["HKDF<br/>info='warzone-ed25519'"]
SEED --> HKDF2["HKDF<br/>info='warzone-x25519'"]
HKDF1 --> ED["Ed25519 SigningKey<br/>(Digital Signatures)"]
HKDF2 --> X25519["X25519 StaticSecret<br/>(Key Agreement)"]
ED --> VKEY["Ed25519 VerifyingKey<br/>(Public)"]
X25519 --> XPUB["X25519 PublicKey<br/>(Public)"]
VKEY --> FP["Fingerprint<br/>SHA-256(pubkey), truncated 16 bytes<br/>xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx"]
style SEED fill:#6c5ce7,color:#fff
style FP fill:#fd79a8,color:#fff
style ED fill:#ee5a24,color:#fff
style X25519 fill:#00b894,color:#fff
```
**BIP39 Mnemonic Backup**: The 32-byte seed can be encoded as a 24-word BIP39 mnemonic for human-readable backup. The same seed produces the same identity on any platform.
**featherChat Compatibility**: The identity derivation is compatible with the Warzone messenger (featherChat), allowing a shared identity across messaging and calling.
### Cryptographic Handshake
```mermaid
sequenceDiagram
participant C as Caller
participant R as Relay / Callee
Note over C: Derive identity from seed<br/>Ed25519 + X25519 via HKDF
C->>C: Generate ephemeral X25519 keypair
C->>C: Sign(ephemeral_pub || "call-offer")
C->>R: CallOffer { identity_pub, ephemeral_pub, signature, profiles }
R->>R: Verify Ed25519 signature
R->>R: Generate ephemeral X25519 keypair
R->>R: shared_secret = DH(eph_b, eph_a)
R->>R: session_key = HKDF(shared_secret, "warzone-session-key")
R->>R: Sign(ephemeral_pub || "call-answer")
R->>C: CallAnswer { identity_pub, ephemeral_pub, signature, profile }
C->>C: Verify signature
C->>C: shared_secret = DH(eph_a, eph_b)
C->>C: session_key = HKDF(shared_secret)
Note over C,R: Both have identical ChaCha20-Poly1305 session key
C->>R: Encrypted media (QUIC datagrams)
R->>C: Encrypted media (QUIC datagrams)
Note over C,R: Rekey every 65,536 packets<br/>New ephemeral DH + HKDF mix
```
### Encryption Details
| Component | Algorithm | Purpose |
|-----------|-----------|---------|
| Identity signing | Ed25519 | Authenticate handshake messages |
| Key agreement | X25519 (ephemeral) | Derive shared secret |
| Key derivation | HKDF-SHA256 | Derive session key from shared secret |
| Media encryption | ChaCha20-Poly1305 | Encrypt audio payloads (16-byte tag) |
| Nonce construction | Deterministic from sequence number | No nonce reuse, no state sync needed |
| Anti-replay | Sliding window (64-packet) | Reject duplicate/old packets |
| Forward secrecy | Rekey every 65,536 packets | New ephemeral DH + HKDF mix |
**Why ChaCha20-Poly1305 over AES-GCM**:
- Faster on hardware without AES-NI (ARM phones, Raspberry Pi relays)
- Inherently constant-time (add-rotate-XOR only)
- Compatible with Warzone messenger (featherChat)
- Same 16-byte authentication tag overhead as AES-GCM
**AEAD with AAD**: The MediaHeader is used as Associated Authenticated Data. The header is authenticated but not encrypted, allowing relays to read routing information (block ID, sequence number) without decrypting the payload.
### Trust on First Use (TOFU)
Clients remember the relay's TLS certificate fingerprint after first connection. If the fingerprint changes on a subsequent connection, the desktop client shows a "Server Key Changed" warning dialog. The relay derives its TLS certificate deterministically from its persisted identity seed, so the fingerprint is stable across restarts.
## Relay Architecture
### Room Mode (Default SFU)
In room mode, the relay acts as a Selective Forwarding Unit. Clients join named rooms via the QUIC SNI (Server Name Indication) field. The relay forwards each participant's encrypted packets to all other participants in the room without decoding or re-encoding.
```mermaid
graph TB
subgraph "Room Mode (SFU)"
C1[Client 1] -->|"QUIC SNI=room-hash"| RM[Room Manager]
C2[Client 2] -->|"QUIC SNI=room-hash"| RM
C3[Client 3] -->|"QUIC SNI=room-hash"| RM
RM --> R1[Room 'podcast']
R1 -->|fan-out| C1
R1 -->|fan-out| C2
R1 -->|fan-out| C3
end
style RM fill:#ff9f43,color:#fff
style R1 fill:#fdcb6e
```
**SFU vs MCU trade-off**: SFU was chosen because it preserves end-to-end encryption (the relay never sees plaintext audio). An MCU would need to decode, mix, and re-encode, breaking E2E encryption. The trade-off is O(N) bandwidth at the relay for N participants.
### Forward Mode
With `--remote`, the relay forwards all traffic to a remote relay. Used for chaining relays across lossy or censored links:
```
Client --> Relay A (--remote B) --> Relay B --> Destination Client
```
The relay pipeline in forward mode: FEC decode, jitter buffer, then FEC re-encode for the next hop.
## Federation
### Overview
Two or more relays form a federation mesh. Each relay is an independent SFU. When configured to trust each other, they bridge **global rooms** -- participants on relay A in a global room hear participants on relay B in the same room.
### Configuration
Federation uses three TOML configuration sections:
- `[[peers]]` -- outbound connections to peer relays (url + TLS fingerprint)
- `[[trusted]]` -- inbound connections accepted from relays (TLS fingerprint only)
- `[[global_rooms]]` -- room names to bridge across all federated peers
### Federation Topology
```mermaid
graph TB
subgraph "Relay A (EU)"
A_RM[Room Manager]
A_FM[Federation Manager]
A1[Alice - local]
A2[Bob - local]
A_RM --> A_FM
end
subgraph "Relay B (US)"
B_RM[Room Manager]
B_FM[Federation Manager]
B1[Charlie - local]
B_RM --> B_FM
end
A_FM <-->|"QUIC SNI='_federation'<br/>GlobalRoomActive/Inactive<br/>Media forwarding"| B_FM
A1 -->|media| A_RM
A2 -->|media| A_RM
B1 -->|media| B_RM
A_RM -->|"federated fan-out"| A1
A_RM -->|"federated fan-out"| A2
B_RM -->|"federated fan-out"| B1
style A_FM fill:#6c5ce7,color:#fff
style B_FM fill:#6c5ce7,color:#fff
style A_RM fill:#ff9f43,color:#fff
style B_RM fill:#ff9f43,color:#fff
```
### Protocol
1. On startup, each relay connects to all configured `[[peers]]` via QUIC with SNI `"_federation"`
2. After QUIC handshake, sends `FederationHello { tls_fingerprint }` for identity verification
3. Peer verifies the fingerprint against its `[[trusted]]` or `[[peers]]` list
4. When a local participant joins a global room, sends `GlobalRoomActive { room }` to all peers
5. When the last local participant leaves, sends `GlobalRoomInactive { room }`
6. Media is forwarded as `[room_hash:8][original_media_packet]` -- the relay does not decrypt
### What Relays Do NOT Do
- **No transcoding** -- media passes through as-is
- **No re-encryption** -- packets are already encrypted E2E
- **No central coordinator** -- each relay independently connects to configured peers
- **No automatic peer discovery** -- peers must be explicitly configured
### Failure Handling
- If a peer goes down, local rooms continue working; federated participants disappear from presence
- Reconnection: every 30 seconds with exponential backoff up to 5 minutes
- If a peer restarts with a different identity, the fingerprint check fails with a clear log message
## Jitter Buffer
The jitter buffer balances latency vs quality:
| Setting | Client | Relay |
|---------|--------|-------|
| Target depth | 10 packets (200ms) | 50 packets (1s) |
| Minimum before playout | 3 packets (60ms) | 25 packets (500ms) |
| Maximum cap | 250 packets (5s) | 250 packets (5s) |
The relay uses a deeper buffer to absorb jitter from lossy inter-relay links. The client uses a shallower buffer for lower latency.
The adaptive playout delay tracks jitter via exponential moving average and adjusts the target depth:
```
target_delay = ceil(jitter_ema / 20ms) + 2
```
**Known limitation**: The current jitter buffer does not use timestamp-based playout scheduling. It relies on sequence-number ordering only, which can lead to drift during long calls.
## Signal Messages
Signal messages are sent over reliable QUIC streams as length-prefixed JSON:
```
[4-byte length prefix][serde_json payload]
```
| Message | Purpose |
|---------|---------|
| `CallOffer` | Identity, ephemeral key, signature, supported profiles |
| `CallAnswer` | Identity, ephemeral key, signature, chosen profile |
| `AuthToken` | featherChat bearer token for relay authentication |
| `Hangup` | Reason: Normal, Busy, Declined, Timeout, Error |
| `Hold` / `Unhold` | Call hold state |
| `Mute` / `Unmute` | Mic mute state |
| `Transfer` | Call transfer to another relay/fingerprint |
| `Rekey` | New ephemeral key for forward secrecy |
| `QualityUpdate` | Quality report + recommended profile |
| `Ping` / `Pong` | Latency measurement (timestamp_ms) |
| `RoomUpdate` | Participant list changes |
| `PresenceUpdate` | Federation presence gossip |
| `RouteQuery` / `RouteResponse` | Presence discovery for routing |
| `FederationHello` | Relay identity during federation setup |
| `GlobalRoomActive` / `GlobalRoomInactive` | Federation room bridging |
## Test Coverage
272 tests across all crates, 0 failures:
| Crate | Tests | Key Coverage |
|-------|-------|-------------|
| wzp-proto | 41 | Wire format, jitter buffer, quality tiers, mini-frames, trunking |
| wzp-codec | 31 | Opus/Codec2 roundtrip, silence detection, noise suppression |
| wzp-fec | 22 | RaptorQ encode/decode, loss recovery, interleaving |
| wzp-crypto | 34 + 28 compat | Encrypt/decrypt, handshake, anti-replay, featherChat identity |
| wzp-transport | 2 | QUIC connection setup |
| wzp-relay | 40 + 4 integration | Room ACL, session mgmt, metrics, probes, mesh, trunking |
| wzp-client | 30 + 2 integration | Encoder/decoder, quality adapter, silence, drift, sweep |
| wzp-web | 2 | Metrics |
## Build Requirements
- **Rust** 1.85+ (2024 edition)
- **Linux**: cmake, pkg-config, libasound2-dev (for audio feature)
- **macOS**: Xcode command line tools (CoreAudio included)
- **Android**: NDK r27c, cmake 3.28+ (from pip)

View File

@@ -0,0 +1,201 @@
# PRD: Adaptive Quality Control (Auto Codec)
## Problem
When a user selects "Auto" quality, the system currently just starts at Opus 24k (GOOD) and never changes. There is no runtime adaptation — if the network degrades mid-call, audio breaks up instead of gracefully stepping down to a lower bitrate codec. Conversely, if the network is excellent, the user stays on 24k when they could have studio-quality 64k.
The relay already sends `QualityReport` messages with loss % and RTT, and a `QualityAdapter` exists in `call.rs` that classifies network conditions into GOOD/DEGRADED/CATASTROPHIC — but none of this is wired into the Android or desktop engines.
## Solution
Wire the existing `QualityAdapter` into both engines so that "Auto" mode continuously monitors network quality and switches codecs mid-call. The full quality range should be used:
```
Excellent network → Studio 64k (best quality)
Good network → Opus 24k (default)
Degraded network → Opus 6k (lower bitrate, more FEC)
Poor network → Codec2 3.2k (vocoder, heavy FEC)
Catastrophic → Codec2 1.2k (minimum viable voice)
```
## Architecture
```
┌─────────────────────┐
Relay ──────────► │ QualityReport │ loss %, RTT, jitter
│ (every ~1s) │
└────────┬────────────┘
┌─────────────────────┐
│ QualityAdapter │ classify + hysteresis
│ (3-report window) │
└────────┬────────────┘
│ recommend new profile
┌──────────────┴──────────────┐
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Encoder │ │ Decoder │
│ set_profile() │ │ (auto-switch │
│ + FEC update │ │ already works)│
└────────────────┘ └────────────────┘
```
## Existing Infrastructure
### What already exists (in `crates/wzp-client/src/call.rs`)
1. **`QualityAdapter`** (lines 97-196):
- Sliding window of `QualityReport` messages
- `classify()`: loss > 15% or RTT > 200ms → CATASTROPHIC, loss > 5% or RTT > 100ms → DEGRADED, else → GOOD
- `should_switch()`: hysteresis — requires 3 consecutive reports recommending the same profile before switching
- Prevents oscillation between profiles
2. **`QualityReport`** (in `wzp-proto/src/packet.rs`):
- Sent by relay piggy-backed on media packets
- Fields: `loss_pct` (u8, 0-255 scaled), `rtt_4ms` (u8, RTT in 4ms units), `jitter_ms`, `bitrate_cap_kbps`
3. **`CallEncoder::set_profile()`** / **`CallDecoder` auto-switch**:
- Encoder can switch codec mid-stream
- Decoder already auto-detects incoming codec from packet headers
### What's missing
1. **QualityReport ingestion** — neither Android engine nor desktop engine reads quality reports from the relay
2. **Profile switch loop** — no periodic check that feeds reports to `QualityAdapter` and applies recommended switches
3. **Upward adaptation**`QualityAdapter` only classifies into 3 tiers (GOOD/DEGRADED/CATASTROPHIC). Needs extension to recommend studio tiers when conditions are excellent (loss < 1%, RTT < 50ms)
4. **Notification to UI** — when quality changes, the UI should show the current active codec
## Requirements
### Phase 1: Basic Adaptive (3-tier)
**Both Android and Desktop:**
1. **Ingest QualityReports**: In the recv loop, extract `quality_report` from incoming `MediaPacket`s when present. Feed to `QualityAdapter`.
2. **Periodic quality check**: Every 1 second (or on each QualityReport), call `adapter.should_switch(&current_profile)`. If it returns `Some(new_profile)`:
- Switch the encoder: `encoder.set_profile(new_profile)`
- Update FEC encoder: `fec_enc = create_encoder(&new_profile)`
- Update frame size if changed (e.g., 20ms → 40ms)
- Log the switch
3. **Frame size adaptation on switch**: When switching from 20ms to 40ms frames (or vice versa):
- Android: update `frame_samples` variable, resize `capture_buf`
- Desktop: same — the send loop reads `frame_samples` dynamically
4. **UI indicator**: Show current active codec in the call screen stats line.
- Android: add to `CallStats` and display in stats text
- Desktop: add to `get_status` response and display in stats div
5. **Only in Auto mode**: Adaptive switching should only happen when the user selected "Auto". If they manually selected a profile, respect their choice.
### Phase 2: Extended Range (5-tier)
Extend `QualityAdapter::classify()` to use the full codec range:
| Condition | Profile | Codec |
|-----------|---------|-------|
| loss < 1% AND RTT < 30ms | STUDIO_64K | Opus 64k |
| loss < 1% AND RTT < 50ms | STUDIO_48K | Opus 48k |
| loss < 2% AND RTT < 80ms | STUDIO_32K | Opus 32k |
| loss < 5% AND RTT < 100ms | GOOD | Opus 24k |
| loss < 15% AND RTT < 200ms | DEGRADED | Opus 6k |
| loss >= 15% OR RTT >= 200ms | CATASTROPHIC | Codec2 1.2k |
With hysteresis:
- **Downgrade**: 3 consecutive reports (fast reaction to degradation)
- **Upgrade**: 5 consecutive reports (slow, cautious improvement)
- **Studio upgrade**: 10 consecutive reports (very conservative — avoid bouncing to 64k on brief good patches)
### Phase 3: Bandwidth Probing
Rather than relying solely on loss/RTT:
1. Start at GOOD
2. After 10 seconds of stable call, probe upward by switching to STUDIO_32K
3. If no quality degradation after 5 seconds, probe to STUDIO_48K
4. If degradation detected, immediately fall back
5. This discovers the true available bandwidth rather than guessing from loss stats
## Implementation Plan
### Android (`crates/wzp-android/src/engine.rs`)
```rust
// In the recv loop, after decoding:
if let Some(ref qr) = pkt.quality_report {
quality_adapter.ingest(qr);
}
// Periodic check (every 50 frames ≈ 1 second):
if auto_profile && frames_decoded % 50 == 0 {
if let Some(new_profile) = quality_adapter.should_switch(&current_profile) {
info!(from = ?current_profile.codec, to = ?new_profile.codec, "auto: switching quality");
let _ = encoder_ref.lock().set_profile(new_profile);
fec_enc_ref.lock() = create_encoder(&new_profile);
current_profile = new_profile;
frame_samples = frame_samples_for(&new_profile);
// Resize capture buffer if needed
}
}
```
**Challenge**: The encoder is in the send task and the quality reports arrive in the recv task. Need shared state (AtomicU8 for profile index, or a channel).
**Recommended approach**: Use an `AtomicU8` that the recv task writes and the send task reads:
```rust
let pending_profile = Arc::new(AtomicU8::new(0xFF)); // 0xFF = no change
// Recv task: when adapter recommends switch
pending_profile.store(new_profile_index, Ordering::Release);
// Send task: check at frame boundary
let p = pending_profile.swap(0xFF, Ordering::Acquire);
if p != 0xFF { /* apply switch */ }
```
### Desktop (`desktop/src-tauri/src/engine.rs`)
Same pattern. The desktop engine already has separate send/recv tasks with shared atomics for mic_muted, etc. Add a `pending_profile: Arc<AtomicU8>` following the same pattern.
### Desktop CLI (`crates/wzp-client/src/call.rs`)
The `CallEncoder` already has `set_profile()`. The `CallDecoder` already auto-switches. Just need to:
1. Add `QualityAdapter` to `CallDecoder`
2. Feed quality reports in `ingest()`
3. Check `should_switch()` in `decode_next()`
4. Emit the recommendation via a callback or return value
## Testing
1. **Local test with tc/netem**: Use Linux traffic control to simulate loss/latency:
```bash
# Simulate 10% loss, 150ms RTT
tc qdisc add dev lo root netem loss 10% delay 75ms
# Run 2 clients in auto mode, verify they switch to DEGRADED
```
2. **CLI test**: Run `wzp-client --profile auto` between two instances with simulated network conditions
3. **Relay quality reports**: Verify the relay actually sends QualityReport messages. If it doesn't yet, that needs to be implemented first (check relay code).
## Open Questions
1. **Does the relay currently send QualityReports?** If not, Phase 1 is blocked until the relay implements per-client loss/RTT tracking and report generation. The relay sees all packets and can compute loss % per sender.
2. **Codec2 3.2k placement**: Should auto mode use Codec2 3.2k between DEGRADED and CATASTROPHIC? It's 20ms frames (lower latency than Opus 6k's 40ms) but speech-only quality.
3. **Cross-client adaptation**: If client A is on GOOD and client B auto-adapts to CATASTROPHIC, client A still sends Opus 24k. Client B can decode it fine (auto-switch on recv). But should A also be told to lower quality to save B's bandwidth? This requires signaling between clients.
## Milestones
| Phase | Scope | Effort | Dependency |
|-------|-------|--------|------------|
| 0 | Verify relay sends QualityReports | 0.5 day | None |
| 1a | Wire QualityAdapter in Android engine | 1 day | Phase 0 |
| 1b | Wire QualityAdapter in desktop engine | 1 day | Phase 0 |
| 1c | UI indicator (current codec) | 0.5 day | Phase 1a/1b |
| 2 | Extended 5-tier classification | 0.5 day | Phase 1 |
| 3 | Bandwidth probing | 2 days | Phase 2 |

View File

@@ -0,0 +1,198 @@
# PRD: Coordinated Codec Switching (Relay-Judged Quality)
## Problem
The current adaptive quality system (`QualityAdapter` in call.rs) exists but isn't wired into either engine. Clients encode at a fixed quality chosen at call start. When network conditions change mid-call, audio degrades instead of gracefully stepping down. When conditions improve, clients stay on low quality unnecessarily.
Additionally, in SFU mode with multiple participants, uncoordinated codec switching creates asymmetry: if client A upgrades to 64k while B stays on 24k, bandwidth is wasted. Participants should switch together.
## Solution
The **relay acts as the quality judge** since it sees both sides of every connection. It monitors packet loss, jitter, and RTT per participant, then signals quality recommendations. Clients react to these signals with coordinated codec switches.
## Architecture
```
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Client A │◄──────►│ Relay │◄──────►│ Client B │
│ │ │ (judge) │ │ │
│ Encoder │ │ │ │ Encoder │
│ Decoder │ │ Monitor │ │ Decoder │
└─────────┘ │ per-peer│ └─────────┘
│ quality │
└────┬────┘
Quality Signals:
- StableSignal (conditions good)
- DegradeSignal (conditions bad)
- UpgradeProposal (try higher quality?)
- UpgradeConfirm (all agreed, switch at T)
```
## Quality Classification (Relay-Side)
The relay monitors each participant's connection quality:
| Condition | Classification | Action |
|-----------|---------------|--------|
| loss >= 15% OR RTT >= 200ms | Critical | Immediate downgrade signal |
| loss >= 5% OR RTT >= 100ms | Degraded | Downgrade signal after 3 reports |
| loss < 2% AND RTT < 80ms | Good | Stable signal |
| loss < 1% AND RTT < 50ms for 30s | Excellent | Upgrade proposal |
| loss < 0.5% AND RTT < 30ms for 60s | Studio | Studio upgrade proposal |
## Coordinated Switching Protocol
### Downgrade (fast, safety-first)
1. Relay detects degradation for ANY participant
2. Relay sends `QualityUpdate { recommended_profile: DEGRADED }` to ALL participants
3. ALL participants immediately switch encoder to the recommended profile
4. No negotiation — downgrade is mandatory and instant
### Upgrade (slow, consensual)
1. Relay detects sustained good conditions for ALL participants (threshold: 30s stable)
2. Relay sends `UpgradeProposal { target_profile, switch_timestamp }` to all
3. Each client responds: `UpgradeAccept` or `UpgradeReject`
4. If ALL accept within 5s → Relay sends `UpgradeConfirm { profile, switch_at_ms }`
5. All clients switch encoder at the agreed timestamp (relative to session clock)
6. If ANY rejects or times out → upgrade cancelled, stay on current profile
### Asymmetric Encoding (SFU optimization)
In SFU mode, each client encodes independently. The relay could allow:
- Client A (strong connection): encode at 64k
- Client B (weak connection): encode at 6k
- Relay forwards A's 64k to B's decoder (auto-switch handles it)
- B benefits from A's quality without needing to send at 64k
This requires NO protocol changes — just each client independently following the relay's recommendation for their own encoding quality. The decoder already handles any codec.
### Split Network Consideration
If participant A has great quality but participant C has terrible quality:
- Option 1: **Match weakest link** — everyone encodes at C's level (current approach, simple)
- Option 2: **Per-participant recommendations** — A encodes at 64k, C encodes at 6k. B (good connection) receives and decodes both. Works because decoders auto-switch per packet.
- Option 3: **Relay transcoding** — relay re-encodes A's 64k as 6k for C. Adds CPU on relay, but saves bandwidth for C. Future feature.
Recommended: start with Option 1 (match weakest), add Option 2 later.
## Signal Messages (New/Modified)
```rust
/// Quality signal from relay to client
QualityDirective {
/// Recommended profile to use for encoding
recommended_profile: QualityProfile,
/// Reason for the recommendation
reason: QualityReason,
}
enum QualityReason {
/// Network conditions require this quality level
NetworkCondition,
/// Coordinated upgrade — all participants agreed
CoordinatedUpgrade,
/// Coordinated downgrade — weakest link determines level
CoordinatedDowngrade,
}
/// Upgrade proposal from relay
UpgradeProposal {
target_profile: QualityProfile,
/// Milliseconds from now when the switch would happen
switch_delay_ms: u32,
}
/// Client response to upgrade proposal
UpgradeResponse {
accepted: bool,
}
/// Confirmed upgrade — all clients switch at this time
UpgradeConfirm {
profile: QualityProfile,
/// Session-relative timestamp to switch (ms since call start)
switch_at_session_ms: u64,
}
```
## Relay-Side Implementation
### Per-Participant Quality Tracking
```rust
struct ParticipantQuality {
/// Sliding window of recent observations
loss_samples: VecDeque<f32>, // last 30 seconds
rtt_samples: VecDeque<u32>, // last 30 seconds
jitter_samples: VecDeque<u32>,
/// Current classification
classification: QualityClass,
/// How long current classification has been stable
stable_since: Instant,
}
```
### Quality Monitor Task (on relay)
Runs alongside the SFU forwarding loop:
1. Every 1 second, compute per-participant quality from QUIC connection stats
2. Classify each participant
3. If ANY participant degrades → send downgrade to ALL
4. If ALL participants stable for threshold → propose upgrade
5. Track upgrade negotiation state
### Integration with Existing Code
The relay already has access to:
- `QuinnTransport::path_quality()` → loss, RTT, jitter, bandwidth estimates
- `QualityReport` embedded in media packet headers
- Per-session metrics in `RelayMetrics`
The quality monitor just needs to read these existing metrics and produce signals.
## Client-Side Implementation
### Handling Quality Signals
In the recv loop (both Android engine and desktop engine):
```rust
SignalMessage::QualityDirective { recommended_profile, .. } => {
// Immediate: switch encoder to recommended profile
encoder.set_profile(recommended_profile)?;
fec_enc = create_encoder(&recommended_profile);
frame_samples = frame_samples_for(&recommended_profile);
info!(codec = ?recommended_profile.codec, "quality directive: switched");
}
```
### P2P Quality (simpler case)
For P2P calls (no relay), both clients directly observe quality:
1. Each client runs its own `QualityAdapter` on the direct connection
2. When quality changes, client proposes to peer via signal
3. Simpler negotiation: only 2 parties, no relay middleman
4. Same coordinated switching logic, just peer-to-peer signals
## Backporting P2P → Relay
The quality monitoring and codec switching logic is identical:
- **P2P**: client observes quality directly → proposes switch to peer
- **Relay**: relay observes quality → proposes switch to all clients
The only difference is WHO makes the decision (client vs relay) and HOW many participants need to agree (2 vs N).
Implementation strategy: build for P2P first (simpler, 2 parties), then wrap the same logic with relay-mediated signals for SFU mode.
## Milestones
| Phase | Scope | Effort |
|-------|-------|--------|
| 1 | Relay-side quality monitor (per-participant tracking) | 1 day |
| 2 | Downgrade signal (immediate, match weakest) | 1 day |
| 3 | Client handling of QualityDirective | 1 day (both engines) |
| 4 | Upgrade proposal + negotiation protocol | 2 days |
| 5 | P2P quality adaptation (direct observation) | 1 day |
| 6 | Per-participant asymmetric encoding (Option 2) | 1 day |

170
docs/PRD-delegated-trust.md Normal file
View File

@@ -0,0 +1,170 @@
# PRD: Delegated Trust for Relay Federation
## Problem
In the current federation model, when Relay 1 trusts Relay 2, and Relay 2 forwards media from Relay 3, Relay 1 has no way to know or control that Relay 3's traffic is reaching it. This is a trust gap — any relay in the chain can introduce untrusted traffic.
**Example:** Relay 1 (trusted zone) ←→ Relay 2 (hub) ←→ Relay 3 (unknown)
Relay 1 explicitly trusts Relay 2. But Relay 2 forwards Relay 3's media to Relay 1 without Relay 1's consent. Relay 1 receives media that originated from an entity it never approved.
## Solution
Add a `delegate` flag to `[[trusted]]` entries. When `delegate = true`, the relay accepts media forwarded through the trusted peer from relays that the trusted peer vouches for. When `delegate = false` (default), only media originating from explicitly trusted/peered relays is accepted.
## Trust Levels
| Config | Meaning |
|--------|---------|
| `[[peers]]` | "I connect to you and trust your identity" |
| `[[trusted]]` | "I accept connections from you" |
| `[[trusted]] delegate = true` | "I accept connections from you AND from relays you vouch for" |
| No entry | "I reject your connections and drop your forwarded media" |
## Configuration
```toml
# Relay 1: trusts Relay 2 and delegates trust
[[trusted]]
fingerprint = "relay-2-tls-fingerprint"
label = "Relay 2 (Hub)"
delegate = true # Accept relays that Relay 2 forwards from
# Without delegate (default = false):
[[trusted]]
fingerprint = "relay-4-tls-fingerprint"
label = "Relay 4"
# delegate = false (implicit default)
# Only direct media from Relay 4 is accepted
```
## Protocol Changes
### Relay-to-Relay Media Authorization
When Relay 2 forwards media from Relay 3 to Relay 1, the datagram needs to carry origin information so Relay 1 can decide whether to accept it.
**Option A: Origin tag in datagram** (recommended)
Extend the federation datagram format:
```
[room_hash: 8 bytes][origin_relay_fp: 8 bytes][media_packet]
```
The 8-byte origin fingerprint identifies which relay originally produced the media. The forwarding relay (Relay 2) sets this to the source relay's fingerprint. Relay 1 checks:
1. Is the origin relay directly trusted? → accept
2. Is the forwarding relay trusted with `delegate = true`? → accept
3. Otherwise → drop
**Option B: Trust announcement signal**
When Relay 2 connects to Relay 1, it sends a `FederationTrustChain` signal listing which relays it will forward from:
```rust
FederationTrustChain {
/// Fingerprints of relays this peer may forward media from
vouched_relays: Vec<String>,
}
```
Relay 1 checks each fingerprint against its policy:
- If Relay 2 has `delegate = true` in Relay 1's config → accept all listed relays
- If Relay 2 has `delegate = false` → reject, only accept direct media from Relay 2
Option B is simpler to implement (no datagram format change) but less granular.
### Recommended: Option B for v1, Option A for v2
Option B is simpler — the trust chain is established at connection time, not per-datagram. The forwarding relay announces what it will forward, and the receiving relay approves or rejects upfront.
## Implementation
### Config Changes
```rust
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct TrustedConfig {
pub fingerprint: String,
#[serde(default)]
pub label: Option<String>,
/// When true, also accept media forwarded through this relay from
/// relays it vouches for. Default: false.
#[serde(default)]
pub delegate: bool,
}
```
### Federation Signal
```rust
/// Sent after FederationHello — lists relays this peer will forward from.
FederationTrustChain {
/// TLS fingerprints of relays whose media may be forwarded through us.
vouched_relays: Vec<String>,
}
```
### Forwarding Authorization
In `handle_datagram`, before forwarding media to local participants:
```rust
// Check if we should accept this forwarded media
let is_authorized = if source_is_direct_peer {
true // Direct peer, always accepted
} else {
// Check if the forwarding peer has delegate=true
let forwarding_peer = fm.find_trusted_by_fingerprint(forwarding_peer_fp);
forwarding_peer.map(|t| t.delegate).unwrap_or(false)
};
if !is_authorized {
warn!("dropping forwarded media from unauthorized relay chain");
return;
}
```
### Relay 2 (Hub) Behavior
When Relay 2 receives `FederationTrustChain` queries from peers:
1. Collect all directly connected peer fingerprints
2. Send `FederationTrustChain { vouched_relays }` to each peer
3. When a new relay connects, update all peers' trust chains
### Anti-Spam Properties
| Attack | Mitigation |
|--------|-----------|
| Unknown relay connects to hub | Hub rejects (not in `[[trusted]]`) |
| Hub forwards spam relay's media | Receiving relay checks delegate flag, drops if false |
| Relay spoofs origin fingerprint | Origin tag is set by the forwarding relay, not the source. The forwarding relay is trusted, so if it lies about origin, the trust is misplaced at the config level. |
| Chain amplification (A→B→C→D→...) | TTL on forwarded datagrams (decrement at each hop, drop at 0). Default TTL=2 (one intermediate relay). |
## TTL for Chain Length
Add a TTL byte to the federation datagram to limit chain depth:
```
[room_hash: 8 bytes][ttl: 1 byte][media_packet]
```
- Default TTL = 2 (allows one intermediate relay: A→B→C)
- Each forwarding relay decrements TTL
- When TTL = 0, don't forward further (only deliver to local participants)
- Configurable per-relay: `max_federation_hops = 2`
## Milestones
| Phase | Scope | Effort |
|-------|-------|--------|
| 1 | Add `delegate` field to `TrustedConfig` | 0.5 day |
| 2 | `FederationTrustChain` signal + announcement | 1 day |
| 3 | Authorization check in `handle_datagram` | 0.5 day |
| 4 | TTL in federation datagrams | 0.5 day |
| 5 | Testing: authorized vs unauthorized forwarding | 0.5 day |
## Non-Goals (v1)
- Per-room trust policies (trust Relay X only for room "android")
- Dynamic trust negotiation (relays negotiate trust level at runtime)
- Revocation (removing a relay from trust chain requires config edit + restart)
- Cryptographic proof of origin (signed datagrams from source relay)

View File

@@ -0,0 +1,22 @@
# PRD: Desktop Direct Calling — Backport SignalManager
## Problem
The desktop Tauri app has the direct calling UI (Room/Direct Call toggle, Register, Call buttons) but the backend uses inline async code in `main.rs` instead of a proper `SignalManager`. This needs to be backported from the Android refactor.
## Tasks
1. **Create `signal_mgr.rs` for desktop** — same pattern as Android, or reuse the crate directly
2. **Wire into Tauri commands**`register_signal` should use `SignalManager::connect()` + `run_recv_loop()` on a dedicated thread
3. **State polling**`get_signal_status` should call `SignalManager::get_state_json()`
4. **place_call / answer_call** — delegate to SignalManager methods
5. **Merge android branch into desktop branch** — resolve the 37 desktop-only + 90 android-only commit divergence
6. **Test** — Android calls Desktop, Desktop calls Android
## UI Fixes
1. **Default alias** — generate random name on first start (like Android does)
2. **Default room** — change from "android" to "general"
3. **Fingerprint display** — ensure full `xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx` format (not truncated)
4. **Deregister button** — ability to disconnect signal channel
5. **Call state reset** — after hangup, return to "Registered" state, not stuck on "Ringing"

View File

@@ -0,0 +1,360 @@
# PRD: DRED Integration & Opus-Tier FEC Simplification
## Problem
WarzonePhone's audio loss-recovery stack is built around classical Opus + application-level RaptorQ FEC. It was the right answer when WZP was designed, but libopus 1.5 (December 2023) introduced **Deep REDundancy (DRED)** — a neural speech-recovery feature that is strictly better than classical FEC for the loss patterns VoIP calls actually experience. We are paying real latency, bitrate, and complexity costs for protection that DRED now does better and cheaper.
Concretely, on every Opus call today we pay:
- **~40100 ms of receiver-side latency** waiting for RaptorQ block completion before decode
- **1020% bitrate overhead** from RaptorQ repair symbols (more on studio profiles)
- **~2040% codec-internal overhead** from Opus inband FEC (LBRR)
- Classical Opus PLC on loss bursts exceeding the RaptorQ block size — which sounds robotic and gap-ridden
…in exchange for bit-exact recovery of isolated single-frame losses, which is perceptually indistinguishable from classical Opus PLC for 20 ms of speech. The protection is misaligned with the failure modes.
DRED delivers:
- **Zero added receive latency** — reconstruction runs only on detected loss
- **~1 kbps flat bitrate overhead** regardless of base bitrate
- **Plausible reconstruction of bursts up to ~1 second** — DRED's headline capability, exactly the regime RaptorQ can't touch
- Neural PLC that sounds like continuous speech, not a gap
We also have a second, unrelated problem blocking adoption: our FFI crate `audiopus_sys 0.2.2` vendors **libopus 1.3**, predating DRED entirely. We cannot enable DRED without first swapping the FFI layer. The naïve choice (`opus` crate from SpaceManiac) is a trap — it depends on the same dead `audiopus_sys`. The real target is `opusic-c 1.5.5` by DoumanAsh, which vendors libopus 1.5.2 with full DRED support and documents Android NDK cross-compile.
This PRD covers the FFI swap, DRED enablement, the decision to **remove RaptorQ and Opus inband FEC from the Opus tiers entirely** (keeping RaptorQ only for Codec2 where DRED is N/A), and the jitter buffer refactor that the DRED lookahead/backfill pattern requires.
## Goals
- Replace `audiopus 0.3.0-rc.0` + `audiopus_sys 0.2.2` (dead upstream, libopus 1.3) with `opusic-c 1.5.5` + `opusic-sys 0.6.0` (active upstream, libopus 1.5.2)
- Enable DRED on every Opus profile with a tiered duration policy, lower at studio bitrates and higher at degraded bitrates
- Disable Opus inband FEC (LBRR) on all Opus profiles — opusic-c's own docs recommend this, and it overlaps DRED's job
- Remove `wzp-fec` (RaptorQ) from the Opus tiers entirely — the latency and bitrate savings are real, and DRED strictly dominates it on speech
- Keep RaptorQ + current FEC ratios on the Codec2 tiers unchanged — DRED is libopus-only, Codec2 has no neural equivalent
- Refactor `wzp-transport::jitter` to a lookahead/backfill pattern that lets DRED reconstruct loss windows when the next packet arrives, instead of the current "wait for block completion or fall through to classical PLC" policy
- Ship behind a runtime escape hatch (`AUDIO_USE_LEGACY_FEC`) for the first rollout window so we can revert to RaptorQ if DRED has surprises in real-world conditions
## Non-goals
- Changing Codec2 at all. Codec2 1200 / 3200 are outside the DRED lineage and keep their current RaptorQ protection, block sizes, and PLC path.
- Adding new Opus bitrate tiers or changing the quality adaptation thresholds. This PRD is about the protection layer, not the bitrate ladder.
- Enabling OSCE (Opus Speech Coding Enhancement — a separate libopus 1.5 neural post-processor that opusic-c exposes via an `osce` feature flag). Valuable, complementary, and free once opusic-c is in — but out of scope here to keep the PRD focused. Track as follow-up.
- Video, audio-over-MoQ, or any protocol-layer changes discussed in prior conversations.
- Touching the wzp-web / browser client. Browser Opus is a separate codepath via WebAudio / WASM libopus and is not affected by the native FFI swap.
## Background
### How the three protection mechanisms actually differ
| | Opus inband FEC (LBRR) | RaptorQ (wzp-fec) | DRED |
|---|---|---|---|
| Layer | codec-internal | application, across Opus packets | codec-internal |
| What it sends | low-bitrate copy of the *previous* frame, embedded in every packet | fountain-code repair symbols across a block | neural-coded history of the recent past |
| Protection horizon | 1 packet back | block duration (currently 100 ms, proposed 40 ms) | configurable, 01040 ms |
| Recovery granularity | 1 frame (lower quality) | 1 frame (bit-exact) | 10 ms frames (plausible reconstruction) |
| Latency cost | 0 ms | block duration on receive | 0 ms |
| Bitrate cost | ~2040% of base | `fec_ratio × base` (currently +20% GOOD, +50% DEGRADED) | ~1 kbps flat |
| Effective loss tolerance | ~single-packet losses | up to `(repair symbols / block)` losses, cliff beyond | bursts up to the configured duration |
| Content assumption | any Opus audio | any | speech (DRED model is speech-trained) |
### Why DRED dominates on the Opus tiers
Loss-scenario walkthrough (verified against opusic-c and libopus 1.5 docs):
- **1-frame loss (20 ms)**: RaptorQ recovers bit-exactly, DRED wouldn't run (classical Opus PLC is perceptually indistinguishable for single 20 ms frames). RaptorQ "wins" on paper but not on ears.
- **23 frame burst (4060 ms)**: RaptorQ at current ratio 0.2 hits its tolerance cliff. DRED handles this trivially — well within a 200 ms window.
- **510 frame burst (100200 ms)**: RaptorQ completely overwhelmed at any reasonable ratio. DRED's sweet spot.
- **10+ frame burst (>200 ms)**: RaptorQ useless. DRED at 5001000 ms still recovers.
The only scenario where RaptorQ strictly beats DRED is bit-exact recovery of isolated single-frame losses — which is perceptually irrelevant for speech. In every other scenario DRED either ties or wins.
### Why Codec2 keeps RaptorQ
DRED lives inside libopus — it does not help Codec2 at all. Codec2's classical PLC is a parametric-vocoder interpolation that produces noticeably robotic artifacts on loss. On the Codec2 tiers, RaptorQ is the only protection we have, and it should stay at current ratios (1.0 on CATASTROPHIC, 0.5 on the Codec2 3200 tier).
### The opusic-c / opusic-sys situation
- `opusic-sys 0.6.0` — FFI crate, published 2026-03-17, vendors libopus 1.5.2 via its `bundled` feature (on by default), documents Android NDK cross-compile via `ANDROID_NDK_HOME` (which our `wzp-android/build.rs` already sets). Exposes raw bindings to `opus_dred_parse`, `opus_decoder_dred_decode`, and the `OpusDRED` state struct.
- `opusic-c 1.5.5` — high-level safe wrapper. Its **encoder** side is fine: exposes `Encoder::set_dred_duration(value: u8) -> Result<(), ErrorCode>` with range `0..=104` (each unit is 10 ms, so 01040 ms configurable). Also exposes `set_bitrate`, `set_inband_fec`, `set_dtx`, `set_packet_loss`, `set_signal`, `set_complexity`, `set_bandwidth`, `set_application` on the encoder.
- **opusic-c's decoder-side DRED wrapper is NOT sufficient for our architecture.** Confirmed by reading the source of `opusic-c/src/dred.rs`:
1. `Dred::decode_to` ignores the `dred_end` output of `opus_dred_parse` (prefixed `_dred_end`), so the caller cannot know how much DRED history a given packet actually carried.
2. In `opus_decoder_dred_decode(decoder, dred, dred_offset, pcm, frame_size)`, the wrapper passes `frame_size` to BOTH the `dred_offset` and `frame_size` arguments. This looks like a bug — it means reconstruction always starts at offset `frame_size` into the DRED window, not at an arbitrary caller-chosen offset. Arbitrary-gap reconstruction (which we need for the lookahead/backfill pattern) requires proper offset control.
3. `DredPacket` is owned internally by a `Dred` instance; its internal buffer is overwritten on every `decode_to` call. We cannot hold a ring of parsed DredPackets from multiple recent arrivals — which is exactly what the lookahead/backfill jitter buffer pattern requires.
- **Decision**: use opusic-c for the encoder path (its wrapper is correct and saves work), and drop to `opusic-sys` raw FFI for the entire decoder path AND the DRED reconstruction path. Both use a single shared `DecoderHandle` so internal decoder state stays consistent. **Verified at pre-flight**: `opusic_c::Decoder.inner` is `pub(crate)`, so there is no way to reach the raw `*mut OpusDecoder` from outside opusic-c. Running two parallel decoders (one from opusic-c for audio, one from opusic-sys for DRED) would cause state drift because the DRED-only decoder wouldn't see the normal decode calls. Single unified decoder via opusic-sys is the only correct architecture.
- **Three FFI handles required** per decode session: `opusic_c::Encoder` (encoder side, unchanged), our own `DecoderHandle` wrapping `*mut OpusDecoder` from opusic-sys (for normal decode AND for the `OpusDecoder` pointer passed to `opus_decoder_dred_decode`), and a new `DredDecoderHandle` wrapping `*mut OpusDREDDecoder` from opusic-sys (passed to `opus_dred_parse`). Note: `OpusDREDDecoder` is a **separate struct** from `OpusDecoder` in libopus 1.5 — verified from opus.h. Allocation via `opus_dred_decoder_create()` (confirm exact symbol name at Phase 3a start).
- The `opus` crate from SpaceManiac (0.3.1, published 2026-01-03) is a trap: it depends on `audiopus_sys ^0.2.0` — the same dead FFI crate we're trying to get away from. Do not use.
- **Follow-up (out of scope for this PRD)**: upstream the fixes to `opusic-c/src/dred.rs` (preserve `dred_end`, fix the `dred_offset` double-pass, expose `DredPacket` externally). Worth a GitHub PR once our own implementation has proven correct. Would let us eventually delete our internal FFI wrapper.
### Critical note from opusic-c docs
From the `dred` module documentation: *"The documentation recommends disabling in-band FEC and using `Application::Voip` for optimal results."* This applies to the **codec-internal** Opus inband FEC (LBRR), not our application-level RaptorQ. The two are independent layers. This PRD disables both on Opus tiers, but for different reasons — inband FEC per upstream recommendation, RaptorQ per the analysis above.
### The libopus 1.5 loss-percentage gating quirk
In libopus 1.5, both inband FEC and DRED are gated on `OPUS_SET_PACKET_LOSS_PERC` being non-zero. If the encoder thinks loss is 0%, it will not emit DRED data even when `set_dred_duration` is configured. We must plumb a meaningful loss percentage into the encoder continuously, floored at a small non-zero value so DRED stays active even when the network is perfect. Planned floor: **5%**, overridden upward by the real `QualityReport` loss value when it exceeds the floor.
## Solution
### High-level architecture change
**Before** (per Opus frame encode path):
```
PCM → AdaptiveEncoder.encode (Opus)
→ inband FEC embedded in packet
→ wzp-fec FEC encoder (accumulate into block, generate repair symbols)
→ DATAGRAM out
```
**Before** (per Opus frame decode path):
```
DATAGRAM in → wzp-fec block assembly (wait for block, recover if possible)
→ AdaptiveDecoder.decode (Opus) / decode_lost (classical PLC)
→ PCM
```
**After** (Opus tiers):
```
PCM → OpusEncoder.encode (opusic-c, DRED enabled via set_dred_duration, inband FEC off)
→ DATAGRAM out directly (no RaptorQ block)
```
```
DATAGRAM in → jitter buffer (lookahead/backfill)
→ on frame arrival: OpusDecoder.decode
→ on detected gap: if next packet has DRED state → dred::Dred.reconstruct(gap)
else → OpusDecoder.decode_lost (classical PLC)
→ PCM
```
**After** (Codec2 tiers): unchanged. RaptorQ block encoding + classical Codec2 decode path stay exactly as they are today.
### New per-profile protection matrix
| Profile | Codec | Inband FEC | RaptorQ ratio | DRED duration | Total overhead |
|---|---|---|---|---|---|
| `STUDIO_64K` | Opus 64k | **off** | **none** | **10 frames (100 ms)** | +1 kbps |
| `STUDIO_48K` | Opus 48k | **off** | **none** | **10 frames (100 ms)** | +1 kbps |
| `STUDIO_32K` | Opus 32k | **off** | **none** | **10 frames (100 ms)** | +1 kbps |
| `GOOD` | Opus 24k | **off** | **none** | **20 frames (200 ms)** | +1 kbps |
| `NORMAL_16K` | Opus 16k | **off** | **none** | **20 frames (200 ms)** | +1 kbps |
| `DEGRADED` | Opus 6k | **off** | **none** | **50 frames (500 ms)** | +1 kbps |
| `CODEC2_3200` | Codec2 3200 | N/A | **0.5 (unchanged)** | N/A | +50% |
| `CATASTROPHIC` | Codec2 1200 | N/A | **1.0 (unchanged)** | N/A | +100% |
| `COMFORT_NOISE` | CN | — | — | — | — |
DRED duration rationale:
- **Studio tiers (100 ms)**: loss is rare on the networks where users pick studio quality. Short DRED window keeps decode-side CPU modest. Still covers multi-frame bursts that classical PLC can't touch.
- **Normal tiers (200 ms)**: balanced baseline. Handles the common VoIP loss pattern (20150 ms bursts from wifi roam, transient congestion).
- **Degraded tier (500 ms)**: users on Opus 6k are by definition on a bad link. Long DRED window buys maximum burst resilience where it matters most. Still well under the 1040 ms cap.
### Runtime escape hatch
Ship with a single environment variable / settings flag: **`AUDIO_USE_LEGACY_FEC`**. When set, the entire Opus-tier path reverts to the pre-PRD behavior: RaptorQ re-enabled at the old ratios, Opus inband FEC re-enabled, DRED disabled (`set_dred_duration(0)`). This is the rollback safety valve for the first production window.
Escape hatch semantics:
- Read once at `CallEncoder::new` / `CallDecoder::new` time. Call-scoped, not re-read mid-call.
- Exposed via Android Settings UI as a hidden "Legacy FEC (debug)" toggle, and as a CLI flag `--legacy-fec` on the desktop client.
- Logged in `DebugReporter` so we can tell which mode a call was in when diagnosing.
- Removed entirely after 2 months of stable production with no regressions reported. Removal is a follow-up PR, not part of this PRD's scope.
## Detailed design
### Phase 0 — FFI crate swap (prerequisite, no behavior change)
**Files touched:**
- `Cargo.toml` (workspace root) — replace `audiopus = "0.3.0-rc.0"` with `opusic-c = { version = "1.5.5", features = ["bundled", "dred"] }` and `opusic-sys = { version = "0.6.0", features = ["bundled"] }`. The `opusic-sys` direct dep is for the DRED decoder path below.
- `crates/wzp-codec/Cargo.toml` — update `audiopus = { workspace = true }` to `opusic-c = { workspace = true }`, add `opusic-sys = { workspace = true }`, add `bytemuck = "1"` for the i16↔u16 slice cast.
- `crates/wzp-codec/src/opus_enc.rs` — rewrite against opusic-c. API mapping:
- `audiopus::coder::Encoder::new(SampleRate::Hz48000, Channels::Mono, Application::Voip)``opusic_c::Encoder::new(Channels::Mono, SampleRate::Hz48000, Application::Voip)` (argument order swapped)
- `set_bitrate(Bitrate::BitsPerSecond(bps))``set_bitrate(Bitrate::Bits(bps))` or equivalent variant — verify at implementation time
- `set_inband_fec(true/false)``set_inband_fec(InbandFec::On/Off)` (now an enum)
- `set_packet_loss_perc(u8)``set_packet_loss(u8)` (method renamed)
- `set_dtx(bool)`, `set_signal(Signal::Voice)`, `set_complexity(u8)` — names match
- `encode(&[i16], &mut [u8])``encode_to_slice(&[u16], &mut [u8])` with `bytemuck::cast_slice::<i16, u16>(pcm)` at the call site
- `crates/wzp-codec/src/opus_dec.rs` — same-style rewrite for the `Decoder` path. Note that opusic-c's decoder methods take `decode_fec: bool` as a parameter directly (not a separate ctl).
- `vendor/audiopus_sys/` — delete the directory (only exists on `feat/desktop-audio-rewrite`, not on `android-rewrite`, so this is a no-op on the current branch but do remove the `[patch.crates-io]` block from Cargo.toml when merging back).
**Acceptance criteria:**
- `cargo check --workspace` passes on Linux x86_64, macOS, and Android NDK cross-compile.
- All existing codec unit tests in `crates/wzp-codec/src/adaptive.rs` pass unchanged. DRED is still disabled at this phase (default `set_dred_duration(0)`), so behavior is equivalent to pre-swap libopus 1.3 for call quality purposes.
- A short real-call smoke test produces audio identical to current behavior (no audible regression).
- `opusic_c::version()` at startup logs libopus version containing `1.5.2` — hard signal that the swap landed correctly.
### Phase 1 — DRED encoder enable on all Opus profiles
**Files touched:**
- `crates/wzp-codec/src/opus_enc.rs`:
- Add `fn dred_duration_for(codec: CodecId) -> u8` returning the per-profile value from the matrix above (10 / 20 / 50 frames).
- In `OpusEncoder::new`, after the existing `set_bitrate`/`set_signal`/`set_complexity` block: call `inner.set_inband_fec(InbandFec::Off)`, then `inner.set_dred_duration(dred_duration_for(profile.codec))`, then `inner.set_packet_loss(5)` as the default floor.
- Add `pub fn set_dred_duration(&mut self, frames: u8)` to allow the adaptive ladder to update DRED duration on profile switch.
- In the existing `set_profile` impl, call `set_dred_duration(dred_duration_for(profile.codec))` after `apply_bitrate`.
- `crates/wzp-codec/src/adaptive.rs`:
- `AdaptiveEncoder::set_profile` already delegates to `self.opus.set_profile` — no changes needed. DRED update rides along.
- `crates/wzp-client/src/call.rs` (and equivalent on `wzp-android/src/pipeline.rs`):
- In the `QualityReport` handler (wherever we currently call `set_expected_loss` / `set_packet_loss_perc`), also ensure the loss value is floored at 5% before passing to the Opus encoder. This is a 1-line change.
**Acceptance criteria:**
- Encoder produces DRED-enabled Opus packets. Verifiable via libopus's reference decoder in debug mode, or by wire capture + inspection — a DRED-bearing Opus packet has a larger `opus_packet_get_nb_frames` footprint than a non-DRED one of the same nominal bitrate.
- Total outgoing bitrate on Opus 24k is ~25 kbps (up from ~24 kbps) — confirms ~1 kbps DRED overhead.
- On a lossless path, decoder output is audibly identical to Phase 0.
- Escape hatch `AUDIO_USE_LEGACY_FEC=1` cleanly reverts the DRED enable (calls `set_dred_duration(0)` and `set_inband_fec(InbandFec::On)` instead).
### Phase 2 — RaptorQ removal on Opus tiers
**Files touched:**
- `crates/wzp-client/src/call.rs`:
- In `CallEncoder::encode_frame` (or wherever `wzp_fec::Encoder::add_source_symbol` is called), gate the RaptorQ path on `!profile.codec.is_opus()` — Opus frames go straight to DATAGRAM emit, Codec2 frames continue through RaptorQ.
- When a profile switch crosses the Opus↔Codec2 boundary, flush/reset the RaptorQ encoder state.
- `crates/wzp-android/src/pipeline.rs`:
- Mirror the same gate in the Android encode path.
- `crates/wzp-proto/src/packet.rs`:
- `MediaHeader.fec_block` and `fec_symbol` are still valid fields on the wire. For Opus packets we emit `fec_block = 0`, `fec_symbol = 0`, `fec_ratio_encoded = 0`. No wire format change; the receiver just sees all-zeros in the FEC fields for Opus packets and skips the FEC decoder path.
- Bump protocol version to v1 → v2? **No** — the change is semantically backward compatible because existing RaptorQ decoders handle a zero ratio correctly (ratio 0.0 means "no repair symbols expected"). Old receivers can still decode new Opus packets; they just won't see any DRED benefit because their libopus is old. This is a property we want: the opposite (new receiver, old sender) is the more common mixed-version case during rollout and also Just Works.
- `crates/wzp-client/src/call.rs``CallDecoder`:
- Symmetric change: Opus frames bypass the RaptorQ block assembly, go straight to the decoder. Only Codec2 frames (`codec_id.is_codec2()`) feed through `wzp-fec` block decoding.
**Acceptance criteria:**
- Outgoing Opus packets have `fec_ratio_encoded == 0` (verifiable with the existing wire capture tooling in `wzp-client/src/echo_test.rs`).
- On a clean network, receiver latency (measured as encode-to-playout one-way delay) drops by ~40 ms versus Phase 1. This is the primary win and should be directly measurable with the existing telemetry.
- Codec2 calls show no latency change and no packet-format change. Regression-test Codec2 3200 and Codec2 1200 specifically.
- Total outgoing bitrate on Opus 24k drops from ~28.8 kbps (24k base + 0.2 RaptorQ ratio) to ~25 kbps (24k base + ~1 kbps DRED). Direct savings observable in network telemetry.
### Phase 3 — DRED reconstruction wrapper + jitter buffer lookahead/backfill refactor
This phase is larger than originally estimated because opusic-c's decoder-side DRED wrapper is unusable for our architecture (see Background). We write our own safe wrapper over `opusic-sys` raw FFI first, then plumb it through the jitter buffer.
**Step 3a — Safe DRED reconstruction wrapper in `wzp-codec`:**
New file `crates/wzp-codec/src/dred_ffi.rs`. Wraps the raw libopus 1.5 DRED API:
- `pub struct DredState` — owns an `OpusDRED` buffer (allocated via `opusic_sys::opus_dred_alloc` or equivalent; size is fixed at 10,592 bytes per libopus 1.5). `Clone` is intentionally NOT implemented — the state is heap-owned and non-trivial to copy.
- `pub fn parse_from_packet(&mut self, decoder: &opusic_c::Decoder, packet: &[u8], max_dred_samples: i32) -> Result<DredParseResult, DredError>` — wraps `opus_dred_parse`, preserves the `dred_end` output (number of samples of history the packet carried), returns it in `DredParseResult { samples_available: i32, frames_available: u8 }`.
- `pub fn reconstruct_into(&self, decoder: &mut opusic_c::Decoder, dred_offset_samples: i32, output: &mut [i16]) -> Result<usize, DredError>` — wraps `opus_decoder_dred_decode`, takes the offset explicitly, decodes `output.len()` samples starting from that offset in the DRED window.
- All `unsafe` contained here, strict bounds checking on offsets, Rust-level panic safety. Unit tests use a reference encoder + known-good reference decoder to verify that reconstruction at specific offsets produces expected output.
- Depends on `opusic-sys` directly and on `opusic-c::Decoder` for the decoder handle. The Decoder handle must be reachable as a raw pointer; opusic-c exposes this via an unstable internal or we wrap the pointer ourselves. **Verify at implementation time** — if opusic-c doesn't expose the raw decoder pointer safely, we create our own thin Decoder wrapper in `dred_ffi.rs` using raw opusic-sys, losing the convenience of opusic-c's decoder but keeping its encoder. This is the smaller-risk fallback.
New `pub trait DredReconstructor` in `wzp-codec/src/lib.rs`:
```rust
pub trait DredReconstructor: Send {
/// Parse DRED state from an arriving Opus packet into `state`.
/// Returns number of 48 kHz samples of history available, or 0 if the packet has no DRED.
fn parse(&mut self, state: &mut DredState, packet: &[u8]) -> Result<i32, DredError>;
/// Reconstruct `output.len()` samples from `state`, starting at the given
/// sample offset (measured from the end of the DRED window going backward).
fn reconstruct(&mut self, state: &DredState, offset_samples: i32, output: &mut [i16]) -> Result<usize, DredError>;
}
```
Implement `DredReconstructor` over the `dred_ffi::DredState` + opusic-c Decoder combination. This is the clean boundary the jitter buffer will talk to.
**Step 3b — Jitter buffer refactor in `crates/wzp-transport/src/jitter.rs`:**
- Current behavior: buffer waits a fixed number of frames of jitter before emitting; on a missing slot, after a timeout it gives up and signals the decoder to run `decode_lost()` (classical Opus PLC or Codec2 PLC).
- New behavior on Opus tiers: when a frame arrives (in-order or late), first call `DredReconstructor::parse` on it to update a rolling ring of `DredState` instances tagged with their originating sequence number. When a gap is detected (missing sequence number between last-emitted and current arrival), and the ring contains a `DredState` from a nearby packet that covers the gap's sample offset, call `DredReconstructor::reconstruct` with the correct offset to synthesize the missing frames, splice them into playout, then continue normal decode.
- If no DRED state covers the gap (e.g., gap too far back, or every nearby packet was dropped), fall through to classical PLC exactly as today. The classical path stays intact as the ultimate fallback.
- Codec2 packets bypass the entire DRED ring. They are not inspected for DRED state and take the unchanged classical PLC path.
- Ring sizing: `max_dred_duration_frames` + `jitter_depth_frames` worth of `DredState` instances. At 500 ms DRED on degraded tier + 60 ms jitter depth, that's ~28 DredState instances × 10,592 bytes ≈ 300 KB. Acceptable. On studio tier with 100 ms DRED it's only ~80 KB.
- The jitter buffer takes a `Box<dyn DredReconstructor>` at construction, passed in by the call engine. `wzp-transport` does NOT take a direct dep on `opusic-c` or `opusic-sys` — it only knows about the trait defined in `wzp-codec`.
**Files touched:**
- `crates/wzp-codec/src/dred_ffi.rs` (new, ~150300 lines)
- `crates/wzp-codec/src/lib.rs` — expose `DredReconstructor`, `DredState`, `DredError` types
- `crates/wzp-codec/Cargo.toml` — add `opusic-sys = { workspace = true }` as a direct dep (already done in Phase 0)
- `crates/wzp-transport/src/jitter.rs` — lookahead/backfill refactor, DRED ring
- `crates/wzp-transport/Cargo.toml` — add `wzp-codec = { workspace = true }` (likely already present) for the trait import
- `crates/wzp-client/src/call.rs` — construct a `DredReconstructor` and pass into `CallDecoder`'s jitter buffer
- `crates/wzp-android/src/pipeline.rs` — same on Android
**Acceptance criteria:**
- Unit tests in `dred_ffi.rs`: round-trip a known speech waveform through an encoder with DRED enabled, parse the resulting packets, reconstruct at several different offsets, verify the reconstructed samples are within an energy/spectral threshold of the original. (Not bit-exact — DRED reconstruction is lossy by design.)
- Synthetic loss test on the full pipeline: inject 200 ms bursts at 10% rate into a looped call, verify the DRED reconstruction rate on receiver telemetry is ≥95% of all loss events whose gaps fall within the configured DRED duration window.
- Reconstructed audio is audibly continuous on 40200 ms bursts — no gaps, no classical-PLC robot artifact. Verified on real voice samples (not just sine tones), and on at least two distinct speaker profiles (male, female) because DRED can have voice-dependent quality.
- End-to-end latency metric is unchanged versus Phase 2 (no regression from adding the lookahead path). The DRED ring insertion on packet arrival must be O(1) in practice.
- Existing `echo_test.rs` and `drift_test.rs` pass with the new jitter buffer.
- Codec2 path uses classical PLC exclusively (no DRED invocation) because Codec2 packets don't carry DRED state. Verify by injecting loss on a Codec2 call and confirming zero DRED reconstruction telemetry events during that call.
- `wzp-transport` has no direct dependency on `opusic-sys` or `opusic-c` in its `Cargo.toml` after the refactor — only on `wzp-codec`. Verify by grepping the Cargo.toml file.
### Phase 4 — Telemetry and tooling updates
**Files touched:**
- `crates/wzp-proto/src/packet.rs``QualityReport` or equivalent telemetry message gains `dred_reconstructions: u32` as a new counter (frames reconstructed via DRED this reporting window) and `classical_plc_invocations: u32` (frames filled by Opus/Codec2 classical PLC). These are separate counters because they're different recovery mechanisms.
- `crates/wzp-relay/src/*` — relay telemetry pipeline surfaces both counters in Prometheus metrics: `wzp_dred_reconstructions_total{call_id}`, `wzp_classical_plc_total{call_id}`.
- `docs/grafana-dashboard.json` — new panel: "Loss recovery breakdown" stacked bar, DRED vs classical PLC vs clean decode, per call.
- `android/app/src/main/java/com/wzp/debug/DebugReporter.kt` — surfaces `dredReconstructions` and `classicalPlc` counts in the debug report; also logs active DRED duration and whether legacy-FEC mode is engaged.
**Acceptance criteria:**
- Grafana dashboard shows a clear visual distinction between DRED-recovered and classical-PLC-recovered frames across a test fleet of calls.
- Debug report includes the active protection mode ("DRED 200 ms" / "Legacy RaptorQ") and reconstruction counts, so incidents can be classified unambiguously.
### Phase 5 — Escape hatch removal (follow-up, ~2 months post-ship)
After 2 months of stable production with no rollbacks triggered:
- Delete `AUDIO_USE_LEGACY_FEC` handling in `opus_enc.rs` / `call.rs` / `pipeline.rs`
- Delete the Opus-tier paths of `wzp-fec` (the crate stays for Codec2)
- Delete the Android settings toggle and desktop CLI flag
- Remove the `--legacy-fec` path from smoke tests
## Critical files to modify (summary)
- `Cargo.toml` (workspace) — dep swap (audiopus → opusic-c + opusic-sys)
- `crates/wzp-codec/Cargo.toml` — dep swap + `bytemuck` for slice cast
- `crates/wzp-codec/src/opus_enc.rs` — opusic-c rewrite + DRED enable + inband FEC off
- `crates/wzp-codec/src/opus_dec.rs` — opusic-c rewrite
- `crates/wzp-codec/src/dred_ffi.rs`**new file**, safe wrapper over opusic-sys raw DRED FFI
- `crates/wzp-codec/src/lib.rs` — expose `DredReconstructor` trait, `DredState`, `DredError`
- `crates/wzp-codec/src/adaptive.rs` — verify profile switch carries DRED duration
- `crates/wzp-client/src/call.rs` — Opus/Codec2 gate on RaptorQ path, loss floor, wire DredReconstructor into CallDecoder
- `crates/wzp-android/src/pipeline.rs` — same gate, same loss floor, wire DredReconstructor
- `crates/wzp-transport/src/jitter.rs` — lookahead/backfill refactor, DRED ring, reconstruction dispatch
- `crates/wzp-transport/Cargo.toml` — verify it depends only on `wzp-codec`, not directly on opusic-*
- `crates/wzp-proto/src/packet.rs` — new telemetry counters
- `crates/wzp-relay/` — Prometheus metric exposure
- `android/app/src/main/java/com/wzp/debug/DebugReporter.kt` — debug output
- `docs/grafana-dashboard.json` — loss-recovery panel
- (delete) `vendor/audiopus_sys/` on `feat/desktop-audio-rewrite` when merging back
## Existing utilities to reuse
- `wzp_codec::resample::Downsampler48to8` / `Upsampler8to48` — unchanged, only Codec2 path uses them
- `wzp_codec::adaptive::AdaptiveEncoder` / `AdaptiveDecoder` — existing profile-switching machinery, DRED duration changes ride along
- `wzp_codec::silence::SilenceDetector` / `ComfortNoise` — unchanged
- `wzp_codec::agc::AutoGainControl` — unchanged, runs before encode as today
- `wzp_fec::RaptorQFecEncoder` / decoder — unchanged, still used for Codec2 tiers
- `wzp_client::call::QualityAdapter` — unchanged; drives profile switching, which now also reconfigures DRED duration via the existing `set_profile` path
## Verification
End-to-end testing, in order:
1. **Unit**: `cargo test -p wzp-codec` — Opus encode/decode round-trip at every profile, DRED enabled. Verify `version()` reports libopus 1.5.2.
2. **Unit**: `cargo test -p wzp-transport` — jitter buffer lookahead/backfill behavior with injected loss patterns (0%, 5%, 15%, 30%, 50% loss; isolated losses, 40 ms bursts, 200 ms bursts, 500 ms bursts).
3. **Integration**: `crates/wzp-client/src/echo_test.rs` — existing echo test must pass on all Opus profiles with <5% perceived quality regression (measure via the time-window analysis already built into `echo_test.rs`).
4. **Integration**: `crates/wzp-client/src/drift_test.rs` — latency measurement. Must show ~40 ms reduction on Opus profiles versus pre-PRD baseline. Codec2 profiles unchanged.
5. **Manual**: Android release build, real call over bad wifi (or a shaped network via `tc netem` on Linux). Burst losses of 200 ms should be perceptually continuous speech, not robotic gaps.
6. **Manual**: Same call with `AUDIO_USE_LEGACY_FEC=1` — verify behavior reverts to current production behavior. This is the pre-ship rollback rehearsal.
7. **Cross-compile**: full build matrix — Android arm64-v8a + armeabi-v7a (via `scripts/build-and-notify.sh`), macOS universal, Linux x86_64 (via `scripts/build-linux-docker.sh`). Windows cross-compile via cargo-xwin should also pass — libopus 1.5 upstream fixed the clang-cl SIMD issue that required the vendor patch on `feat/desktop-audio-rewrite`.
8. **Telemetry smoke**: deploy to staging relay, make 10 test calls, verify Grafana's new "Loss recovery breakdown" panel shows DRED reconstruction events firing on injected loss and classical-PLC on packet-loss beyond DRED's window.
## Risks and mitigations
- **Custom DRED FFI wrapper is WZP-maintained code with no second source.** opusic-c's decoder-side DRED wrapper is insufficient (see Background), so we carry our own `dred_ffi.rs` that calls `opus_dred_parse` and `opus_decoder_dred_decode` directly via opusic-sys. Bugs in this wrapper — offset arithmetic off-by-ones, lifetime errors on `OpusDRED` buffers, UB from misuse of the C API — could manifest as silent audio corruption on loss bursts, hard to diagnose. **Mitigation**: extensive unit tests in `dred_ffi.rs` using a reference encoder + reference decoder round-trip with known offsets; strict bounds checking on every `unsafe` boundary; Miri run in CI if feasible; the legacy-FEC escape hatch disables the entire DRED code path including our custom wrapper, giving us a single flag to revert any wrapper bug in production. Long-term: upstream the fixes to opusic-c (follow-up task, not blocking).
- **opusic-c's encoder-side API and internal Decoder pointer access**. Step 3a depends on being able to call opusic-sys raw functions that take an `*mut OpusDecoder` pointer while still using opusic-c's `Decoder` for normal decode. If opusic-c doesn't expose the raw pointer cleanly, we fall back to a thin opusic-sys-direct Decoder wrapper inside `dred_ffi.rs` and lose some of opusic-c's convenience. **Mitigation**: verify at the start of Phase 3 (one afternoon of reading opusic-c source). If the clean path doesn't work, the fallback is not difficult — it's what we'd have built anyway if opusic-c didn't exist.
- **DRED reconstruction quality varies by voice / content**. The neural model is trained on speech; edge cases (shouting, whispering, heavy accents, music-on-hold, cough, laughter) may reconstruct less cleanly than continuous speech. **Mitigation**: escape hatch ships from day one. If production telemetry shows perceptible quality regression on specific voice patterns, flip legacy mode for affected users while tuning. Also: classical Opus PLC remains as the third-tier fallback when DRED state is unavailable.
- **Removing RaptorQ removes bit-exact recovery**. Isolated single-packet losses are now reconstructed plausibly instead of bit-exactly. **Mitigation**: as argued in Background, bit-exactness on a single 20 ms speech frame is perceptually meaningless. The assumption is "speech is the workload" — if we ever add non-speech features (music bot, ringtones over the call path, DTMF-over-audio) we revisit.
- **libopus 1.5 DRED API stability**. **Verified at pre-flight**: opus.h in the upstream xiph/opus repository has no "experimental" marker on the DRED API declarations. The earlier characterization was incorrect. DRED shipped as a first-class feature in libopus 1.5.0 (Dec 2023) and has been iterated in 1.5.1 and 1.5.2. Google Meet and Duo ship it at scale. **Mitigation**: pin `opusic-sys` exactly (no `^` range) to ensure reproducible builds, follow upstream 1.5.x bugfixes as they land. No special stability concerns beyond normal dependency hygiene.
- **Jitter buffer refactor is the largest code change**. Jitter bugs are notoriously subtle (off-by-one on sequence wraparound, clock drift interactions, playout starvation corner cases). **Mitigation**: keep the classical-PLC path intact as the DRED fallback, so jitter bugs degrade to "current behavior" rather than "broken audio". Write targeted unit tests for the buffer at each loss-pattern scenario before touching production paths. Consider shipping Phase 3 behind a sub-flag separate from the main escape hatch, so we can independently toggle "DRED enabled but classical jitter buffer" for bisection.
- **Cross-compile surprises**. `opusic-sys` is actively maintained but our exact combination of Android NDK version / Docker builder environment / Windows cross-compile via cargo-xwin has not been tested by upstream. **Mitigation**: Phase 0 includes the full cross-compile matrix as an acceptance criterion. Any blockers surface before we touch loss-recovery behavior.
- **Wire-format compatibility during rollout**. Mixed-version calls (new sender + old receiver, or vice versa) need to keep working. **Verified at pre-flight**: traced both live receive paths (`wzp-client/src/call.rs::CallDecoder::ingest` and `wzp-android/src/engine.rs` the JNI-driven engine path), and both degrade gracefully: new-sender Opus packets with `fec_ratio_encoded=0` / `fec_block=0` / `fec_symbol=0` flow through to the jitter buffer and decode normally on old receivers. The RaptorQ decoder either ignores zero-FEC packets entirely (Android pipeline.rs gates on non-zero fec_block/fec_symbol) or accumulates them harmlessly until the 2-second staleness eviction (desktop call.rs). Old-sender packets with populated RaptorQ fields are handled by new receivers via the unchanged Codec2 path (new receivers keep wzp-fec for Codec2 tiers and simply ignore RaptorQ fields on Opus packets). **No wire format version bump required.**
- **Pre-existing desktop RaptorQ gap** (incidental finding, NOT caused by this PRD). The desktop `wzp-client/src/call.rs::CallDecoder` feeds packets into `fec_dec.add_symbol` but **never calls `fec_dec.try_decode`** — RaptorQ recovery is effectively dead code on the desktop path today. Main decode reads from the jitter buffer directly, falling through to classical Opus PLC on missing packets. The Android `engine.rs` path properly uses `try_decode` for recovery. This PRD does not fix the desktop gap — it's unrelated — but is noted here so nobody is surprised that removing RaptorQ from Opus tiers on the desktop client causes no measurable recovery regression (there was nothing to lose). Recommend filing a follow-up task to either fix or remove the vestigial desktop RaptorQ wiring independently of this work.
- **`AUDIO_USE_LEGACY_FEC` itself becoming permanent tech debt**. Escape hatches have a way of outliving their intended lifespan. **Mitigation**: put an explicit removal date in a `// TODO(2026-06-15): remove legacy FEC path` comment at the flag-handling site. Track in taskmaster.
## Open questions
- ~~**Does opusic-c expose `opusic_c::Decoder`'s raw inner pointer?**~~ **Resolved at pre-flight**: no, it's `pub(crate)`. We build a unified `DecoderHandle` over raw opusic-sys in `dred_ffi.rs` and use it for both normal decode and DRED reconstruction. Opusic-c is used only for the encoder side.
- **Exact opusic-sys symbol name for DRED decoder allocation**. opus.h documents the `OpusDREDDecoder` type and `opus_dred_parse`/`opus_decoder_dred_decode` functions, but the allocation function name is not in the fetched snippet. Expected to be `opus_dred_decoder_create` / `opus_dred_decoder_destroy` per libopus naming convention, but confirm at the very start of Phase 3a by reading the actual opusic-sys bindings. If the function is not exported by opusic-sys, we file a PR upstream to opusic-sys (small fix, trivially mergeable) and temporarily vendor the function declaration locally.
- **Should the 5% loss floor be configurable per profile?** Currently specified as a constant. A future refinement might make it higher at degraded tiers and lower at studio tiers, but without real telemetry we don't know if the constant is wrong. Keep as a constant for now, revisit after 1 month of production data.
- **OSCE enable**: opusic-c has an `osce` feature flag for Opus Speech Coding Enhancement, a separate libopus 1.5 neural post-processor. Out of scope for this PRD but should be the next audio-quality follow-up. Probably one-line enable once opusic-c is in.
- **Upstream PR to opusic-c**: our own `dred_ffi.rs` wrapper should be proven in production first, then the fixes upstreamed to `opusic-c/src/dred.rs` (preserve `dred_end`, fix `dred_offset` double-pass, expose `DredPacket` externally). Follow-up task, not blocking this PRD.
- **`feat/desktop-audio-rewrite` merge**: the vendored `audiopus_sys` patch on that branch becomes obsolete under this PRD. Coordinate removal with whoever owns that branch.

59
docs/PRD-mtu-discovery.md Normal file
View File

@@ -0,0 +1,59 @@
# PRD: QUIC Path MTU Discovery
## Problem
WarzonePhone uses conservative 1200-byte QUIC datagrams. Some network paths support larger MTUs (1400+), wasting bandwidth. Some broken paths (VPNs, tunnels, double-NAT, cellular) have MTU < 1200, causing silent packet drops — this may explain why Opus 64k fails on some paths while 24k works (larger encoded frames + FEC repair packets).
## Solution
Enable Quinn's built-in Path MTU Discovery (PMTUD) and handle edge cases:
1. PMTUD probes larger packet sizes and discovers the actual path MTU
2. Graceful fallback when datagrams exceed discovered MTU
3. Expose MTU in metrics for debugging
## Implementation
### Phase 1: Enable PMTUD in Quinn
`crates/wzp-transport/src/config.rs` — update `transport_config()`:
```rust
// Enable PMTUD (Quinn default is enabled, but we should ensure it)
config.mtu_discovery_config(Some(quinn::MtuDiscoveryConfig::default()));
// Set minimum MTU for safety (some paths can't handle 1200)
// Quinn default min is 1200, which is the QUIC spec minimum
```
Quinn's `MtuDiscoveryConfig` has:
- `interval`: how often to probe (default: 600s)
- `upper_bound`: max MTU to probe (default: 1452 for IPv4)
- `minimum_change`: min MTU increase to be worth probing (default: 20)
### Phase 2: Handle MTU-related Failures
In federation forwarding (`send_raw_datagram`), if the datagram exceeds the connection's current MTU, Quinn returns an error. Handle gracefully:
- Log warning with packet size vs MTU
- Drop the packet (don't crash)
- Track in metrics: `wzp_relay_mtu_exceeded_total`
### Phase 3: Codec-Aware MTU
When the path MTU is small, the relay or client should:
- Prefer lower-bitrate codecs (smaller packets)
- Reduce FEC ratio (fewer repair packets)
- This feeds into the adaptive quality system
### Phase 4: Expose MTU in Stats
- Add `path_mtu` to relay metrics (per peer)
- Add `path_mtu` to client stats (visible in UI)
- Log MTU on connection establishment
## Non-Goals (v1)
- Datagram fragmentation (QUIC datagrams are atomic — either fit or don't)
- Manual MTU override per relay config
- MTU-based codec selection (future, needs adaptive quality)
## Effort: 1 day

146
docs/PRD-p2p-direct.md Normal file
View File

@@ -0,0 +1,146 @@
# PRD: Peer-to-Peer Direct Calls (No Relay)
## Problem
All calls currently route through a relay, even 1-on-1 calls between clients that could reach each other directly. This adds latency (2x hop), creates a single point of failure, and requires trusting the relay operator (even though media is encrypted, the relay sees metadata).
## Solution
For 1-on-1 calls, clients attempt a direct QUIC connection using STUN-discovered addresses. If NAT traversal succeeds, media flows directly between peers. If it fails, fall back to relay-assisted mode (current behavior).
## Architecture
```
Preferred (P2P):
Client A ←──QUIC direct──→ Client B
(no relay in media path, true E2E)
Fallback (Relay):
Client A ──→ Relay ──→ Client B
(current model)
Hybrid discovery:
Client A → Relay (signaling only) → Client B
↓ ↓
STUN server STUN server
↓ ↓
Discover public IP:port Discover public IP:port
↓ ↓
Exchange candidates via relay signaling
↓ ↓
Attempt direct QUIC connection ←──→
```
## Why P2P = True E2E
- QUIC TLS handshake establishes encrypted tunnel directly between A and B
- No third party sees the traffic
- Certificate pinning via identity fingerprints: each client derives their TLS cert from their Ed25519 seed (same as relay identity). During QUIC handshake, both sides verify the peer's cert fingerprint against the known identity
- MITM elimination: if A knows B's fingerprint (from prior call, QR code, or identity server), any interceptor presents a different cert → fingerprint mismatch → connection rejected
- Stronger guarantee than relay-assisted: user doesn't need to trust relay operator
## Requirements
### Phase 1: STUN Discovery
1. **STUN client**: lightweight UDP-based STUN client to discover public IP:port
- Use existing public STUN servers (stun.l.google.com:19302, etc.)
- Or run a STUN server alongside the relay
- Discover: local addresses, server-reflexive addresses (STUN), relay candidates (TURN/relay fallback)
2. **Candidate gathering**: on call initiation, gather all candidates:
- Host candidates: local network interfaces
- Server-reflexive: STUN-discovered public IP:port
- Relay candidate: the relay's address (fallback)
3. **Candidate exchange**: via relay signaling channel (existing `IceCandidate` signal message)
- A sends candidates to relay → relay forwards to B
- B sends candidates to relay → relay forwards to A
### Phase 2: Direct Connection
1. **QUIC hole punching**: both clients simultaneously attempt QUIC connections to each other's candidates
- Quinn supports connecting to multiple addresses
- First successful connection wins
- Timeout after 3 seconds, fall back to relay
2. **Identity verification**: during QUIC handshake, verify peer's TLS cert fingerprint
- `server_config_from_seed()` already exists — derive client cert from identity seed
- Both sides present certs (mutual TLS)
- Verify fingerprint matches expected identity
3. **Media flow**: once connected, use existing `QuinnTransport` for media + signals
- Same `send_media()` / `recv_media()` API
- Same codec pipeline, FEC, jitter buffer
- No code changes needed in the call engine
### Phase 3: Adaptive Quality (P2P)
P2P connections have direct quality visibility — no relay middleman:
1. Both clients observe RTT, loss, jitter directly from QUIC stats
2. Adapt codec quality based on direct observations
3. Since only 2 participants, coordinated switching is simple: propose → ack → switch
This is the simplest case for adaptive quality. Once proven, backport the logic to relay-assisted mode.
### Phase 4: Hybrid Mode
1. **Call initiation**: always connect to relay for signaling
2. **Parallel attempt**: while relay call is active, attempt P2P in background
3. **Seamless migration**: if P2P succeeds, migrate media path from relay to direct
- Both clients switch simultaneously
- Relay connection kept alive for signaling (presence, room updates)
4. **Fallback**: if P2P connection drops, seamlessly fall back to relay
## Security Properties
| Property | Relay Mode | P2P Mode |
|----------|-----------|----------|
| Encryption | ChaCha20-Poly1305 (app layer) | QUIC TLS 1.3 + ChaCha20-Poly1305 |
| Key exchange | Via relay signaling | Direct QUIC handshake |
| Identity verification | TOFU (server fingerprint) | Mutual TLS cert pinning |
| Metadata privacy | Relay sees who talks to whom | No third party sees anything |
| MITM resistance | Depends on relay trust | Strong (cert pinning) |
| Forward secrecy | ECDH ephemeral keys | QUIC built-in + app-layer rekey |
## Implementation Notes
### STUN in Rust
Use `stun-rs` or `webrtc-rs` crate for STUN client. Minimal: just need Binding Request/Response to discover server-reflexive address.
### Quinn Hole Punching
Quinn's `Endpoint` can both listen and connect. For hole punching:
```rust
let endpoint = create_endpoint(bind_addr, Some(server_config))?;
// Send connect to peer's address (opens NAT pinhole)
let conn = connect(&endpoint, peer_addr, "peer", client_config).await?;
// Simultaneously, peer connects to our address
// First successful handshake wins
```
### Client TLS Certificate
Already have `server_config_from_seed()` for relays. Create `client_config_from_seed()` that presents a TLS client certificate derived from the identity seed. The peer verifies this cert's fingerprint.
### Signaling via Relay
The existing relay connection carries `IceCandidate` signals. No new infrastructure needed — just use the relay as a dumb signaling pipe for candidate exchange.
## Non-Goals (v1)
- SFU over P2P (P2P is 1-on-1 only; multi-party uses relay SFU)
- TURN server (relay acts as the fallback, no separate TURN)
- mDNS local discovery (future)
- Mesh P2P for multi-party (future, complex)
## Milestones
| Phase | Scope | Effort |
|-------|-------|--------|
| 1 | STUN client + candidate gathering | 2 days |
| 2 | QUIC hole punching + identity verification | 3 days |
| 3 | Adaptive quality on P2P connection | 2 days |
| 4 | Hybrid mode (relay + P2P, seamless migration) | 3 days |

View File

@@ -0,0 +1,178 @@
# PRD: Protocol Analyzer & Debug Tap
## 1. Relay-Side Metadata Tap (`--debug-tap`)
### Problem
When debugging federation, codec issues, or packet flow problems, there's no visibility into what's actually flowing through the relay. You have to guess from client-side logs.
### Solution
A `--debug-tap <room>` flag on the relay that logs every packet's **header metadata** for a specific room (or all rooms with `--debug-tap *`). No decryption needed — the MediaHeader is not encrypted, only the audio payload is.
### Output Format
```
[12:00:00.123] TAP room=test dir=in src=192.168.1.5:54321 seq=1234 codec=Opus24k ts=24000 fec_block=5 fec_sym=2 repair=false len=87
[12:00:00.123] TAP room=test dir=out dst=192.168.1.6:54322 seq=1234 codec=Opus24k ts=24000 fec_block=5 fec_sym=2 repair=false len=87 fan_out=2
[12:00:00.143] TAP room=test dir=in src=192.168.1.5:54321 seq=1235 codec=Opus24k ts=24960 fec_block=5 fec_sym=3 repair=false len=91
[12:00:00.500] TAP room=test dir=in src=192.168.1.6:54322 seq=0042 codec=Codec2_1200 ts=40000 fec_block=1 fec_sym=0 repair=false len=6
[12:00:01.000] TAP room=test SIGNAL type=RoomUpdate count=3 participants=[Alice,Bob,Charlie]
[12:00:05.000] TAP room=test STATS period=5s in_pkts=250 out_pkts=500 fan_out_avg=2.0 loss_detected=0 codecs_seen=[Opus24k,Codec2_1200]
```
### What it shows
- **Per-packet**: direction, source/dest, sequence number, codec ID, timestamp, FEC block/symbol, repair flag, payload size
- **Signals**: RoomUpdate, FederationRoomJoin/Leave, handshake events
- **Periodic stats**: packets in/out, average fan-out, codecs seen, detected sequence gaps (loss)
- **Federation**: room-hash tagged datagrams with source/dest relay
### Implementation
**File:** `crates/wzp-relay/src/room.rs` — in `run_participant_plain()` and `run_participant_trunked()`
After receiving a packet and before forwarding:
```rust
if debug_tap_enabled {
let h = &pkt.header;
info!(
room = %room_name,
dir = "in",
src = %addr,
seq = h.seq,
codec = ?h.codec_id,
ts = h.timestamp,
fec_block = h.fec_block,
fec_sym = h.fec_symbol,
repair = h.is_repair,
len = pkt.payload.len(),
"TAP"
);
}
```
**Activation:** `--debug-tap <room_name>` CLI flag, or `debug_tap = "test"` / `debug_tap = "*"` in TOML config.
**Performance:** Only active when enabled. When enabled, adds one `info!()` log per packet per direction. At 50 fps × 5 participants = 500 log lines/sec — acceptable for debugging, not for production.
**Output options:**
- Default: tracing log (stderr)
- `--debug-tap-file <path>`: write to a dedicated file (JSONL format for machine parsing)
### Effort: 0.5 day
---
## 2. Full Protocol Analyzer (Standalone Tool)
### Problem
The metadata tap shows packet flow but can't inspect audio content, verify encryption, or measure audio quality. For deep debugging (codec issues, resampling bugs, encryption mismatches), you need to see the actual decrypted audio.
### Solution
A standalone `wzp-analyzer` binary that either:
- **A)** Acts as a transparent proxy between client and relay (MITM mode)
- **B)** Reads a pcap/capture file with QUIC session keys (passive mode)
- **C)** Runs as a special "observer" client that joins a room in listen-only mode with all participants' consent
### Architecture
**Option C (recommended — simplest, no MITM):**
```
┌──────────────┐
Client A ────────►│ Relay │◄──────── Client B
│ │
│ (SFU) │◄──────── wzp-analyzer
└──────────────┘ (observer mode)
┌──────────────────┐
│ Decode + Analyze │
│ - Packet timing │
│ - Codec decode │
│ - Audio quality │
│ - Jitter stats │
│ - Waveform plot │
└──────────────────┘
```
The analyzer joins the room as a regular participant (receives all media via SFU forwarding) but doesn't send audio. It decodes everything it receives and produces analysis.
**Limitation:** End-to-end encrypted payloads can't be decoded without session keys. The analyzer would either:
1. Need the session key (shared out-of-band for debugging)
2. Or only analyze unencrypted headers + timing (same as the relay tap, but from client perspective with jitter buffer simulation)
For now, since encryption is not fully enforced in the current codebase (the crypto session is established but the actual ChaCha20 encryption of payloads is TODO in some paths), the analyzer can decode raw Opus/Codec2 payloads directly.
### Features
**Real-time display (TUI):**
```
┌─ wzp-analyzer: room "podcast" on 193.180.213.68:4433 ─────────────┐
│ │
│ Participants: Alice (Opus24k), Bob (Codec2_3200) │
│ │
│ Alice ──────────────────────────────────────── │
│ seq: 5234 codec: Opus24k ts: 125760 loss: 0.2% jitter: 3ms │
│ RMS: 4521 peak: 15280 silence: no │
│ FEC blocks: 1046/1046 complete (0 recovered) │
│ ▁▂▃▅▇█▇▅▃▂▁▁▂▃▅▇█▇▅▃▂▁ (waveform last 1s) │
│ │
│ Bob ────────────────────────────────────── │
│ seq: 2617 codec: Codec2_3200 ts: 62800 loss: 1.5% jitter: 8ms│
│ RMS: 1250 peak: 6800 silence: no │
│ FEC blocks: 523/525 complete (4 recovered) │
│ ▁▁▂▃▅▇▅▃▂▁▁▁▂▃▅▇▅▃▂▁▁ (waveform last 1s) │
│ │
│ Total: 7851 pkts recv, 0 pkts sent, 2 participants │
│ Uptime: 2m 35s │
└──────────────────────────────────────────────────────────────────────┘
```
**Recorded analysis:**
- Save all received packets to a capture file
- Post-session report: per-participant stats, quality timeline, codec switches, packet loss patterns
- Export decoded audio as WAV per participant (if decryptable)
**Quality metrics per participant:**
- Packet loss % (from sequence gaps)
- Jitter (inter-arrival time variance)
- Codec switches (timestamps + reasons)
- RMS audio level over time
- Silence detection
- FEC recovery rate
- Round-trip estimates (from Ping/Pong if available)
### Implementation
**Binary:** `wzp-analyzer` (new crate or subcommand of `wzp-client`)
```
wzp-analyzer 193.180.213.68:4433 --room podcast
wzp-analyzer 193.180.213.68:4433 --room podcast --record capture.wzp
wzp-analyzer --replay capture.wzp --report report.html
```
**Dependencies:**
- Existing: `wzp-transport`, `wzp-proto`, `wzp-codec`, `wzp-crypto`
- New: `ratatui` for TUI display (optional)
### Phases
| Phase | Scope | Effort |
|-------|-------|--------|
| 1 | Header-only analysis: join room, log packet metadata, show per-participant stats (TUI) | 2 days |
| 2 | Audio decode: decode Opus/Codec2 payloads (unencrypted path), show waveform + RMS | 1-2 days |
| 3 | Capture/replay: save packets to file, replay offline with full analysis | 1 day |
| 4 | HTML report: post-session quality report with charts | 2 days |
| 5 | Encrypted payload support: accept session keys, decrypt ChaCha20 | 1 day |
### Non-Goals (v1)
- Active probing (sending test patterns)
- Modifying packets in transit
- Automated quality scoring (MOS estimation)
- Video support

View File

@@ -0,0 +1,170 @@
# PRD: Relay Federation (Multi-Relay Mesh)
## Problem
Currently all participants in a call must connect to the same relay. This creates:
- **Single point of failure** — if the relay goes down, the entire call drops
- **Geographic latency** — users far from the relay get high RTT
- **Capacity limits** — one relay handles all traffic
Users should be able to connect to their nearest/preferred relay and still talk to users on other relays, as long as the relays are federated.
## Prerequisite: Fix Relay Identity Persistence
### Bug: TLS certificate regenerates on every restart
**Root cause:** `wzp-transport/src/config.rs:17` calls `rcgen::generate_simple_self_signed()` which creates a new keypair every time. The relay's Ed25519 identity seed IS persisted to `~/.wzp/relay-identity`, but the TLS certificate is not derived from it.
**Impact:** Clients see a different server fingerprint after every relay restart, triggering the "Server Key Changed" warning. This also breaks federation since relays identify each other by certificate fingerprint.
**Fix:** Derive the TLS certificate from the persisted relay seed:
1. Add `server_config_from_seed(seed: &[u8; 32])` to `wzp-transport`
2. Use the seed to create a deterministic keypair (e.g., derive an ECDSA key via HKDF from the Ed25519 seed)
3. Generate a self-signed cert with that keypair — same seed = same cert = same fingerprint
4. The relay passes its loaded seed to `server_config_from_seed()` instead of `server_config()`
**Effort:** 0.5 day
## Federation Design
### Core Concept
Two or more relays form a **federation mesh**. Each relay is an independent SFU. When relays are configured to trust each other, they bridge rooms with matching names — participants on relay A in room "podcast" hear participants on relay B in room "podcast" as if everyone were on the same relay.
### Configuration
Each relay reads a YAML config file (e.g., `~/.wzp/relay.yaml` or `--config relay.yaml`):
```yaml
# Relay identity (auto-generated if missing)
listen: 0.0.0.0:4433
# Federation peers — other relays we trust and bridge rooms with
# Both sides must configure each other for federation to work
peers:
- url: "193.180.213.68:4433"
fingerprint: "a5d6:e3c6:5ae7:185c:4eb1:af89:daed:4a43"
label: "Pangolin EU"
- url: "10.0.0.5:4433"
fingerprint: "7f2a:b391:0c44:..."
label: "Office LAN"
```
**Key rules:**
- Both relays must configure each other — **mutual trust** required
- A relay that receives a connection from an unknown peer logs: `"Relay a5d6:e3c6:... (193.180.213.68) wants to federate. To accept, add to peers config: url: 193.180.213.68:4433, fingerprint: a5d6:e3c6:..."`
- Fingerprints are verified via the TLS certificate (requires the identity fix above)
### Protocol
#### Peer Connection
1. On startup, each relay attempts QUIC connections to all configured peers
2. The connection uses SNI `"_federation"` (reserved room name prefix) to distinguish from client connections
3. After QUIC handshake, verify the peer's certificate fingerprint matches the configured fingerprint
4. If fingerprint mismatch → reject, log warning
5. If peer connects but isn't in our config → log the helpful "add to config" message, reject
#### Room Bridging
Once two relays are connected:
1. **Room discovery**: When a local participant joins room "T", the relay sends a `FederationRoomJoin { room: "T" }` signal to all connected peers
2. **Room leave**: When the last local participant leaves room "T", send `FederationRoomLeave { room: "T" }`
3. **Media forwarding**: For each room that exists on both relays:
- Relay A forwards all media packets from its local participants to relay B
- Relay B forwards all media packets from its local participants to relay A
- Each relay then fans out received federated media to its local participants (same as local SFU forwarding)
4. **Participant presence**: `RoomUpdate` signals are merged — local participants + federated participants from all peers
```
Relay A (2 local users) Relay B (1 local user)
┌─────────────────────┐ ┌─────────────────────┐
│ Room "T" │ │ Room "T" │
│ Alice (local) ────┼──media──►│ Charlie (local) │
│ Bob (local) ────┼──media──►│ │
│ │◄──media──┼── Charlie │
│ Charlie (federated)│ │ Alice (federated) │
│ │ │ Bob (federated) │
└─────────────────────┘ └─────────────────────┘
```
#### Signal Messages (new)
```rust
enum FederationSignal {
/// A room exists on this relay with active participants
RoomJoin { room: String, participants: Vec<ParticipantInfo> },
/// Room is empty on this relay
RoomLeave { room: String },
/// Participant update for a federated room
ParticipantUpdate { room: String, participants: Vec<ParticipantInfo> },
}
```
#### Media Forwarding
Federated media is forwarded as raw QUIC datagrams — the relay doesn't decode/re-encode. Each packet is prefixed with a room identifier so the receiving relay knows which room to fan it out to:
```
[room_hash: 8 bytes][original_media_packet]
```
The 8-byte room hash is computed once when the federation room bridge is established.
### What Relays DON'T Do
- **No transcoding** — media passes through as-is. If Alice sends Opus 64k, Charlie receives Opus 64k
- **No re-encryption** — packets are already encrypted end-to-end between participants. Relays just forward opaque bytes
- **No central coordinator** — each relay independently connects to its configured peers. No master/slave, no consensus protocol
- **No automatic peer discovery** — peers must be explicitly configured in YAML
### Failure Handling
- If a peer relay goes down, the federation link drops. Local rooms continue to work. Federated participants disappear from presence.
- Reconnection: attempt every 30 seconds with exponential backoff up to 5 minutes
- If a peer relay restarts with a new identity (bug not fixed), the fingerprint check fails and federation is rejected with a clear error log
## Implementation Plan
### Phase 0: Fix Relay Identity (prerequisite)
- Derive TLS cert from persisted seed
- Same seed → same cert → same fingerprint across restarts
### Phase 1: YAML Config + Peer Connection
- Add `--config relay.yaml` CLI flag
- Parse peers config
- On startup, connect to all configured peers via QUIC
- Verify certificate fingerprints
- Log helpful message for unconfigured peers
- Reconnect on disconnect
### Phase 2: Room Bridging
- Track which rooms exist on each peer
- Forward media for shared rooms
- Merge participant presence across peers
- Handle room join/leave signals
### Phase 3: Resilience
- Graceful handling of peer disconnect/reconnect
- Don't duplicate packets if a participant is reachable via multiple paths
- Rate limiting on federation links (prevent amplification)
- Metrics: federated rooms, packets forwarded, peer latency
## Effort Estimates
| Phase | Scope | Effort |
|-------|-------|--------|
| 0 | Fix relay TLS identity from seed | 0.5 day |
| 1 | YAML config + peer QUIC connections | 2 days |
| 2 | Room bridging + media forwarding + presence merge | 3-4 days |
| 3 | Resilience + metrics | 2 days |
## Non-Goals (v1)
- Automatic peer discovery (mDNS, DHT, etc.)
- Cascading federation (relay A ↔ B ↔ C where A doesn't know C)
- Load balancing across relays
- Encryption between relays (QUIC provides transport encryption; e2e encryption between participants is orthogonal)
- Different rooms on different relays (all federated rooms are bridged by name)

508
docs/USER_GUIDE.md Normal file
View File

@@ -0,0 +1,508 @@
# WarzonePhone User Guide
This guide covers all WarzonePhone client applications: Desktop (Tauri), Android, CLI, and Web.
## Desktop Client (Tauri)
The desktop client is a Tauri application with a native Rust audio engine and a web-based UI. It runs on macOS, Windows, and Linux.
### Connect Screen
When you launch the desktop client, you see the connect screen with:
- **Relay selector** -- click the relay button to open the Manage Relays dialog. Shows relay name, address, connection status (verified/new/changed/offline), and RTT latency
- **Room** -- enter a room name. Clients in the same room hear each other. Room names are hashed before being sent to the relay for privacy
- **Alias** -- your display name shown to other participants
- **OS Echo Cancel** -- checkbox to enable macOS VoiceProcessingIO (Apple's FaceTime-grade AEC). Strongly recommended when using speakers
- **Connect button** -- connects to the selected relay and joins the room
- **Identity info** -- your identicon and fingerprint are shown at the bottom. Click to copy
Recent rooms are displayed below the form for quick reconnection. Click any recent room to select it and its associated relay.
### In-Call Screen
Once connected, the in-call screen shows:
- **Room name** and **call timer** at the top
- **Status indicator** -- green when connected, yellow when reconnecting
- **Audio level meter** -- real-time visualization of outgoing audio
- **Participant list** -- identicon, alias, and fingerprint for each participant. Your own entry is highlighted with a badge
- **Controls** -- Mic toggle, Hang Up, Speaker toggle
- **Stats bar** -- TX and RX frame rates
### Settings Panel
Open with the gear icon or **Cmd+,** (Ctrl+, on Windows/Linux). Contains:
#### Connection
- **Default Room** -- room name used on next connect
- **Alias** -- display name
#### Audio
- **Quality slider** -- 5 levels:
| Position | Profile | Description |
|----------|---------|-------------|
| 0 | Auto | Adaptive quality based on network conditions |
| 1 | Opus 24k | Good conditions (28.8 kbps with FEC) |
| 2 | Opus 6k | Degraded conditions (9.0 kbps with FEC) |
| 3 | Codec2 3.2k | Poor conditions (4.8 kbps with FEC) |
| 4 | Codec2 1.2k | Catastrophic conditions (2.4 kbps with FEC) |
- **OS Echo Cancellation** -- macOS VoiceProcessingIO toggle
- **Automatic Gain Control** -- normalize mic volume
#### Identity
- **Fingerprint** -- your public identity fingerprint
- **Identity file** -- stored at `~/.wzp/identity`
#### Recent Rooms
- History of recently joined rooms with relay association
- Clear History button
### Manage Relays Dialog
Open by clicking the relay selector button on the connect screen:
- **Relay list** -- each entry shows name, address, identicon (from server fingerprint), lock status, and RTT
- **Select** -- click a relay to make it the default
- **Remove** -- click the X button to delete a relay
- **Add Relay** -- enter name and host:port to add a new relay
- **Ping** -- relays are automatically pinged when the dialog opens. RTT and server fingerprint are updated
### Key Change Warning Dialog
If a relay's TLS fingerprint has changed since your last connection, a warning dialog appears:
- Shows the previously known fingerprint and the new fingerprint
- **Accept New Key** -- trust the new fingerprint and proceed
- **Cancel** -- abort the connection
This is the TOFU (Trust on First Use) model. Fingerprint changes typically mean the relay was restarted with a new identity. However, they could also indicate a man-in-the-middle attack.
### Keyboard Shortcuts
| Shortcut | Action | Context |
|----------|--------|---------|
| **m** | Toggle microphone | In-call |
| **s** | Toggle speaker | In-call |
| **q** | Hang up | In-call |
| **Cmd+,** (Ctrl+,) | Open/close settings | Any |
| **Escape** | Close dialog/settings | Any |
| **Enter** | Connect | Connect screen (when room/alias field is focused) |
### Audio Engine
The desktop audio engine uses:
- **CPAL** for audio I/O (CoreAudio on macOS, WASAPI on Windows, ALSA on Linux)
- **VoiceProcessingIO** on macOS for OS-level echo cancellation (opt-in via checkbox)
- **Lock-free SPSC ring buffers** between audio threads and network threads
- **Direct playout** -- no jitter buffer on the client (the relay buffers instead)
- Audio callbacks deliver 512 f32 samples at 48 kHz on macOS (accumulated to 960-sample frames for codec)
#### Audio Quality Notes
- Always use **Release builds** for real-time audio. Debug builds are too slow for wzp-codec, nnnoiseless, audiopus, and raptorq
- VoiceProcessingIO is strongly recommended on macOS. Software AEC does not work well with the round-trip latency (~35-45ms)
- The quality slider only affects the **encode** side. Decoding always accepts all codecs
### Auto-Reconnect
If the connection drops, the client automatically attempts to reconnect with exponential backoff (1s, 2s, 4s, 8s, capped at 10s). After 5 failed attempts, the client returns to the connect screen. The status dot shows yellow during reconnection.
## Android Client
The Android client is built with Kotlin and Jetpack Compose, using JNI to call the Rust audio engine.
### Call Screen
The main call screen shows:
- **Server selector** -- tap to choose from configured servers
- **Room name** -- enter the room to join
- **Connect/Disconnect** button
- **Participant list** with identicons and aliases
- **Audio level visualization**
- **Mute/Unmute** button
### Settings Screen
The settings screen is organized into sections:
#### Identity
- **Display Name** -- your alias shown to other participants
- **Fingerprint** -- displayed with an identicon. Tap to copy
- **Copy Key** -- copy the 64-character hex seed to clipboard for backup
- **Restore Key** -- paste a previously backed-up hex seed to restore your identity
#### Audio Defaults
- **Voice Volume** -- playout gain slider (-20 dB to +20 dB)
- **Mic Gain** -- capture gain slider (-20 dB to +20 dB)
- **Echo Cancellation (AEC)** -- toggle Android's built-in AEC. Disable if audio sounds distorted
- **Quality slider** -- 8 levels from best to lowest:
| Position | Profile | Bitrate | Color |
|----------|---------|---------|-------|
| 0 | Studio 64k | 70.4 kbps | Green |
| 1 | Studio 48k | 52.8 kbps | Green |
| 2 | Studio 32k | 35.2 kbps | Green |
| 3 | Auto | Adaptive | Yellow-green |
| 4 | Opus 24k | 28.8 kbps | Yellow-green |
| 5 | Opus 6k | 9.0 kbps | Yellow |
| 6 | Codec2 3.2k | 4.8 kbps | Orange |
| 7 | Codec2 1.2k | 2.4 kbps | Red |
Note: "Decode always accepts all codecs" -- the quality setting only affects encoding.
#### Servers
- **Server chips** -- tap to select, X to remove (built-in servers cannot be removed)
- **Add Server** -- enter host, port (default 4433), and optional label
- **Force Ping** -- servers are pinged on dialog open to measure RTT
#### Network
- **Prefer IPv6** -- toggle to prefer IPv6 connections when available
#### Room
- **Default Room** -- the room name pre-filled on the call screen
### Identity Backup and Restore
Your identity is a 32-byte seed stored as a 64-character hex string. To back up:
1. Go to Settings > Identity
2. Tap **Copy Key**
3. Store the hex string securely
To restore on a new device:
1. Go to Settings > Identity
2. Tap **Restore Key**
3. Paste the 64-character hex string
4. Tap **Restore** (key is staged)
5. Tap **Save** to apply
The same seed produces the same fingerprint on any device or platform.
## CLI Client (wzp-client)
The CLI client is a command-line tool for testing, recording, and live audio.
### Usage
```
wzp-client [options] [relay-addr]
```
Default relay address: `127.0.0.1:4433`
### Flags Reference
| Flag | Description |
|------|-------------|
| `--live` | Live mic/speaker mode. Requires `--features audio` at build time |
| `--send-tone <secs>` | Send a 440 Hz test tone for N seconds |
| `--send-file <file>` | Send a raw PCM file (48 kHz mono s16le) |
| `--record <file.raw>` | Record received audio to raw PCM file |
| `--echo-test <secs>` | Run automated echo quality test for N seconds. Produces a windowed analysis with loss%, SNR, correlation |
| `--drift-test <secs>` | Run automated clock-drift measurement for N seconds |
| `--sweep` | Run jitter buffer parameter sweep (local, no network). Tests different buffer configurations |
| `--seed <hex>` | Identity seed as 64 hex characters. Compatible with featherChat |
| `--mnemonic <words...>` | Identity seed as BIP39 mnemonic (24 words). All remaining non-flag words are consumed |
| `--room <name>` | Room name. Hashed before sending for privacy |
| `--token <token>` | featherChat bearer token for relay authentication |
| `--metrics-file <path>` | Write JSONL telemetry to file (1 line/sec) |
| `--help`, `-h` | Print help and exit |
### Common Usage Patterns
#### Connectivity Test (Silence)
```bash
# Send 250 silence frames (5 seconds) and exit
wzp-client 127.0.0.1:4433
```
#### Live Audio Call
```bash
# Terminal 1
wzp-relay
# Terminal 2: Alice
wzp-client --live --room myroom 127.0.0.1:4433
# Terminal 3: Bob
wzp-client --live --room myroom 127.0.0.1:4433
```
Both capture from mic and play received audio. Press Ctrl+C to stop.
#### Send Test Tone and Record
```bash
# Terminal 1
wzp-relay
# Terminal 2: Send 10 seconds of 440 Hz tone
wzp-client --send-tone 10 127.0.0.1:4433
# Terminal 3: Record what is received
wzp-client --record call.raw 127.0.0.1:4433
```
Play the recording:
```bash
ffplay -f s16le -ar 48000 -ac 1 call.raw
```
#### Send Audio File
```bash
# Convert to raw PCM first
ffmpeg -i song.mp3 -f s16le -ar 48000 -ac 1 song.raw
# Send through relay
wzp-client --send-file song.raw 127.0.0.1:4433
```
#### Echo Quality Test
```bash
wzp-relay &
wzp-client --echo-test 30 127.0.0.1:4433
```
Produces a windowed analysis showing loss percentage, SNR, correlation, and quality degradation trends.
#### Clock Drift Test
```bash
wzp-relay &
wzp-client --drift-test 60 127.0.0.1:4433
```
Measures clock drift between the send and receive paths over the specified duration.
#### Jitter Buffer Sweep
```bash
# Runs locally, no network needed
wzp-client --sweep
```
Tests different jitter buffer configurations and prints results.
#### With Identity and Auth
```bash
# Using hex seed
wzp-client --seed 0123456789abcdef...64chars --room secure-room --token my-bearer-token relay.example.com:4433
# Using BIP39 mnemonic
wzp-client --mnemonic abandon abandon abandon ... zoo --room secure-room relay.example.com:4433
```
#### With JSONL Telemetry
```bash
wzp-client --live --metrics-file /tmp/call.jsonl relay.example.com:4433
```
Writes one JSON object per second:
```json
{
"ts": "2026-04-07T12:00:00Z",
"buffer_depth": 45,
"underruns": 0,
"overruns": 0,
"loss_pct": 1.2,
"rtt_ms": 34,
"jitter_ms": 8,
"frames_sent": 50,
"frames_received": 49,
"quality_profile": "GOOD"
}
```
### Audio File Format
All raw PCM files use:
| Property | Value |
|----------|-------|
| Sample rate | 48 kHz |
| Channels | 1 (mono) |
| Sample format | signed 16-bit little-endian (s16le) |
Conversion commands:
```bash
# WAV to raw PCM
ffmpeg -i input.wav -f s16le -ar 48000 -ac 1 output.raw
# MP3 to raw PCM
ffmpeg -i input.mp3 -f s16le -ar 48000 -ac 1 output.raw
# Raw PCM to WAV
ffmpeg -f s16le -ar 48000 -ac 1 -i input.raw output.wav
# Play raw PCM
ffplay -f s16le -ar 48000 -ac 1 file.raw
```
## Web Client (Browser)
The web client runs in a browser via the wzp-web bridge server.
### Setup
```bash
# Start relay
wzp-relay
# Start web bridge
wzp-web --port 8080 --relay 127.0.0.1:4433
# For remote access (requires TLS for mic)
wzp-web --port 8443 --relay 127.0.0.1:4433 --tls
```
Open `http://localhost:8080/room-name` (or `https://...` with TLS).
### Features
- **Open mic** (default) and **push-to-talk** modes
- PTT via on-screen button, mouse hold, or spacebar
- Audio level meter
- Auto-reconnection on disconnect
### Audio Processing
The web client uses AudioWorklet (preferred) with a ScriptProcessorNode fallback:
- **Capture**: Accumulates Float32 samples into 960-sample (20ms) Int16 frames
- **Playback**: Ring buffer capped at 200ms (9600 samples at 48 kHz)
## Identity System
### Overview
Your identity is a 32-byte cryptographic seed that derives:
- **Ed25519 signing key** -- authenticates handshake messages
- **X25519 key agreement key** -- derives shared session encryption keys
- **Fingerprint** -- SHA-256 of the public key, truncated to 16 bytes, displayed as `xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx`
- **Identicon** -- deterministic visual avatar generated from the fingerprint
### Seed Sources
| Source | Description |
|--------|-------------|
| Auto-generated | Created on first run, stored in `~/.wzp/identity` (desktop/CLI) or app storage (Android) |
| `--seed <hex>` | 64-character hex string (CLI) |
| `--mnemonic <words>` | 24-word BIP39 mnemonic (CLI) |
| Copy Key / Restore Key | Hex backup/restore (Android settings) |
### BIP39 Mnemonic Backup
The 32-byte seed can be represented as a 24-word BIP39 mnemonic for human-readable backup. The same mnemonic produces the same identity on any platform or device.
### featherChat Compatibility
The identity derivation uses the same HKDF scheme as featherChat (Warzone messenger). The same seed produces the same fingerprint in both systems, allowing a unified identity across messaging and calling.
### Trust on First Use (TOFU)
Clients remember the fingerprints of relays and peers they connect to. On subsequent connections, if a fingerprint changes, the client warns the user. This protects against man-in-the-middle attacks but requires manual verification on first contact.
## Quality Profiles Explained
### When to Use Each Profile
| Profile | Total Bandwidth | Best For | Trade-offs |
|---------|----------------|----------|------------|
| **Studio 64k** | 70.4 kbps | LAN calls, music, podcasting | Highest quality, needs good network |
| **Studio 48k** | 52.8 kbps | Good WiFi, wired connections | Near-studio quality |
| **Studio 32k** | 35.2 kbps | Reliable WiFi, LTE | Very good quality with lower bandwidth |
| **Auto** | Adaptive | Most users | Automatically switches based on network conditions |
| **Opus 24k** | 28.8 kbps | General use, moderate networks | Good speech quality, reasonable bandwidth |
| **Opus 6k** | 9.0 kbps | 3G networks, congested WiFi | Intelligible speech, some artifacts |
| **Codec2 3.2k** | 4.8 kbps | Poor connections | Robotic but intelligible, narrowband |
| **Codec2 1.2k** | 2.4 kbps | Satellite links, extreme loss | Minimal intelligibility, last resort |
### Auto Mode
Auto mode starts at the **Good (Opus 24k)** profile and adapts based on observed network quality:
- **Downgrade** -- 3 consecutive bad quality reports (2 on cellular) trigger a step down
- **Upgrade** -- 10 consecutive good quality reports trigger a step up (one tier at a time)
- **Network handoff** -- switching from WiFi to cellular triggers a preemptive one-tier downgrade plus a 10-second FEC boost
Auto mode uses three tiers (Good, Degraded, Catastrophic). It does not use the Studio profiles, which must be selected manually.
### Manual Override
When you select a specific profile (not Auto), adaptive switching is disabled. The encoder stays at the selected profile regardless of network conditions. This is useful when you know your network quality and want consistent encoding, or when you want to force a specific bitrate.
Note: The decoder always accepts all codecs. A manual quality selection only affects what you send, not what you receive.
## Direct 1:1 Calling (Desktop + Android)
In addition to room-mode group calls, you can place direct calls to a specific peer by fingerprint. Direct calls bypass room state entirely — the relay is used purely as a signaling gateway and for media relay. There is no need for the callee to join a room beforehand; they just need to be registered with the same signal hub.
### UI elements in the direct-call panel
- **Place call field** — paste a fingerprint (the long hex string you see under your own identity) and click Call. The callee sees a ringing UI.
- **Recent contacts row** — a horizontal strip of chips showing your most recently called/receiving peers. Click a chip to re-dial. Aliases are shown if the peer has one, otherwise a short fingerprint prefix.
- **Call history list** — every direct call you've placed, received, or missed, with direction indicator (↗ Outgoing, ↙ Incoming, ✗ Missed), the peer's alias (if known) or fingerprint prefix, and a timestamp. Click an entry to re-dial.
- **Deregister button** — drops your signal-hub registration without quitting the app. Useful when switching identities (e.g. testing with two accounts on one machine) or when you want to explicitly appear offline to peers.
- **Clear history button** — wipes the call history store. Does not affect current calls.
### Live updates
The call history updates in real time across all views via Tauri events (`history-changed`). Placing, answering, or missing a call immediately refreshes the history list and the recent contacts row — no manual refresh needed.
### Default room
On first launch, the room name in the room-mode panel defaults to `general` (changed from the prior `android` default so the desktop and Android clients don't silently talk past each other). You can still change it to any room name, and the last-used room is remembered across launches.
### Random alias
New installations derive a human-friendly alias from your identity seed — something like `silent-forest-41` or `bold-river-07`. It's deterministic, so reinstalling without changing your seed gives you the same alias. The alias is shown alongside your fingerprint in the header and is what peers see in their call history when they receive your call.
You can override the alias in Settings → Identity if you want a specific name.
## Windows AEC Variants
The Windows desktop build ships in two variants for echo cancellation, depending on which backend you want to exercise. Both are `wzp-desktop.exe` binaries — only the internal audio backend differs.
| Build | File | Capture backend | AEC | When to use |
|---|---|---|---|---|
| **noAEC baseline** | `wzp-desktop-noAEC.exe` | CPAL (WASAPI shared mode) | None | Headphone-only use, or for A/B comparison against the AEC build |
| **Communications AEC** | `wzp-desktop.exe` | Direct WASAPI with `AudioCategory_Communications` | **Yes** — Windows routes the capture stream through the driver's communications APO chain (AEC + noise suppression + automatic gain control) | Any speaker-mode call, laptop built-in speakers, anywhere echo is audible |
**Quality caveat**: the communications AEC operates at the OS level and its algorithm depends on the audio driver's installed APO chain. On modern consumer laptops with Intel Smart Sound, Dolby, recent Realtek, or Windows 11 Voice Clarity, the quality is excellent (effectively matching what Teams/Zoom deliver). On generic class-compliant USB microphones or older drivers, the communications APO may not be present at all — in that case the build behaves identically to the noAEC baseline.
If you hear echo on the AEC build, try these in order before escalating:
1. **Check which capture device is selected as "Default Device - Communications"** in Windows Sound Settings → Recording tab. Right-click any device to set it. The AEC build opens the device marked as `eCommunications`, not `eConsole`, so changing the default-communications device changes what we capture from.
2. **Verify the driver exposes a communications APO**. Sound Settings → Recording → your mic → Properties → Advanced → look for an "Enhancements" or "Signal Enhancements" tab. If it's absent, the driver has no APOs and the AEC build effectively has no AEC.
3. **Try the classic Voice Capture DSP build** when it ships (tracked as task #26). That uses Microsoft's bundled software AEC (`CLSID_CWMAudioAEC`) which works on every Windows machine regardless of driver.
### Installing the Windows builds
1. Windows 10: install the [WebView2 Runtime Evergreen Bootstrapper](https://developer.microsoft.com/en-us/microsoft-edge/webview2/) first. Windows 11 has it pre-installed.
2. Copy `wzp-desktop.exe` (or `wzp-desktop-noAEC.exe`) to any directory and double-click. No installer needed.
3. First launch creates the config + identity store at `%APPDATA%\com.wzp.phone\`.

View File

@@ -0,0 +1,431 @@
# Incident report — Tauri Android `__init_tcb+4` SIGSEGV
**Status:** Blocked. Reproducible crash with a known trigger at the cc::Build /
rustc-link-lib layer that we cannot yet explain. Writing this report to hand
off for external help.
**Project:** WarzonePhone (Rust + Tauri 2.x Mobile) Android rewrite
**Branch:** `feat/desktop-audio-rewrite`
**Target phone:** Pixel 6 (`oriole`), Android 16 (`BP3A.250905.014`), arm64-v8a
**Date range of investigation:** 2026-04-09 (one working session, ~27 builds)
---
## One-paragraph summary
We're porting the existing CPAL-backed desktop Tauri app (`desktop/src-tauri`)
to Tauri Mobile Android so the same Rust + Tauri + WebView codebase runs on
both platforms. The Android `.apk` launches, renders the home screen, and
registers on a relay for signal-only builds (no audio backend). The moment
we add **any** `cc::Build::new().cpp(true).cpp_link_stdlib("c++_shared")`
call to `build.rs` — even with a 6-line cpp file that just returns 42 and is
never called from Rust — the built `.so` crashes at launch inside
`__init_tcb(bionic_tcb*, pthread_internal_t*)+4` via `pthread_create` via
`std::thread::spawn` via `tao::ndk_glue::create` via
`Java_com_wzp_desktop_WryActivity_create`, before our Rust entry point has
a chance to run. The exact same NDK, exact same Rust toolchain, exact same
Docker image is used by the legacy `wzp-android` crate (via `cargo-ndk`)
which compiles Oboe and runs fine on the same phone.
---
## Environment
**Docker build image:** `wzp-android-builder` (Dockerfile at
`scripts/Dockerfile.android-builder`)
- Base: `debian:bookworm`
- JDK 17
- Android SDK:
- cmdline-tools latest
- `platforms;android-34`, `platforms;android-36`
- `build-tools;34.0.0`, `build-tools;35.0.0`
- `ndk;26.1.10909125` (last stable before scudo/MTE crash on NDK r27+)
- `platform-tools`
- Node.js 20 LTS
- Rust stable `1.94.1 (e408947bf 2026-03-25)`
- Rust android targets: `aarch64-linux-android`, `armv7-linux-androideabi`,
`i686-linux-android`, `x86_64-linux-android`
- `cargo-ndk` + `cargo tauri-cli 2.10.1` (latest 2.x)
**Host:** Docker on `SepehrHomeserverdk` (remote build server).
**Phone:** Pixel 6, Android 16, kernel 6.1.134-android14-11, on the same LAN
as the build machine and a local `wzp-relay` binary.
**Tauri crate:** `desktop/src-tauri/` in the workspace at the root of the
repo. Depends on `tauri = "2"`, `tauri-plugin-shell = "2"`, `tokio`, `rustls`,
`wzp-proto`, `wzp-codec`, `wzp-fec`, `wzp-crypto`, `wzp-transport`, and (on
non-Android only) `wzp-client` with `features = ["audio", "vpio"]`. The
crate's `[lib]` section is:
```toml
[lib]
name = "wzp_desktop_lib"
crate-type = ["staticlib", "cdylib", "rlib"]
```
The crate produces `libwzp_desktop_lib.so` which is `System.loadLibrary`'d by
Tauri's generated `WryActivity.onCreate` via JNI.
---
## The crash
Every failing build produces the same stack at launch, same pc offsets:
```
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x00000072XXXXXX00f (write)
#00 pc 000000000130cc74 libwzp_desktop_lib.so (__init_tcb(bionic_tcb*, pthread_internal_t*)+4)
#01 pc 0000000001331cf0 libwzp_desktop_lib.so (pthread_create+360)
#02 pc 00000000012bee04 libwzp_desktop_lib.so (std::sys::thread::unix::Thread::new::h87be8e9feeaaaf84+184)
#03 pc 0000000000e37f5c libwzp_desktop_lib.so (std::thread::lifecycle::spawn_unchecked::h941f828f9a95150d+1504)
#04 pc 0000000000e461e8 libwzp_desktop_lib.so (std::thread::builder::Builder::spawn_unchecked::hec5f087680cb0248+112)
#05 pc 0000000000e441c8 libwzp_desktop_lib.so (std::thread::functions::spawn::ha3d3fbf2d9fe53e3+108)
#06 pc ... libwzp_desktop_lib.so (tao::platform_impl::platform::ndk_glue::create::h254c68662718841a+1792)
#07 pc ... libwzp_desktop_lib.so (Java_com_wzp_desktop_WryActivity_create+76)
```
The offsets are **byte-identical across every failing build**, even when the
cpp content changes drastically (cf. `cpp_smoke.cpp` at 6 lines, 20 lines,
200+ Oboe source files). We believe this is because cargo caches the Rust
compilation unit and only the build-script artifacts differ, and the final
link produces the same layout.
`__init_tcb` is defined locally inside our `.so` with C++ mangling:
```
_Z10__init_tcbP10bionic_tcbP18pthread_internal_t
```
It originates from bionic's `pthread_create.cpp`, which got pulled in
statically from the NDK's `sysroot/usr/lib/aarch64-linux-android/libc.a`.
Both failing and known-good (legacy `wzp_android.so`) builds contain this
same static symbol — the presence of the symbol is not the problem.
Fault address `0x72XXXXXX00f` with code `SEGV_ACCERR` (access permission
error, write). Aligned to `+4` inside `__init_tcb`, which is typically a
store into the passed-in `bionic_tcb*`. The pointer is either NULL-ish or
pointing into read-only memory.
---
## Bisection (the important part)
We started from a known-good commit (`5309938`) where the Tauri Android app
launches, registers on a relay, and behaves identically to the desktop app
modulo audio. Then we added features **one variable at a time**:
| Step | Commit | Change vs previous | Result |
|---|---|---|---|
| Baseline | `5309938` | — | ✅ launches, renders home, registers on relay |
| **A** | `f96d7ce` | Add `cc = "1"` build-dep + compile trivial `cpp/hello.c` via `cc::Build` (C, not C++). Static lib never linked in. | ✅ |
| **B** | `ae4f366` | Add `wzp-client` Android dep with `default-features = false` (no CPAL, no VPIO). No new imports. | ✅ |
| **C** | `19fd3dd` | Un-cfg-gate `mod engine;` in `lib.rs` so `engine.rs` compiles on Android. `CallEngine::start()` has an Android stub returning an error. | ✅ |
| **D** | `a852cad` | Compile `cpp/getauxval_fix.c` (legacy wzp-android shim). Still pure C. | ✅ |
| **E** | `4250f1b` | **Compile full Oboe C++ bridge** (200+ source files from `google/oboe@1.8.1`). `cc::Build::new().cpp(true).std("c++17").cpp_link_stdlib(Some("c++_shared"))` + `-llog` + `-lOpenSLES` link directives. Nothing called from Rust yet — the `extern "C"` bridge functions are exported but never referenced from the Rust side. | ❌ **crash** |
| E.4 | `aa240c6` | **Only change:** replace the entire Oboe compile with ONE tiny `cpp_smoke.cpp` file: `extern "C" int wzp_cpp_smoke(void) { std::lock_guard<std::mutex> lk(m); std::thread t([](){...}); t.join(); return g.load(); }`. Still `cpp(true) + cpp_link_stdlib("c++_shared")`. Drop `-llog`/`-lOpenSLES`. | ❌ **same crash, same offsets** |
| E.2 | `0224ce6` | Shrink `cpp_smoke.cpp` further: just `std::atomic<int>` + `fetch_add`, no mutex, no thread, no includes beyond `<atomic>`. | ❌ **same crash, same offsets** |
| E.1 | `0d74366` | **Absolute minimum:** `cpp_smoke.cpp` = `extern "C" int wzp_cpp_hello(void){return 42;}`. NO `#include`. NO STL. Just a function. Still compiled with `cpp(true) + cpp_link_stdlib("c++_shared")`. | ❌ **same crash, same offsets** |
### Additional confirming observations
1. **The cpp code is dead-stripped.** `llvm-nm -a libwzp_desktop_lib.so` shows
zero matches for `wzp_cpp_hello`, `wzp_cpp_smoke`, or any Oboe symbol in
builds E through E.1. The static archive (`libwzp_cpp_smoke.a` /
`liboboe_bridge.a`) exists on disk under
`target/aarch64-linux-android/debug/build/wzp-desktop-*/out/`, but because
nothing in Rust ever references the exported C function, the final linker
drops it.
2. **`build.rs` link directives are the real delta.** `cc::Build::new()
.cpp(true).cpp_link_stdlib(Some("c++_shared"))` emits a
`cargo:rustc-link-lib=c++_shared` directive that adds a `NEEDED` entry for
`libc++_shared.so` to the final `.so`'s dynamic table. `readelf -d` on
the crashing `.so` shows:
```
NEEDED Shared library: [libc++_shared.so]
NEEDED Shared library: [liblog.so] (only in full Oboe build)
NEEDED Shared library: [libOpenSLES.so] (only in full Oboe build)
```
The working baseline `.so` has no `NEEDED` entries beyond libc/liblog.
3. **Linker version doesn't matter.** We tried forcing
`aarch64-linux-android26-clang` as the linker (API 26 has proper dynamic
bindings to libc.so's runtime `pthread_create`/`__init_tcb`) via three
different mechanisms:
- `CARGO_TARGET_AARCH64_LINUX_ANDROID_LINKER` env var in `docker run`
- `.cargo/config.toml` workspace-level linker override
- **Binary replacement inside the image**: `mv
aarch64-linux-android24-clang .orig` and replace with a shell script
that `exec`s `aarch64-linux-android26-clang`. Verified by calling
`--version` which prints `Target: aarch64-unknown-linux-android26`.
All three made no difference. The `__init_tcb` symbol is pulled statically
from the **same** `libc.a` regardless of which clang wrapper is used — the
NDK ships ONE `libc.a` at
`sysroot/usr/lib/aarch64-linux-android/libc.a` shared across all API
levels. Only the per-API `libc.so` symlinks change (and we're linked
statically, not dynamically, against libc).
4. **Legacy `wzp-android` crate works on the same phone, same image.** Run
in the exact same Docker container, the legacy Kotlin app's JNI library
(`crates/wzp-android` built via `cargo ndk`) compiles a subset of the
same Oboe code, produces a `.so` that has the same static
`_Z10__init_tcbP...` + `pthread_create` + `pthread_create.cpp` symbols,
and launches cleanly on the Pixel 6. Key differences between the two
build paths:
| | `wzp-android` (works) | `wzp-desktop` Tauri (crashes) |
|---|---|---|
| Build driver | `cargo ndk -t arm64-v8a build --release -p wzp-android` | `cargo tauri android build --debug --target aarch64 --apk` |
| Profile | release | debug (release crashes identically) |
| Linker | `aarch64-linux-android26-clang` (via `.cargo/config.toml` which cargo-ndk honors) | `aarch64-linux-android24-clang` (tauri-cli hardcodes and ignores config; the shim redirect makes no difference) |
| crate-type | `["cdylib", "rlib"]` | `["staticlib", "cdylib", "rlib"]` |
| JNI entrypoint | direct Kotlin `System.loadLibrary` + our own `native fun` declarations; first `pthread_create` runs later from the tokio runtime inside a command | `WryActivity.onCreate` via Tauri's generated Java glue; first `pthread_create` runs **inside the JNI call** via `tao::ndk_glue::create` |
| Other heavy deps | tokio, wzp-{proto,codec,fec,crypto,transport} | tokio, tauri, tauri-runtime-wry, tao, wry, webview2-com, soup3, webkit2gtk (all platform-specific ones cfg-gated out of android), and also all of the above |
| Binary size | `libwzp_android.so` ≈ 14 MB (release) | `libwzp_desktop_lib.so` ≈ 160 MB (debug), 16 MB (release) |
5. **The crash happens in the JNI-callback thread during `onCreate`.** Frame
#06 `tao::platform_impl::platform::ndk_glue::create+1792` is tao's Android
event-loop bootstrap, which Tauri calls from inside
`Java_com_wzp_desktop_WryActivity_create` in response to the Java-side
activity lifecycle. This means the thread spawn is happening while the
Java VM still holds the native onCreate call, before `onCreate` has
returned to the Android runtime. Legacy `wzp-android` never spawns a
thread from an onCreate JNI call — it spawns threads only from
`nativeSignalConnect`/similar commands invoked later from Kotlin button
clicks, after the activity is fully initialised.
---
## Current suspect
One of the two items below, probably (2):
1. **The `.cpp(true)` mode in cc-rs changes something invisible in the link
pipeline** (for example, emitting a different `-x` flag to clang, or
changing linker driver selection). We have not yet verified this by
diffing the actual rustc linker invocation between a working and a
crashing build with `--verbose` + `-Clink-arg=-Wl,-t`.
2. **Adding `libc++_shared.so` as a NEEDED entry causes Android's dynamic
linker to load libc++_shared.so before our `.so`'s init runs, and
something in libc++_shared's `.init_array` interacts badly with
tao::ndk_glue's `pthread_create` call from inside the JNI onCreate
window**. The legacy crate doesn't hit this because (a) it has no
NEEDED libc++_shared when built without Oboe, and (b) even when it does
build Oboe, its thread spawns happen outside the onCreate JNI call so
whatever libc state is wrong at that moment is already stabilised.
We have not yet confirmed (2) with the obvious A/B test: keep `cpp_smoke.cpp`
but drop `.cpp_link_stdlib(Some("c++_shared"))` (and drop any manual
`cargo:rustc-link-lib=c++_shared`) so the NEEDED entry disappears but the
rest of the pipeline stays identical. That's the next experiment we were
going to run, but the user reasonably asked for this report first.
---
## What we've ruled out
- **NDK API level** — forcing API-26 linker via three independent mechanisms
made zero difference.
- **Build profile** — release (`0x6b8000` offset, 21 MB unsigned APK) and
debug (same 193 MB APK, same crash offsets) both crash identically.
- **Oboe specifically** — replacing the Oboe compile with 6 lines of C++
that does nothing still reproduces the crash.
- **cpp code being executed at runtime** — dead-stripped, not in the final
`.so` at all per `nm -a`.
- **minSdk in build.gradle** — bumped from 24 to 26, no effect.
- **libdl.a stub issue** — ruled out via logcat (`libdl.a is a stub --- use
libdl.so instead` was only surfacing from our own `dlsym` shim that we
subsequently deleted).
- **`pthread_create` interposition via `-Wl,--wrap=pthread_create`** — tried
and reverted; the wrap target still resolved to the broken static stub.
- **Keystore / signing** — debug signing with persistent `~/.android/
debug.keystore` works fine; no signature mismatch issues.
---
## The files involved
### `desktop/src-tauri/build.rs` (current state, E.1)
```rust
use std::path::PathBuf;
use std::process::Command;
fn main() {
// Embedded git hash
let git_hash = Command::new("git")
.args(["rev-parse", "--short", "HEAD"])
.output()
.ok()
.filter(|o| o.status.success())
.and_then(|o| String::from_utf8(o.stdout).ok())
.map(|s| s.trim().to_string())
.unwrap_or_else(|| "unknown".into());
println!("cargo:rustc-env=WZP_GIT_HASH={git_hash}");
println!("cargo:rerun-if-changed=../../.git/HEAD");
println!("cargo:rerun-if-changed=../../.git/refs/heads");
let target = std::env::var("TARGET").unwrap_or_default();
if target.contains("android") {
// Step A: plain C sanity file
println!("cargo:rerun-if-changed=cpp/hello.c");
cc::Build::new().file("cpp/hello.c").compile("wzp_hello");
// Step D: legacy getauxval shim
println!("cargo:rerun-if-changed=cpp/getauxval_fix.c");
cc::Build::new().file("cpp/getauxval_fix.c").compile("getauxval_fix");
// Step E.1: minimal C++ smoke — THIS STEP BRINGS BACK THE CRASH
println!("cargo:rerun-if-changed=cpp/cpp_smoke.cpp");
cc::Build::new()
.cpp(true)
.std("c++17")
.cpp_link_stdlib(Some("c++_shared"))
.file("cpp/cpp_smoke.cpp")
.compile("wzp_cpp_smoke");
// Copy libc++_shared.so into gen/android jniLibs so the runtime
// linker can find it when the NEEDED entry fires.
if let Ok(ndk) = std::env::var("ANDROID_NDK_HOME").or_else(|_| std::env::var("NDK_HOME")) {
let triple = "aarch64-linux-android";
let abi = "arm64-v8a";
let lib_dir = format!(
"{ndk}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/{triple}"
);
println!("cargo:rustc-link-search=native={lib_dir}");
let shared_so = format!("{lib_dir}/libc++_shared.so");
if std::path::Path::new(&shared_so).exists() {
let manifest = std::env::var("CARGO_MANIFEST_DIR").unwrap_or_default();
let jni_dir = format!("{manifest}/gen/android/app/src/main/jniLibs/{abi}");
if std::fs::create_dir_all(&jni_dir).is_ok() {
let _ = std::fs::copy(&shared_so, format!("{jni_dir}/libc++_shared.so"));
}
}
}
}
tauri_build::build()
}
```
### `desktop/src-tauri/cpp/cpp_smoke.cpp` (E.1)
```cpp
extern "C" int wzp_cpp_hello(void) {
return 42;
}
```
### `desktop/src-tauri/Cargo.toml` (relevant excerpts)
```toml
[package]
name = "wzp-desktop"
version = "0.1.0"
edition = "2024"
[lib]
name = "wzp_desktop_lib"
crate-type = ["staticlib", "cdylib", "rlib"]
[[bin]]
name = "wzp-desktop"
path = "src/main.rs"
[build-dependencies]
tauri-build = { version = "2", features = [] }
cc = "1"
[dependencies]
tauri = { version = "2", features = [] }
tauri-plugin-shell = "2"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }
tracing = "0.1"
tracing-subscriber = "0.3"
anyhow = "1"
rustls = { version = "0.23", default-features = false, features = ["ring", "std"] }
wzp-proto = { path = "../../crates/wzp-proto" }
wzp-codec = { path = "../../crates/wzp-codec" }
wzp-fec = { path = "../../crates/wzp-fec" }
wzp-crypto = { path = "../../crates/wzp-crypto" }
wzp-transport = { path = "../../crates/wzp-transport" }
[target.'cfg(not(target_os = "android"))'.dependencies]
wzp-client = { path = "../../crates/wzp-client", features = ["audio", "vpio"] }
[target.'cfg(target_os = "android")'.dependencies]
wzp-client = { path = "../../crates/wzp-client", default-features = false }
```
---
## Reproduction
A fresh clone on a Linux x86_64 host with:
```bash
git clone ssh://git@git.manko.yoga:222/manawenuz/wz-phone.git
cd wz-phone
git checkout feat/desktop-audio-rewrite
git reset --hard 0d74366 # <-- step E.1, smallest crashing commit
# Need: Android NDK r26.1.10909125, JDK 17, Node 20, Rust stable, cargo tauri 2.x
scripts/prep-linux-mint.sh # installs all the above into /opt/android-sdk etc.
cd desktop
npm install
cd src-tauri
cargo tauri android build --debug --target aarch64 --apk
adb install -r gen/android/app/build/outputs/apk/universal/debug/app-universal-debug.apk
adb logcat -c && adb shell am start -n com.wzp.desktop/.MainActivity
adb logcat | grep -E "F DEBUG|__init_tcb|pthread_create"
```
Expected result: SIGSEGV at `__init_tcb+4` within ~500 ms of launch.
Reverting `cpp/cpp_smoke.cpp` + the `cc::Build` call for it in `build.rs`
(one git command: `git revert 0d74366 aa240c6 0224ce6 a852cad`) restores a
working build. Keeping the C sanity compile (`hello.c`, `getauxval_fix.c`)
is fine — only the `.cpp(true) + .cpp_link_stdlib("c++_shared")` combination
triggers the regression.
---
## What we'd like help with
1. **Is our suspect #2 actually the mechanism?** Is there a known issue
where a Tauri/tao android cdylib crashes on load when it has a
`libc++_shared.so` NEEDED entry and tries to spawn a thread from inside
an onCreate JNI call?
2. **What's the correct way to link Oboe (or any C++ Android audio
library) into a `cargo tauri android build` cdylib** without hitting
this? Is there a known-good combination of cc-rs flags / linker
arguments / cargo config?
3. **Is there a way to force `cargo tauri` to use the same linker setup
as `cargo ndk`**, which reliably produces working Oboe-linked .so
files from the exact same workspace? We've tried env var override,
`.cargo/config.toml`, and image-level binary replacement — cargo
tauri ignores all three and keeps using
`aarch64-linux-android24-clang`.
4. **Is there a way to defer `tao::ndk_glue::create`'s thread spawn to
after `onCreate` returns** so that whatever bionic state `__init_tcb`
depends on is ready?
5. **Lastly** — is there a fundamentally different approach we should
take (e.g., use the `oboe` Rust crate from crates.io instead of a
hand-rolled C++ bridge, use Android's AAudio directly via the `ndk`
crate's aaudio bindings, or even abandon the C++ audio path and
implement mic/speaker via JNI into Java `AudioRecord`/`AudioTrack`)?

View File

@@ -0,0 +1,130 @@
# =============================================================================
# WZ Phone — Android build environment (Debian 12 / Bookworm)
#
# Supports both:
# 1. Legacy Kotlin+JNI Android app (via cargo-ndk + gradle)
# 2. Tauri 2.x Mobile Android app (via tauri-cli + Node/npm)
#
# Toolchain:
# - Debian 12 (cmake 3.25, no Android cross-compilation bugs)
# - JDK 17 (Gradle 8.5 + AGP 8.2.0 compatible)
# - NDK 26.1 (last stable before scudo/MTE crash on NDK 27+)
# - Node.js 20 LTS (for Tauri frontend build)
# - Rust stable with all 4 Android targets + cargo-ndk + tauri-cli 2.x
#
# Build: docker build -t wzp-android-builder -f Dockerfile.android-builder .
# =============================================================================
FROM debian:bookworm
ARG NDK_VERSION=26.1.10909125
ARG ANDROID_API=34
# Tauri 2.x mobile targets compileSdk 36 + build-tools 35 by default. Install
# both 34 (legacy Kotlin app) and 35/36 (Tauri mobile) so the same image works
# for both pipelines.
ARG ANDROID_API_TAURI=36
ARG BUILD_TOOLS_TAURI=35.0.0
ENV DEBIAN_FRONTEND=noninteractive \
ANDROID_HOME=/opt/android-sdk \
JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
ENV ANDROID_NDK_HOME=$ANDROID_HOME/ndk/$NDK_VERSION \
ANDROID_NDK=$ANDROID_HOME/ndk/$NDK_VERSION
# ── System packages ──────────────────────────────────────────────────────────
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
curl \
git \
libssl-dev \
pkg-config \
unzip \
wget \
zip \
openjdk-17-jdk-headless \
ca-certificates \
libasound2-dev \
file \
xz-utils \
&& rm -rf /var/lib/apt/lists/*
# ── Node.js 20 LTS (required by Tauri for frontend build) ────────────────────
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
&& apt-get install -y --no-install-recommends nodejs \
&& rm -rf /var/lib/apt/lists/* \
&& node --version \
&& npm --version
# ── Android SDK + NDK 26.1 ──────────────────────────────────────────────────
RUN mkdir -p $ANDROID_HOME/cmdline-tools \
&& cd /tmp \
&& wget -q https://dl.google.com/android/repository/commandlinetools-linux-11076708_latest.zip -O cmdtools.zip \
&& unzip -qo cmdtools.zip -d $ANDROID_HOME/cmdline-tools \
&& mv $ANDROID_HOME/cmdline-tools/cmdline-tools $ANDROID_HOME/cmdline-tools/latest \
&& rm cmdtools.zip
RUN yes | $ANDROID_HOME/cmdline-tools/latest/bin/sdkmanager --licenses > /dev/null 2>&1 \
&& $ANDROID_HOME/cmdline-tools/latest/bin/sdkmanager --install \
"platforms;android-${ANDROID_API}" \
"build-tools;${ANDROID_API}.0.0" \
"platforms;android-${ANDROID_API_TAURI}" \
"build-tools;${BUILD_TOOLS_TAURI}" \
"ndk;${NDK_VERSION}" \
"platform-tools" \
2>&1 | grep -v '^\[' > /dev/null
# Work around the API-24 libc.a stub in the NDK. Any C++ static lib we
# link into libwzp_desktop_lib.so (e.g. the Oboe audio bridge) pulls in
# bionic's static pthread_create from API-24 libc.a via libc++_shared,
# and that pthread_create crashes at __init_tcb+4 when called from a
# .so loaded via dlopen (the static stub expects libc init state that
# only exists for main executables). API-26 has the proper runtime
# bindings. Tauri-cli hard-codes aarch64-linux-android24-clang as the
# linker and ignores .cargo/config.toml overrides, so the only sure
# fix is to replace the NDK's ${abi}24-clang binary itself with a
# shim that exec()s the ${abi}26-clang equivalent. Applies to all four
# ABIs × {clang, clang++}. The legacy wzp-android crate works without
# this because cargo-ndk honours a crate-level linker override; the
# shim is the minimal targeted fix for the cargo-tauri build path.
# Added as Option 3 for the incremental Step E regression (commit 4250f1b).
RUN set -eux; \
BIN=$ANDROID_NDK_HOME/toolchains/llvm/prebuilt/linux-x86_64/bin; \
for abi in aarch64-linux-android armv7a-linux-androideabi i686-linux-android x86_64-linux-android; do \
for suffix in clang clang++; do \
mv "$BIN/${abi}24-${suffix}" "$BIN/${abi}24-${suffix}.orig"; \
printf '#!/bin/sh\nexec "%s/%s26-%s" "$@"\n' "$BIN" "$abi" "$suffix" > "$BIN/${abi}24-${suffix}"; \
chmod +x "$BIN/${abi}24-${suffix}"; \
done; \
done
# Make SDK world-readable so builder user can access it
RUN chmod -R a+rX $ANDROID_HOME
# ── Builder user (1000:1000) ─────────────────────────────────────────────────
RUN groupadd -g 1000 builder \
&& useradd -m -u 1000 -g 1000 -s /bin/bash builder
USER builder
WORKDIR /home/builder
# ── Rust toolchain ───────────────────────────────────────────────────────────
# Install all 4 Android targets (Tauri Mobile builds for all ABIs by default;
# cargo-ndk legacy path only needs arm64-v8a — both workflows supported).
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \
| sh -s -- -y --default-toolchain stable \
&& . $HOME/.cargo/env \
&& rustup target add \
aarch64-linux-android \
armv7-linux-androideabi \
i686-linux-android \
x86_64-linux-android \
&& cargo install cargo-ndk \
&& cargo install tauri-cli --version "^2.0" --locked
ENV PATH="/home/builder/.cargo/bin:$ANDROID_HOME/cmdline-tools/latest/bin:$ANDROID_HOME/platform-tools:$JAVA_HOME/bin:$PATH"
# NDK_HOME is the env var tauri-cli checks (in addition to ANDROID_NDK_HOME)
ENV NDK_HOME=$ANDROID_NDK_HOME
WORKDIR /build/source

197
scripts/build-and-notify.sh Executable file
View File

@@ -0,0 +1,197 @@
#!/usr/bin/env bash
set -euo pipefail
# Build Android APK via Docker on SepehrHomeserverdk, upload to rustypaste,
# notify via ntfy.sh/wzp. Fire and forget.
#
# Usage:
# ./scripts/build-and-notify.sh Build current local branch
# ./scripts/build-and-notify.sh --branch opus-DRED Build a specific branch
# ./scripts/build-and-notify.sh --rust Force Rust rebuild
# ./scripts/build-and-notify.sh --no-pull Skip git pull (use cached source)
# ./scripts/build-and-notify.sh --install Also download + adb install locally
#
# The remote builder pulls the requested branch from its `origin` (gitea:
# git.manko.yoga). Make sure you've pushed the branch to `origin` before
# running this script, otherwise the remote fetch will fail loudly.
REMOTE_HOST="SepehrHomeserverdk"
BASE_DIR="/mnt/storage/manBuilder"
NTFY_TOPIC="https://ntfy.sh/wzp"
LOCAL_OUTPUT="target/android-apk"
SSH_OPTS="-o ConnectTimeout=15 -o ServerAliveInterval=15 -o ServerAliveCountMax=4 -o LogLevel=ERROR"
REBUILD_RUST=0
DO_PULL=1
DO_INSTALL=0
# Default to whatever branch the local workspace is on — "build what I'm
# working on" is the intuitive behavior for iterative development.
BRANCH=$(git -C "$(dirname "$0")/.." branch --show-current 2>/dev/null || echo "")
while [ $# -gt 0 ]; do
case "$1" in
--rust) REBUILD_RUST=1 ;;
--pull) DO_PULL=1 ;;
--no-pull) DO_PULL=0 ;;
--install) DO_INSTALL=1 ;;
--branch)
shift
BRANCH="$1"
;;
--branch=*) BRANCH="${1#--branch=}" ;;
*) echo "Unknown arg: $1"; exit 1 ;;
esac
shift
done
if [ -z "$BRANCH" ]; then
echo "ERROR: could not determine target branch (detached HEAD?). Pass --branch NAME."
exit 1
fi
echo "Target branch: $BRANCH"
log() { echo -e "\033[1;36m>>> $*\033[0m"; }
ssh_cmd() { ssh -A $SSH_OPTS "$REMOTE_HOST" "$@"; }
# Upload the remote build script
log "Uploading build script to remote..."
ssh_cmd "cat > /tmp/wzp-docker-build.sh" <<'REMOTE_SCRIPT'
#!/usr/bin/env bash
set -euo pipefail
BASE_DIR="/mnt/storage/manBuilder"
NTFY_TOPIC="https://ntfy.sh/wzp"
REBUILD_RUST="${1:-0}"
DO_PULL="${2:-0}"
BRANCH="${3:-}"
if [ -z "$BRANCH" ]; then
echo "ERROR: remote script invoked without a BRANCH argument"
exit 1
fi
notify() { curl -s -d "$1" "$NTFY_TOPIC" > /dev/null 2>&1 || true; }
trap 'notify "WZP Android build FAILED [$BRANCH]! Check /tmp/wzp-build.log"' ERR
# Pull the requested branch. Previously this was hardcoded to
# feat/android-voip-client with `|| true` on the reset, which silently
# left the tree on whatever branch it was last on when the hardcoded
# branch didn't exist on origin. Now the branch is a parameter and any
# failure aborts the build so nobody ships an APK from the wrong source.
if [ "$DO_PULL" = "1" ]; then
echo ">>> Pulling branch '$BRANCH' from origin..."
cd "$BASE_DIR/data/source"
git reset --hard HEAD 2>/dev/null || true
git clean -fd 2>/dev/null || true
git gc --prune=now 2>/dev/null || true
git fetch origin "$BRANCH"
git reset --hard "origin/$BRANCH"
BUILT_HASH=$(git rev-parse --short HEAD)
BUILT_SUBJECT=$(git log -1 --format=%s)
echo ">>> HEAD after pull: $BUILT_HASH — $BUILT_SUBJECT"
fi
# Clean Rust if requested
if [ "$REBUILD_RUST" = "1" ]; then
echo ">>> Cleaning Rust target..."
rm -rf "$BASE_DIR/data/cache/target/aarch64-linux-android/release"
fi
# Fix perms
find "$BASE_DIR/data/source" "$BASE_DIR/data/cache" \
! -user 1000 -o ! -group 1000 2>/dev/null | \
xargs -r chown 1000:1000 2>/dev/null || true
# Clean jniLibs
rm -rf "$BASE_DIR/data/source/android/app/src/main/jniLibs/arm64-v8a"
GIT_HASH=$(cd $BASE_DIR/data/source && git rev-parse --short HEAD 2>/dev/null || echo unknown)
notify "WZP Android build started [$BRANCH @ $GIT_HASH]..."
echo ">>> Building in Docker..."
docker run --rm --user 1000:1000 \
-v "$BASE_DIR/data/source:/build/source" \
-v "$BASE_DIR/data/cache/cargo-registry:/home/builder/.cargo/registry" \
-v "$BASE_DIR/data/cache/cargo-git:/home/builder/.cargo/git" \
-v "$BASE_DIR/data/cache/target:/build/source/target" \
-v "$BASE_DIR/data/cache/gradle:/home/builder/.gradle" \
wzp-android-builder bash -c '
set -euo pipefail
cd /build/source
echo ">>> Rust build..."
cargo ndk -t arm64-v8a -o android/app/src/main/jniLibs build --release -p wzp-android 2>&1 | tail -5
echo ">>> Checking .so files..."
# cargo-ndk may not copy libc++_shared.so — grab it from the NDK if missing
if [ ! -f android/app/src/main/jniLibs/arm64-v8a/libc++_shared.so ]; then
echo ">>> libc++_shared.so missing, copying from NDK..."
NDK_LIBCXX=$(find "$ANDROID_NDK_HOME" -name "libc++_shared.so" -path "*/aarch64-linux-android/*" | head -1)
if [ -n "$NDK_LIBCXX" ]; then
cp "$NDK_LIBCXX" android/app/src/main/jniLibs/arm64-v8a/
echo "Copied from: $NDK_LIBCXX"
else
echo "WARNING: libc++_shared.so not found in NDK, APK may crash at runtime"
fi
fi
ls -lh android/app/src/main/jniLibs/arm64-v8a/
[ -f android/app/src/main/jniLibs/arm64-v8a/libwzp_android.so ] || { echo "ERROR: libwzp_android.so missing!"; exit 1; }
echo ">>> APK build..."
cd android && chmod +x gradlew
./gradlew clean assembleDebug --no-daemon --warning-mode=none 2>&1 | tail -50
echo "APK_BUILT"
'
# Upload to rustypaste
echo ">>> Uploading to rustypaste..."
source "$BASE_DIR/.env"
APK=$(find "$BASE_DIR/data/source/android" -name "app-debug*.apk" -path "*/outputs/apk/*" | head -1)
if [ -n "$APK" ]; then
URL=$(curl -s -F "file=@$APK" -H "Authorization: $rusty_auth_token" "$rusty_address")
echo "UPLOAD_URL=$URL"
notify "WZP Android [$BRANCH @ $GIT_HASH] done! APK: $URL"
echo ">>> Done! APK at: $URL"
else
notify "WZP build FAILED [$BRANCH @ $GIT_HASH] - no APK"
echo "ERROR: No APK found"
exit 1
fi
REMOTE_SCRIPT
ssh_cmd "chmod +x /tmp/wzp-docker-build.sh"
# Run in tmux
log "Starting build in tmux (branch: $BRANCH)..."
ssh_cmd "tmux kill-session -t wzp-build 2>/dev/null; true"
ssh_cmd "tmux new-session -d -s wzp-build '/tmp/wzp-docker-build.sh $REBUILD_RUST $DO_PULL $BRANCH 2>&1 | tee /tmp/wzp-build.log'"
log "Build running! You'll get a notification on ntfy.sh/wzp with the download URL."
echo ""
echo " Monitor: ssh $REMOTE_HOST 'tail -f /tmp/wzp-build.log'"
echo " Status: ssh $REMOTE_HOST 'tail -5 /tmp/wzp-build.log'"
echo ""
# Optionally wait and install locally
if [ "$DO_INSTALL" = "1" ]; then
log "Waiting for build to finish..."
while true; do
sleep 15
if ssh_cmd "grep -q 'UPLOAD_URL\|ERROR' /tmp/wzp-build.log 2>/dev/null"; then
break
fi
done
URL=$(ssh_cmd "grep UPLOAD_URL /tmp/wzp-build.log | tail -1 | cut -d= -f2")
if [ -n "$URL" ]; then
log "Downloading APK..."
mkdir -p "$LOCAL_OUTPUT"
curl -s -o "$LOCAL_OUTPUT/wzp-debug.apk" "$URL"
log "Installing..."
adb uninstall com.wzp.phone 2>/dev/null || true
adb install "$LOCAL_OUTPUT/wzp-debug.apk"
log "Done!"
else
err "Build failed"
fi
fi

416
scripts/build-android-docker.sh Executable file
View File

@@ -0,0 +1,416 @@
#!/usr/bin/env bash
set -euo pipefail
# =============================================================================
# WZ Phone — Android APK build via Docker on remote host
#
# Replaces Hetzner Cloud VMs with a Docker container on SepehrHomeserverdk.
# Persistent storage at /mnt/storage/manBuilder/data/{source,cache,keystore}.
# Uploads APKs to rustypaste, then SCPs them back locally.
#
# Prerequisites:
# - SSH config has "SepehrHomeserverdk" host entry
# - SSH agent running with keys for both remote host and git.manko.yoga
# - Docker installed on remote host
# - /mnt/storage/manBuilder/.env with rusty_address and rusty_auth_token
#
# Usage:
# ./scripts/build-android-docker.sh Full: prepare+pull+build+upload+transfer
# ./scripts/build-android-docker.sh --prepare Build Docker image + sync keystores
# ./scripts/build-android-docker.sh --pull Clone/update source from Gitea
# ./scripts/build-android-docker.sh --build Build debug APK inside Docker
# ./scripts/build-android-docker.sh --upload Upload APKs to rustypaste
# ./scripts/build-android-docker.sh --transfer SCP APKs back to local machine
# ./scripts/build-android-docker.sh --all pull+build+upload+transfer (image ready)
#
# Add --release to also build release APK:
# ./scripts/build-android-docker.sh --build --release
# ./scripts/build-android-docker.sh --all --release
# ./scripts/build-android-docker.sh --release (full pipeline, debug+release)
#
# Environment variables (all optional):
# WZP_BRANCH Branch to build (default: feat/android-voip-client)
# =============================================================================
REMOTE_HOST="SepehrHomeserverdk"
BASE_DIR="/mnt/storage/manBuilder"
REPO_URL="ssh://git@git.manko.yoga:222/manawenuz/wz-phone.git"
BRANCH="${WZP_BRANCH:-feat/android-voip-client}"
DOCKER_IMAGE="wzp-android-builder"
LOCAL_OUTPUT_DIR="target/android-apk"
PROJECT_DIR="$(cd "$(dirname "$0")/.." && pwd)"
LOCAL_KEYSTORE_DIR="$PROJECT_DIR/android/keystore"
SSH_OPTS="-o ConnectTimeout=10 -o LogLevel=ERROR -o ServerAliveInterval=15 -o ServerAliveCountMax=4"
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
log() { echo -e "\n\033[1;36m>>> $*\033[0m"; }
err() { echo -e "\033[1;31mERROR: $*\033[0m" >&2; }
ssh_cmd() {
ssh -A $SSH_OPTS "$REMOTE_HOST" "$@"
}
push_reminder() {
echo ""
echo " ┌──────────────────────────────────────────────────────────────────┐"
echo " │ IMPORTANT: Push your changes to origin (Gitea) before build! │"
echo " │ │"
echo " │ The build fetches from: │"
echo " │ ssh://git@git.manko.yoga:222/manawenuz/wz-phone.git │"
echo " │ │"
echo " │ Run: git push origin $BRANCH"
echo " └──────────────────────────────────────────────────────────────────┘"
echo ""
read -r -p "Press Enter to continue (Ctrl-C to abort)... "
}
# ---------------------------------------------------------------------------
# --prepare: Create remote dirs, build Docker image, sync keystores
# ---------------------------------------------------------------------------
do_prepare() {
log "Preparing remote environment..."
ssh_cmd "mkdir -p $BASE_DIR/data/{source,cache/cargo-registry,cache/cargo-git,cache/target,cache/gradle,keystore}"
# Sync keystores (gitignored — won't exist after clone)
REMOTE_HAS_KEYSTORE=$(ssh_cmd "[ -f $BASE_DIR/data/keystore/wzp-debug.jks ] && echo yes || echo no")
if [ "$REMOTE_HAS_KEYSTORE" = "no" ]; then
if [ -f "$LOCAL_KEYSTORE_DIR/wzp-debug.jks" ]; then
log "Uploading keystores to remote persistent storage..."
scp $SSH_OPTS \
"$LOCAL_KEYSTORE_DIR/wzp-debug.jks" \
"$LOCAL_KEYSTORE_DIR/wzp-release.jks" \
"$REMOTE_HOST:$BASE_DIR/data/keystore/"
echo " Keystores uploaded to $BASE_DIR/data/keystore/"
else
err "No keystores found locally at $LOCAL_KEYSTORE_DIR/"
err "Build will generate a temporary debug keystore instead."
fi
else
echo " Keystores already on remote."
fi
# Upload Dockerfile from local (always use local version — no git dependency)
log "Uploading Dockerfile to remote..."
ssh_cmd "mkdir -p $BASE_DIR/data/source/scripts"
scp $SSH_OPTS \
"$PROJECT_DIR/scripts/Dockerfile.android-builder" \
"$REMOTE_HOST:$BASE_DIR/data/source/scripts/Dockerfile.android-builder"
# Build Docker image
log "Building Docker image (Debian 12 + Rust + Android SDK/NDK)..."
ssh_cmd bash <<IMAGE_EOF
set -euo pipefail
docker build -t "$DOCKER_IMAGE" - < "$BASE_DIR/data/source/scripts/Dockerfile.android-builder"
echo " Docker image '$DOCKER_IMAGE' ready."
IMAGE_EOF
}
# ---------------------------------------------------------------------------
# --pull: Clone or update source from Gitea
# ---------------------------------------------------------------------------
do_pull() {
push_reminder
log "Updating source (branch: $BRANCH)..."
ssh_cmd bash <<PULL_EOF
set -euo pipefail
mkdir -p "$BASE_DIR/data/source" \
"$BASE_DIR/data/cache/cargo-registry" \
"$BASE_DIR/data/cache/cargo-git" \
"$BASE_DIR/data/cache/target" \
"$BASE_DIR/data/cache/gradle" \
"$BASE_DIR/data/keystore"
cd "$BASE_DIR/data/source"
if [ -d .git ]; then
echo " Fetching origin..."
git fetch origin
git checkout "$BRANCH" 2>/dev/null || git checkout -b "$BRANCH" "origin/$BRANCH"
git reset --hard "origin/$BRANCH"
else
echo " Cloning repo..."
cd "$BASE_DIR/data"
rm -rf source
git clone --branch "$BRANCH" "$REPO_URL" source
cd source
fi
git submodule update --init || true
echo " HEAD: \$(git log --oneline -1)"
echo " Branch: \$(git branch --show-current)"
PULL_EOF
# Inject keystores into source tree
log "Injecting keystores into source tree..."
ssh_cmd bash <<KS_EOF
set -euo pipefail
mkdir -p "$BASE_DIR/data/source/android/keystore"
if [ -f "$BASE_DIR/data/keystore/wzp-debug.jks" ]; then
cp "$BASE_DIR/data/keystore/wzp-debug.jks" "$BASE_DIR/data/source/android/keystore/"
cp "$BASE_DIR/data/keystore/wzp-release.jks" "$BASE_DIR/data/source/android/keystore/"
echo " Keystores ready (wzp-debug.jks + wzp-release.jks)"
else
echo " WARNING: No keystores in persistent storage — build will generate temporary ones"
fi
KS_EOF
}
# ---------------------------------------------------------------------------
# --build: Build APK inside Docker container
# $1 = "1" to also build release APK (default: debug only)
# ---------------------------------------------------------------------------
do_build() {
local build_release="${1:-0}"
if [ "$build_release" = "1" ]; then
log "Building debug + release APKs inside Docker container..."
else
log "Building debug APK inside Docker container..."
fi
ssh_cmd bash <<BUILD_EOF
set -euo pipefail
# Ensure uid 1000 can write to mounted volumes
# Use find to only chown files not already 1000:1000, ignore errors on stubborn files
find "$BASE_DIR/data/source" "$BASE_DIR/data/cache" \
! -user 1000 -o ! -group 1000 2>/dev/null | \
xargs -r chown 1000:1000 2>/dev/null || true
docker run --rm \
--user 1000:1000 \
-e BUILD_RELEASE="$build_release" \
-v "$BASE_DIR/data/source:/build/source" \
-v "$BASE_DIR/data/cache/cargo-registry:/home/builder/.cargo/registry" \
-v "$BASE_DIR/data/cache/cargo-git:/home/builder/.cargo/git" \
-v "$BASE_DIR/data/cache/target:/build/source/target" \
-v "$BASE_DIR/data/cache/gradle:/home/builder/.gradle" \
"$DOCKER_IMAGE" \
bash -c '
set -euo pipefail
cd /build/source
echo ">>> Building Rust native library (arm64-v8a, release)..."
# Clean stale jniLibs so cargo-ndk re-copies libc++_shared.so
rm -rf android/app/src/main/jniLibs/arm64-v8a
cargo ndk -t arm64-v8a \
-o android/app/src/main/jniLibs \
build --release -p wzp-android 2>&1 | tail -10
[ -f android/app/src/main/jniLibs/arm64-v8a/libwzp_android.so ] || {
echo "ERROR: libwzp_android.so not found after build"; exit 1;
}
echo " .so size: \$(du -h android/app/src/main/jniLibs/arm64-v8a/libwzp_android.so | cut -f1)"
# Verify keystores exist (should have been injected by --pull)
if [ -f android/keystore/wzp-debug.jks ] && [ -f android/keystore/wzp-release.jks ]; then
echo " Keystores: wzp-debug.jks + wzp-release.jks (from persistent storage)"
else
echo "WARNING: Keystores missing — generating temporary debug keystore..."
mkdir -p android/keystore
keytool -genkey -v \
-keystore android/keystore/wzp-debug.jks \
-keyalg RSA -keysize 2048 -validity 10000 \
-alias wzp-debug -storepass android -keypass android \
-dname "CN=WZP Debug" 2>&1 | tail -1
cp android/keystore/wzp-debug.jks android/keystore/wzp-release.jks
fi
cd android
chmod +x ./gradlew
echo ">>> Building debug APK..."
./gradlew assembleDebug --no-daemon --warning-mode=none 2>&1 | tail -5
if [ "\${BUILD_RELEASE}" = "1" ]; then
echo ">>> Building release APK..."
./gradlew assembleRelease --no-daemon --warning-mode=none 2>&1 | tail -5 || \
echo " (release build failed — debug APK still available)"
fi
echo ""
echo ">>> Build artifacts:"
find . -name "*.apk" -path "*/outputs/apk/*" -exec ls -lh {} \;
'
BUILD_EOF
}
# ---------------------------------------------------------------------------
# --upload: Upload APKs to rustypaste
# ---------------------------------------------------------------------------
do_upload() {
log "Uploading APKs to rustypaste..."
UPLOAD_RESULT=$(ssh_cmd bash <<'UPLOAD_EOF'
set -euo pipefail
BASE_DIR="/mnt/storage/manBuilder"
ENV_FILE="$BASE_DIR/.env"
if [ ! -f "$ENV_FILE" ]; then
echo "ERROR: $ENV_FILE not found — create it with rusty_address and rusty_auth_token" >&2
exit 1
fi
source "$ENV_FILE"
if [ -z "${rusty_address:-}" ] || [ -z "${rusty_auth_token:-}" ]; then
echo "ERROR: rusty_address or rusty_auth_token not set in $ENV_FILE" >&2
exit 1
fi
upload_apk() {
local apk="$1" label="$2"
if [ -f "$apk" ]; then
local url
url=$(curl -s -F "file=@$apk" -H "Authorization: $rusty_auth_token" "$rusty_address")
echo "$label: $url"
fi
}
DEBUG_APK=$(find "$BASE_DIR/data/source/android" -name "app-debug*.apk" -path "*/outputs/apk/*" 2>/dev/null | head -1)
RELEASE_APK=$(find "$BASE_DIR/data/source/android" -name "app-release*.apk" -path "*/outputs/apk/*" 2>/dev/null | head -1)
upload_apk "${DEBUG_APK:-}" "debug"
upload_apk "${RELEASE_APK:-}" "release"
UPLOAD_EOF
)
echo "$UPLOAD_RESULT"
}
# ---------------------------------------------------------------------------
# --transfer: SCP APKs back to local machine
# ---------------------------------------------------------------------------
do_transfer() {
log "Downloading APKs to local machine..."
mkdir -p "$LOCAL_OUTPUT_DIR"
# Debug APK
DEBUG_REMOTE=$(ssh_cmd "find $BASE_DIR/data/source/android -name 'app-debug*.apk' -path '*/outputs/apk/*' 2>/dev/null | head -1" || true)
if [ -n "$DEBUG_REMOTE" ]; then
scp $SSH_OPTS "$REMOTE_HOST:$DEBUG_REMOTE" "$LOCAL_OUTPUT_DIR/wzp-debug.apk"
echo " debug: $LOCAL_OUTPUT_DIR/wzp-debug.apk ($(du -h "$LOCAL_OUTPUT_DIR/wzp-debug.apk" | cut -f1))"
fi
# Release APK
RELEASE_REMOTE=$(ssh_cmd "find $BASE_DIR/data/source/android -name 'app-release*.apk' -path '*/outputs/apk/*' 2>/dev/null | head -1" || true)
if [ -n "$RELEASE_REMOTE" ]; then
scp $SSH_OPTS "$REMOTE_HOST:$RELEASE_REMOTE" "$LOCAL_OUTPUT_DIR/wzp-release.apk"
echo " release: $LOCAL_OUTPUT_DIR/wzp-release.apk ($(du -h "$LOCAL_OUTPUT_DIR/wzp-release.apk" | cut -f1))"
fi
# Also grab the .so
scp $SSH_OPTS "$REMOTE_HOST:$BASE_DIR/data/source/android/app/src/main/jniLibs/arm64-v8a/libwzp_android.so" \
"$LOCAL_OUTPUT_DIR/libwzp_android.so" 2>/dev/null \
&& echo " .so: $LOCAL_OUTPUT_DIR/libwzp_android.so" || true
}
# ---------------------------------------------------------------------------
# Summary banner
# ---------------------------------------------------------------------------
show_summary() {
log "All done!"
echo ""
echo " ┌──────────────────────────────────────────────────────────────┐"
[ -f "$LOCAL_OUTPUT_DIR/wzp-debug.apk" ] && \
echo " │ Debug APK: $LOCAL_OUTPUT_DIR/wzp-debug.apk"
[ -f "$LOCAL_OUTPUT_DIR/wzp-release.apk" ] && \
echo " │ Release APK: $LOCAL_OUTPUT_DIR/wzp-release.apk"
echo " │"
if [ -n "${UPLOAD_RESULT:-}" ]; then
echo " │ Rustypaste:"
echo "$UPLOAD_RESULT" | while read -r line; do
echo "$line"
done
echo " │"
fi
echo " │ Install: adb install -r $LOCAL_OUTPUT_DIR/wzp-debug.apk"
echo " └──────────────────────────────────────────────────────────────┘"
}
# ---------------------------------------------------------------------------
# Parse arguments
# ---------------------------------------------------------------------------
ACTION=""
BUILD_RELEASE=0
for arg in "$@"; do
case "$arg" in
--release) BUILD_RELEASE=1 ;;
--prepare|--pull|--build|--upload|--transfer|--all)
if [ -n "$ACTION" ]; then
err "Multiple actions specified: $ACTION and $arg"
exit 1
fi
ACTION="$arg"
;;
*)
echo "Usage: $0 [--prepare|--pull|--build|--upload|--transfer|--all] [--release]"
echo ""
echo "Actions:"
echo " (no action) Full pipeline: pull → prepare → build → upload → transfer"
echo " --prepare Build Docker image + sync keystores to remote"
echo " --pull Clone/update source from Gitea + inject keystores"
echo " --build Build debug APK inside Docker container"
echo " --upload Upload APKs to rustypaste"
echo " --transfer SCP APKs + .so back to local machine"
echo " --all pull → build → upload → transfer (Docker image ready)"
echo ""
echo "Flags:"
echo " --release Also build release APK (default: debug only)"
echo ""
echo "Examples:"
echo " $0 # full pipeline, debug only"
echo " $0 --release # full pipeline, debug + release"
echo " $0 --build # debug APK only"
echo " $0 --build --release # debug + release APKs"
echo " $0 --all # iterate: pull+build+upload+transfer (debug)"
echo " $0 --all --release # iterate with release too"
echo ""
echo "Environment:"
echo " WZP_BRANCH=$BRANCH"
exit 1
;;
esac
done
# ---------------------------------------------------------------------------
# Dispatch
# ---------------------------------------------------------------------------
case "${ACTION:-}" in
--prepare)
do_prepare
;;
--pull)
do_pull
;;
--build)
do_build "$BUILD_RELEASE"
;;
--upload)
do_upload
;;
--transfer)
do_transfer
;;
--all)
do_pull
do_build "$BUILD_RELEASE"
do_upload
do_transfer
show_summary
;;
"")
do_pull
do_prepare
do_build "$BUILD_RELEASE"
do_upload
do_transfer
show_summary
;;
esac

166
scripts/build-linux-docker.sh Executable file
View File

@@ -0,0 +1,166 @@
#!/usr/bin/env bash
set -euo pipefail
# Build WarzonePhone Linux x86_64 binaries via Docker on SepehrHomeserverdk.
# Reuses same Docker image as Android build (has Rust + cmake + build tools).
# Fire and forget — notifies via ntfy.sh/wzp with rustypaste URL.
#
# Usage:
# ./scripts/build-linux-docker.sh Build + upload + notify
# ./scripts/build-linux-docker.sh --pull Git pull before building
# ./scripts/build-linux-docker.sh --clean Clean Rust target cache
# ./scripts/build-linux-docker.sh --install Download binaries locally after build
REMOTE_HOST="SepehrHomeserverdk"
BASE_DIR="/mnt/storage/manBuilder"
NTFY_TOPIC="https://ntfy.sh/wzp"
LOCAL_OUTPUT="target/linux-x86_64"
SSH_OPTS="-o ConnectTimeout=15 -o ServerAliveInterval=15 -o ServerAliveCountMax=4 -o LogLevel=ERROR"
DO_PULL=1
DO_CLEAN=0
DO_INSTALL=0
for arg in "$@"; do
case "$arg" in
--pull) DO_PULL=1 ;;
--no-pull) DO_PULL=0 ;;
--clean) DO_CLEAN=1 ;;
--install) DO_INSTALL=1 ;;
esac
done
log() { echo -e "\033[1;36m>>> $*\033[0m"; }
err() { echo -e "\033[1;31mERROR: $*\033[0m" >&2; }
ssh_cmd() { ssh $SSH_OPTS "$REMOTE_HOST" "$@"; }
# Upload build script to remote
log "Uploading build script..."
ssh_cmd "cat > /tmp/wzp-linux-build.sh" <<'REMOTE_SCRIPT'
#!/usr/bin/env bash
set -euo pipefail
BASE_DIR="/mnt/storage/manBuilder"
NTFY_TOPIC="https://ntfy.sh/wzp"
DO_PULL="${1:-0}"
DO_CLEAN="${2:-0}"
notify() { curl -s -d "$1" "$NTFY_TOPIC" > /dev/null 2>&1 || true; }
trap 'notify "WZP Linux build FAILED! Check /tmp/wzp-linux-build.log"' ERR
if [ "$DO_PULL" = "1" ]; then
echo ">>> Pulling latest..."
cd "$BASE_DIR/data/source"
git reset --hard HEAD 2>/dev/null || true
git clean -fd 2>/dev/null || true
git gc --prune=now 2>/dev/null || true
git fetch origin feat/android-voip-client 2>&1 | tail -3
git reset --hard origin/feat/android-voip-client 2>/dev/null || true
fi
if [ "$DO_CLEAN" = "1" ]; then
echo ">>> Cleaning Linux target cache..."
rm -rf "$BASE_DIR/data/cache-linux/target"
fi
# Ensure cache dirs exist (separate from Android cache)
mkdir -p "$BASE_DIR/data/cache-linux/target" \
"$BASE_DIR/data/cache-linux/cargo-registry" \
"$BASE_DIR/data/cache-linux/cargo-git"
# Fix perms
find "$BASE_DIR/data/source" "$BASE_DIR/data/cache-linux" \
! -user 1000 -o ! -group 1000 2>/dev/null | \
xargs -r chown 1000:1000 2>/dev/null || true
GIT_HASH=$(cd "$BASE_DIR/data/source" && git rev-parse --short HEAD 2>/dev/null || echo "unknown")
notify "WZP Linux x86_64 build started [$GIT_HASH]..."
echo ">>> Building in Docker..."
docker run --rm --user 1000:1000 \
-v "$BASE_DIR/data/source:/build/source" \
-v "$BASE_DIR/data/cache-linux/cargo-registry:/home/builder/.cargo/registry" \
-v "$BASE_DIR/data/cache-linux/cargo-git:/home/builder/.cargo/git" \
-v "$BASE_DIR/data/cache-linux/target:/build/source/target" \
wzp-android-builder bash -c '
set -euo pipefail
cd /build/source
echo ">>> Building relay + client + web + bench..."
cargo build --release --bin wzp-relay --bin wzp-client --bin wzp-web --bin wzp-bench 2>&1 | tail -5
echo ">>> Building audio client..."
cargo build --release --bin wzp-client --features audio 2>&1 | tail -3
cp target/release/wzp-client target/release/wzp-client-audio
cargo build --release --bin wzp-client 2>&1 | tail -3
echo ">>> Binaries:"
ls -lh target/release/wzp-relay target/release/wzp-client target/release/wzp-client-audio target/release/wzp-web target/release/wzp-bench
echo ">>> Packaging..."
tar czf /tmp/wzp-linux-x86_64.tar.gz \
-C target/release wzp-relay wzp-client wzp-client-audio wzp-web wzp-bench
echo "BINARIES_BUILT"
'
# Upload to rustypaste
echo ">>> Uploading to rustypaste..."
source "$BASE_DIR/.env"
TARBALL="$BASE_DIR/data/cache-linux/target/release/../../../wzp-linux-x86_64.tar.gz"
# Docker wrote to /tmp inside container, copy from target mount
docker run --rm \
-v "$BASE_DIR/data/cache-linux/target:/build/target" \
wzp-android-builder bash -c \
"cp /build/target/release/wzp-relay /build/target/release/wzp-client /build/target/release/wzp-client-audio /build/target/release/wzp-web /build/target/release/wzp-bench /tmp/ && tar czf /tmp/wzp-linux-x86_64.tar.gz -C /tmp wzp-relay wzp-client wzp-client-audio wzp-web wzp-bench && cat /tmp/wzp-linux-x86_64.tar.gz" \
> /tmp/wzp-linux-x86_64.tar.gz
URL=$(curl -s -F "file=@/tmp/wzp-linux-x86_64.tar.gz" -H "Authorization: $rusty_auth_token" "$rusty_address")
if [ -n "$URL" ]; then
echo "UPLOAD_URL=$URL"
notify "WZP Linux x86_64 [$GIT_HASH] ready! $URL"
echo ">>> Done! Binaries at: $URL"
else
notify "WZP Linux build FAILED - upload error"
echo "ERROR: upload failed"
exit 1
fi
REMOTE_SCRIPT
ssh_cmd "chmod +x /tmp/wzp-linux-build.sh"
# Run in tmux
log "Starting Linux build in tmux..."
ssh_cmd "tmux kill-session -t wzp-linux 2>/dev/null; true"
ssh_cmd "tmux new-session -d -s wzp-linux '/tmp/wzp-linux-build.sh $DO_PULL $DO_CLEAN 2>&1 | tee /tmp/wzp-linux-build.log'"
log "Build running! Notification on ntfy.sh/wzp when done."
echo ""
echo " Monitor: ssh $REMOTE_HOST 'tail -f /tmp/wzp-linux-build.log'"
echo " Status: ssh $REMOTE_HOST 'tail -5 /tmp/wzp-linux-build.log'"
echo ""
# Optionally wait and download
if [ "$DO_INSTALL" = "1" ]; then
log "Waiting for build..."
while true; do
sleep 15
if ssh_cmd "grep -q 'UPLOAD_URL\|ERROR' /tmp/wzp-linux-build.log 2>/dev/null"; then
break
fi
done
URL=$(ssh_cmd "grep UPLOAD_URL /tmp/wzp-linux-build.log | tail -1 | cut -d= -f2")
if [ -n "$URL" ]; then
log "Downloading binaries..."
mkdir -p "$LOCAL_OUTPUT"
curl -s -o "$LOCAL_OUTPUT/wzp-linux-x86_64.tar.gz" "$URL"
tar xzf "$LOCAL_OUTPUT/wzp-linux-x86_64.tar.gz" -C "$LOCAL_OUTPUT/"
rm "$LOCAL_OUTPUT/wzp-linux-x86_64.tar.gz"
ls -lh "$LOCAL_OUTPUT"/wzp-*
log "Done! Binaries in $LOCAL_OUTPUT/"
else
err "Build failed"
fi
fi

122
scripts/build-linux-notify.sh Executable file
View File

@@ -0,0 +1,122 @@
#!/usr/bin/env bash
set -euo pipefail
# Build WarzonePhone Linux x86_64 binaries via Hetzner Cloud VPS.
# Fire and forget — notifies via ntfy.sh/wzp with rustypaste URL.
#
# Usage:
# ./scripts/build-linux-notify.sh Full: create VM → build → upload → notify → destroy
# ./scripts/build-linux-notify.sh --keep Keep VM after build
# ./scripts/build-linux-notify.sh --pull Git pull (for existing VM)
SSH_KEY_NAME="wz"
SSH_KEY_PATH="/Users/manwe/CascadeProjects/wzp"
SERVER_TYPE="cx33"
IMAGE="debian-12"
SERVER_NAME="wzp-linux-builder"
NTFY_TOPIC="https://ntfy.sh/wzp"
LOCAL_OUTPUT="target/linux-x86_64"
PROJECT_DIR="$(cd "$(dirname "$0")/.." && pwd)"
SSH_OPTS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=15 -o ServerAliveInterval=15 -o LogLevel=ERROR"
KEEP_VM=0
DO_PULL=0
for arg in "$@"; do
case "$arg" in
--keep) KEEP_VM=1 ;;
--pull) DO_PULL=1 ;;
esac
done
log() { echo -e "\033[1;36m>>> $*\033[0m"; }
err() { echo -e "\033[1;31mERROR: $*\033[0m" >&2; }
get_vm_ip() {
hcloud server list -o columns=name,ipv4 -o noheader 2>/dev/null | grep "$SERVER_NAME" | awk '{print $2}' | tr -d ' '
}
ssh_cmd() {
local ip=$(get_vm_ip)
[ -n "$ip" ] || { err "No VM found"; exit 1; }
ssh $SSH_OPTS -i "$SSH_KEY_PATH" "root@$ip" "$@"
}
notify() { curl -s -d "$1" "$NTFY_TOPIC" > /dev/null 2>&1 || true; }
# --- Create VM if needed ---
existing=$(hcloud server list -o columns=name -o noheader 2>/dev/null | grep "$SERVER_NAME" | tr -d ' ' || true)
if [ -z "$existing" ]; then
log "Creating Hetzner VM ($SERVER_TYPE, $IMAGE)..."
hcloud server create --name "$SERVER_NAME" --type "$SERVER_TYPE" --image "$IMAGE" --ssh-key "$SSH_KEY_NAME" --location fsn1 --quiet
log "Waiting for SSH..."
ip=$(get_vm_ip)
for i in $(seq 1 30); do
ssh $SSH_OPTS -i "$SSH_KEY_PATH" "root@$ip" "echo ok" &>/dev/null && break
sleep 2
done
log "Installing deps..."
ssh_cmd "apt-get update -qq && apt-get install -y -qq build-essential cmake pkg-config libasound2-dev libssl-dev curl git > /dev/null 2>&1"
ssh_cmd "curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable > /dev/null 2>&1"
fi
# --- Upload source ---
log "Uploading source..."
ip=$(get_vm_ip)
rsync -az --delete \
--exclude='target' --exclude='.git' --exclude='.claude' \
--exclude='node_modules' --exclude='dist' --exclude='android/app/build' \
-e "ssh $SSH_OPTS -i $SSH_KEY_PATH" \
"$PROJECT_DIR/" "root@$ip:/root/wzp-build/"
# --- Build ---
log "Building all binaries..."
notify "WZP Linux build started..."
ssh_cmd "source ~/.cargo/env && cd /root/wzp-build && \
cargo build --release --bin wzp-relay --bin wzp-client --bin wzp-web --bin wzp-bench 2>&1 | tail -5 && \
echo '--- audio client ---' && \
cargo build --release --bin wzp-client --features audio 2>&1 | tail -3 && \
cp target/release/wzp-client target/release/wzp-client-audio && \
cargo build --release --bin wzp-client 2>&1 | tail -3 && \
echo 'BUILD_DONE' && \
ls -lh target/release/wzp-relay target/release/wzp-client target/release/wzp-client-audio target/release/wzp-web target/release/wzp-bench"
# --- Package + upload to rustypaste ---
log "Packaging and uploading..."
UPLOAD_URL=$(ssh_cmd "cd /root/wzp-build && \
tar czf /tmp/wzp-linux-x86_64.tar.gz \
-C target/release wzp-relay wzp-client wzp-client-audio wzp-web wzp-bench \
-C /root/wzp-build/crates/wzp-web/static index.html audio-processor.js 2>/dev/null && \
curl -s -F 'file=@/tmp/wzp-linux-x86_64.tar.gz' \
-H 'Authorization: DAxAAGghkn1WKv1+RpPKkg==' \
https://paste.dk.manko.yoga")
if [ -n "$UPLOAD_URL" ]; then
notify "WZP Linux binaries ready! $UPLOAD_URL"
log "Uploaded: $UPLOAD_URL"
else
notify "WZP Linux build FAILED"
err "Upload failed"
fi
# --- Transfer locally ---
log "Downloading binaries..."
mkdir -p "$LOCAL_OUTPUT"
for bin in wzp-relay wzp-client wzp-client-audio wzp-web wzp-bench; do
scp $SSH_OPTS -i "$SSH_KEY_PATH" "root@$ip:/root/wzp-build/target/release/$bin" "$LOCAL_OUTPUT/$bin" 2>/dev/null
done
ls -lh "$LOCAL_OUTPUT"/wzp-*
# --- Cleanup ---
if [ "$KEEP_VM" = "1" ]; then
log "VM kept alive. Destroy: hcloud server delete $SERVER_NAME"
else
log "Destroying VM..."
hcloud server delete "$SERVER_NAME"
fi
log "Done!"
echo " Deploy: scp $LOCAL_OUTPUT/wzp-relay user@server:~/wzp/"

253
scripts/build-tauri-android.sh Executable file
View File

@@ -0,0 +1,253 @@
#!/usr/bin/env bash
set -euo pipefail
# =============================================================================
# WZ Phone — Tauri 2.x Mobile Android APK build
#
# Builds the desktop/ Tauri app as an Android APK via cargo-tauri inside the
# wzp-android-builder Docker image on SepehrHomeserverdk. Uploads the APK to
# rustypaste, fires ntfy.sh/wzp notifications at start + finish, and SCPs the
# APK back locally.
#
# Same pattern as build-and-notify.sh but for the Tauri mobile pipeline:
# - Source: desktop/src-tauri/ (not android/)
# - Build: cargo tauri android build (not gradlew assembleDebug)
# - Output: desktop/src-tauri/gen/android/.../*.apk
#
# Usage:
# ./scripts/build-tauri-android.sh # full pipeline (debug)
# ./scripts/build-tauri-android.sh --release # release APK
# ./scripts/build-tauri-android.sh --no-pull # skip git fetch
# ./scripts/build-tauri-android.sh --rust # force-clean rust target
# ./scripts/build-tauri-android.sh --init # also run `cargo tauri android init`
#
# Environment:
# WZP_BRANCH Branch to build (default: feat/desktop-audio-rewrite)
# =============================================================================
REMOTE_HOST="SepehrHomeserverdk"
BASE_DIR="/mnt/storage/manBuilder"
NTFY_TOPIC="https://ntfy.sh/wzp"
LOCAL_OUTPUT="target/tauri-android-apk"
BRANCH="${WZP_BRANCH:-feat/desktop-audio-rewrite}"
SSH_OPTS="-o ConnectTimeout=15 -o ServerAliveInterval=15 -o ServerAliveCountMax=4 -o LogLevel=ERROR"
REBUILD_RUST=0
DO_PULL=1
DO_INIT=0
BUILD_RELEASE=0
for arg in "$@"; do
case "$arg" in
--rust) REBUILD_RUST=1 ;;
--pull) DO_PULL=1 ;;
--no-pull) DO_PULL=0 ;;
--init) DO_INIT=1 ;;
--release) BUILD_RELEASE=1 ;;
-h|--help)
sed -n '3,30p' "$0"
exit 0
;;
esac
done
log() { echo -e "\033[1;36m>>> $*\033[0m"; }
ssh_cmd() { ssh -A $SSH_OPTS "$REMOTE_HOST" "$@"; }
notify_local() { curl -s -d "$1" "$NTFY_TOPIC" > /dev/null 2>&1 || true; }
mkdir -p "$LOCAL_OUTPUT"
log "Uploading remote build script..."
ssh_cmd "cat > /tmp/wzp-tauri-build.sh" <<'REMOTE_SCRIPT'
#!/usr/bin/env bash
set -euo pipefail
BASE_DIR="/mnt/storage/manBuilder"
NTFY_TOPIC="https://ntfy.sh/wzp"
BRANCH="${1:-feat/desktop-audio-rewrite}"
DO_PULL="${2:-1}"
REBUILD_RUST="${3:-0}"
DO_INIT="${4:-0}"
BUILD_RELEASE="${5:-0}"
LOG_FILE=/tmp/wzp-tauri-build.log
GIT_HASH="unknown" # populated after fetch
ENV_FILE="$BASE_DIR/.env"
notify() { curl -s -d "$1" "$NTFY_TOPIC" > /dev/null 2>&1 || true; }
# Upload a file to rustypaste; print URL on stdout (or empty on failure).
upload_to_rustypaste() {
local file="$1"
[ ! -f "$ENV_FILE" ] && { echo ""; return; }
# shellcheck disable=SC1090
source "$ENV_FILE"
if [ -n "${rusty_address:-}" ] && [ -n "${rusty_auth_token:-}" ]; then
curl -s -F "file=@$file" -H "Authorization: $rusty_auth_token" "$rusty_address" || echo ""
else
echo ""
fi
}
# On failure: upload the build log to rustypaste, then notify with hash + url.
on_error() {
local line="$1"
local log_url
log_url=$(upload_to_rustypaste "$LOG_FILE" || echo "")
if [ -n "$log_url" ]; then
notify "WZP Tauri Android build FAILED [$GIT_HASH] (line $line)
log: $log_url"
else
notify "WZP Tauri Android build FAILED [$GIT_HASH] (line $line) — log upload failed, see $LOG_FILE on remote"
fi
}
trap 'on_error $LINENO' ERR
exec > >(tee "$LOG_FILE") 2>&1
if [ "$DO_PULL" = "1" ]; then
echo ">>> git fetch + reset $BRANCH"
cd "$BASE_DIR/data/source"
git reset --hard HEAD 2>/dev/null || true
# NOTE: deliberately do NOT run `git clean -fd` here. It would wipe the
# tauri-generated `desktop/src-tauri/gen/android/` scaffold (gradlew,
# settings.gradle, etc.) which is expensive to recreate and breaks
# subsequent builds with "gradlew not found".
git gc --prune=now 2>/dev/null || true
git fetch origin "$BRANCH" 2>&1 | tail -3
git checkout "$BRANCH" 2>/dev/null || git checkout -b "$BRANCH" "origin/$BRANCH"
git reset --hard "origin/$BRANCH"
git submodule update --init || true
fi
GIT_HASH=$(cd "$BASE_DIR/data/source" && git rev-parse --short HEAD 2>/dev/null || echo unknown)
GIT_MSG=$(cd "$BASE_DIR/data/source" && git log -1 --pretty=%s 2>/dev/null | head -c 60 || echo "?")
notify "WZP Tauri Android build STARTED [$GIT_HASH] — $GIT_MSG"
# Fix perms so uid 1000 can write
find "$BASE_DIR/data/source" "$BASE_DIR/data/cache" \
! -user 1000 -o ! -group 1000 2>/dev/null | \
xargs -r chown 1000:1000 2>/dev/null || true
# Optionally clean rust target for android triples
if [ "$REBUILD_RUST" = "1" ]; then
echo ">>> Cleaning Rust android target dirs..."
rm -rf "$BASE_DIR/data/cache/target/aarch64-linux-android" \
"$BASE_DIR/data/cache/target/armv7-linux-androideabi" \
"$BASE_DIR/data/cache/target/i686-linux-android" \
"$BASE_DIR/data/cache/target/x86_64-linux-android"
fi
# Profile flag
PROFILE_FLAG="--debug"
[ "$BUILD_RELEASE" = "1" ] && PROFILE_FLAG=""
# Persist ~/.android (where the auto-generated debug.keystore lives) so every
# build is signed with the SAME key. Without this, every fresh container gets
# a new debug keystore and `adb install -r` fails with INSTALL_FAILED_UPDATE_
# INCOMPATIBLE because the signature changed.
mkdir -p "$BASE_DIR/data/cache/android-home"
chown 1000:1000 "$BASE_DIR/data/cache/android-home" 2>/dev/null || true
docker run --rm \
--user 1000:1000 \
-e DO_INIT="$DO_INIT" \
-e PROFILE_FLAG="$PROFILE_FLAG" \
-v "$BASE_DIR/data/source:/build/source" \
-v "$BASE_DIR/data/cache/cargo-registry:/home/builder/.cargo/registry" \
-v "$BASE_DIR/data/cache/cargo-git:/home/builder/.cargo/git" \
-v "$BASE_DIR/data/cache/target:/build/source/target" \
-v "$BASE_DIR/data/cache/gradle:/home/builder/.gradle" \
-v "$BASE_DIR/data/cache/android-home:/home/builder/.android" \
wzp-android-builder \
bash -c '
set -euo pipefail
cd /build/source/desktop
echo ">>> npm install"
npm install --silent 2>&1 | tail -5 || npm install 2>&1 | tail -20
cd src-tauri
# Run init if forced, OR if the gradle wrapper is missing. Just checking
# for `gen/android` is not enough — Tauri creates a few subdirectories
# during build (app/, buildSrc/, .gradle/) that survive a partial wipe and
# would make a naive `[ ! -d gen/android ]` check return false even though
# the build wrapper itself is gone.
if [ "${DO_INIT}" = "1" ] || [ ! -x gen/android/gradlew ]; then
echo ">>> cargo tauri android init"
cargo tauri android init 2>&1 | tail -20
fi
# ─── wzp-native standalone cdylib (built with cargo-ndk, not cargo-tauri) ──
# Produces libwzp_native.so which wzp-desktop dlopens at runtime via
# libloading. Split exists because cargo-tauri`s linker wiring pulls
# bionic private symbols into any cdylib with cc::Build C++, causing
# __init_tcb+4 SIGSEGV. cargo-ndk uses the same linker path as the
# legacy wzp-android crate which works.
echo ">>> cargo ndk build -p wzp-native --release"
JNI_ABI_DIR=gen/android/app/src/main/jniLibs/arm64-v8a
mkdir -p "$JNI_ABI_DIR"
(
cd /build/source
cargo ndk -t arm64-v8a -o desktop/src-tauri/gen/android/app/src/main/jniLibs \
build --release -p wzp-native 2>&1 | tail -10
)
if [ -f "$JNI_ABI_DIR/libwzp_native.so" ]; then
ls -lh "$JNI_ABI_DIR/libwzp_native.so"
else
echo ">>> WARNING: libwzp_native.so not produced"
fi
echo ">>> cargo tauri android build ${PROFILE_FLAG} --target aarch64 --apk"
cargo tauri android build ${PROFILE_FLAG} --target aarch64 --apk
echo ""
echo ">>> Build artifacts:"
find gen/android -name "*.apk" -exec ls -lh {} \; 2>/dev/null
'
# Locate the produced APK
APK=$(find "$BASE_DIR/data/source/desktop/src-tauri/gen/android" -name "*.apk" -type f 2>/dev/null | head -1)
if [ -z "$APK" ] || [ ! -f "$APK" ]; then
LOG_URL=$(upload_to_rustypaste "$LOG_FILE" || echo "")
if [ -n "$LOG_URL" ]; then
notify "WZP Tauri Android build [$GIT_HASH]: no APK produced
log: $LOG_URL"
else
notify "WZP Tauri Android build [$GIT_HASH]: no APK produced — log upload failed"
fi
exit 1
fi
APK_SIZE=$(du -h "$APK" | cut -f1)
RUSTY_URL=$(upload_to_rustypaste "$APK" || echo "")
if [ -n "$RUSTY_URL" ]; then
notify "WZP Tauri Android build OK [$GIT_HASH] ($APK_SIZE)
$RUSTY_URL"
else
notify "WZP Tauri Android build OK [$GIT_HASH] ($APK_SIZE) — rustypaste upload skipped"
fi
# Print path so the local script can grab it
echo "APK_REMOTE_PATH=$APK"
REMOTE_SCRIPT
ssh_cmd "chmod +x /tmp/wzp-tauri-build.sh"
notify_local "WZP Tauri Android build dispatched (branch=$BRANCH, release=$BUILD_RELEASE)"
log "Triggering remote build (branch=$BRANCH)..."
# Run; capture full output, last line is APK_REMOTE_PATH=...
REMOTE_OUTPUT=$(ssh_cmd "/tmp/wzp-tauri-build.sh '$BRANCH' '$DO_PULL' '$REBUILD_RUST' '$DO_INIT' '$BUILD_RELEASE'" || true)
echo "$REMOTE_OUTPUT" | tail -60
APK_REMOTE=$(echo "$REMOTE_OUTPUT" | grep '^APK_REMOTE_PATH=' | tail -1 | cut -d= -f2-)
if [ -n "$APK_REMOTE" ]; then
log "Downloading APK to $LOCAL_OUTPUT/wzp-tauri.apk..."
scp $SSH_OPTS "$REMOTE_HOST:$APK_REMOTE" "$LOCAL_OUTPUT/wzp-tauri.apk"
echo " $LOCAL_OUTPUT/wzp-tauri.apk ($(du -h "$LOCAL_OUTPUT/wzp-tauri.apk" | cut -f1))"
else
log "No APK produced — see ntfy / remote log /tmp/wzp-tauri-build.log"
exit 1
fi

280
scripts/federation-test.sh Executable file
View File

@@ -0,0 +1,280 @@
#!/usr/bin/env bash
set -euo pipefail
# Federation Test Harness
# Tests presence, audio delivery, and reconnection across 3 relays.
#
# Usage:
# ./scripts/federation-test.sh <relay1> <relay2> <relay3>
# ./scripts/federation-test.sh 172.16.81.175:4434 172.16.81.175:4435 172.16.81.175:4436
#
# Requires: wzp-client binary in PATH or target/release/
RELAY1="${1:-127.0.0.1:4433}"
RELAY2="${2:-127.0.0.1:4434}"
RELAY3="${3:-127.0.0.1:4435}"
ROOM="general"
CLIENT="${WZP_CLIENT:-target/release/wzp-client}"
AUDIO="/tmp/test-audio-60s.raw"
RESULTS="/tmp/federation-test-results"
DURATION=15 # seconds per test phase
# Fixed seeds for reproducible identities
SEED_A="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
SEED_B="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
SEED_C="cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc"
SEED_D="dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
log() { echo -e "\033[1;36m>>> $*\033[0m"; }
err() { echo -e "\033[1;31mERROR: $*\033[0m" >&2; }
pass() { echo -e "\033[1;32m PASS: $*\033[0m"; }
fail() { echo -e "\033[1;31m FAIL: $*\033[0m"; }
analyze() {
local path="$1" label="$2"
if [ ! -f "$path" ] || [ ! -s "$path" ]; then
fail "$label: NO FILE or empty"
return 1
fi
python3 -c "
import struct, math
with open('$path', 'rb') as f: data = f.read()
if len(data) < 4:
print(' $label: EMPTY')
exit(1)
samples = struct.unpack(f'<{len(data)//2}h', data)
n = len(samples)
rms = math.sqrt(sum(s*s for s in samples) / n) if n > 0 else 0
dur = n / 48000
nonzero = sum(1 for s in samples if s != 0)
pct = 100 * nonzero / n if n > 0 else 0
if rms > 50 and pct > 5:
print(f' \033[32mPASS\033[0m: $label — {dur:.1f}s, RMS {rms:.0f}, {pct:.0f}% nonzero')
else:
print(f' \033[31mFAIL\033[0m: $label — {dur:.1f}s, RMS {rms:.0f}, {pct:.0f}% nonzero')
exit(1)
" 2>/dev/null
}
cleanup() {
log "Cleaning up..."
kill ${PIDS[@]} 2>/dev/null || true
wait 2>/dev/null || true
}
trap cleanup EXIT
mkdir -p "$RESULTS"
PIDS=()
# Generate test audio if missing
if [ ! -f "$AUDIO" ]; then
log "Generating test audio..."
python3 -c "
import struct, math, random
RATE = 48000; samples = []
t = 0
while t < 60 * RATE:
burst = random.randint(int(RATE*0.2), int(RATE*0.8))
freq = random.choice([220,330,440,550,660,880])
amp = random.uniform(8000,16000)
for i in range(min(burst, 60*RATE-t)):
s = amp * math.sin(2*math.pi*freq*(t+i)/RATE)
samples.append(int(max(-32767,min(32767,s))))
t += burst
sil = random.randint(int(RATE*0.1), int(RATE*0.5))
samples.extend([0]*min(sil, 60*RATE-t)); t += sil
with open('$AUDIO', 'wb') as f:
f.write(struct.pack(f'<{len(samples)}h', *samples))
print(f'Generated {len(samples)/RATE:.1f}s')
"
fi
echo ""
echo "╔══════════════════════════════════════════════════════════╗"
echo "║ WarzonePhone Federation Test Suite ║"
echo "╠══════════════════════════════════════════════════════════╣"
echo "║ Relay 1: $RELAY1"
echo "║ Relay 2: $RELAY2"
echo "║ Relay 3: $RELAY3"
echo "║ Room: $ROOM"
echo "║ Duration: ${DURATION}s per phase"
echo "╚══════════════════════════════════════════════════════════╝"
echo ""
# ═══════════════════════════════════════════════════════════════
# TEST 1: Basic 2-relay audio
# ═══════════════════════════════════════════════════════════════
log "TEST 1: Basic audio — A sends on Relay1, B records on Relay2"
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_B --record "$RESULTS/t1_b.raw" "$RELAY2" &
PIDS+=($!); sleep 2
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_A --send-tone $DURATION "$RELAY1" &
PIDS+=($!); sleep $((DURATION + 3))
kill -INT ${PIDS[-2]} 2>/dev/null; sleep 3; kill -INT ${PIDS[-1]} 2>/dev/null; wait ${PIDS[-1]} ${PIDS[-2]} 2>/dev/null || true
PIDS=("${PIDS[@]:0:${#PIDS[@]}-2}")
analyze "$RESULTS/t1_b.raw" "Relay1→Relay2 audio"
echo ""
# ═══════════════════════════════════════════════════════════════
# TEST 2: Reverse direction
# ═══════════════════════════════════════════════════════════════
log "TEST 2: Reverse — B sends on Relay2, A records on Relay1"
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_A --record "$RESULTS/t2_a.raw" "$RELAY1" &
PIDS+=($!); sleep 2
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_B --send-tone $DURATION "$RELAY2" &
PIDS+=($!); sleep $((DURATION + 3))
kill -INT ${PIDS[-2]} 2>/dev/null; sleep 3; kill -INT ${PIDS[-1]} 2>/dev/null; wait ${PIDS[-1]} ${PIDS[-2]} 2>/dev/null || true
PIDS=("${PIDS[@]:0:${#PIDS[@]}-2}")
analyze "$RESULTS/t2_a.raw" "Relay2→Relay1 audio"
echo ""
# ═══════════════════════════════════════════════════════════════
# TEST 3: 3-relay chain
# ═══════════════════════════════════════════════════════════════
log "TEST 3: 3-relay chain — A sends on Relay1, C records on Relay3"
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_C --record "$RESULTS/t3_c.raw" "$RELAY3" &
PIDS+=($!); sleep 2
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_A --send-tone $DURATION "$RELAY1" &
PIDS+=($!); sleep $((DURATION + 3))
kill -INT ${PIDS[-2]} 2>/dev/null; sleep 3; kill -INT ${PIDS[-1]} 2>/dev/null; wait ${PIDS[-1]} ${PIDS[-2]} 2>/dev/null || true
PIDS=("${PIDS[@]:0:${#PIDS[@]}-2}")
analyze "$RESULTS/t3_c.raw" "Relay1→Relay3 (via Relay2) audio"
echo ""
# ═══════════════════════════════════════════════════════════════
# TEST 4: File playback (simulated talk show)
# ═══════════════════════════════════════════════════════════════
log "TEST 4: File playback — A plays audio file on Relay1, B records on Relay2"
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_B --record "$RESULTS/t4_b.raw" "$RELAY2" &
PIDS+=($!); sleep 2
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_A --send-file "$AUDIO" "$RELAY1" &
PIDS+=($!); sleep 20 # file is 60s but we only wait 20
kill -INT ${PIDS[-2]} 2>/dev/null; sleep 3; kill -INT ${PIDS[-1]} 2>/dev/null; wait ${PIDS[-1]} ${PIDS[-2]} 2>/dev/null || true
PIDS=("${PIDS[@]:0:${#PIDS[@]}-2}")
analyze "$RESULTS/t4_b.raw" "File playback Relay1→Relay2"
echo ""
# ═══════════════════════════════════════════════════════════════
# TEST 5: Reconnection — B disconnects and rejoins
# ═══════════════════════════════════════════════════════════════
log "TEST 5: Reconnection — A sends, B joins/leaves/rejoins on Relay2"
# A sends continuously
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_A --send-tone 30 "$RELAY1" &
A_PID=$!; PIDS+=($A_PID)
sleep 2
# B joins and records for 5s
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_B --record "$RESULTS/t5_b_first.raw" "$RELAY2" &
B_PID=$!; PIDS+=($B_PID)
sleep 5
kill -INT $B_PID 2>/dev/null; wait $B_PID 2>/dev/null || true
log " B disconnected, waiting 3s..."
sleep 3
# B rejoins and records for 5s
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_B --record "$RESULTS/t5_b_rejoin.raw" "$RELAY2" &
B_PID=$!; PIDS+=($B_PID)
sleep 8
kill -INT $B_PID 2>/dev/null; wait $B_PID 2>/dev/null || true
kill -INT $A_PID 2>/dev/null; wait $A_PID 2>/dev/null || true
analyze "$RESULTS/t5_b_first.raw" "B first join (before disconnect)"
analyze "$RESULTS/t5_b_rejoin.raw" "B rejoin (after disconnect)"
echo ""
# ═══════════════════════════════════════════════════════════════
# TEST 6: Multi-participant — 3 users on 3 relays
# ═══════════════════════════════════════════════════════════════
log "TEST 6: Multi-participant — A sends on R1, B records on R2, C records on R3"
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_B --record "$RESULTS/t6_b.raw" "$RELAY2" &
PIDS+=($!); sleep 1
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_C --record "$RESULTS/t6_c.raw" "$RELAY3" &
PIDS+=($!); sleep 1
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_A --send-tone $DURATION "$RELAY1" &
PIDS+=($!); sleep $((DURATION + 3))
# Kill all 3
for i in 1 2 3; do
kill -INT ${PIDS[-$i]} 2>/dev/null || true
done
wait 2>/dev/null || true
PIDS=()
analyze "$RESULTS/t6_b.raw" "B on Relay2 hears A on Relay1"
analyze "$RESULTS/t6_c.raw" "C on Relay3 hears A on Relay1"
echo ""
# ═══════════════════════════════════════════════════════════════
# TEST 7: Simultaneous senders
# ═══════════════════════════════════════════════════════════════
log "TEST 7: Simultaneous — A sends 440Hz on R1, B sends 880Hz on R2, C records on R3"
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_C --record "$RESULTS/t7_c.raw" "$RELAY3" &
PIDS+=($!); sleep 2
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_A --send-tone $DURATION "$RELAY1" &
PIDS+=($!);
RUST_LOG=error $CLIENT --room $ROOM --seed $SEED_B --send-tone $DURATION "$RELAY2" &
PIDS+=($!); sleep $((DURATION + 3))
for i in 1 2 3; do kill ${PIDS[-$i]} 2>/dev/null || true; done
wait 2>/dev/null || true
PIDS=()
analyze "$RESULTS/t7_c.raw" "C hears both A(R1) + B(R2)"
echo ""
# ═══════════════════════════════════════════════════════════════
# SUMMARY
# ═══════════════════════════════════════════════════════════════
echo ""
echo "╔══════════════════════════════════════════════════════════╗"
echo "║ TEST SUMMARY ║"
echo "╠══════════════════════════════════════════════════════════╣"
PASS=0; FAIL=0
for f in "$RESULTS"/t*.raw; do
label=$(basename "$f" .raw)
if [ -s "$f" ]; then
rms=$(python3 -c "
import struct, math
with open('$f','rb') as f: d=f.read()
s=struct.unpack(f'<{len(d)//2}h',d)
print(f'{math.sqrt(sum(x*x for x in s)/len(s)):.0f}')
" 2>/dev/null || echo "0")
if [ "$rms" -gt 50 ] 2>/dev/null; then
echo "║ ✓ $label (RMS: $rms)"
PASS=$((PASS + 1))
else
echo "║ ✗ $label (RMS: $rms)"
FAIL=$((FAIL + 1))
fi
else
echo "║ ✗ $label (NO FILE)"
FAIL=$((FAIL + 1))
fi
done
echo "╠══════════════════════════════════════════════════════════╣"
echo "║ PASSED: $PASS FAILED: $FAIL"
echo "╚══════════════════════════════════════════════════════════╝"
echo ""
echo "Recordings saved to: $RESULTS/"
echo "Play with: ffplay -f s16le -ar 48000 -ac 1 $RESULTS/<file>.raw"

72
scripts/mint-tmux.sh Executable file
View File

@@ -0,0 +1,72 @@
#!/usr/bin/env bash
# =============================================================================
# mint-tmux.sh — run a command inside a persistent tmux session on the
# Linux Mint build box so the user can attach and watch/interact at any time.
#
# Usage:
# mint-tmux.sh run <window-name> <command...> # start a new tmux window
# mint-tmux.sh send <window-name> <text...> # send keys to a window
# mint-tmux.sh kill <window-name> # close a window
# mint-tmux.sh list # list windows
# mint-tmux.sh tail <window-name> # dump last 200 lines
#
# Session name is always "wzp". Attach manually with:
# ssh -t root@172.16.81.192 tmux attach -t wzp
#
# If the wzp session doesn't exist yet, it's created automatically.
# =============================================================================
set -euo pipefail
HOST="root@172.16.81.192"
SESSION="wzp"
SSH_OPTS="-o ConnectTimeout=10 -o LogLevel=ERROR"
ensure_session() {
ssh $SSH_OPTS "$HOST" "
tmux has-session -t $SESSION 2>/dev/null || tmux new-session -d -s $SESSION -n home 'bash -l'
"
}
cmd="${1:-list}"
shift || true
case "$cmd" in
run)
WIN="${1:?window name required}"; shift
ensure_session
# Use a heredoc so multi-arg commands don't need escaping
CMD="$*"
ssh $SSH_OPTS "$HOST" bash -s <<REMOTE
if tmux list-windows -t $SESSION -F '#W' 2>/dev/null | grep -qx '$WIN'; then
tmux kill-window -t $SESSION:$WIN 2>/dev/null || true
fi
tmux new-window -t $SESSION -n '$WIN' "bash -l -c '$CMD; echo; echo --- window $WIN exited with code \\\$?; exec bash -l'"
REMOTE
echo "Started '$WIN' in tmux session $SESSION on $HOST"
echo "Attach: ssh -t $HOST tmux attach -t $SESSION"
;;
send)
WIN="${1:?window name required}"; shift
TEXT="$*"
ssh $SSH_OPTS "$HOST" "tmux send-keys -t $SESSION:$WIN '$TEXT' C-m"
;;
kill)
WIN="${1:?window name required}"
ssh $SSH_OPTS "$HOST" "tmux kill-window -t $SESSION:$WIN 2>/dev/null || true"
;;
list)
ensure_session
ssh $SSH_OPTS "$HOST" "tmux list-windows -t $SESSION"
;;
tail)
WIN="${1:?window name required}"
ssh $SSH_OPTS "$HOST" "tmux capture-pane -p -t $SESSION:$WIN -S -200 || echo 'no such window'"
;;
attach)
exec ssh -t $SSH_OPTS "$HOST" tmux attach -t $SESSION
;;
*)
sed -n '3,20p' "$0"
exit 1
;;
esac

167
scripts/prep-linux-mint.sh Executable file
View File

@@ -0,0 +1,167 @@
#!/usr/bin/env bash
# =============================================================================
# Prepare a Linux Mint / Debian / Ubuntu x86_64 host as a full WarzonePhone
# Android build environment. Installs everything the docker wzp-android-builder
# image has, but directly on the host — so we can iterate locally without
# docker layer caching, see real linker output, run gdbserver, etc.
#
# Target host: root@172.16.81.192 (Linux Mint on the LAN)
#
# Usage (from the macOS workstation):
# scp scripts/prep-linux-mint.sh root@172.16.81.192:/tmp/
# ssh root@172.16.81.192 'nohup bash /tmp/prep-linux-mint.sh > /var/log/wzp-prep.log 2>&1 &'
#
# The script is idempotent: safe to re-run if a step fails. Each stage tests
# for its target before doing work. Progress + completion is pinged to
# ntfy.sh/wzp so we can track it from the phone.
#
# On success the host has:
# - JDK 17
# - Android SDK (cmdline-tools + platforms 34/36, build-tools 34/35, NDK 26.1)
# - Node.js 20 LTS + npm
# - Rust stable + aarch64/armv7/i686/x86_64 android targets
# - cargo-ndk + cargo tauri-cli 2.x
# - /opt/wzp/warzonePhone (cloned workspace checkout on feat/desktop-audio-rewrite)
#
# Everything lives under /opt/android-sdk and /opt/wzp so nothing leaks into $HOME.
# =============================================================================
set -euo pipefail
NTFY_TOPIC="https://ntfy.sh/wzp"
NDK_VERSION="26.1.10909125"
ANDROID_API=34
ANDROID_API_TAURI=36
BUILD_TOOLS_TAURI="35.0.0"
ANDROID_HOME=/opt/android-sdk
WZP_DIR=/opt/wzp
GIT_REPO="ssh://git@git.manko.yoga:222/manawenuz/wz-phone.git"
GIT_BRANCH="feat/desktop-audio-rewrite"
export DEBIAN_FRONTEND=noninteractive
export ANDROID_HOME ANDROID_NDK_HOME="$ANDROID_HOME/ndk/$NDK_VERSION"
export NDK_HOME="$ANDROID_NDK_HOME"
export PATH="$ANDROID_HOME/cmdline-tools/latest/bin:$ANDROID_HOME/platform-tools:/root/.cargo/bin:$PATH"
notify() { curl -s -d "$1" "$NTFY_TOPIC" > /dev/null 2>&1 || true; }
log() { echo -e "\n\033[1;36m[prep-linux-mint]\033[0m $*"; }
die() { notify "wzp prep-linux-mint FAILED: $1"; echo "FATAL: $1" >&2; exit 1; }
trap 'die "line $LINENO"' ERR
notify "wzp prep-linux-mint STARTED on $(hostname) ($(whoami))"
# ─── 1. Base packages ────────────────────────────────────────────────────────
log "Installing base packages..."
apt-get update -qq
apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
cmake \
curl \
file \
git \
libasound2-dev \
libc6-dev \
libssl-dev \
openjdk-17-jdk-headless \
pkg-config \
unzip \
wget \
xz-utils \
zip
# ─── 2. Android SDK + NDK ────────────────────────────────────────────────────
if [ ! -x "$ANDROID_HOME/cmdline-tools/latest/bin/sdkmanager" ]; then
log "Installing Android cmdline-tools..."
mkdir -p "$ANDROID_HOME/cmdline-tools"
cd /tmp
wget -q https://dl.google.com/android/repository/commandlinetools-linux-11076708_latest.zip -O cmdtools.zip
unzip -qo cmdtools.zip -d "$ANDROID_HOME/cmdline-tools"
mv "$ANDROID_HOME/cmdline-tools/cmdline-tools" "$ANDROID_HOME/cmdline-tools/latest"
rm cmdtools.zip
else
log "cmdline-tools already installed"
fi
if [ ! -d "$ANDROID_HOME/ndk/$NDK_VERSION" ] || \
[ ! -d "$ANDROID_HOME/platforms/android-$ANDROID_API" ] || \
[ ! -d "$ANDROID_HOME/platforms/android-$ANDROID_API_TAURI" ]; then
log "Installing Android platforms + NDK $NDK_VERSION..."
yes | "$ANDROID_HOME/cmdline-tools/latest/bin/sdkmanager" --licenses > /dev/null 2>&1 || true
"$ANDROID_HOME/cmdline-tools/latest/bin/sdkmanager" --install \
"platforms;android-$ANDROID_API" \
"build-tools;$ANDROID_API.0.0" \
"platforms;android-$ANDROID_API_TAURI" \
"build-tools;$BUILD_TOOLS_TAURI" \
"ndk;$NDK_VERSION" \
"platform-tools" 2>&1 | grep -v '^\[' || true
else
log "Android SDK components already installed"
fi
# ─── 3. Node.js 20 LTS ───────────────────────────────────────────────────────
if ! command -v node >/dev/null 2>&1 || ! node --version | grep -q "^v20"; then
log "Installing Node.js 20 LTS..."
curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
apt-get install -y --no-install-recommends nodejs
else
log "Node.js already at $(node --version)"
fi
# ─── 4. Rust + Android targets ───────────────────────────────────────────────
if ! command -v rustup >/dev/null 2>&1; then
log "Installing rustup..."
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable
fi
. /root/.cargo/env
log "Ensuring Rust android targets + cargo-ndk + cargo-tauri..."
rustup target add \
aarch64-linux-android \
armv7-linux-androideabi \
i686-linux-android \
x86_64-linux-android
command -v cargo-ndk >/dev/null 2>&1 || cargo install cargo-ndk
command -v cargo-tauri >/dev/null 2>&1 || cargo install tauri-cli --version "^2.0" --locked
# ─── 5. Clone the workspace ──────────────────────────────────────────────────
mkdir -p "$WZP_DIR"
cd "$WZP_DIR"
if [ -d warzonePhone/.git ]; then
log "Pulling latest on $GIT_BRANCH..."
cd warzonePhone
git fetch origin || true
git checkout "$GIT_BRANCH" 2>/dev/null || git checkout -b "$GIT_BRANCH" "origin/$GIT_BRANCH"
git reset --hard "origin/$GIT_BRANCH" || true
else
log "Cloning warzonePhone from $GIT_REPO..."
# The public repo URL needs ssh keys; if unavailable, skip and let the user sort it later
if git clone --branch "$GIT_BRANCH" "$GIT_REPO" warzonePhone 2>/dev/null; then
log " cloned ok"
else
log " clone failed (no SSH keys for $GIT_REPO — skipping, user will rsync)"
fi
fi
# ─── 6. Persistent env for the user ──────────────────────────────────────────
cat > /etc/profile.d/wzp-android.sh <<ENVEOF
export ANDROID_HOME=$ANDROID_HOME
export ANDROID_NDK_HOME=$ANDROID_HOME/ndk/$NDK_VERSION
export NDK_HOME=\$ANDROID_NDK_HOME
export PATH=\$ANDROID_HOME/cmdline-tools/latest/bin:\$ANDROID_HOME/platform-tools:/root/.cargo/bin:\$PATH
ENVEOF
chmod 644 /etc/profile.d/wzp-android.sh
# ─── 7. Sanity summary ───────────────────────────────────────────────────────
log "Sanity checks:"
echo " java: $(java -version 2>&1 | head -1)"
echo " node: $(node --version)"
echo " npm: $(npm --version)"
echo " rustc: $(rustc --version)"
echo " cargo-ndk: $(cargo ndk --version 2>&1 | head -1)"
echo " cargo-tauri:$(cargo tauri --version 2>&1 | head -1)"
echo " NDK dir: $ANDROID_NDK_HOME"
echo " WZP dir: $WZP_DIR/warzonePhone"
notify "wzp prep-linux-mint DONE on $(hostname) — ready at /opt/wzp/warzonePhone"
log "All done."

10
skills-lock.json Normal file
View File

@@ -0,0 +1,10 @@
{
"version": 1,
"skills": {
"caveman": {
"source": "JuliusBrussee/caveman",
"sourceType": "github",
"computedHash": "aa7939fc4d1fe31484090290da77f2d21e026aa4b34b329d00e6630feb985d75"
}
}
}