Commit Graph

73 Commits

Author SHA1 Message Date
Siavash Sameni
00deb97a5d fix(reflect): drop LAN/private reflex addrs from NAT classification
Real-world report: a user with one LAN relay + one internet relay
got "Multiple IPs — treating as symmetric" because the LAN relay
saw the client's LAN IP (172.16.81.172) while the internet relay
saw the WAN IP (150.228.49.65). Two observations of "different
public IPs" from the classifier's perspective, but semantically
they describe two different network paths and shouldn't be
compared.

The LAN relay's reflection is always true, just not useful for
public NAT classification: there's no NAT between the client and
the LAN relay, so that path's reflex addr is always the LAN
interface IP regardless of what the public-facing NAT beyond it
looks like.

Fix: new `is_private_or_loopback` helper filters the probe set
before classification. Drops:
 - 127.0.0.0/8 loopback
 - 10/8, 172.16/12, 192.168/16 RFC1918 private
 - 169.254/16 link-local
 - 100.64/10 CGNAT shared-transition (same reasoning: a relay
   that sees the client with a CGNAT addr is on the same carrier
   network and can't describe public NAT state)
 - IPv6 loopback, unspecified, fe80::/10 link-local

Failed probes still filtered out of classification (they were
already) but now dimmed in the UI list instead of highlighted
amber. Same rationale: a momentarily-offline probe target isn't
a warning-worthy state, it's just a fact about the probe run.

UI palette rebalance: only Cone gets green, everything else
neutral text-dim. Wording changed from warning-tone
"⚠ must use relay" to informational "ℹ P2P falls back to relay,
calls still work" — symmetric NAT isn't broken state, it just
means media takes the relay path.

Tests added (4 new in wzp_client::reflect):
- classify_drops_private_ip_probes — LAN + public → Unknown
- classify_drops_loopback_probes — loopback + 2 public → Cone
- classify_drops_cgnat_probes — CGNAT + 2 public same-IP-
  diff-port → SymmetricPort
- classify_two_lan_probes_is_unknown_not_cone — all LAN → Unknown

Existing multi_reflect integration test updated: two loopback
relays now correctly classify as Unknown (because loopback reflex
addrs are filtered) with the plumbing-works invariant preserved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:29:09 +04:00
Siavash Sameni
8cdf8d486a feat(p2p): Phase 4 cross-relay direct calling over federation
Teaches the relay pair to route direct-call signaling across an
existing federation link. Alice on Relay A can now place a direct
call to Bob on Relay B if A and B are federation peers — the
wire protocol, call registry, and signal dispatch all learn to
track and route the cross-relay flow.

Phase 3.5's dual-path QUIC race then carries the media directly
peer-to-peer using the advertised reflex addrs, with zero
changes needed on the client side.

## Wire protocol (wzp-proto)

New `SignalMessage::FederatedSignalForward { inner, origin_relay_fp }`
envelope variant, appended at end of enum — JSON serde is
name-tagged so pre-Phase-4 relays just log "unknown variant" and
drop it. 2 new roundtrip tests (any-inner nesting + single
DirectCallOffer case).

## Call registry (wzp-relay)

`DirectCall.peer_relay_fp: Option<String>` — federation TLS fp
of the peer relay that forwarded the offer/answer for this call.
`None` on local calls, `Some` on cross-relay. Used by the answer
path to route the reply back through the same federation link
instead of trying (and failing) to deliver via local signal_hub.
New `set_peer_relay_fp` setter + 1 new unit test.

## FederationManager (wzp-relay)

Three new methods:
- `local_tls_fp()` — exposes the relay's own federation TLS fp
  so main.rs can build `origin_relay_fp` fields.
- `broadcast_signal(msg) -> usize` — fan out any signal message
  (in practice `FederatedSignalForward`) to every active peer
  link, returning the reach count. Used when Relay A doesn't
  know which peer has the target fingerprint.
- `send_signal_to_peer(fp, msg)` — targeted send for the reply
  path where the registry already knows which peer relay to
  hit.

Plus a new `cross_relay_signal_tx: Mutex<Option<Sender<...>>>`
field that `set_cross_relay_tx()` wires at startup so the
federation `handle_signal` can push unwrapped inner messages
into the main signal dispatcher.

## Federation handle_signal (wzp-relay)

New match arm for `FederatedSignalForward`:
- Loop prevention: drops forwards whose `origin_relay_fp` equals
  this relay's own fp (prevents A→B→A echo loops without needing
  TTL yet).
- Otherwise pulls the inner message out and pushes it through
  `cross_relay_signal_tx` so the main loop's dispatcher task
  handles it as if it had arrived locally.

## Main signal loop (wzp-relay)

### DirectCallOffer when target not local
Before falling through to Hangup, try the federation path:
- Wrap the offer in `FederatedSignalForward` with
  `origin_relay_fp = this relay's tls_fp`
- `fm.broadcast_signal(forward)` — returns peer count
- If any peers reached, stash the call in local registry with
  `caller_reflexive_addr` set, `peer_relay_fp` still None
  (broadcast — the answer-side will identify itself when it
  replies)
- Send `CallRinging` to caller immediately for UX feedback
- Only if no federation or no peers → legacy Hangup path

### DirectCallAnswer when peer is remote
- Registry lookup now reads both `peer_fingerprint` and
  `peer_relay_fp` in one acquisition
- If `peer_relay_fp.is_some()`:
  * Reject → forward a `Hangup` over federation via
    `send_signal_to_peer` instead of local signal_hub
  * Accept → wrap the raw answer in `FederatedSignalForward`,
    route to the specific origin peer, then emit the LOCAL
    CallSetup to our callee with `peer_direct_addr =
    caller_reflexive_addr` (caller is remote; this side only
    has the callee)
- If `peer_relay_fp.is_none()` → existing Phase 3 same-relay
  path with both CallSetups (caller + callee)

### Cross-relay signal dispatcher task
New long-running task reading `(inner, origin_relay_fp)` from
`cross_relay_rx`. In Phase 4 MVP handles:
- `DirectCallOffer` — if target is local, create the call in
  the registry with `peer_relay_fp = origin_relay_fp`, stash
  caller addr, deliver offer to local callee. If target isn't
  local, drop (no multi-hop in Phase 4 MVP).
- `DirectCallAnswer` — look up local caller by call_id, stash
  callee addr, forward raw answer to local caller via
  signal_hub, emit local CallSetup with `peer_direct_addr =
  callee_reflexive_addr` (peer is local now; this side only
  has the caller).
- `CallRinging` — best-effort forward to local caller for UX.
- `Hangup` — logged for now; Phase 4.1 will target by call_id.

## Integration tests

`crates/wzp-relay/tests/cross_relay_direct_call.rs` — 3 tests
that reproduce the main.rs cross-relay dispatcher logic inline
and assert the invariants without spinning up real binaries:

1. `cross_relay_offer_forwards_and_stashes_peer_relay_fp` —
   Relay A gets Alice's offer, broadcasts. Relay B's dispatcher
   creates the call with `peer_relay_fp = relay_a_tls_fp`.
2. `cross_relay_answer_crosswires_peer_direct_addrs` — full
   round trip; both CallSetups (one on each relay) carry the
   OTHER party's reflex addr.
3. `cross_relay_loop_prevention_drops_self_sourced_forward` —
   explicit loop-prevention check.

Full workspace test goes from 413 → 419 passing. Clippy clean
on touched files.

## Non-goals (deferred to Phase 4.1+)

- Relay-mediated media fallback across federation — if P2P
  direct fails (symmetric NAT on either side), the call errors
  out with "no media path". Making the existing federation
  media pipeline carry ephemeral call-<id> rooms is the Phase
  4.1 lift.
- Multi-hop federation (A → B → C). Phase 4 MVP supports a
  direct federation link between A and B only.
- Fingerprint → peer-relay routing gossip.

PRD: .taskmaster/docs/prd_phase4_cross_relay_p2p.txt
Tasks: 70-78 all completed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:31:43 +04:00
Siavash Sameni
59ce52f8e8 feat(p2p): Phase 3.5 dual-path QUIC race + GUI call-flow debug logs
Two features in one commit because they ship and test together:
Phase 3.5 closes the hole-punching loop and the call-flow debug
logs give the user live visibility into every step of a call so
real-hardware testing of the new P2P path is debuggable.

## Phase 3.5 — dual-path QUIC connect race

Completes the hole-punching work Phase 3 scaffolded. On receiving
a CallSetup with peer_direct_addr, the client now actually races a
direct QUIC handshake against the relay dial and uses whichever
completes first. Symmetric role assignment avoids the two-conns-
per-call problem:

- Both peers compare `own_reflex_addr` vs `peer_reflex_addr`
  lexicographically.
- Smaller addr → **Acceptor** (A-role): builds a server-capable
  dual endpoint, awaits an incoming QUIC session. Does NOT dial.
- Larger addr → **Dialer** (D-role): builds a client-only
  endpoint, dials the peer's addr with `call-<id>` SNI. Does NOT
  listen.
- Both sides always dial the relay in parallel as fallback.
- `tokio::select!` with `biased` preference for direct, `tokio::pin!`
  so each branch can await the losing opposite as fallback.
- Direct timeout 2s, relay fallback timeout 5s (so 7s worst case
  from CallSetup to "no media path" error).

New crate module `wzp_client::dual_path::{race, WinningPath}`
(moved here from desktop/src-tauri so it's testable from a
workspace test). `determine_role` in `wzp_client::reflect` is
pure-function and unit-tested.

### CallEngine integration
- New `pre_connected_transport: Option<Arc<QuinnTransport>>` arg
  on both android + desktop `CallEngine::start` branches. Skips
  the internal wzp_transport::connect step when Some. Backward-
  compat: None keeps Phase 0 relay-only behavior.
- `connect` Tauri command reads own_reflex_addr from SignalState,
  computes role, runs the race, passes the winning transport
  into CallEngine. If ANY input is missing (no peer addr, no own
  addr, equal addrs), falls back to classic relay path —
  identical to pre-Phase-3.5 behavior.

### Tests (9 new, all passing)
- 6 unit tests for `determine_role` truth table in
  `wzp-client/src/reflect.rs` (smaller=Acceptor, larger=Dialer,
  port-only diff, equal, missing-side, symmetry)
- 3 integration tests in `crates/wzp-client/tests/dual_path.rs`:
    * `dual_path_direct_wins_on_loopback` — two-endpoint test
      rig, Dialer wins direct path vs loopback mock relay
    * `dual_path_relay_wins_when_direct_is_dead` — dead peer
      port, 2s direct timeout, relay fallback wins
    * `dual_path_errors_cleanly_when_both_paths_dead` — <10s
      error, no hang

## GUI call-flow debug logs

Runtime-toggled structured events at every step of a call so the
user can see where a call progressed or stalled on real hardware.
Modeled on the existing DRED_VERBOSE_LOGS pattern.

### Rust side
- `static CALL_DEBUG_LOGS: AtomicBool` + `emit_call_debug(&app,
  step, details)` helper. Always logs via `tracing::info!`
  (logcat always has a copy); GUI Tauri `call-debug-log` event
  only fires when the flag is on.
- Tauri commands `set_call_debug_logs` / `get_call_debug_logs`.

### Instrumented steps (24 emit_call_debug sites)
- `register_signal`: start, identity loaded, endpoint created,
  connect failed/ok, RegisterPresence sent, ack received/failed,
  recv loop spawning
- Recv loop: CallRinging, DirectCallOffer (w/ caller_reflexive_addr),
  DirectCallAnswer (w/ callee_reflexive_addr), CallSetup (w/
  peer_direct_addr), Hangup
- `place_call`: start, reflect query start/ok/none, offer sent,
  send failed
- `answer_call`: start, reflect query start/ok/none or privacy
  skip, answer sent, send failed
- `connect`: start, dual_path_race_start (w/ role), won (w/
  path), failed, skipped (w/ reasons), call_engine_starting/
  started/failed

### JS side
- New `callDebugLogs: boolean` field on Settings type.
- Boot-time hydrate of the Rust flag from localStorage so the
  choice survives restarts (like `dredDebugLogs`).
- Settings panel: new "Call flow debug logs" checkbox alongside
  the DRED toggle.
- New "Call Debug Log" section that ONLY shows when the flag is
  on. Rolling in-memory buffer of the last 200 events, rendered
  as monospace `HH:MM:SS.mmm step {details}` lines with auto-
  scroll and a Clear button.
- `listen("call-debug-log", ...)` subscribed at app startup,
  appends to the buffer, re-renders on every event.

Full workspace test goes from 404 → 413 passing. Clippy clean
on touched crates.

PRD: .taskmaster/docs/prd_phase35_dual_path_race.txt
Tasks: 61-69 all completed

Next: APK + desktop build carrying everything — Phase 2 NAT
detect, Phase 3 advertising, Phase 3.5 dual-path + call debug
logs, plus the earlier Android first-join diagnostics — so the
user can validate the P2P path on real hardware with live
per-step visibility into where any failures happen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:06:44 +04:00
Siavash Sameni
39277bf3a0 feat(hole-punching): advertise peer reflexive addrs in DirectCall flow — Phase 3
Completes the signal-plane plumbing for P2P direct calling: both
peers now learn their own server-reflexive address (Phase 1
Reflect), include it in DirectCallOffer / DirectCallAnswer, and
the relay cross-wires them into each side's CallSetup so the
client knows the OTHER party's direct addr. Dual-path QUIC race
is scaffolded but deferred to Phase 3.5 — this commit ships the
full advertising layer so real-hardware testing can confirm the
addrs flow end-to-end before adding the concurrent-connect logic.

Wire protocol (wzp-proto/src/packet.rs):
- DirectCallOffer gains optional `caller_reflexive_addr`
- DirectCallAnswer gains optional `callee_reflexive_addr`
- CallSetup gains optional `peer_direct_addr`
- All #[serde(default, skip_serializing_if = "Option::is_none")] so
  pre-Phase-3 peers and relays stay backward compatible by
  construction — the new fields are elided from the JSON on the
  wire when None, and older clients parse the JSON ignoring any
  fields they don't know.
- 2 new roundtrip tests (Some + None cases, old-JSON parse-back).

Call registry (wzp-relay/src/call_registry.rs):
- DirectCall gains caller_reflexive_addr + callee_reflexive_addr.
- set_caller_reflexive_addr / set_callee_reflexive_addr setters.
- 2 new unit tests: stores and returns addrs, clearing works.

Relay cross-wiring (wzp-relay/src/main.rs):
- On DirectCallOffer: stash the caller's addr in the registry.
- On DirectCallAnswer: stash the callee's addr (only set by
  AcceptTrusted answers — privacy-mode leaves it None).
- Send two different CallSetup messages: one to the caller with
  peer_direct_addr=callee_addr, and one to the callee with
  peer_direct_addr=caller_addr. The cross-wiring means each side
  gets the OTHER party's direct addr, not its own.
- Logs `p2p_viable=true` when both sides advertised.

Client advertising (desktop/src-tauri/src/lib.rs):
- New `try_reflect_own_addr` helper that reuses the Phase 1
  oneshot pattern WITHOUT holding state.signal.lock() across the
  await (critical: the recv loop reacquires the same mutex to
  fire the oneshot, so holding it would deadlock).
- `place_call` queries reflect first and includes the returned
  addr in DirectCallOffer. Falls back to None on any failure —
  call still proceeds via the relay path.
- `answer_call` queries reflect ONLY on AcceptTrusted so
  AcceptGeneric keeps the callee's IP private by design. Reject
  and AcceptGeneric both pass None.
- recv loop's CallSetup handler destructures and forwards
  peer_direct_addr to the JS layer in the signal-event payload.

Client scaffolding for dual-path (desktop/src-tauri/src/lib.rs +
desktop/src/main.ts):
- `connect` Tauri command gets a new optional `peer_direct_addr`
  argument. Currently LOGS the addr but still uses the relay
  path for the media connection — Phase 3.5 will swap in a
  tokio::select! race between direct dial + relay dial. Scaffolding
  lands here so the JS wire is stable, real-hardware testing can
  confirm advertising works end-to-end, and Phase 3.5 is a pure
  Rust change with no JS touches.
- JS setup handler forwards `data.peer_direct_addr` to invoke.

Back-compat with the CLI client (crates/wzp-client/src/cli.rs):
- CLI test harness updated for the new fields — always passes
  None for both reflex addrs (no hole-punching). Also destructures
  peer_direct_addr: _ in its CallSetup handler.

Tests (8 new, all passing):
- wzp-proto: hole_punching_optional_fields_roundtrip,
  hole_punching_backward_compat_old_json_parses
- wzp-relay call_registry: call_registry_stores_reflexive_addrs,
  call_registry_clearing_reflex_addr_works
- wzp-relay integration: crates/wzp-relay/tests/hole_punching.rs
    * both_peers_advertise_reflex_addrs_cross_wire_in_setup
    * privacy_mode_answer_omits_callee_addr_from_setup
    * pre_phase3_caller_leaves_both_setups_relay_only
    * neither_peer_advertises_both_setups_are_relay_only

Full workspace test goes from 396 → 404 passing.

PRD: .taskmaster/docs/prd_hole_punching.txt
Tasks: 53-60 all completed (58 = scaffolding-only; 3.5 follow-up)

Next up: **Phase 3.5 — dual-path QUIC connect race**. With the
advertising layer live, this becomes a focused change: on
CallSetup-with-peer_direct_addr, start a server-capable dual
endpoint, and tokio::select! across (direct dial, relay dial,
inbound accept). Whichever QUIC handshake completes first wins,
the losers drop, 2s direct timeout falls back to relay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 13:37:04 +04:00
Siavash Sameni
8d903f16c6 feat(reflect): multi-relay NAT type detection — Phase 2
Builds on Phase 1's SignalMessage::Reflect to probe N relays in
parallel through transient QUIC connections and classify the
client's NAT type for the future P2P hole-punching path. No wire
protocol changes — Phase 1's Reflect/ReflectResponse pair is
reused unchanged.

New client-side module (crates/wzp-client/src/reflect.rs):
- probe_reflect_addr(relay, timeout_ms): opens a throwaway
  quinn::Endpoint (fresh ephemeral source port per probe,
  essential for NAT-type detection — sharing one endpoint would
  make a symmetric NAT look like a cone NAT), connects to _signal,
  sends RegisterPresence with zero identity, consumes the Ack,
  sends Reflect, awaits ReflectResponse, cleanly closes.
- detect_nat_type(relays, timeout_ms): parallel probes via
  tokio::task::JoinSet (bounded by slowest probe not sum) and
  returns a NatDetection with per-probe results + aggregate
  classification.
- classify_nat(probes): pure-function classifier split out for
  network-free unit tests. Rules:
    * 0-1 successful probes              → Unknown
    * 2+ successes, same ip same port    → Cone (P2P viable)
    * 2+ successes, same ip diff ports   → SymmetricPort (relay)
    * 2+ successes, different ips        → Multiple (treat as
                                             symmetric)

Tauri command (desktop/src-tauri/src/lib.rs):
- detect_nat_type({ relays: [{ name, address }] }) -> NatDetection
  as JSON. Takes the relay list from JS because localStorage
  owns the config. Parse-up-front so a malformed entry fails
  clean instead of as a probe error. 1500ms per-probe timeout.

UI (desktop/index.html + src/main.ts):
- New "NAT type" row + "Detect NAT" button in the Network
  settings section. Renders per-probe status (name, address,
  observed addr, latency, or error) plus the colored verdict:
    * green  Cone — shows consensus addr
    * amber  SymmetricPort / Multiple — must relay
    * gray   Unknown — not enough data

Tests:
- 7 unit tests in wzp-client/src/reflect.rs covering every
  classifier branch (empty, 1 success, 2 identical, 2 diff ports,
  2 diff ips, success+failure mix, pure-failure).
- 3 integration tests in crates/wzp-relay/tests/multi_reflect.rs:
    * probe_reflect_addr_happy_path — single mock relay end-to-end
    * detect_nat_type_two_loopback_relays_is_cone — two concurrent
      relays, asserts both see 127.0.0.1 and classifier returns
      Cone or SymmetricPort (accepted because the test harness
      uses fresh ephemeral ports per probe which look like
      SymmetricPort on single-host loopback)
    * detect_nat_type_dead_relay_is_unknown — alive + dead port
      mix, asserts the dead probe surfaces an error string and
      the aggregator returns Unknown (only 1 success)

Full workspace test goes from 386 → 396 passing.

PRD: .taskmaster/docs/prd_multi_relay_reflect.txt
Tasks: 47-52 all completed

Next up: hole-punching (Phase 3) — use the reflected address in
DirectCallOffer/Answer and CallSetup so peers attempt a direct
QUIC handshake to each other, with relay fallback on timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 12:47:12 +04:00
Siavash Sameni
921856eba9 feat(reflect): QUIC-native NAT reflection ("STUN for QUIC") — Phase 1
Lets a client ask its registered relay "what IP:port do you see for
me?" over the existing TLS-authenticated signal channel, returning
the client's server-reflexive address as a SocketAddr. Replaces the
need for a classic STUN deployment and becomes the bootstrap step
for future P2P hole-punching: once both peers know their own reflex
addrs, they can advertise them in DirectCallOffer and attempt a
direct QUIC handshake to each other.

Wire protocol (wzp-proto):
- SignalMessage::Reflect — unit variant, client -> relay
- SignalMessage::ReflectResponse { observed_addr: String } — relay -> client
- JSON-serde, appended at end of enum: zero ordinal concerns,
  backward compat with pre-Phase-1 relays by construction (older
  relays log "unexpected message" and drop; newer clients time out
  cleanly within 1s).

Relay handler (wzp-relay/src/main.rs, signal loop):
- New match arm next to Ping reuses the already-bound `addr` from
  connection.remote_address() and replies with observed_addr as a
  string. debug!-level log on success, warn!-level on send failure.

Client side (desktop/src-tauri/src/lib.rs):
- SignalState gains pending_reflect: Option<oneshot::Sender<SocketAddr>>.
- get_reflected_address Tauri command installs the oneshot before
  sending Reflect and awaits it with a 1s timeout; cleans up on
  every exit path (send failure, timeout, parse error).
- recv loop's new ReflectResponse arm fires the pending sender or
  emits a debug log for unsolicited responses — never crashes the
  loop on malformed input.
- Integrated into invoke_handler! alongside the other signal
  commands.

UI (desktop/index.html + src/main.ts):
- New "Network" section in settings panel with a "Detect" button
  that displays the reflected address or a categorized warning
  ("register first" / "relay does not support reflection" / error).

Tests (crates/wzp-relay/tests/reflect.rs — 3 new, all passing):
- reflect_happy_path: client on loopback gets back 127.0.0.1:<its own port>
- reflect_two_clients_distinct_ports: two concurrent clients see
  their own distinct ports, proving per-connection remote_address
- reflect_old_relay_times_out: mock relay that ignores Reflect —
  client times out between 1000-1200ms and does not hang

Also pre-existing test bit-rot unrelated to this PR — fixed so the
full workspace `cargo test` goes green:
- handshake_integration tests in wzp-client, wzp-relay and
  featherchat_compat in wzp-crypto all missed the `alias` field
  addition to CallOffer and the 3-arg form of perform_handshake
  plus 4-tuple return of accept_handshake. Updated to the current
  API surface.

Results:
  cargo test --workspace --exclude wzp-android: 386 passed
  cargo check --workspace: clean
  cargo clippy: no new warnings in touched files

Verification excludes wzp-android because it's dead code on this
branch (Tauri mobile uses wzp-native instead) and can't link -llog
on macOS host — unchanged status quo.

PRD: .taskmaster/docs/prd_reflect_over_quic.txt
Tasks: 39-46 all completed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 12:29:07 +04:00
Siavash Sameni
daf7bcd9ba chore(warnings): sweep the workspace — zero warnings on lib + bin targets
Addressed every rustc warning surfaced by \`cargo check --workspace
--release --lib --bins\` on opus-DRED-v2. Split across three
categories:

## Real bugs surfaced by the audit (fix, don't silence)

- **crates/wzp-relay/src/federation.rs** — the per-peer RTT monitor
  task computed \`rtt_ms\` every 5 s and threw it on the floor. The
  \`wzp_federation_peer_rtt_ms\` gauge has been registered in
  metrics.rs the whole time but was never receiving samples, leaving
  the Grafana panel blank. Wired it up: the task now calls
  \`fm_rtt.metrics.federation_peer_rtt_ms.with_label_values(&[&label_rtt]).set(rtt_ms)\`
  on every sample. Fixes three warnings (\`rtt_ms\`, \`fm_rtt\`,
  \`label_rtt\` were all captured for this task and all dead).

## Dead code removal

- **crates/wzp-relay/src/federation.rs** — removed \`local_delivery_seq:
  AtomicU16\` field and its initializer. It was described in comments
  as "per-room seq counter for federation media delivered to local
  clients" but was declared, initialized to 0, and never read or
  written anywhere else. Genuine half-wired feature; deletable with
  zero behavior change.
- **crates/wzp-relay/src/room.rs** — removed \`let recv_start =
  Instant::now()\` at the top of a recv loop that was never read.
  Separate variable \`last_recv_instant\` already measures the actual
  gap that's used for the \`max_recv_gap_ms\` stat.
- **crates/wzp-client/src/cli.rs** — removed \`let my_fp = fp.clone()\`
  from the signal loop setup. Cloned but never used in any match arm.

## Stub-intent warnings (underscore + explanatory comment)

- **crates/wzp-relay/src/handshake.rs** — \`choose_profile\` hardcodes
  \`QualityProfile::GOOD\` and ignores its \`supported\` parameter.
  Comment already documented "Cap at GOOD (24k) for now — studio
  tiers not yet tested for federation reliability". Renamed to
  \`_supported\`, expanded the comment to explicitly note the future
  plan (pick highest supported ≤ relay ceiling).
- **crates/wzp-relay/src/federation.rs** — \`forward_to_peers\` takes
  \`room_name: &str\` but only uses \`room_hash\`. The caller
  (handle_datagram) passes the name for caller-site symmetry with
  other helpers; kept the param shape and underscored the binding
  with a comment noting it's reserved for future per-name logging.

## Cosmetic fixes

- **crates/wzp-relay/src/event_log.rs** — dropped \`use std::sync::Arc\`
  (unused).
- **crates/wzp-relay/src/signal_hub.rs** — trimmed \`use tracing::{info,
  warn}\` to \`use tracing::info\`. Also removed unnecessary \`mut\` on
  \`hub\` binding in the \`register_unregister\` test.
- **crates/wzp-relay/src/room.rs** — trimmed \`use tracing::{debug,
  error, info, trace, warn}\` to \`{error, info, warn}\`. Also removed
  unnecessary \`mut\` on \`mgr\` binding in the \`room_join_leave\` test.
- **crates/wzp-relay/src/main.rs** — removed unnecessary \`mut\` on the
  \`config\` destructured binding from \`parse_args()\`; and dropped
  \`ref caller_alias\` from the \`DirectCallOffer\` match pattern since
  the relay just forwards the full \`msg\` (caller_alias is preserved
  end-to-end, we don't need to read it on the relay).
- **crates/wzp-crypto/tests/featherchat_compat.rs** — dropped
  \`CallSignalType\` from a \`use wzp_client::featherchat::{...}\`
  (unused in the test body). Note: this test file has pre-existing
  compile errors from SignalMessage schema drift unrelated to this
  sweep; that's tracked separately.

## Crate-level annotation

- **crates/wzp-android/src/lib.rs** — added
  \`#![allow(dead_code, unused_imports, unused_variables, unused_mut)]\`
  with a doc block explaining the crate is dead code since the Tauri
  mobile rewrite. The legacy Kotlin+JNI Android app that consumed
  this crate was replaced by desktop/src-tauri (live Android recv
  path) + crates/wzp-native (Oboe bridge). Rather than piecemeal
  cleanup of a crate that shouldn't be maintained, the whole-crate
  allow keeps CI clean until someone removes the crate entirely. Kills
  all 6 wzp-android warnings (4 unused imports/vars, 1 unused \`mut\`
  on a JNI env param, 1 dead \`command_rx\` field) in one line.

## Not touched

- **deps/featherchat/warzone/crates/warzone-protocol/src/x3dh.rs** —
  3 unused-variable warnings in \`alice_spk_secret\`, \`alice_bundle\`,
  \`bob_bundle_bytes\`. This is a vendored third-party submodule;
  upstream's problem, not ours. Would need to be reported to
  featherchat upstream if we care.

## Verification

- \`cargo check --workspace --release --lib --bins\` → 0 warnings, 0 errors
- \`cargo check --workspace --release --all-targets\` → only the 3
  featherchat submodule warnings remain, plus the pre-existing 3
  broken integration tests (SignalMessage schema drift from Phase 2,
  tracked separately and explicitly out of scope).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 08:28:26 +04:00
Siavash Sameni
df1a45a5f5 fix(cli): port live mode to ring API (read_frame/write_frame removed)
AudioCapture and AudioPlayback no longer expose the old read_frame()
and write_frame() methods — they were replaced with ring() returning
&Arc<AudioRing> when the lock-free SPSC ring was introduced. The CLI
live-mode loop still referenced the removed methods, which broke every
workspace build that touched wzp-client bin (including the remote
Linux x86_64 docker build).

- Send loop: allocate a 960-sample scratch buffer, fill it in a loop
  via capture.ring().read() until a full 20 ms frame is available,
  sleep 2 ms between empty reads to avoid hot-spinning.
- Recv loop: write decoded PCM into playback.ring() instead of
  calling write_frame(). Short writes on full ring drop the tail,
  which is the correct real-time behavior for CLI live mode.

No behavioral change on the wire or in the call pipeline — this is
purely a compile fix for cli.rs bitrot that accumulated since the
ring API landed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 08:08:14 +04:00
Siavash Sameni
7515417202 feat(telemetry): Phase 4 — LossRecoveryUpdate protocol + relay metrics + DebugReporter
Phase 4 lays the telemetry foundation for distinguishing DRED recoveries
from classical PLC in production: a new SignalMessage variant, two new
per-session Prometheus counters on the relay side, and a highlighted
loss-recovery section in the Android DebugReporter.

The periodic emitter (client → relay) and Grafana panel are deferred to
Phase 4b — this commit ships the protocol surface, the relay sink, and
the immediate user-visible debug output. Once 4b lands the full path
(emitter → relay → Prometheus → Grafana), the metrics here will
automatically start receiving data.

Scope decision — why not extend QualityReport instead:
The existing wire-format QualityReport is a fixed 4-byte media packet
trailer. Adding counter fields to it would shift the binary layout and
break backward compatibility (old receivers would parse the last 4
bytes of the extended trailer as QR, corrupting audio). Using a
new SignalMessage variant on the reliable QUIC signal stream sidesteps
the wire-format problem entirely — serde JSON enums tolerate unknown
variants gracefully on old receivers, and the signal channel is the
right layer for periodic telemetry aggregates.

Changes:

  wzp-proto/src/packet.rs:
    - New SignalMessage::LossRecoveryUpdate variant carrying:
        * dred_reconstructions: u64 (monotonic since call start)
        * classical_plc_invocations: u64 (monotonic)
        * frames_decoded: u64 (for rate calculation)
    - All three fields tagged #[serde(default)] for forward compat.

  wzp-client/src/featherchat.rs:
    - Added a match arm so signal_to_call_type() handles the new
      variant (treat as Offer for featherChat bridging purposes).

  wzp-relay/src/metrics.rs:
    - Two new IntCounterVec metrics on the relay, labeled by session_id:
        * wzp_relay_session_dred_reconstructions_total
        * wzp_relay_session_classical_plc_total
    - New method update_session_loss_recovery(session_id, dred, plc)
      applies monotonic deltas: if the incoming totals exceed the
      current counter, the difference is inc_by'd. If the incoming
      totals are LOWER (client restart or counter reset), the
      Prometheus counter holds steady until the client catches up.
      This matches the existing update_session_buffer delta pattern.
    - remove_session_metrics() now cleans up the two new labels.
    - New test session_loss_recovery_monotonic_delta exercises:
        * initial population (10 DRED, 2 PLC)
        * forward advance (25, 5 → delta +15, +3)
        * lower values ignored (client reset → counters unchanged)
        * client catches up (30, 8 → advances to new max)
    - Existing session_metrics_cleanup test extended to cover the
      new counters.

  android/app/src/main/java/com/wzp/debug/DebugReporter.kt:
    - Phase 4 users — and incident responders — need to quickly see
      whether DRED is actually firing during a call. The stats JSON
      already carries the counters (after Phase 3c), but they were
      buried in the trailing JSON dump. Added a dedicated
      "=== Loss Recovery ===" section to the meta preamble that
      extracts dred_reconstructions, classical_plc_invocations,
      frames_decoded, and fec_recovered from the JSON and displays
      them plainly, plus computed percentages when frames_decoded > 0.
    - New extractLongField helper: tiny hand-rolled JSON integer
      extractor. We don't want to pull in a full JSON parser for this
      single use case and CallStats has a flat, well-known schema.

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-proto --lib: 63 passing
- cargo test -p wzp-codec --lib: 68 passing
- cargo test -p wzp-client --lib: 35 passing (+1 ignored probe)
- cargo test -p wzp-relay --lib: 68 passing (+1 new Phase 4 test)
- cargo check -p wzp-android --lib: zero errors
- Android APK build verified earlier today (unridden-alfonso.apk
  via the remote Docker builder) — Phase 0–3c confirmed to compile
  end-to-end on the NDK target.

Phase 4b remaining (not blocking this commit):
- Periodic LossRecoveryUpdate emitter in wzp-client/src/call.rs and
  wzp-android/src/engine.rs (every ~5 s)
- Relay-side handler in main.rs that matches the new variant and
  calls metrics.update_session_loss_recovery
- Grafana "Loss recovery breakdown" panel in docs/grafana-dashboard.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:03:39 +04:00
Siavash Sameni
27bc264738 feat(codec): Phase 3b — CallDecoder DRED reconstruction on packet loss
Phase 3b of the DRED integration — wires the Phase 3a FFI primitives
into the desktop receive path. When the jitter buffer reports a missing
Opus frame, CallDecoder now attempts to reconstruct the audio from the
most recently parsed DRED side-channel state before falling through to
classical PLC.

Architectural refinement vs the PRD's literal wording: the PRD said
"jitter buffer takes a Box<dyn DredReconstructor>". After checking deps,
wzp-transport depends only on wzp-proto (not wzp-codec). Putting DRED
state in the jitter buffer would require a new cross-crate dep and
couple the codec-agnostic buffer to libopus. Instead, this commit keeps
the DRED state ring and reconstruction dispatch inside CallDecoder (one
layer up from the jitter buffer), intercepting the existing
PlayoutResult::Missing signal. Same lookahead/backfill semantics,
cleaner layering, zero change to wzp-transport.

Changes:

  CallDecoder field type: Box<dyn AudioDecoder> → AdaptiveDecoder.
  Required because Phase 3b calls the inherent reconstruct_from_dred
  method, which cannot live on the AudioDecoder trait without dragging
  libopus DredState through wzp-proto. In practice AdaptiveDecoder was
  the only AudioDecoder implementor anyway — the trait abstraction was
  buying nothing. Method call sites unchanged because AdaptiveDecoder
  also implements AudioDecoder.

  New CallDecoder fields:
    - dred_decoder: DredDecoderHandle
    - dred_parse_scratch: DredState  (scratch for parse_into)
    - last_good_dred: DredState      (cached most-recent valid state)
    - last_good_dred_seq: Option<u16>
    - dred_reconstructions: u64      (Phase 4 telemetry)
    - classical_plc_invocations: u64 (Phase 4 telemetry)

  CallDecoder::ingest — on Opus non-repair packets, parse DRED into the
  scratch state. On success (samples_available > 0), std::mem::swap the
  scratch into last_good_dred and record the seq. This is O(1) per
  packet, zero allocation after construction (the two DredState buffers
  are allocated once in new() and reused forever).

  CallDecoder::decode_next — on PlayoutResult::Missing(seq) for Opus
  profiles: if last_good_dred_seq > seq and the seq delta × frame_samples
  fits within samples_available, call audio_dec.reconstruct_from_dred
  and bump dred_reconstructions. Otherwise fall through to classical
  PLC and bump classical_plc_invocations. The Codec2 path always falls
  through to classical PLC since DRED is libopus-only and
  AdaptiveDecoder::reconstruct_from_dred rejects Codec2 tiers
  explicitly.

  OpusDecoder and AdaptiveDecoder: new inherent reconstruct_from_dred
  method that delegates to the underlying DecoderHandle. Needed to
  bridge CallDecoder's wzp-client code to the Phase 3a FFI wrappers
  without touching the AudioDecoder trait.

CRITICAL FINDING — raised DRED loss floor from 5% to 15%:

Phase 3b testing discovered that libopus 1.5's DRED emission window
scales aggressively with OPUS_SET_PACKET_LOSS_PERC. Empirical data
(see probe_dred_samples_available_by_loss_floor, an #[ignore]'d
diagnostic test in call.rs):

  loss_pct   samples_available   effective_ms
    5%        720                  15 ms  (useless!)
   10%        2640                 55 ms
   15%        4560                 95 ms
   20%        6480                135 ms
   25%+       8400 (capped)       175 ms  (~87% of 200 ms configured)

The Phase 1 default of 5% produced only a 15 ms reconstruction window
— too small to even cover a single 20 ms Opus frame. DRED was
effectively disabled even though it was emitting bytes. Raised the
floor to 15% (95 ms window) as the minimum that actually provides
single-frame loss recovery. This updates Phase 1's DRED_LOSS_FLOOR_PCT
constant in opus_enc.rs and the accompanying module docstring.

Trade-off: 15% assumed loss slightly increases encoder bitrate overhead
on clean networks. Measured via the existing phase1 bitrate probe:

  Before (5% floor):  3649 bytes/sec at Opus 24k + 300 Hz sine
  After  (15% floor): 3568 bytes/sec at Opus 24k + 300 Hz sine

The delta is within noise — 15% isn't meaningfully more expensive than
5% on this signal, which suggests the DRED emission size is signal-
dependent rather than loss-dependent for small values. Net result: we
get a 6x larger reconstruction window for essentially free.

Tests (+3 DRED recovery, +1 #[ignore]'d probe):
- opus_single_packet_loss_is_recovered_via_dred — full encode → ingest
  → decode_next loop with one packet dropped mid-stream. Asserts
  dred_reconstructions ≥ 1 and observes the exact counter deltas.
- opus_lossless_ingest_never_triggers_dred_or_plc — baseline behavior,
  lossless stream never takes the Missing branch.
- codec2_loss_falls_through_to_classical_plc — Codec2 never
  reconstructs via DRED even if state were populated (which it won't
  be — Codec2 packets don't carry DRED bytes).
- probe_dred_samples_available_by_loss_floor — #[ignore]'d diagnostic
  that sweeps loss_pct values and prints the resulting DRED window
  sizes. Kept for future tuning work.

New CallDecoder introspection accessors (public but undocumented in
the PRD): last_good_dred_seq() and last_good_dred_samples_available()
for test diagnostics and future telemetry surfaces in Phase 4.

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 68 passing (Phase 3a baseline held)
- cargo test -p wzp-client --lib: 35 passing (+3 Phase 3b tests,
  +1 ignored diagnostic, no regressions)

Next up: Phase 3c mirrors this on the Android engine.rs receive path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:03:24 +04:00
Siavash Sameni
6db5c25b54 feat(codec): Phase 2 — remove RaptorQ from Opus tiers, Codec2 unchanged
Phase 2 of the DRED integration (docs/PRD-dred-integration.md). With
Phase 1 having enabled DRED on every Opus profile, the app-level RaptorQ
layer is now redundant overhead on those tiers: +20% bitrate, +40–100 ms
receive-side latency (block wait), +CPU for stats we never used. This
phase removes RaptorQ from the Opus encode and decode paths on both the
desktop (wzp-client/call.rs) and Android (wzp-android/engine.rs) sides.
Codec2 tiers keep RaptorQ with their current ratios unchanged — DRED is
libopus-only and Codec2 has no neural equivalent.

Encoder changes (the real bandwidth / CPU win):
- CallEncoder::encode_frame and engine.rs encode loop now gate the
  RaptorQ path on !codec.is_opus():
    - Opus source packets emit fec_block=0, fec_symbol=0,
      fec_ratio_encoded=0 in the MediaHeader
    - fec_enc.add_source_symbol is skipped on Opus
    - generate_repair + repair packet emission is skipped on Opus
    - block_id and frame_in_block counters stay frozen at 0 for Opus
- Codec2 path is byte-for-byte identical to pre-Phase-2 behavior.

Decoder changes (mostly cleanup, since both live decoder paths were
already reading audio directly from source packets and only using the
RaptorQ decoder output for stats):
- CallDecoder::ingest skips fec_dec.add_symbol on Opus packets. Source
  packets still flow to the jitter buffer; Opus repair packets from old
  senders are dropped cleanly (repair packets never hit the jitter
  buffer either).
- engine.rs recv loop skips fec_dec.add_symbol, fec_dec.try_decode, and
  fec_dec.expire_before on Opus packets. The `fec_recovered` stat
  counter becomes Codec2-only (a separate DRED reconstruction counter
  lands in Phase 4).

Wire-format backward compat verified at pre-flight:
- Old receiver + new sender: engine.rs pipeline.rs path gates on
  non-zero fec_block/fec_symbol which now never fire for Opus, so the
  RaptorQ decoder simply isn't fed. Audio flows normally. Desktop
  CallDecoder's old path accumulated packets into the stale-eviction
  HashMap, which cleans up after 2s — harmless.
- New receiver + old sender: new receiver skips RaptorQ on Opus so
  old-sender repair packets are ignored entirely (no crash, no double-
  decode). Loses the (previously vestigial) RaptorQ recovery benefit,
  which was never actually active in the audio path. Source packets
  still decode normally.
- No wire format version bump required. MediaHeader is unchanged; we
  just zero the FEC fields on Opus packets.

Test changes:
- Removed `encoder_generates_repair_on_full_block` — asserted the old
  (pre-Phase-2) RaptorQ-on-Opus behavior and is now incorrect. Replaced
  with two symmetric tests:
    - `opus_source_packets_have_zero_fec_header_fields` — verifies
      Phase 2 invariants on Opus packets
    - `opus_encoder_never_emits_repair_packets` — runs 20 frames of
      non-silent sine wave through a GOOD-profile encoder, asserts
      exactly 20 output packets, zero repair
    - `codec2_encoder_generates_repair_on_full_block` — same shape as
      the old test but on CATASTROPHIC profile (Codec2 1200, 8
      frames/block, ratio 1.0) to verify Codec2 path still emits
      repairs as before

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 61 passing (Phase 1 baseline held)
- cargo test -p wzp-client --lib: 32 passing (+3 new Phase 2 tests,
  -1 old test removed)
- cargo check -p wzp-android --lib: zero errors (host link of
  wzp-android tests fails on -llog per pre-existing Android-only
  build.rs, unrelated to this work; integration build via
  build-and-notify.sh will validate Android end-to-end)
- Pre-existing broken integration test in
  crates/wzp-client/tests/handshake_integration.rs (SignalMessage
  schema drift) is NOT caused by this commit — baseline had the same
  3 compile errors before Phase 2. Flagged as a separate cleanup task.

Expected observable effects on a real call:
- Opus 24k outgoing bitrate drops from ~28.8 kbps (ratio 0.2 RaptorQ)
  to ~25 kbps (base 24 kbps + DRED ~1–10 kbps signal-dependent)
- Opus receive-side latency drops ~40 ms on clean network (no more
  block wait — jitter buffer emits as soon as a source packet arrives)
- Codec2 calls show no latency or bitrate change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:02:42 +04:00
Siavash Sameni
07873ea598 fix(linux-aec): fall back to 0.3 crate + apt lib (2.x bundled is broken)
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 4m6s
Mirror to GitHub / mirror (push) Failing after 45s
Switch the webrtc-audio-processing dep from the 2.x git source (bundled
mode) back to crates.io 0.3, and link against Debian's apt package
libwebrtc-audio-processing-dev (0.3-1+b1 on Bookworm). The 2.x path
fails because both the crates.io tarball and the upstream git main
branch of webrtc-audio-processing-sys 2.0.3 have a build.rs bug where
\`meson setup --reconfigure\` is passed unconditionally, panicking on
first-run empty build dirs with "Directory does not contain a valid
build tree". The 0.x line sidesteps bundled mode entirely by linking
the apt-provided library.

Trade-off: we get AEC2 (the older generation) instead of AEC3, but
it's the same algorithm family and is what PulseAudio's
module-echo-cancel and PipeWire's filter-chain use on current
Debian-family distros. Fine for shipping — we can revisit AEC3 once
the 2.x bundled build is fixed upstream.

API changes:
- 0.3's Processor::process_capture_frame and process_render_frame
  take &mut self, so wrap the module-level processor in a Mutex.
  Capture and playback threads each lock briefly (sub-ms per 10 ms
  frame); contention is minimal.
- Import NUM_SAMPLES_PER_FRAME from the crate directly instead of
  hardcoding 480, so the code tracks whatever sample rate the
  upstream C++ lib exposes (currently 48 kHz hardcoded -> 480).
- Helper fns drain_frames_through_apm / tee_render_samples / etc.
  take &Mutex<Processor> instead of &Processor.
- Use explicit EchoCancellationSuppressionLevel and
  NoiseSuppressionLevel imports rather than fully-qualified paths.

Dockerfile:
- Drop meson / ninja-build / python3 (only needed for bundled build).
- Add libwebrtc-audio-processing-dev for the system link path.
- Keep clang (may be needed by the bindgen step in some versions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:06:56 +04:00
Siavash Sameni
4ba77c8c0e feat(linux): WebRTC AEC3 capture/playback backend with render-side tee
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Has been cancelled
Adds gold-standard Linux echo cancellation: in-app WebRTC AEC3 (Audio
Processing Module) via the webrtc-audio-processing crate, using the
same algorithm as Chrome WebRTC, Zoom, Teams, and Jitsi. Runs entirely
in-process, so it works identically on ALSA / PulseAudio / PipeWire
systems — no dependency on user-configured echo-cancel modules.

Architecture:
- New crates/wzp-client/src/audio_linux_aec.rs module (~470 lines).
  Contains LinuxAecCapture and LinuxAecPlayback, both using CPAL
  under the hood but routing samples through a shared
  Arc<webrtc_audio_processing::Processor>. The playback path tees
  each 20 ms frame into APM.process_render_frame as the echo
  reference BEFORE handing the samples to CPAL's output callback.
  The capture path runs APM.process_capture_frame on each mic frame
  in place before pushing to the audio ring buffer. This is the
  "tee the playback ring" approach that Zoom/Teams/Jitsi use.
- New `linux-aec` feature in wzp-client pulling in the
  webrtc-audio-processing crate at v2.x with the `bundled`
  sub-feature. Bundled means the vendored PulseAudio WebRTC C++
  sources are statically compiled via meson+ninja at cargo build
  time — no runtime .so dependency, avoids Debian Bookworm's stale
  libwebrtc-audio-processing-dev 0.3 package (which predates AEC3).
  Dep is target-gated to Linux, so enabling the feature on non-Linux
  is a no-op.
- lib.rs re-exports LinuxAecCapture/LinuxAecPlayback as
  AudioCapture/AudioPlayback when `linux-aec` is on, otherwise
  falls back to the CPAL audio_io path. Shared public API
  (start/ring/stop/Drop) means downstream code is unchanged.
- New `linux-aec` feature in wzp-desktop forwards to
  wzp-client/linux-aec so `cargo tauri build -- --features
  wzp-desktop/linux-aec` builds the AEC variant.

APM configuration:
- EchoCancellation: High suppression, delay-agnostic mode on,
  extended filter on, stream_delay_ms=60 initial hint
- NoiseSuppression: High
- HighPassFilter: on
- AGC: off (can fight Opus encoder's own gain staging + adaptive
  quality controller; add later if users report low mic level)

Frame size handling:
- Pipeline uses 20 ms frames (960 samples @ 48 kHz mono)
- APM requires strict 10 ms (480 samples) per call
- Each 20 ms frame is split into two 480-sample halves, APM called
  twice, halves stitched back
- Same pattern for render and capture sides
- Carry-buffer logic handles the case where CPAL delivers samples in
  arbitrary chunk sizes that don't divide 960

Build infrastructure:
- scripts/Dockerfile.linux-desktop-builder adds meson, ninja-build,
  python3, clang for the webrtc-audio-processing bundled build
- scripts/build-linux-desktop-docker.sh takes a new --aec flag that
  enables the linux-aec feature and renames the output artifacts
  with an `-aec` suffix so noAEC and AEC variants can coexist on disk

Task #30.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:53:23 +04:00
Siavash Sameni
4e9244eb00 fix(windows): add Win32_Security feature + 2024 edition unsafe wrappers
Some checks failed
Mirror to GitHub / mirror (push) Failing after 43s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
- CreateEventW is gated behind Win32_Security in the windows crate
  because its signature takes SECURITY_ATTRIBUTES; add to features.
- Remove unused HANDLE import.
- Wrap GetId() and PWSTR::to_string() in explicit unsafe { ... }
  blocks for Rust 2024 edition's unsafe_op_in_unsafe_fn lint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:36:50 +04:00
Siavash Sameni
03a80a3196 feat(windows): WASAPI capture backend with OS-level AEC
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Has been cancelled
Adds a direct WASAPI microphone capture path for the Windows desktop
build that opens the default communications endpoint via
IMMDeviceEnumerator -> IAudioClient2 -> SetClientProperties with
AudioCategory_Communications, turning on Windows's communications
audio processing chain (AEC, noise suppression, automatic gain
control). The communications AEC operates at the OS level and uses
the system render mix as the reference signal, so echo from our
existing CPAL playback stream is cancelled automatically with no
per-process reference plumbing.

Architecture:
- New crates/wzp-client/src/audio_wasapi.rs module (~280 lines).
  Event-driven capture loop on a dedicated thread; pushes PCM into
  the same lock-free AudioRing used by the CPAL path. Same public
  API as audio_io::AudioCapture so downstream code is unchanged.
- New `windows-aec` feature in wzp-client that pulls in the
  `windows` crate (Microsoft's official Rust COM bindings) gated to
  target_os = "windows" only. Enabling the feature on non-Windows
  targets is a no-op since both the module and the dep are
  cfg(target_os = "windows").
- lib.rs re-exports WasapiAudioCapture as AudioCapture when the
  feature is on, otherwise falls back to the CPAL AudioCapture.
  AudioPlayback is always the CPAL one — no reason to swap it.
- desktop/src-tauri/Cargo.toml Windows target enables the new
  feature: `features = ["audio", "windows-aec"]`.

Implementation notes:
- Uses eCommunications role (not eConsole) for GetDefaultAudioEndpoint
  — the user-configured "communications" device that Teams/Zoom
  pick up, and the one Windows's AEC is tuned for.
- Requests 48 kHz mono i16 with AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM +
  SRC_DEFAULT_QUALITY so Windows handles any format conversion in
  the audio engine instead of rejecting our format.
- Event-driven with SetEventHandle / WaitForSingleObject — no
  polling, minimal CPU cost between packets.
- 200 ms wait timeout so the capture thread polls `running` often
  enough for Drop to stop cleanly even if the audio engine stalls
  (e.g. device unplug).

Task #24.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:35:36 +04:00
Siavash Sameni
da09fdb6e9 windows(desktop): gate coreaudio / VoiceProcessingIO to macOS-only targets
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m34s
First step of the Windows x86_64 desktop build: stop pulling
coreaudio-rs into the Windows dependency graph so the project can at
least run `cargo check --target x86_64-pc-windows-msvc`. Software AEC
is already disabled in engine.rs so there's nothing else to stub — the
macOS-specific VPIO path is skipped via #[cfg(target_os = "macos")] on
both sides and Windows falls through to the plain CPAL
AudioCapture/AudioPlayback branch that already existed.

crates/wzp-client/Cargo.toml
  - coreaudio-rs optional dep moved under [target.'cfg(target_os = "macos")']
  - `vpio` feature now uses `dep:coreaudio-rs` syntax and the gated dep
  - Enabling `vpio` on Windows/Linux is a no-op at resolution time

crates/wzp-client/src/lib.rs
  - `pub mod audio_vpio` is now #[cfg(all(feature = "vpio", target_os = "macos"))]
  - Previously `vpio` alone was enough to try to compile the Core Audio
    bindings, which would fail on non-Apple targets the moment the
    feature flag was flipped on

desktop/src-tauri/Cargo.toml
  - [target.'cfg(not(target_os = "android"))'] removed — was leaking
    vpio into Windows/Linux via the catch-all.
  - macOS: wzp-client with features = ["audio", "vpio"]
  - Windows: wzp-client with features = ["audio"]
  - Linux: wzp-client with features = ["audio"]
  - Android: wzp-client with default-features = false (unchanged)
  - Dropped the unused direct coreaudio-rs = "0.11" dep on macOS —
    wzp-desktop's own sources never call Core Audio directly.

Verified via `cargo tree --target x86_64-pc-windows-msvc -p wzp-desktop`
that the Windows target now resolves wzp-client with cpal but without
coreaudio-rs. macOS target still resolves with coreaudio (direct via
vpio feature and transitively via cpal). macOS `cargo check` still
builds cleanly.

Cross-compile from macOS hit a cargo-xwin + llvm-lib setup issue in
ring's build.rs, so the actual `cargo check --target
x86_64-pc-windows-msvc` did not complete locally. Build verification
belongs on the user's Windows x86_64 host where MSVC is present
natively.

See tasks #23 (this one), #24 (Voice Capture DSP / WASAPI Communications
for OS-level AEC on Windows), and #25 (aarch64-pc-windows-msvc support).
2026-04-10 11:12:08 +04:00
Siavash Sameni
2288c1ae07 feat: direct calling UI for desktop Tauri app + merge android branch
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
Tauri backend:
- register_signal: persistent _signal connection, presence registration
- place_call: send DirectCallOffer by fingerprint
- answer_call: accept/reject incoming calls
- get_signal_status: poll signal state

Frontend:
- Mode toggle: "Room" vs "Direct Call"
- Register button → registers on relay signal channel
- Incoming call panel with Accept/Reject
- Fingerprint input + Call button
- Auto-connect to media room on CallSetup event

Also merges feat/android-voip-client into desktop branch:
- Federation fixes, time-based dedup, FEC stale blocks
- Direct calling protocol types
- ACL + SAS verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:42:47 +04:00
Siavash Sameni
3351cb6473 feat: direct 1:1 calling via relay signaling (Phase 1)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
New feature: call someone directly by fingerprint through the relay.

- Client connects with SNI "_signal" for persistent signaling
- RegisterPresence/RegisterPresenceAck for relay registration
- DirectCallOffer routed to target by fingerprint
- DirectCallAnswer with AcceptGeneric/AcceptTrusted/Reject modes
- Relay creates private room (call-{id}), sends CallSetup to both
- Both clients connect to private room for media (existing SFU path)
- Hangup forwarding + cleanup on disconnect
- Desktop CLI: --signal + --call <fingerprint> for testing
- CallRegistry tracks call state (Pending/Ringing/Active/Ended)
- SignalHub manages persistent signaling connections

Tested: Alice calls Bob by fingerprint, relay routes offer, Bob
auto-accepts, both join private room, media flows bidirectionally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 05:35:16 +04:00
Siavash Sameni
f935bd69cd fix: rewrite seq/fec for federation-delivered packets
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 2m48s
Mirror to GitHub / mirror (push) Failing after 4m2s
- Time-based dedup (2s TTL) replaces fixed-window dedup — consecutive
  senders with same seq numbers no longer collide
- Raw byte forwarding for federation local delivery (no re-serialization)
- Jitter buffer resets on large backward seq jumps (>100)
- recv_media skips malformed datagrams instead of returning connection-closed
- SIGTERM handler for clean QUIC shutdown on wzp-client
- JSONL event log infrastructure (--event-log flag) for protocol analysis
- FEC disabled on GOOD profile for federation debugging (fec_ratio=0.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:55:06 +04:00
Siavash Sameni
5c24adf1c1 feat: remote version query — wzp-client --version-check <relay>
Some checks failed
Mirror to GitHub / mirror (push) Failing after 1m32s
Build Release Binaries / build-amd64 (push) Failing after 2m16s
Connects to a relay over QUIC with SNI "version", reads build hash
from a unidirectional stream, prints "<relay> <git-hash>" and exits.

Usage: wzp-client --version-check 172.16.81.175:4434
Output: 172.16.81.175:4434 8dbda3e

Relay side: detects "version" SNI, opens uni stream, writes
BUILD_GIT_HASH, waits 100ms for client to read, closes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:47:37 +04:00
Siavash Sameni
a3ebf5616f fix: unified raw room names + merged presence on join
Some checks failed
Mirror to GitHub / mirror (push) Failing after 42s
Build Release Binaries / build-amd64 (push) Failing after 2m1s
1. CLI client now sends raw room names (no hash), matching Android
   JNI and Desktop Tauri. All three clients are now consistent.

2. When a client joins a global room, the relay merges federated
   remote participants into the initial RoomUpdate. Previously,
   clients that joined after the GlobalRoomActive signal only saw
   local participants. Now they see everyone immediately.

3. Added get_remote_participants() to FederationManager for querying
   cached remote participants from all peer links.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:09:15 +04:00
Siavash Sameni
b00db5dfdc feat: federation rewrite — global rooms router model
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m52s
Major rewrite of relay federation replacing virtual participants with
a clean router model:

1. Global rooms: [[global_rooms]] in TOML config declares rooms that
   are bridged across federation. Each relay is a router + local SFU.

2. Room events: RoomManager emits LocalJoin/LocalLeave via broadcast
   channel when rooms transition between empty and non-empty.

3. GlobalRoomActive/Inactive signals: relays announce when they have
   local participants in global rooms. Peers track active state and
   forward media accordingly. Announcements propagate for multi-hop.

4. Media forwarding: separated from SFU loop. Local participant sends
   via mpsc channel → egress task → forward_to_peers() → room-hash
   tagged datagrams to active peer links. Inbound datagrams delivered
   to local participants + forwarded to other active peers (multi-hop).

5. Loop prevention: don't forward back to source relay.

6. Room name hashing: is_global_room() checks both plain name and
   hash (clients hash room names for SNI privacy).

Removed: ParticipantSender::Federation, federated_participants, virtual
participant join/leave, periodic room polling. Rooms now only contain
local participants.

Signaling tested: 3-relay chain (A→B←C) correctly propagates
GlobalRoomActive through B to both A and C. Media forwarding plumbing
in place but needs final debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 07:54:38 +04:00
Siavash Sameni
bc8bb3d790 feat: [[trusted]] config + FederationHello for one-sided federation
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
- Added [[trusted]] config: relay B can accept inbound federation
  from relay A by fingerprint alone, without knowing A's address.
  A connects to B with [[peers]], B trusts A with [[trusted]].

- FederationHello signal: outbound connections send their TLS
  fingerprint as first signal. The accepting relay verifies it
  against [[peers]] (by IP) or [[trusted]] (by fingerprint).

- Tested 3-relay chain: A→B←C. Both A and C connect to B, B trusts
  both. B correctly accepts both inbound connections. Room
  announcements flow A→B and C→B.

- Remaining: B needs to announce rooms back to A and C on the same
  connection so media can flow A→B→C. Currently A has no virtual
  participant for B, so media doesn't reach B's SFU for forwarding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 06:49:20 +04:00
Siavash Sameni
6be36e43c2 feat: relay federation infrastructure — room bridging, loop prevention, peer connections
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m1s
Phase 1 of relay federation:

1. Signal messages: FederationRoomJoin/Leave/ParticipantUpdate added
   to SignalMessage enum for relay-to-relay room coordination.

2. Room changes: ParticipantOrigin (Local/Federated) tracking, loop
   prevention (federated media only forwards to local participants),
   ParticipantSender::Federation with 8-byte room-hash prefixed
   datagrams, merged participant lists (local + remote), new methods:
   join_federated(), update_federated_participants(), local_senders(),
   active_rooms(), local_participants().

3. FederationManager: connects to configured peers via QUIC with SNI
   "_federation", reconnects with exponential backoff (5s-300s),
   exchanges FederationRoomJoin signals, runs recv loops for both
   signals and media datagrams, creates virtual participants in rooms.

4. Accept-side: _federation SNI handling in main.rs, unknown peer
   gets helpful "add to relay.toml" log message, recognized peers
   handed off to FederationManager.

TODO: TLS fingerprint verification — currently outbound connections
use client_config() which doesn't present a cert, so inbound
verification fails. Need mutual TLS or URL-based peer matching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:30:18 +04:00
Siavash Sameni
7bddc6b5a6 fix: advertise studio profiles in desktop handshake supported_profiles
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m55s
Same fix as Android — the CallOffer now includes STUDIO_64K/48K/32K
so the relay can negotiate studio quality levels.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:06:48 +04:00
Siavash Sameni
c8bcc5c974 fix: advertise studio profiles in handshake supported_profiles
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 2m7s
Mirror to GitHub / mirror (push) Failing after 35s
The CallOffer only advertised GOOD/DEGRADED/CATASTROPHIC. When a
client uses a studio profile, the relay's choose_profile couldn't
pick it. Now advertises all 6 profiles (studio 64k/48k/32k + good +
degraded + catastrophic) in both Android engine and shared handshake.

Also: the relay MUST be rebuilt with the new CodecId variants,
otherwise it will fail to deserialize CallOffer messages containing
studio QualityProfiles in supported_profiles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:39:31 +04:00
Siavash Sameni
a8c2011445 feat: add Opus 32k/48k/64k studio quality tiers
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Has been cancelled
Adds three new codec IDs (Opus32k=6, Opus48k=7, Opus64k=8) and
corresponding STUDIO_32K, STUDIO_48K, STUDIO_64K quality profiles.
All use 20ms frames with minimal FEC (10%) for maximum quality on
good networks.

Updated across: wire protocol (codec_id.rs), encoder/decoder
(opus_enc/dec.rs), adaptive codec switch (call.rs), CLI
(--profile studio-64k), desktop engine + UI slider (8 quality
levels from Studio 64k green to Codec2 1.2k red).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:31:05 +04:00
Siavash Sameni
96ccb4f333 fix: auto-switch decoder codec to match incoming packets
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
The CallDecoder now inspects each incoming packet's codec_id and
automatically switches the audio decoder if it differs from the
current profile. This enables cross-codec interop where one client
sends Opus and the other sends Codec2 — previously the receiver
would try to decode with the wrong codec, producing garbled audio.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 15:35:31 +04:00
Siavash Sameni
95a905e1b5 feat: add --profile/--codec flag to CLI for forcing codec selection
Enables debugging Codec2 by allowing forced codec selection from CLI.
Supports: good, degraded, catastrophic, codec2-3200, codec2-1200.
Frame size, timing, and jitter buffer are all adjusted dynamically
based on the selected profile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 15:35:31 +04:00
Siavash Sameni
9f7962a6cd fix: vec allocation for desktop AudioRing (match Android fix)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m35s
Same fix as Android: Box::new([0i16; 16384]) allocates 32KB on the
stack before moving to heap. Use vec![].into_boxed_slice() for
direct heap allocation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 05:26:59 +04:00
Siavash Sameni
2263e898e5 fix: port AudioRing reader-detects-lap fix to desktop client
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m44s
Same fix as Android (4af7c5f): writer never touches read_pos,
reader self-corrects when lapped. Power-of-2 capacity (16384),
bitmask indexing, overflow/underrun counters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:42:33 +04:00
Siavash Sameni
9ab57ba037 merge: fj/feat/android-voip-client — congestion fix, AEC toggle, debug logging
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m36s
Merged 10 commits from Android branch:
- Send task crash fix on QUIC congestion (continue instead of break)
- AEC toggle + NoiseSuppressor on Android
- Debug reporter for crash diagnostics
- Mic mute crackling fix
- Participant dedup in UI
- Proper QUIC connection close on hangup
- Null alias display fix
- Tracing → Android logcat
- Incident reports for send-task crash and playout ring desync

Conflict resolved in room.rs: kept Android's improved debug logging
(recv gap tracking, lock contention, forward latency, send errors)
inside our media_task async block for parallel signal handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:13:43 +04:00
Siavash Sameni
80d5bd7628 fix: survive QUIC congestion — drop packets instead of killing send task
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m14s
send_datagram() returns Err(Blocked) when the QUIC congestion window
is full. This is transient — the window reopens once ACKs arrive.
Previously, all send paths treated this as fatal (break/return),
which killed the send task and cascaded via tokio::select! to kill
the entire call.

Now: log warning, drop the packet, continue. Brief audio glitch
(20-100ms) instead of complete call death. FEC on the receiver
side recovers most dropped packets.

Fixed in:
- CLI run_live send task (continue + error counter)
- CLI run_file_mode send paths (2 locations)
- Desktop engine send task

Also hardened recv tasks: transient errors (non-closed/reset)
are survived instead of causing exit.

Matches the fix applied to Android client (engine.rs).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:48:20 +04:00
Siavash Sameni
e468454464 feat: Tauri desktop GUI app with call engine
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m27s
- New desktop/ directory with Tauri v2 + Vite + TypeScript
- Rust backend: CallEngine wrapping wzp-client audio + transport
- Web frontend: connect screen, in-call screen with participants,
  mic/speaker mute, keyboard shortcuts (m/s/q)
- Dark theme UI, settings persistence via localStorage
- Platform-aware --os-aec: warns on Windows/Linux (not yet implemented)
- Workspace updated to include desktop/src-tauri

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:25:54 +04:00
Siavash Sameni
d1c96cd71f feat: macOS VoiceProcessingIO for hardware AEC + delay-compensated NLMS
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m33s
- Add --os-aec flag: uses Apple VoiceProcessingIO audio unit for
  hardware echo cancellation (same engine as FaceTime)
- New vpio feature + audio_vpio.rs: combined capture+playback via VPIO
- Improved software AEC: delay-compensated leaky NLMS with Geigel DTD
  (60ms tail, 40ms delay, configurable via --aec-delay)
- Add --aec-delay flag for tuning software AEC delay compensation
- Add dev-fast Cargo profile (opt-level 2 with incremental compilation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:10:10 +04:00
Siavash Sameni
1b00b5e2a4 feat: improved AEC, keyboard shortcuts, dedup participants, dev-fast profile
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m40s
AEC improvements:
- Reduce echo tail from 100ms to 30ms (3.3x faster, suited for laptops)
- Add double-talk detection: freeze adaptation when near-end speaks
- Add residual echo suppression
- Disable AEC by default in --android mode (macOS has built-in AEC)

CLI features:
- Keyboard shortcuts: m=mic mute, s=speaker mute, q=quit (raw terminal mode)
- Dedup participants in RoomUpdate display (same fingerprint+alias shown once)
- Add dev-fast profile (opt-level 2 with incremental compilation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:15:23 +04:00
Siavash Sameni
cfb48df1ef feat: direct playout mode, AEC far-end, audio processing switches
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m28s
- Add --android/--direct-playout: bypass jitter buffer, decode on recv
  (matches Android engine architecture)
- Wire AEC far-end reference from decoded playout to encoder
- Add --no-aec, --no-agc, --no-fec, --no-silence, --no-denoise switches
- Fix BufferSize::Fixed(960) → Default for macOS CoreAudio compat
- Optimize wzp-codec, wzp-fec, audiopus, nnnoiseless in debug profile
- Add capture callback size diagnostic logging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:48:34 +04:00
Siavash Sameni
ba29d8354f fix: send alias via CallOffer handshake (match Android approach)
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m44s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:10:07 +04:00
Siavash Sameni
0908507a7a Merge remote-tracking branch 'origin/feat/android-voip-client' into feat/desktop-audio-rewrite 2026-04-06 09:04:55 +04:00
Siavash Sameni
860c90394d feat: rewrite desktop audio I/O with lock-free ring buffers
- Replace Mutex-based CPAL callbacks with atomic SPSC ring buffers
- Proper async send/recv loops (no block_on), 20ms playout tick
- Add signal task for RoomUpdate presence display
- Add --alias, --raw-room flags and key persistence (~/.wzp/identity)
- Add SetAlias signal variant and relay-side handling
- Graceful Ctrl+C shutdown with force-quit on second press

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:04:51 +04:00
Claude
7eb136fcb3 fix: settings save button (back=discard), fix missing alias in featherchat tests
- Settings now uses draft state — changes only persist on explicit Save
- Back button discards unsaved changes
- Added applyServers() for batch server updates
- Added missing alias field to CallOffer in featherchat tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 04:30:23 +00:00
Claude
0835c36d0f feat: settings page with persistence, client alias in handshake, fix null fingerprints
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m34s
- Add SettingsScreen with identity (alias, key backup/restore), audio defaults,
  server management, network prefs, and default room
- SettingsRepository persists all settings via SharedPreferences
- Auto-generate random display names on first launch (e.g. "Swift Wolf")
- Thread alias through CallOffer → relay handshake → RoomUpdate broadcast
- Derive caller fingerprint from identity key in relay handshake (fixes null
  fingerprints when --auth-url is not set)
- Persist identity seed for stable fingerprints across reconnects
- Add alias field to SignalMessage::CallOffer (serde default for backward compat)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 03:56:33 +00:00
Claude
8bf073aa80 fix: handle RoomUpdate variant in wzp-client signal type mapping
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m37s
Build Release Binaries / release (push) Has been skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:54:36 +00:00
Claude
e7b1c3372a feat: Android VoIP client — Phase 2 (JNI bridge, Compose UI, AEC pipeline wiring)
- JNI bridge with 8 extern functions (init, startCall, stopCall, setMute,
  setSpeaker, getStats, forceProfile, destroy) with panic catching
- Kotlin engine layer: WzpEngine JNI wrapper, WzpCallback interface,
  CallStats data class with JSON deserialization
- Jetpack Compose UI: InCallScreen with quality indicator (green/yellow/red),
  mute/speaker/hangup buttons, stats overlay, duration timer
- CallActivity with RECORD_AUDIO permission handling, Material3 theme
- CallService foreground service with WakeLock, WiFi lock, notification
- AudioRouteManager for speaker/earpiece/Bluetooth SCO switching
- AEC wired into CallEncoder pipeline: AEC → AGC → denoise → silence → encode
- AEC farend reference fed from decode path to encode path in pipeline
- Engine exposes set_aec_enabled/set_agc_enabled via AtomicBool flags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:16:38 +00:00
Claude
26e9c55f1f feat: Android VoIP client — Phase 1 (audio quality, network adaptation, crate skeleton)
- New wzp-android crate with Oboe C++ backend, lock-free SPSC ring buffers,
  engine orchestrator, codec pipeline, and Android Gradle project structure
- AEC (NLMS adaptive filter), AGC (two-stage with fast attack/slow release),
  windowed-sinc FIR resampler replacing linear interpolation (wzp-codec)
- Opus encoder tuning: complexity 7 default, set_expected_loss support
- Mobile jitter buffer: asymmetric EMA (fast up/slow down), handoff spike
  detection with 2s cooldown, configurable safety margin
- Network-aware quality control: cellular-specific thresholds, faster
  downgrade on cellular, proactive tier drop on WiFi→cellular handoff,
  FEC ratio boost during network transitions
- Handoff detection in PathMonitor via RTT jitter spike analysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:07:55 +00:00
Siavash Sameni
e595fe6591 feat: P3-T6 per-session forwarding — relay links for hop-by-hop media
RelayLink: QUIC connection to peer relay (SNI "_relay") for forwarding
specific sessions. Methods: connect, forward, add/remove_session, is_idle.

RelayLinkManager: manages connections to multiple peers.
- get_or_connect: lazy connection establishment
- forward_to: send media packet to specific peer
- register/unregister_session: track which sessions use which links
- Auto-closes idle links on session unregister

Protocol: added SignalMessage::SessionForward { session_id,
target_fingerprint, source_relay } and SessionForwardAck { session_id,
room_name } for relay-link session setup signaling.

Building block for P3-T7 (call setup over mesh) which wires
route resolution + relay links + handshake into a complete flow.

62 relay tests + 42 proto tests passing (7 new relay_link tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:45:36 +04:00
Siavash Sameni
326aa491cc feat: P3-T5 route resolution — find relay path to any fingerprint
RouteResolver queries PresenceRegistry to determine how to reach a target:
- Route::Local — connected to this relay
- Route::DirectPeer(addr) — on a directly connected peer relay
- Route::Chain(addrs) — multi-hop (structure ready, single-hop for now)
- Route::NotFound — not in any known relay

Protocol: added SignalMessage::RouteQuery { fingerprint, ttl } and
RouteResponse { fingerprint, found, relay_chain } for peer-to-peer
route queries over probe connections.

HTTP API: GET /route/:fingerprint returns JSON with route type + chain.

Relay handles incoming RouteQuery on probe connections: looks up locally,
replies with RouteResponse. TTL decremented for future multi-hop forwarding.

55 relay tests + 42 proto tests passing (7 new route tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:38:24 +04:00
Siavash Sameni
464e95a4bd feat: P3-T4 relay presence registry — gossip fingerprints across relay mesh
PresenceRegistry tracks who is connected where:
- register_local/unregister_local for directly connected users
- update_peer for fingerprints reported by peer relays
- lookup returns Local or Remote(addr)
- expire_stale removes entries older than timeout

Gossip via probe connections:
- New SignalMessage::PresenceUpdate { fingerprints, relay_addr }
- Probes send local fingerprints every 10s alongside Ping/Pong
- Receiving relay updates its remote presence table

HTTP API on metrics port:
- GET /presence — all known fingerprints + locations
- GET /presence/:fingerprint — single lookup
- GET /peers — peer relays + their connected users

Wired into relay main:
- Registry created at startup
- register_local after auth+handshake
- unregister_local on disconnect
- Passed to probe mesh and metrics server

Also marks FC-10 as DONE in integration tracker.

48 relay tests + 42 proto tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:36:55 +04:00
Siavash Sameni
6310864b0b fix: client sends Hangup before disconnect, relay handles timeouts gracefully
Client: sends SignalMessage::Hangup(Normal) before closing in all modes
(send-tone, file mode, silence mode) so the relay knows the session ended.

Relay: downgrades "timed out" / "reset" / "closed" recv errors from
ERROR to INFO since these are normal disconnect scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:15:47 +04:00
Siavash Sameni
4d2c9838c5 fix: eliminate all compiler warnings across client, relay, web
- Remove unused imports in featherchat.rs (tracing, QualityProfile)
- Remove unused comfort_noise field from CallEncoder (cn_level is used instead)
- Prefix unused _metrics_file in CliArgs
- Prefix unused _addr in Participant
- Remove unused RoomSlot struct and rooms field from web AppState
- Remove unused HashMap import from web main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:13:48 +04:00