Commit Graph

248 Commits

Author SHA1 Message Date
Siavash Sameni
6f81487778 T1.6: Protocol version negotiation in handshake 2026-05-11 15:53:04 +04:00
Siavash Sameni
30d26fc7f6 T1.5.1: Remove unwrap() from encode_compact 2026-05-11 12:57:35 +04:00
Siavash Sameni
c93d302656 T1.5: Migrate emit/parse sites to v2 wire format 2026-05-11 12:37:32 +04:00
Siavash Sameni
9680b6ff34 T1.4.1: Add rustdoc on MiniHeaderV2 and MiniFrameContextV2 public items 2026-05-11 11:38:04 +04:00
Siavash Sameni
6385b93391 T1.2.1: Add rustdoc on MediaType variants and methods 2026-05-11 11:33:58 +04:00
Siavash Sameni
6eb94f079d T1.1.1: Address review — add rustdoc on impl MediaHeaderV2 constants and methods 2026-05-11 11:32:00 +04:00
Siavash Sameni
7c9ede9227 T1.1.1: Add rustdoc on MediaHeaderV2 fields 2026-05-11 11:22:21 +04:00
Siavash Sameni
e8866c6632 T1.4: Add v2 MiniHeader with seq_delta 2026-05-11 11:18:15 +04:00
Siavash Sameni
8c6e88ea68 T1.3: Widen CodecId wire representation to u8 2026-05-11 11:11:42 +04:00
Siavash Sameni
ffb92237be T1.2: Add MediaType enum 2026-05-11 11:09:43 +04:00
Siavash Sameni
6af0539a72 T1.1: Add v2 MediaHeader type 2026-05-11 11:00:51 +04:00
Siavash Sameni
defd8eab07 fix(signal): send PresenceList directly to new client after ack
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m50s
The broadcast alone wasn't reaching the first client because its
recv loop hadn't started yet when the second client registered.
Now the relay sends PresenceList directly to the new client (right
after RegisterPresenceAck) AND broadcasts to all others.

This guarantees every client gets the full user list:
- New client: via direct send (queued before recv loop starts)
- Existing clients: via broadcast

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:20:37 +04:00
Siavash Sameni
1120c7b579 feat(signal): PresenceList broadcast for lobby user discovery
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 7m21s
Mirror to GitHub / mirror (push) Failing after 27s
New signal infrastructure for the lobby-first UI:

- PresenceUser struct: { fingerprint, alias }
- SignalMessage::PresenceList: relay broadcasts full user list
  to all signal clients on every register/deregister
- SignalHub::presence_list(): builds the list from connected clients
- SignalHub::broadcast(): sends to ALL signal clients
- Relay calls broadcast on register + unregister
- Desktop emits "presence_list" signal-event to JS frontend

This gives clients real-time visibility of who's online via the
signal channel, without needing to join a voice room first.

603 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:12:47 +04:00
Siavash Sameni
bb23976076 feat(quality): upgrade negotiation + asymmetric quality signals (#28, #29, #30)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
New SignalMessage variants for P2P quality coordination:

UpgradeProposal/UpgradeResponse/UpgradeConfirm (#28):
- Consensual quality upgrade flow — proposer sends desired profile,
  peer accepts/rejects based on own conditions, confirm commits both
- All carry call_id for relay routing

QualityCapability (#30):
- Peer reports its max sustainable profile — enables asymmetric
  encoding where each side uses its own best quality instead of
  forcing everyone to the weakest link

Relay forwards all 4 signals to the call peer (same pattern as
MediaPathReport, CandidateUpdate, HardNatProbe).

Desktop signal recv loop handles all 4 with debug logging.
Encoder switching TODOs noted for wiring into CallEngine.

4 new serde roundtrip tests. 603 total, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:25:34 +04:00
Siavash Sameni
18e5e75f33 feat(analyzer): encrypted payload decoding in replay mode (#17)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 20s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
When --key <64-char-hex> is provided with --replay, the analyzer
decrypts each packet's ChaCha20-Poly1305 payload using the session
key and logs plaintext frame sizes. Prints first 5 + every 100th
decrypt result, and a summary at the end.

This completes all 5 protocol analyzer tasks (#13-17):
- #13: Observer mode (live passive listener) — was done
- #14: TUI with Ratatui (per-participant panels) — was done
- #15: Capture and replay (.wzp format) — was done
- #16: HTML report (Chart.js loss/jitter graphs) — was done
- #17: Encrypted decode (--key for replay) — done now

Usage:
  wzp-analyzer --replay session.wzp --key <64-hex-chars> --html report.html

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:07:43 +04:00
Siavash Sameni
f06f9073ae feat(nat): birthday attack module + HardNatBirthdayStart signal (#86, #87)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 25s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
Birthday attack for random symmetric NATs:
- birthday.rs: open_acceptor_ports() opens N sockets, STUN-probes
  each to learn external ports. generate_dialer_targets() builds
  hit list (known ports first, then random fill). spray_dialer()
  sprays QUIC connects with rate limiting, first success wins.
- Default: 32 acceptor ports, 128 dialer probes, 20ms interval

Signal coordination:
- HardNatBirthdayStart { acceptor_ports, external_ip } sent by
  Acceptor when peer's HardNatProbe shows random/sequential NAT
- Relay forwards it like other call signals
- Desktop recv loop handles and logs it

Hybrid waterfall integration:
- On receiving HardNatProbe with non-cone allocation, Acceptor
  auto-opens birthday ports and sends BirthdayStart
- Sockets kept alive 10s for NAT mapping persistence
- Dialer spray integration into race() pending (needs transport
  hot-swap for background upgrade)

6 new tests, 599 total, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:44:36 +04:00
Siavash Sameni
1de280fe04 fix(nat): working NAT tickle + smart filter debug + timeout diags
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m39s
Fixes from real-world 5G↔Starlink testing:

NAT tickle fix:
- tokio::net::UdpSocket::bind() doesn't set SO_REUSEADDR, so binding
  to the same port as quinn silently failed. Now uses socket2::Socket
  with explicit SO_REUSEADDR + SO_REUSEPORT (via libc on unix).
- Tickle now logs success/failure for debugging.

Diagnostic fixes:
- connect:dual_path_race_start shows both dial_order_raw and
  dial_order_smart so we can see what filtering removed
- Grace-period timeout (relay wins first, direct still running)
  now fills "timeout:grace" diags for unrecorded candidates
- Previously candidate_diags was empty when relay won the race

Dependencies:
- Added socket2 = "0.5" to wzp-client

593 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:58:13 +04:00
Siavash Sameni
bc6d327ebb feat(nat): smart candidate filtering + acceptor NAT tickle + 4s timeout
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
Major P2P improvements for cross-network calls:

Smart candidate filtering (smart_dial_order):
- Strip LAN candidates when peer's public IP differs from ours
  (172.16.x.x is unreachable from a different network)
- Strip all IPv6 candidates (Phase 7 disabled, wastes dial slots)
- Only keep mapped + reflexive for cross-network calls
- LAN candidates preserved when both peers share the same public IP

Acceptor NAT tickle:
- A-role sends a 1-byte UDP packet to each peer candidate BEFORE
  accepting. This opens the NAT pinhole for return traffic from
  the Dialer's IP — critical for address-restricted NATs that only
  allow inbound from IPs they've seen outbound traffic to.
- Uses SO_REUSEADDR on the same port as the quinn endpoint.

Direct timeout increased from 2s to 4s:
- Cross-network QUIC handshakes through CGNAT can take 2-3s
- 2s was too aggressive for 5G/LTE networks

Diagnostic fix:
- Record "timeout:4s" for candidates still in-flight when the
  timeout fires (previously these had no diagnostic entry)

5 new tests for smart_dial_order edge cases.
593 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:42:02 +04:00
Siavash Sameni
c0dd6c06ff feat(debug): per-candidate dial diagnostics in dual-path race
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m24s
Added CandidateDiag struct to RaceResult with per-candidate:
- address attempted
- result (ok / skipped:ipv6 / error:reason)
- elapsed time in ms

Surfaced in call-debug events:
- connect:dual_path_race_start now includes dial_order + peer_mapped
- connect:dual_path_race_done now includes candidate_diags array

Upgraded dual_path tracing from debug to info for IPv6 skips and
dial failures so they appear in logcat/console.

Helps diagnose why P2P fails on specific networks (5G CGNAT,
address-restricted NATs, etc).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:16:34 +04:00
Siavash Sameni
ec1bdf3cd5 feat(nat): hard NAT port allocation detection + prediction + HardNatProbe signal (#29)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m30s
Phase A of hard NAT traversal (PRD-hard-nat.md):

- PortAllocation enum: PortPreserving / Sequential{delta} / Random / Unknown
- detect_port_allocation(): sequential STUN probes from single socket,
  analyzes port sequence for allocation pattern
- classify_port_allocation(): pure function with jitter tolerance,
  wraparound handling, 60% threshold for noisy sequences
- predict_ports(): generates target port range from last_port + delta
- HardNatProbe signal message: carries port_sequence, allocation
  pattern, external_ip for peer coordination
- Relay forwards HardNatProbe to call peer
- Netcheck gains port_allocation field + format_report display

588 tests pass (17 new), 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:29:35 +04:00
Siavash Sameni
8fcf1be341 feat(nat): Tailscale-inspired STUN/ICE + port mapping + mid-call re-gathering (#28)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 23s
Build Release Binaries / build-amd64 (push) Failing after 6m8s
Phase 8: 5 new modules bringing NAT traversal close to Tailscale's approach.

- stun.rs: RFC 5389 STUN client — public server reflexive discovery,
  XOR-MAPPED-ADDRESS parsing, parallel probe with retry, STUN fallback
  in desktop try_reflect_own_addr()
- portmap.rs: NAT-PMP (RFC 6886) + PCP (RFC 6887) + UPnP IGD port
  mapping — gateway discovery, acquire/release/refresh lifecycle,
  new PeerCandidates.mapped candidate type in dial order
- ice_agent.rs: candidate lifecycle — gather(), re_gather(),
  apply_peer_update() with monotonic generation counter,
  CandidateUpdate signal message forwarded by relay
- netcheck.rs: comprehensive diagnostic — NAT type, IPv4/v6,
  port mapping availability, relay latencies, CLI --netcheck
- relay_map.rs: RTT-sorted relay map, preferred() selection,
  populate_from_ack() for RegisterPresenceAck.available_relays

Relay: CallRegistry stores + cross-wires caller/callee_mapped_addr
into CallSetup.peer_mapped_addr. Region config + available_relays
populated from federation peers in RegisterPresenceAck.

Desktop: place_call/answer_call call acquire_port_mapping() and
fill caller/callee_mapped_addr. STUN+relay combined NAT detection.

571 tests pass (66 new), 0 regressions, 0 warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:17:17 +04:00
Siavash Sameni
9377a9009c feat(quality): bandwidth probing for upward adaptive quality (#10)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 25s
Build Release Binaries / build-amd64 (push) Failing after 3m36s
After 30s stable at a tier, the AdaptiveQualityController actively
probes the next tier up by switching the encoder and observing for 5s.
If loss/RTT stay within the target tier's thresholds, the upgrade
commits. If >1 bad report, the probe aborts with a 60s cooldown.

Probing is disabled on cellular (studio tiers aren't classified there)
and skipped when already at Studio64k (highest tier).

This complements the passive upgrade path (10 consecutive good reports)
by actively discovering that a path can sustain higher quality, rather
than waiting for the classification to drift upward.

New: ProbeState struct, check_probe() method, 4 constants, 5 tests.
377 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:47:21 +04:00
Siavash Sameni
425c67a08a feat(analyzer): replay, HTML report, encrypted decode stub (#15, #16, #17)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 26s
Build Release Binaries / build-amd64 (push) Failing after 3m31s
#15 - Replay mode: --replay <file.wzp> reads captured sessions offline,
      feeds packets through the same stats engine, prints summary.
      CaptureReader mirrors CaptureWriter's binary format.

#16 - HTML report: --html <report.html> generates self-contained HTML
      with Chart.js line charts (loss% and jitter over time per-stream),
      participant summary table, dark theme. Works with live sessions
      (after exit) or replay mode.

#17 - Encrypted decode: --key <hex> flag accepted and stored. Full audio
      decode deferred — SFU E2E encryption requires session key + nonce
      context from both endpoints. Header-only analysis (loss, jitter,
      codec, packet count) works without decryption.

Usage:
  wzp-analyzer --replay session.wzp --html report.html
  wzp-analyzer relay:4433 --room test --capture out.wzp --html report.html

372 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:31:28 +04:00
Siavash Sameni
88ca3e099a feat: wzp-analyzer binary — protocol analyzer with TUI (#13, #14, #15)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m20s
New binary: wzp-analyzer joins a room as a passive observer and displays
real-time per-participant quality metrics.

Features:
- Passive observation: connects to relay, receives all media, never sends
- Participant detection: identifies senders by sequence number streams
- Per-participant stats: packets, loss%, jitter, codec, codec switches
- TUI mode (ratatui): color-coded table (green/yellow/red by loss),
  10 FPS refresh, session header, quit with q/Ctrl+C
- No-TUI mode: prints stats to stderr every 2s (for headless/CI use)
- Capture mode: binary .wzp format with microsecond timestamps for
  offline replay (magic WZP\x01, JSON header, per-packet records)
- Session summary on exit

Usage:
  wzp-analyzer 193.180.213.68:4433 --room general
  wzp-analyzer 193.180.213.68:4433 --room general --no-tui --duration 60
  wzp-analyzer 193.180.213.68:4433 --room general --capture session.wzp

372 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:26:46 +04:00
Siavash Sameni
1e82811cc1 feat(p2p): adaptive quality on direct calls (#23)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m37s
P2P calls now adapt codec quality based on observed network conditions,
matching what relay calls already had.

Three-layer implementation:
- QualityReport::from_path_stats(): construct reports from local quinn
  stats (loss%, RTT, jitter) without needing relay-generated reports
- CallEncoder.pending_quality_report: one-shot attachment to next
  source packet (consumed on encode, not repeated)
- Engine send tasks: generate quality report every 50 frames (~1s)
  from quinn_path_stats() and attach via set_pending_quality_report()
- Engine recv tasks: self-observe from own QUIC path stats every 50
  packets, feed to AdaptiveQualityController for P2P adaptation
  (works even if peer isn't sending quality reports yet)

Both relay and P2P calls now have adaptive quality. On relay calls,
both peer-sent reports AND local observations feed the controller.
Hysteresis (3 consecutive bad reports to downgrade) prevents thrashing.

372 tests passing (+4 new: from_path_stats encoding, clamping, zero
values, encoder quality report attachment).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:14:06 +04:00
Siavash Sameni
81b5522942 refactor: clap CLI parser, safety docs, dead code docs, cross-refs
Some checks failed
Mirror to GitHub / mirror (push) Failing after 26s
Build Release Binaries / build-amd64 (push) Failing after 4m1s
Audit items 6, 8, 9, 10:

#6 - Relay CLI: replaced 154-line manual parse_args() with clap derive
     (13 flags/options preserved, auto --help, --version from build hash)
#8 - wzp-native: added # Safety docs to all 3 unsafe extern "C" fns
#9 - wzp-crypto: documented x25519_static_secret/public as reserved for
     future static-key federation auth (not dead code, intentionally unused)
#10 - Cross-references between quality.rs ↔ dred_tuner.rs module docs

368 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:40:49 +04:00
Siavash Sameni
d539a6dfb9 test(federation): 29 tests for federation.rs (was 0), engine dedup PRD
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m45s
Federation test coverage (crates/wzp-relay/tests/federation.rs):
- room_hash: determinism, uniqueness, length, case sensitivity (5)
- is_global_room: static config, call-* implicit, exact match (3)
- resolve_global_room: static + call-* resolution (2)
- global_room_hash: canonical names, fallthrough, independence (4)
- forward_to_peers: zero peers, live QUIC datagram delivery (2)
- broadcast_signal: zero peers, live QUIC signal delivery (2)
- send_signal_to_peer: unknown fingerprint error (1)
- peer lookup: fingerprint normalization, IP, trust priority (5)
- accessors: local_tls_fp, cross_relay_tx, remote_participants (3)
- integration: full media egress over live QUIC link (1)
- edge case: exact room match (1)

Total relay tests: 120 (was 91). Full suite: 368 passing.

Also added PRD-engine-dedup.md for the engine.rs helper extraction
completed in the previous commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:35:04 +04:00
Siavash Sameni
ba12aae439 refactor: extract shared engine helpers, federation clone-before-send, constants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 30s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Engine deduplication (PRD-engine-dedup.md):
- build_call_config(): shared CallConfig construction (was 23 lines × 2)
- codec_to_profile(): shared CodecId → QualityProfile mapping (was 19 lines × 2)
- run_signal_task(): shared signal handler (was 48 lines × 2)
- Net -39 lines from engine.rs, 6 duplicated blocks → single-line calls

Quick wins from REFACTOR-codebase-audit.md:
- 6 magic number constants extracted (CAPTURE_POLL_MS, RECV_TIMEOUT_MS, etc.)
- DRED_POLL_INTERVAL moved from 2 local defs to 1 module-level const
- federation.rs: forward_to_peers, broadcast_signal, send_signal_to_peer
  now clone peer list and release lock before sending (was holding Mutex
  across async I/O — last lock-during-send pattern eliminated)
- main.rs: close_transport() helper replaces 12 silent .ok() calls with
  debug-level logging

314 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:22:44 +04:00
Siavash Sameni
a52b011fb5 feat(relay): replace global Mutex<RoomManager> with DashMap sharding
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
Eliminates the single-lock bottleneck for media forwarding. Before:
all participants across all rooms competed for one Mutex. Now rooms
are stored in DashMap (64 internal shards with per-shard RwLocks).

Changes:
- RoomManager.rooms: HashMap → DashMap<String, Room>
- Per-room quality tracking (qualities, current_tier moved into Room)
- Arc<Mutex<RoomManager>> → Arc<RoomManager> everywhere
- 20 .lock().await sites removed across room.rs, main.rs, federation.rs, ws.rs
- federation forward_to_peers: clone peer list, release lock, then send
- ACL uses std::sync::Mutex (rarely accessed, non-async)

Concurrency improvement:
- Before: 100 rooms × 10 people = 1000 tasks → 1 Mutex
- After: distributed across 64 DashMap shards, ~15 tasks per shard avg
- Rooms are fully independent — room A never blocks room B

314 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:17:57 +04:00
Siavash Sameni
9ae9441de4 fix(audio): check capture ring available before read (fixes Opus6k choppy)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 32s
Build Release Binaries / build-amd64 (push) Failing after 3m58s
Partial reads from the capture ring consumed samples that were then
discarded when the send loop retried from buf[0]. For 20ms codecs this
was invisible (single Oboe burst fills 960 samples in one read), but
40ms codecs (Opus6k, 1920 samples) needed 2 bursts — the first partial
read consumed 960 real samples and threw them away.

Result: Opus6k produced ~11 frames/s instead of 25 (~44% of expected).

Fix: expose wzp_native_audio_capture_available() and check it before
reading, matching the desktop capture_ring.available() pattern. Partial
reads no longer occur because we only read when enough samples exist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:46:15 +04:00
Siavash Sameni
d424515542 feat: 5-tier quality classification, QualityDirective handling, debug tap stats
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
- Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good +
  Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5,
  studio:10)
- Handle QualityDirective signals in both desktop and Android engines
  — relay-coordinated codec switching now works end-to-end
- Add periodic TAP STATS to debug tap: packets in/out, fan-out avg,
  seq gaps, codecs seen (every 5s)
- Mark task #2 done (ParticipantInfo in federation signals already
  implemented)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:23:48 +04:00
Siavash Sameni
ea5fc17c34 fix(relay): debug tap signal logging, dual_path test regression, PRD updates
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m39s
Mirror to GitHub / mirror (push) Failing after 28s
- Add log_signal() and log_event() to DebugTap for RoomUpdate,
  QualityDirective, join/leave lifecycle events (task #11)
- Fix dual_path.rs Phase 7 regression: add missing ipv6_endpoint arg
  to 3 race() call sites
- Update PRDs to reflect actual implementation status: mark adaptive
  quality, coordinated codec, P2P, network awareness, protocol analyzer
- Update PROGRESS.md with QualityDirective gap and dual_path regression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 09:54:52 +04:00
Siavash Sameni
d249b32ee5 test+docs: add tests for QualityDirective, ParticipantQuality; update docs
- QualityDirective signal roundtrip tests (with/without reason)
- ParticipantQuality unit tests (initial tier, degradation, weakest-link)
- Updated PROGRESS.md with desktop adaptive quality, relay coordinated
  switching, Oboe state polling entries
- Updated ARCHITECTURE.md SFU fan-out rules with QualityDirective
- Updated PRD-coordinated-codec.md with implementation status
- 312 tests passing across all modified crates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:56:46 +04:00
Siavash Sameni
22045bc5e6 feat: adaptive quality in desktop, relay quality directive, Oboe state polling
- Wire AdaptiveQualityController into desktop engine send/recv tasks
  (mirrors Android pattern: AtomicU8 pending_profile, auto-mode check)
- Wire same into Android engine send task (was only in recv before)
- QualityDirective SignalMessage variant for relay-initiated codec switch
- ParticipantQuality tracking in relay RoomManager (per-participant
  AdaptiveQualityController, weakest-link tier computation)
- Relay broadcasts QualityDirective to all participants when room-wide
  tier degrades (coordinated codec switching)
- Oboe stream state polling: poll getState() for up to 2s after
  requestStart() to ensure both streams reach Started before proceeding
  (fixes intermittent silent calls on cold start, Nothing Phone A059)

Tasks: #7, #25, #26, #31, #35

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:54:04 +04:00
Siavash Sameni
766c9df442 feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window
- DredTuner: maps live network metrics (loss/RTT/jitter) to continuous
  DRED duration every ~500ms instead of discrete tier-locked values.
  Includes jitter-spike detection for pre-emptive Starlink-style boost.
- Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports)
- PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval
- TrunkedForwarder respects discovered MTU (was hard-coded 1200)
- QuinnPathSnapshot exposes quinn internal stats + discovered MTU
- AudioEncoder trait: set_expected_loss() + set_dred_duration() methods
- PathMonitor: sliding-window jitter variance for spike detection
- Integrated into both Android and desktop send tasks in engine.rs
- 14 new tests (10 tuner unit + 4 encoder integration)
- Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:38:37 +04:00
Siavash Sameni
a37c8b30fe fix(native): add missing bt_active field to stall detector config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:25:11 +04:00
Siavash Sameni
137fe5f084 fix(bluetooth): BT SCO mode skips 48kHz + VoiceCommunication on capture
Root cause: Oboe capture at 48kHz with InputPreset::VoiceCommunication
cannot open against a BT SCO device (only supports 8/16kHz). The stream
silently falls back to builtin mic, delivering zeros.

Fix: add bt_active flag to WzpOboeConfig. When set, capture skips
setSampleRate and setInputPreset, letting the system route to BT SCO
at its native rate. Oboe's SampleRateConversionQuality::Best resamples
to 48kHz for our ring buffers. Playout uses Usage::Media in BT mode.

New API: wzp_native_audio_start_bt() for BT mode, called from
set_bluetooth_sco(on=true). Normal audio_start() restores the
standard config when switching back to earpiece/speaker.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:23:19 +04:00
Siavash Sameni
5dfb5b3581 fix(bluetooth): use Shared mode for Oboe + delay restart for BT route
Two fixes for BT audio silence:

1. Switch Oboe streams from Exclusive to Shared sharing mode. Exclusive
   mode bypasses Oboe's internal resampler, so opening a 48kHz stream
   against a BT SCO device (8/16kHz only) fails at the AudioPolicy
   level. Shared mode lets Oboe's resampler bridge the gap.

2. Add 500ms post-SCO delay before Oboe restart. The audio policy needs
   time to apply the bt-sco route after setCommunicationDevice returns.
   Without the delay, Oboe opens against the old device (handset).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:14:06 +04:00
Siavash Sameni
fd0ccf8e99 fix(bluetooth): enable Oboe sample rate conversion for BT SCO (8/16kHz)
BT SCO devices only support 8kHz or 16kHz but our Oboe streams request
48kHz. Without resampling, AudioPolicyManager rejects the input stream
("getInputProfile could not find profile for... sampling rate 48000").

Fix: add setSampleRateConversionQuality(Best) to both capture and
playout stream builders. Oboe resamples internally so our ring buffers
stay at 48kHz regardless of the hardware sample rate.

Also removes the broken setBluetoothScoOn/isBluetoothScoOn calls from
stop_bluetooth_sco — just call stopBluetoothSco() unconditionally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:08:48 +04:00
Siavash Sameni
a798634b3d fix(signal): add call_id to Hangup — prevents stale hangup killing new calls
Root cause: Hangup had no call_id field. The relay forwarded hangups to
ALL active calls for a user. When user A hung up call 1 and user B
immediately placed call 2, the relay's processing of A's hangup would
also kill call 2 (race window ~1-2s).

Fix: add optional call_id to Hangup (backwards-compatible via serde
skip_serializing_if). When present, the relay only ends the named call.
Old clients send call_id=None and get the legacy broadcast behavior.

Also: clear pending_path_report in Hangup recv handler and
internal_deregister to prevent stale oneshot channels from blocking
subsequent call setups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:39:21 +04:00
Siavash Sameni
4c1ad841e1 feat(android): Bluetooth audio routing + network change detection + per-arch APK builds
Bluetooth: wire existing AudioRouteManager SCO support through both app
variants. Replace binary speaker toggle with 3-way route cycling
(Earpiece → Speaker → Bluetooth). Tauri side adds JNI bridge functions
(start/stop/query SCO, device availability) and Oboe stream restart.

Network awareness: integrate Android ConnectivityManager to detect
WiFi/cellular transitions and feed them to AdaptiveQualityController
via lock-free AtomicU8 signaling. Enables proactive quality downgrade
and FEC boost on network handoffs.

Build: add --arch flag to build-tauri-android.sh supporting arm64,
armv7, or all (separate per-arch APKs for smaller tester binaries).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:07:41 +04:00
Siavash Sameni
29cd23fe39 fix(p2p): connection cleanup — 4 fixes for stale/dead connections
PRD 4: Disable IPv6 direct dial/accept temporarily. IPv6 QUIC
handshakes succeed but connections die immediately on datagram
send ("connection lost"). IPv4 candidates work reliably. IPv6
candidates still gathered but filtered at dial time.

PRD 1: Close losing transport after Phase 6 negotiation. The
non-selected transport now gets an explicit QUIC close frame
instead of silently dropping after 30s idle timeout. Prevents
phantom connections from polluting future accept() calls.

PRD 2: Harden accept loop with max 3 stale retries. Stale
connections are explicitly closed (conn.close) and counted.
After 3 stale connections, the accept loop aborts instead of
spinning until the race timeout.

PRD 3: Resource cleanup — close old IPv6 endpoint before
creating a new one in place_call/answer_call. Add Drop impl
to CallEngine so tasks are signalled to stop on ungraceful
shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:11:50 +04:00
Siavash Sameni
4d66d3769d fix(relay): set peer_relay_fp on originating relay when answer arrives
The originating relay (where the caller is) never set peer_relay_fp
because the call was created locally. When the callee's answer
arrived via federation, the cross-relay dispatcher handled it but
didn't mark the call as cross-relay. This meant the caller's
MediaPathReport was delivered via local hub.send_to() to a peer
fingerprint that isn't connected locally — silently dropped.

Fix: in the cross-relay answer dispatcher, call
reg.set_peer_relay_fp(call_id, Some(origin_relay_fp)) so the
originating relay knows to forward MediaPathReport via federation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:49:34 +04:00
Siavash Sameni
002df15c5e fix(cli): add .. rest pattern for RegisterPresenceAck error arm
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:32:57 +04:00
Siavash Sameni
1eb82d77b8 feat(relay+client): relay reports build version in Ack
Add relay_build field to RegisterPresenceAck so the client logs
which relay version it connected to. Shows in the debug log as
register_signal:ack_received {"relay_build":"f843a93"}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:27:58 +04:00
Siavash Sameni
f843a934fe fix(relay): forward MediaPathReport across federation
MediaPathReport was only delivered via local signal_hub, so calls
between peers on different relays always hit peer_report_timeout
and fell back to relay — even when direct P2P worked perfectly.

Fix: check peer_relay_fp in call_registry (same pattern as
DirectCallAnswer). If the peer is on a remote relay, wrap in
FederatedSignalForward and send via federation link. Also fix
the cross-relay dispatcher to deliver to BOTH caller and callee
(not just caller), since the report can come from either side.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:14:30 +04:00
Siavash Sameni
1904b19d05 fix(direct): validate A-role accepted connection, skip stale ones
The Acceptor's accept() on the shared signal endpoint can dequeue
a stale QUIC connection from a previous call that the Dialer has
already dropped. This results in "connection lost" errors when
media datagrams are sent — 100% drops on both sides.

Fix: after accepting a connection, check close_reason(). If the
connection is already closed, log a warning and re-accept. Also
verify max_datagram_size() is available before returning.

Additionally: emit transport details (remote addr, max_datagram,
close_reason) in the call_engine_starting debug event so stale
connection issues are visible in the user-facing debug log.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:50:21 +04:00
Siavash Sameni
40955bd11c debug(media): add connection diagnostics for direct P2P drops
When direct P2P calls show 100% datagram drops, we need to know
WHY send_media() fails. This commit adds:

- Remote address + stable_id logging on A-role accept and D-role
  dial success (dual_path.rs) — tells us which candidate won
- Remote address + max_datagram_size on engine transport init —
  verifies datagrams are negotiated
- last_send_err in send heartbeat — captures the actual error
  from send_datagram() failures
- QuinnTransport::remote_address() helper

Also fixes UI badge: was looking for wrong event name
("dual_path_race_won" → "path_negotiated").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:29:58 +04:00
Siavash Sameni
0b62d3e22f fix(cli): add missing build_version fields to Offer/Answer
CLI binary was missing the new caller_build_version and
callee_build_version fields, causing E0063 compile errors on
Linux relay/client builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:09:26 +04:00
Siavash Sameni
bd6733b2e5 feat(signal): advertise build version in Offer/Answer
Add caller_build_version / callee_build_version (git short hash)
to DirectCallOffer and DirectCallAnswer so peers can identify each
other's build in debug logs. Also log own build at register time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:43:55 +04:00