125 Commits

Author SHA1 Message Date
Siavash Sameni
a08a37b5eb fix(video): stabilize relay streams and remote rendering
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m2s
2026-05-26 07:18:22 +04:00
Siavash Sameni
ca164ada5c fix(relay): forward legacy h264 room video stream
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Has been cancelled
2026-05-25 20:46:41 +04:00
Siavash Sameni
2d58bae9ba chore(relay): log video forwarding decisions in debug tap
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
2026-05-25 20:42:24 +04:00
Siavash Sameni
d57ebe3d2c fix(video): force h264 and trace frame pipeline
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m32s
Mirror to GitHub / mirror (push) Failing after 28s
2026-05-25 20:03:11 +04:00
Siavash Sameni
06253fdeeb feat(video+desktop): camera capture, video UI, E2E AEAD wiring, test fixes
Blockers 4 & 5: browser getUserMedia → JPEG IPC → Rust I420 pipeline;
remote video strip renders decoded frames via canvas; EncryptingTransport
wraps QuinnTransport so WZP AEAD is applied to all media (C2 fix).

Test fixes: HandshakeResult.session destructuring across relay/client/crypto
integration tests; video_codecs field added to all CallOffer/CallAnswer
structs; wzp-video pipeline_roundtrip integration tests added.

PRD docs: five Kimi-ready specs for E2E encryption, Android NDK 0.9 migration,
quality upgrade flow, wire-format hardening, and clippy debt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 15:30:26 +04:00
Siavash Sameni
52a6f5e048 fix(audit): address C2, C3, M4, M5 from 2026-05-25 audit
C2: Add EncryptingTransport wrapper — all media I/O now goes through
ChaChaSession encrypt/decrypt before hitting the QUIC datagram path.
cli.rs run_live/run_silence/run_file_mode accept Arc<dyn MediaTransport>
and receive a wrapped transport after the handshake.

C3: Wire VideoScorer::observe() into both plain and trunked forwarding
loops in room.rs. Packets from participants with Abusive verdict are
dropped before forwarding. last_bwe_kbps tracked from quality reports.

M4: Widen FEC repair symbol index from u8 to u16 throughout
(FecEncoder::generate_repair, FecDecoder::add_symbol, all call sites in
call.rs, bench.rs, pipeline.rs, wzp-android). Eliminates theoretical
wrapping when num_source + repair_count > 255.

M5: Track last_encrypt_timestamp in ChaChaSession. debug_assert in
encrypt() that timestamp is non-decreasing across calls (including post-
rekey). complete_rekey() explicitly preserves last_encrypt_timestamp to
prevent accidental timestamp reset regressions.

583 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:20:05 +04:00
Siavash Sameni
12b0d9738f fix(wzp-crypto): derive AEAD nonces from MediaHeader.seq, not recv_seq
The previous scheme built ChaCha20-Poly1305 nonces from an internal
recv_seq counter that incremented once per decrypt() call. Under
in-order delivery recv_seq stayed in sync with the sender's send_seq,
but any out-of-order or lost packet caused them to diverge permanently —
every subsequent packet then used the wrong nonce and AEAD decryption
failed for the rest of the session.

Fix: parse the MediaHeader at the top of both encrypt() and decrypt()
and use header.seq as the nonce input. Both sides now derive the nonce
from the same wire field, surviving reordering by construction.

send_seq / recv_seq are kept as pure packet counters for the rekey
interval trigger; they no longer affect nonce derivation.

All tests updated to pass valid v2 MediaHeader bytes instead of raw
byte literals (the new code requires a parseable header for nonce
derivation). New test decrypt_survives_out_of_order_delivery encrypts
5 packets and delivers them out of order (indices 0,2,1,4,3); this
test would have failed under the old counter-based scheme.

Fixes audit finding C1 from AUDIT-2026-05-25.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:00:01 +04:00
Siavash Sameni
9334aa5ccd T6.1: AV1 encoder/decoder with HW probe + SVT-AV1 SW fallback
- New: av1_obu.rs — OBU framer, depacketizer, keyframe detection, LEB128 helpers
- New: dav1d.rs — SW AV1 decoder wrapper (shiguredo_dav1d)
- New: svt_av1.rs — SW AV1 encoder wrapper (shiguredo_svt_av1)
- Add CodecId::Av1Main = 12 with match-arm fixes in downstream crates
- Add VideoToolboxAv1Decoder for macOS M3+ HW decode
- Add MediaCodecAv1Encoder/Decoder for Android (video/av01)
- Add extract_sequence_header_obu() helper for AV1 decoder CSD
- Add 10-frame encode-decode roundtrip test (svt_av1 + dav1d)
- Fix clippy unused import in dav1d.rs
- 15 tests; all workspace tests pass; cargo fmt clean
2026-05-12 18:44:44 +04:00
Siavash Sameni
f16d650721 T6.2: Tier F video scorer — keyframe periodicity, I/P ratio, BWE responsiveness + 10 tests 2026-05-12 17:42:39 +04:00
Siavash Sameni
517d0ebfe0 T5.7.1: Unify Verdict enum into wzp_relay::verdict, drop RepeatAbusive variant 2026-05-12 16:49:04 +04:00
Siavash Sameni
ffded2a913 clippy: fix wzp-relay lint issues (empty doc, unused var, TokenExhausted, Default, dead field) 2026-05-12 15:40:55 +04:00
Siavash Sameni
fdfaed5390 fmt: cargo fmt --all 2026-05-12 15:40:02 +04:00
Siavash Sameni
dbbab0decf T5.8: Tier G response policy — Verdict enum + ResponsePolicy + typed Hangup::PolicyViolation + 9 tests 2026-05-12 15:13:20 +04:00
Siavash Sameni
5fda5ecc52 T5.7: Tier F audio scorer — IAT CoV + silence fraction + bitrate + Q-flag + bimodality + 11 tests 2026-05-12 15:09:28 +04:00
Siavash Sameni
2bbb664df4 T5.6: Per-receiver layer selection at SFU — ReceiverState + hysteresis + forwarding filter 2026-05-12 15:05:32 +04:00
Siavash Sameni
b197651557 T5.4: H.265 encoder/decoder wrappers — VideoToolbox + MediaCodec, CodecId::H265Main 2026-05-12 14:50:20 +04:00
Siavash Sameni
001d94f9ae T4.7 rework: make should_forward_pli take now: Instant + 6 unit tests
- Refactor should_forward_pli(room, stream_id) -> should_forward_pli(room, stream_id, now: Instant)
  so the 200 ms dedup window is deterministically testable.
- Update the one caller in run_participant_signals to pass Instant::now().
- Add 6 PLI unit tests covering:
  * first PLI forwards
  * duplicate within 200 ms suppressed
  * after 200 ms forwards again
  * different streams independent
  * different rooms independent
  * no stream owner returns None

Addresses reviewer CR on T4.7 (line drawn at T4.6 — stateful relay features must
have state-transition tests).

wzp-relay tests: 93 -> 99 pass.
2026-05-12 11:39:35 +04:00
Siavash Sameni
36b0421d68 T4.7: PLI suppression at SFU — 200 ms dedup window per (room, stream_id) 2026-05-12 11:25:25 +04:00
Siavash Sameni
828fbea2ea T4.6: SFU keyframe cache — per-(room,sender,stream) I-frame replay on join 2026-05-12 10:54:04 +04:00
Siavash Sameni
490d2d31c6 T4.1: wzp-video crate scaffold + H.264 NAL framer + depacketizer 2026-05-12 07:22:54 +04:00
Siavash Sameni
f1b86e0fed T3.5: Tier E per-session token bucket 2026-05-12 06:45:56 +04:00
Siavash Sameni
017c371611 T3.4: Tier D per-codec payload size sanity 2026-05-12 06:24:40 +04:00
Siavash Sameni
e73f8a7150 T3.3: SignalMessage version field 2026-05-12 06:11:59 +04:00
Siavash Sameni
f3398adb95 T3.1: RoomManager concurrency — Arc<RwLock<Room>> per room 2026-05-11 21:12:04 +04:00
Siavash Sameni
54c1a35186 T2.3-T2.6: BWE guard, relay conformance Tier A/B/C, Prometheus metrics 2026-05-11 20:50:22 +04:00
Siavash Sameni
6f81487778 T1.6: Protocol version negotiation in handshake 2026-05-11 15:53:04 +04:00
Siavash Sameni
c93d302656 T1.5: Migrate emit/parse sites to v2 wire format 2026-05-11 12:37:32 +04:00
Siavash Sameni
defd8eab07 fix(signal): send PresenceList directly to new client after ack
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m50s
The broadcast alone wasn't reaching the first client because its
recv loop hadn't started yet when the second client registered.
Now the relay sends PresenceList directly to the new client (right
after RegisterPresenceAck) AND broadcasts to all others.

This guarantees every client gets the full user list:
- New client: via direct send (queued before recv loop starts)
- Existing clients: via broadcast

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:20:37 +04:00
Siavash Sameni
1120c7b579 feat(signal): PresenceList broadcast for lobby user discovery
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 7m21s
Mirror to GitHub / mirror (push) Failing after 27s
New signal infrastructure for the lobby-first UI:

- PresenceUser struct: { fingerprint, alias }
- SignalMessage::PresenceList: relay broadcasts full user list
  to all signal clients on every register/deregister
- SignalHub::presence_list(): builds the list from connected clients
- SignalHub::broadcast(): sends to ALL signal clients
- Relay calls broadcast on register + unregister
- Desktop emits "presence_list" signal-event to JS frontend

This gives clients real-time visibility of who's online via the
signal channel, without needing to join a voice room first.

603 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:12:47 +04:00
Siavash Sameni
bb23976076 feat(quality): upgrade negotiation + asymmetric quality signals (#28, #29, #30)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
New SignalMessage variants for P2P quality coordination:

UpgradeProposal/UpgradeResponse/UpgradeConfirm (#28):
- Consensual quality upgrade flow — proposer sends desired profile,
  peer accepts/rejects based on own conditions, confirm commits both
- All carry call_id for relay routing

QualityCapability (#30):
- Peer reports its max sustainable profile — enables asymmetric
  encoding where each side uses its own best quality instead of
  forcing everyone to the weakest link

Relay forwards all 4 signals to the call peer (same pattern as
MediaPathReport, CandidateUpdate, HardNatProbe).

Desktop signal recv loop handles all 4 with debug logging.
Encoder switching TODOs noted for wiring into CallEngine.

4 new serde roundtrip tests. 603 total, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:25:34 +04:00
Siavash Sameni
f06f9073ae feat(nat): birthday attack module + HardNatBirthdayStart signal (#86, #87)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 25s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
Birthday attack for random symmetric NATs:
- birthday.rs: open_acceptor_ports() opens N sockets, STUN-probes
  each to learn external ports. generate_dialer_targets() builds
  hit list (known ports first, then random fill). spray_dialer()
  sprays QUIC connects with rate limiting, first success wins.
- Default: 32 acceptor ports, 128 dialer probes, 20ms interval

Signal coordination:
- HardNatBirthdayStart { acceptor_ports, external_ip } sent by
  Acceptor when peer's HardNatProbe shows random/sequential NAT
- Relay forwards it like other call signals
- Desktop recv loop handles and logs it

Hybrid waterfall integration:
- On receiving HardNatProbe with non-cone allocation, Acceptor
  auto-opens birthday ports and sends BirthdayStart
- Sockets kept alive 10s for NAT mapping persistence
- Dialer spray integration into race() pending (needs transport
  hot-swap for background upgrade)

6 new tests, 599 total, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:44:36 +04:00
Siavash Sameni
ec1bdf3cd5 feat(nat): hard NAT port allocation detection + prediction + HardNatProbe signal (#29)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m30s
Phase A of hard NAT traversal (PRD-hard-nat.md):

- PortAllocation enum: PortPreserving / Sequential{delta} / Random / Unknown
- detect_port_allocation(): sequential STUN probes from single socket,
  analyzes port sequence for allocation pattern
- classify_port_allocation(): pure function with jitter tolerance,
  wraparound handling, 60% threshold for noisy sequences
- predict_ports(): generates target port range from last_port + delta
- HardNatProbe signal message: carries port_sequence, allocation
  pattern, external_ip for peer coordination
- Relay forwards HardNatProbe to call peer
- Netcheck gains port_allocation field + format_report display

588 tests pass (17 new), 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:29:35 +04:00
Siavash Sameni
8fcf1be341 feat(nat): Tailscale-inspired STUN/ICE + port mapping + mid-call re-gathering (#28)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 23s
Build Release Binaries / build-amd64 (push) Failing after 6m8s
Phase 8: 5 new modules bringing NAT traversal close to Tailscale's approach.

- stun.rs: RFC 5389 STUN client — public server reflexive discovery,
  XOR-MAPPED-ADDRESS parsing, parallel probe with retry, STUN fallback
  in desktop try_reflect_own_addr()
- portmap.rs: NAT-PMP (RFC 6886) + PCP (RFC 6887) + UPnP IGD port
  mapping — gateway discovery, acquire/release/refresh lifecycle,
  new PeerCandidates.mapped candidate type in dial order
- ice_agent.rs: candidate lifecycle — gather(), re_gather(),
  apply_peer_update() with monotonic generation counter,
  CandidateUpdate signal message forwarded by relay
- netcheck.rs: comprehensive diagnostic — NAT type, IPv4/v6,
  port mapping availability, relay latencies, CLI --netcheck
- relay_map.rs: RTT-sorted relay map, preferred() selection,
  populate_from_ack() for RegisterPresenceAck.available_relays

Relay: CallRegistry stores + cross-wires caller/callee_mapped_addr
into CallSetup.peer_mapped_addr. Region config + available_relays
populated from federation peers in RegisterPresenceAck.

Desktop: place_call/answer_call call acquire_port_mapping() and
fill caller/callee_mapped_addr. STUN+relay combined NAT detection.

571 tests pass (66 new), 0 regressions, 0 warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:17:17 +04:00
Siavash Sameni
81b5522942 refactor: clap CLI parser, safety docs, dead code docs, cross-refs
Some checks failed
Mirror to GitHub / mirror (push) Failing after 26s
Build Release Binaries / build-amd64 (push) Failing after 4m1s
Audit items 6, 8, 9, 10:

#6 - Relay CLI: replaced 154-line manual parse_args() with clap derive
     (13 flags/options preserved, auto --help, --version from build hash)
#8 - wzp-native: added # Safety docs to all 3 unsafe extern "C" fns
#9 - wzp-crypto: documented x25519_static_secret/public as reserved for
     future static-key federation auth (not dead code, intentionally unused)
#10 - Cross-references between quality.rs ↔ dred_tuner.rs module docs

368 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:40:49 +04:00
Siavash Sameni
d539a6dfb9 test(federation): 29 tests for federation.rs (was 0), engine dedup PRD
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m45s
Federation test coverage (crates/wzp-relay/tests/federation.rs):
- room_hash: determinism, uniqueness, length, case sensitivity (5)
- is_global_room: static config, call-* implicit, exact match (3)
- resolve_global_room: static + call-* resolution (2)
- global_room_hash: canonical names, fallthrough, independence (4)
- forward_to_peers: zero peers, live QUIC datagram delivery (2)
- broadcast_signal: zero peers, live QUIC signal delivery (2)
- send_signal_to_peer: unknown fingerprint error (1)
- peer lookup: fingerprint normalization, IP, trust priority (5)
- accessors: local_tls_fp, cross_relay_tx, remote_participants (3)
- integration: full media egress over live QUIC link (1)
- edge case: exact room match (1)

Total relay tests: 120 (was 91). Full suite: 368 passing.

Also added PRD-engine-dedup.md for the engine.rs helper extraction
completed in the previous commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:35:04 +04:00
Siavash Sameni
ba12aae439 refactor: extract shared engine helpers, federation clone-before-send, constants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 30s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Engine deduplication (PRD-engine-dedup.md):
- build_call_config(): shared CallConfig construction (was 23 lines × 2)
- codec_to_profile(): shared CodecId → QualityProfile mapping (was 19 lines × 2)
- run_signal_task(): shared signal handler (was 48 lines × 2)
- Net -39 lines from engine.rs, 6 duplicated blocks → single-line calls

Quick wins from REFACTOR-codebase-audit.md:
- 6 magic number constants extracted (CAPTURE_POLL_MS, RECV_TIMEOUT_MS, etc.)
- DRED_POLL_INTERVAL moved from 2 local defs to 1 module-level const
- federation.rs: forward_to_peers, broadcast_signal, send_signal_to_peer
  now clone peer list and release lock before sending (was holding Mutex
  across async I/O — last lock-during-send pattern eliminated)
- main.rs: close_transport() helper replaces 12 silent .ok() calls with
  debug-level logging

314 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:22:44 +04:00
Siavash Sameni
a52b011fb5 feat(relay): replace global Mutex<RoomManager> with DashMap sharding
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
Eliminates the single-lock bottleneck for media forwarding. Before:
all participants across all rooms competed for one Mutex. Now rooms
are stored in DashMap (64 internal shards with per-shard RwLocks).

Changes:
- RoomManager.rooms: HashMap → DashMap<String, Room>
- Per-room quality tracking (qualities, current_tier moved into Room)
- Arc<Mutex<RoomManager>> → Arc<RoomManager> everywhere
- 20 .lock().await sites removed across room.rs, main.rs, federation.rs, ws.rs
- federation forward_to_peers: clone peer list, release lock, then send
- ACL uses std::sync::Mutex (rarely accessed, non-async)

Concurrency improvement:
- Before: 100 rooms × 10 people = 1000 tasks → 1 Mutex
- After: distributed across 64 DashMap shards, ~15 tasks per shard avg
- Rooms are fully independent — room A never blocks room B

314 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:17:57 +04:00
Siavash Sameni
d424515542 feat: 5-tier quality classification, QualityDirective handling, debug tap stats
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
- Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good +
  Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5,
  studio:10)
- Handle QualityDirective signals in both desktop and Android engines
  — relay-coordinated codec switching now works end-to-end
- Add periodic TAP STATS to debug tap: packets in/out, fan-out avg,
  seq gaps, codecs seen (every 5s)
- Mark task #2 done (ParticipantInfo in federation signals already
  implemented)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:23:48 +04:00
Siavash Sameni
ea5fc17c34 fix(relay): debug tap signal logging, dual_path test regression, PRD updates
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m39s
Mirror to GitHub / mirror (push) Failing after 28s
- Add log_signal() and log_event() to DebugTap for RoomUpdate,
  QualityDirective, join/leave lifecycle events (task #11)
- Fix dual_path.rs Phase 7 regression: add missing ipv6_endpoint arg
  to 3 race() call sites
- Update PRDs to reflect actual implementation status: mark adaptive
  quality, coordinated codec, P2P, network awareness, protocol analyzer
- Update PROGRESS.md with QualityDirective gap and dual_path regression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 09:54:52 +04:00
Siavash Sameni
d249b32ee5 test+docs: add tests for QualityDirective, ParticipantQuality; update docs
- QualityDirective signal roundtrip tests (with/without reason)
- ParticipantQuality unit tests (initial tier, degradation, weakest-link)
- Updated PROGRESS.md with desktop adaptive quality, relay coordinated
  switching, Oboe state polling entries
- Updated ARCHITECTURE.md SFU fan-out rules with QualityDirective
- Updated PRD-coordinated-codec.md with implementation status
- 312 tests passing across all modified crates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:56:46 +04:00
Siavash Sameni
22045bc5e6 feat: adaptive quality in desktop, relay quality directive, Oboe state polling
- Wire AdaptiveQualityController into desktop engine send/recv tasks
  (mirrors Android pattern: AtomicU8 pending_profile, auto-mode check)
- Wire same into Android engine send task (was only in recv before)
- QualityDirective SignalMessage variant for relay-initiated codec switch
- ParticipantQuality tracking in relay RoomManager (per-participant
  AdaptiveQualityController, weakest-link tier computation)
- Relay broadcasts QualityDirective to all participants when room-wide
  tier degrades (coordinated codec switching)
- Oboe stream state polling: poll getState() for up to 2s after
  requestStart() to ensure both streams reach Started before proceeding
  (fixes intermittent silent calls on cold start, Nothing Phone A059)

Tasks: #7, #25, #26, #31, #35

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:54:04 +04:00
Siavash Sameni
766c9df442 feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window
- DredTuner: maps live network metrics (loss/RTT/jitter) to continuous
  DRED duration every ~500ms instead of discrete tier-locked values.
  Includes jitter-spike detection for pre-emptive Starlink-style boost.
- Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports)
- PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval
- TrunkedForwarder respects discovered MTU (was hard-coded 1200)
- QuinnPathSnapshot exposes quinn internal stats + discovered MTU
- AudioEncoder trait: set_expected_loss() + set_dred_duration() methods
- PathMonitor: sliding-window jitter variance for spike detection
- Integrated into both Android and desktop send tasks in engine.rs
- 14 new tests (10 tuner unit + 4 encoder integration)
- Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:38:37 +04:00
Siavash Sameni
a798634b3d fix(signal): add call_id to Hangup — prevents stale hangup killing new calls
Root cause: Hangup had no call_id field. The relay forwarded hangups to
ALL active calls for a user. When user A hung up call 1 and user B
immediately placed call 2, the relay's processing of A's hangup would
also kill call 2 (race window ~1-2s).

Fix: add optional call_id to Hangup (backwards-compatible via serde
skip_serializing_if). When present, the relay only ends the named call.
Old clients send call_id=None and get the legacy broadcast behavior.

Also: clear pending_path_report in Hangup recv handler and
internal_deregister to prevent stale oneshot channels from blocking
subsequent call setups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:39:21 +04:00
Siavash Sameni
4d66d3769d fix(relay): set peer_relay_fp on originating relay when answer arrives
The originating relay (where the caller is) never set peer_relay_fp
because the call was created locally. When the callee's answer
arrived via federation, the cross-relay dispatcher handled it but
didn't mark the call as cross-relay. This meant the caller's
MediaPathReport was delivered via local hub.send_to() to a peer
fingerprint that isn't connected locally — silently dropped.

Fix: in the cross-relay answer dispatcher, call
reg.set_peer_relay_fp(call_id, Some(origin_relay_fp)) so the
originating relay knows to forward MediaPathReport via federation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:49:34 +04:00
Siavash Sameni
1eb82d77b8 feat(relay+client): relay reports build version in Ack
Add relay_build field to RegisterPresenceAck so the client logs
which relay version it connected to. Shows in the debug log as
register_signal:ack_received {"relay_build":"f843a93"}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:27:58 +04:00
Siavash Sameni
f843a934fe fix(relay): forward MediaPathReport across federation
MediaPathReport was only delivered via local signal_hub, so calls
between peers on different relays always hit peer_report_timeout
and fell back to relay — even when direct P2P worked perfectly.

Fix: check peer_relay_fp in call_registry (same pattern as
DirectCallAnswer). If the peer is on a remote relay, wrap in
FederatedSignalForward and send via federation link. Also fix
the cross-relay dispatcher to deliver to BOTH caller and callee
(not just caller), since the report can come from either side.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:14:30 +04:00
Siavash Sameni
f5542ef822 feat(p2p): Phase 6 — ICE-style path negotiation
Before Phase 6, each side's dual-path race ran independently and
committed to whichever transport completed first. When one side
picked Direct and the other picked Relay, they sent media to
different places — TX > 0 RX: 0 on both, completely silent call.

Phase 6 adds a negotiation step: after the local race completes,
each side sends a MediaPathReport { call_id, direct_ok, winner }
to the peer through the relay. Both wait for the other's report
before committing a transport to the CallEngine. The decision
rule is simple: if BOTH report direct_ok = true, use direct; if
EITHER reports false, BOTH use relay.

## Wire protocol

New `SignalMessage::MediaPathReport { call_id, direct_ok,
race_winner }`. The relay forwards it to the call peer via the
same signal_hub routing used for DirectCallOffer/Answer. The
cross-relay dispatcher also forwards it.

## dual_path::race restructured

Returns `RaceResult` instead of `(Arc<QuinnTransport>, WinningPath)`:
- `direct_transport: Option<Arc<QuinnTransport>>`
- `relay_transport: Option<Arc<QuinnTransport>>`
- `local_winner: WinningPath`

Both paths are run as spawned tasks. After the first completes,
a 1s grace period lets the loser also finish. The connect
command gets BOTH transports (when available) and picks the
right one based on the negotiation outcome. The unused transport
is dropped.

## connect command flow (revised)

1. Run race() → RaceResult with both transports
2. Send MediaPathReport to relay with our direct_ok
3. Install oneshot; wait for peer's report (3s timeout)
4. Decision: both direct_ok → use direct; else → use relay
5. Start CallEngine with the agreed transport

If the peer never responds (old build, timeout), falls back to
relay — backward compatible.

## Relay forwarding

MediaPathReport is forwarded like DirectCallOffer/Answer: via
signal_hub.send_to(peer_fp) for same-relay calls, and via
cross-relay dispatcher for federated calls.

## Debug log events

- `connect:dual_path_race_done` — local race result
- `connect:path_report_sent` — our report to the peer
- `connect:peer_report_received` — peer's report
- `connect:peer_report_timeout` — peer didn't respond (3s)
- `connect:path_negotiated` — final agreed path with reasons

Full workspace test: 423 passing (no regressions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:03:42 +04:00
Siavash Sameni
026940d492 fix(federation): diagnostic logging for cross-relay media routing
Added warn-level log in handle_datagram when a federation
datagram arrives but no matching local room is found. Prints:
- room_hash (8-byte tag from the datagram)
- active_rooms (all rooms the relay currently has)
- seq + peer label

This diagnoses the cross-relay recv_fr=0 issue: if media IS
arriving from the peer relay but the room hash doesn't match any
active room, the log tells us exactly what hash is expected vs
what rooms exist locally. If no datagram log fires at all, the
issue is upstream (peer relay not forwarding, federation link
down, etc.).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:27:34 +04:00
Siavash Sameni
6cd61fc63b feat(federation): Phase 4.1 — call-* rooms are implicitly global
All rooms with names starting with 'call-' are now treated as
global rooms by the federation pipeline. This enables relay-
mediated media fallback for cross-relay direct calls: when Alice
on Relay A and Bob on Relay B both join the same call-<id> room,
the federation media forwarding pipeline (GlobalRoomActive
announcements + datagram forwarding + presence replication)
kicks in automatically without any runtime registration step.

Previously, cross-relay direct calls that couldn't go P2P
(symmetric NAT on either side) failed with "no media path"
because the call-<id> room wasn't in the configured global_rooms
set and media datagrams weren't forwarded across the federation
link.

The relay's existing ACL for call-* rooms (only the two
authorized fingerprints from the call registry can join)
prevents random clients from creating or eavesdropping on
call rooms.

## Changes

### `is_global_room` (federation.rs)
Added `room.starts_with("call-")` check before the static
global_rooms set lookup. Returns true immediately for any
call-prefixed room.

### `resolve_global_room` (federation.rs)
Return type changed from `Option<&str>` to `Option<String>`
(owned) because call-* room names aren't stored on `self` —
they come from the caller and resolve to themselves as the
canonical name. The 13 callers continue to work via String/&str
auto-deref; 4 HashMap lookups needed explicit `.as_str()` or
`&` borrows.

Full workspace test: 423 passing (no regressions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:55:01 +04:00
Siavash Sameni
fa038df057 feat(p2p): Phase 5.5 — ICE LAN host candidates (IPv4 + IPv6)
Same-LAN P2P was failing because MikroTik masquerade (like most
consumer NATs) doesn't support NAT hairpinning — the advertised
WAN reflex addr is unreachable from a peer on the same LAN as
the advertiser. Phase 5 got us Cone NAT classification and fixed
the measurement artifact, but same-LAN direct dials still had
nowhere to land.

Phase 5.5 adds ICE-style host candidates: each client enumerates
its LAN-local network interface addresses, includes them in the
DirectCallOffer/Answer alongside the reflex addr, and the
dual-path race fans out to ALL peer candidates in parallel.
Same-LAN peers find each other via their RFC1918 IPv4 + ULA /
global-unicast IPv6 addresses without touching the NAT at all.

Dual-stack IPv6 is in scope from the start — on modern ISPs
(including Starlink) the v6 path often works even when v4
hairpinning doesn't, because there's no NAT on the v6 side.

## Changes

### `wzp_client::reflect::local_host_candidates(port)` (new)

Enumerates network interfaces via `if-addrs` and returns
SocketAddrs paired with the caller's port. Filters:

- IPv4: RFC1918 (10/8, 172.16/12, 192.168/16) + CGNAT (100.64/10)
- IPv6: global unicast (2000::/3) + ULA (fc00::/7)
- Skipped: loopback, link-local (169.254, fe80::), public v4
  (already covered by reflex-addr), unspecified

Safe from any thread, one `getifaddrs(3)` syscall.

### Wire protocol (wzp-proto/packet.rs)

Three new `#[serde(default, skip_serializing_if = "Vec::is_empty")]`
fields, backward-compat with pre-5.5 clients/relays by
construction:

- `DirectCallOffer.caller_local_addrs: Vec<String>`
- `DirectCallAnswer.callee_local_addrs: Vec<String>`
- `CallSetup.peer_local_addrs: Vec<String>`

### Call registry (wzp-relay/call_registry.rs)

`DirectCall` gains `caller_local_addrs` + `callee_local_addrs`
Vec<String> fields. New `set_caller_local_addrs` /
`set_callee_local_addrs` setters. Follow the same pattern as
the reflex addr fields.

### Relay cross-wiring (wzp-relay/main.rs)

Both the local-call and cross-relay-federation paths now track
the local_addrs through the registry and inject them into the
CallSetup's peer_local_addrs. Cross-wiring is identical to the
existing peer_direct_addr logic — each party's CallSetup
carries the OTHER party's LAN candidates.

### Client side (desktop/src-tauri/lib.rs)

- `place_call`: gathers local host candidates via
  `local_host_candidates(signal_endpoint.local_addr().port())`
  and includes them in `DirectCallOffer.caller_local_addrs`.
  The port match is critical — it's the Phase 5 shared signal
  socket, so incoming dials to these addrs land on the same
  endpoint that's already listening.
- `answer_call`: same, AcceptTrusted only (privacy mode keeps
  LAN addrs hidden too, for consistency with the reflex addr).
- `connect` Tauri command: new `peer_local_addrs: Vec<String>`
  arg. Builds a `PeerCandidates` bundle and passes it to the
  dual-path race.
- Recv loop's CallSetup handler: destructures + forwards the
  new field to JS via the signal-event payload.

### `dual_path::race` (wzp-client/dual_path.rs)

Signature change: takes `PeerCandidates` (reflex + local Vec)
instead of a single SocketAddr. The D-role branch now fans out
N parallel dials via `tokio::task::JoinSet` — one per candidate
— and the first successful dial wins (losers are aborted
immediately via `set.abort_all()`). Only when ALL candidates
have failed do we return Err; individual candidate failures are
just traced at debug level and the race waits for the others.

LAN host candidates are tried BEFORE the reflex addr in
`PeerCandidates::dial_order()` — they're faster when they work,
and the reflex addr is the fallback for the not-on-same-LAN
case.

### JS side (desktop/main.ts)

`connect` invoke now passes `peerLocalAddrs: data.peer_local_addrs ?? []`
alongside the existing `peerDirectAddr`.

### Tests

All existing test callsites updated for the new Vec<String>
fields (defaults to Vec::new() in tests — they don't exercise
the multi-candidate path). `dual_path.rs` integration tests
wrap the single `dead_peer` / `acceptor_listen_addr` in a
`PeerCandidates { reflexive: Some(_), local: Vec::new() }`.

Full workspace test: 423 passing (same as before 5.5).

## Expected behavior on the reporter's setup

Two phones behind MikroTik, both on the same LAN:

  place_call:host_candidates {"local_addrs": ["192.168.88.21:XXX", "2001:...:YY:XXX"]}
  recv:DirectCallAnswer {"callee_local_addrs": ["192.168.88.22:ZZZ", "2001:...:WW:ZZZ"]}
  recv:CallSetup {"peer_direct_addr":"150.228.49.65:NN",
                  "peer_local_addrs":["192.168.88.22:ZZZ","2001:...:WW:ZZZ"]}
  connect:dual_path_race_start {"peer_reflex":"...","peer_local":[...]}
  dual_path: direct dial succeeded on candidate 0   ← LAN v4 wins
  connect:dual_path_race_won {"path":"Direct"}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 07:34:49 +04:00