67 Commits

Author SHA1 Message Date
Siavash Sameni
a058a83c91 feat(ui): relay list management in settings
Settings now shows relay list with:
- Visual list of all configured relays
- Active relay highlighted in green with "ACTIVE" badge
- Tap a relay to switch (deregisters + reconnects automatically)
- X button to remove a relay (keeps at least 1)
- Add relay with name + address inputs
- Reconnect flow: deregister → clear lobby → auto-connect to new relay

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:37:58 +04:00
Siavash Sameni
9b8013ba7f merge main: PresenceList direct send fix 2026-04-14 18:36:01 +04:00
Siavash Sameni
defd8eab07 fix(signal): send PresenceList directly to new client after ack
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m50s
The broadcast alone wasn't reaching the first client because its
recv loop hadn't started yet when the second client registered.
Now the relay sends PresenceList directly to the new client (right
after RegisterPresenceAck) AND broadcasts to all others.

This guarantees every client gets the full user list:
- New client: via direct send (queued before recv loop starts)
- Existing clients: via broadcast

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:20:37 +04:00
Siavash Sameni
cc23e829b2 feat(ui): handle PresenceList in lobby — show online users
The lobby now populates from PresenceList signal events:
- Relay broadcasts user list on register/deregister
- JS receives "presence_list" signal-event
- Updates lobbyUsers map (excluding self)
- Renders user rows with identicon, name, fingerprint

Users appear in the lobby as soon as they register their
signal channel — no need to join voice first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:13:45 +04:00
Siavash Sameni
18c204c1ff merge main: PresenceList signal for lobby 2026-04-14 18:13:15 +04:00
Siavash Sameni
1120c7b579 feat(signal): PresenceList broadcast for lobby user discovery
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 7m21s
Mirror to GitHub / mirror (push) Failing after 27s
New signal infrastructure for the lobby-first UI:

- PresenceUser struct: { fingerprint, alias }
- SignalMessage::PresenceList: relay broadcasts full user list
  to all signal clients on every register/deregister
- SignalHub::presence_list(): builds the list from connected clients
- SignalHub::broadcast(): sends to ALL signal clients
- Relay calls broadcast on register + unregister
- Desktop emits "presence_list" signal-event to JS frontend

This gives clients real-time visibility of who's online via the
signal channel, without needing to join a voice room first.

603 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:12:47 +04:00
Siavash Sameni
7e7391fdbb feat(ui): lobby-first main.ts rewrite for experimental-ui
Complete JS rewrite for IRC-style lobby flow:

- Auto-connect signal channel on app launch (no connect button)
- Lobby shows online users with identicon, name, voice status
- "Join Voice" FAB toggles room voice on/off
- Tap user → context menu → Direct Call
- Incoming call banner slides up from bottom
- Back button returns from call to lobby
- Settings panel preserved with all debug toggles

~500 lines (down from 1786) — focused on the lobby experience.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:52:51 +04:00
Siavash Sameni
aa0362f318 feat(ui): lobby-first HTML/CSS layout for experimental-ui
New IRC-style lobby layout:
- Auto-connect on launch, drop into user list
- User rows with identicon, name, fingerprint, voice status
- Speaking indicator (green highlight + pulsing)
- Join Voice FAB (green, toggles to Leave/red)
- Incoming call banner (slides up from bottom)
- User context menu (tap user → Call / Message)
- Settings panel preserved from original

The old connect-screen HTML is removed. The call-screen is kept
intact. JS adaptation next.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:43:15 +04:00
Siavash Sameni
bb23976076 feat(quality): upgrade negotiation + asymmetric quality signals (#28, #29, #30)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
New SignalMessage variants for P2P quality coordination:

UpgradeProposal/UpgradeResponse/UpgradeConfirm (#28):
- Consensual quality upgrade flow — proposer sends desired profile,
  peer accepts/rejects based on own conditions, confirm commits both
- All carry call_id for relay routing

QualityCapability (#30):
- Peer reports its max sustainable profile — enables asymmetric
  encoding where each side uses its own best quality instead of
  forcing everyone to the weakest link

Relay forwards all 4 signals to the call peer (same pattern as
MediaPathReport, CandidateUpdate, HardNatProbe).

Desktop signal recv loop handles all 4 with debug logging.
Encoder switching TODOs noted for wiring into CallEngine.

4 new serde roundtrip tests. 603 total, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:25:34 +04:00
Siavash Sameni
18e5e75f33 feat(analyzer): encrypted payload decoding in replay mode (#17)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 20s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
When --key <64-char-hex> is provided with --replay, the analyzer
decrypts each packet's ChaCha20-Poly1305 payload using the session
key and logs plaintext frame sizes. Prints first 5 + every 100th
decrypt result, and a summary at the end.

This completes all 5 protocol analyzer tasks (#13-17):
- #13: Observer mode (live passive listener) — was done
- #14: TUI with Ratatui (per-participant panels) — was done
- #15: Capture and replay (.wzp format) — was done
- #16: HTML report (Chart.js loss/jitter graphs) — was done
- #17: Encrypted decode (--key for replay) — done now

Usage:
  wzp-analyzer --replay session.wzp --key <64-hex-chars> --html report.html

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:07:43 +04:00
Siavash Sameni
488efcb614 feat(ui): birthday attack toggle in settings (default off)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 22s
Build Release Binaries / build-amd64 (push) Failing after 3m36s
New setting: "Birthday attack (opens extra ports for hard NAT)"
- Default: OFF — no extra latency on call setup
- When ON: waits up to 3s for peer's birthday ports if peer has
  non-cone NAT, adds them to the dial race

Gated end-to-end: Settings → localStorage → JS invoke →
Rust connect param → birthday wait + target injection.
LAN/cone calls unaffected regardless of setting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:54:22 +04:00
Siavash Sameni
8c360186df feat(nat): wire birthday attack end-to-end into connect flow
Some checks failed
Mirror to GitHub / mirror (push) Failing after 32s
Build Release Binaries / build-amd64 (push) Failing after 3m19s
Complete Dialer-side birthday attack integration:

- SignalState stores peer_birthday_ports from HardNatBirthdayStart
- connect command: if peer's HardNatProbe shows non-cone NAT, waits
  up to 3s for birthday ports to arrive (Acceptor needs time to open
  32 sockets + STUN-probe each)
- When birthday ports arrive, generate_dialer_targets() builds hit
  list (known ports + random fill) and adds them to PeerCandidates
- All birthday targets go into the dual-path race as extra candidates
- LAN/cone calls skip the wait entirely (gated on allocation type)

Full waterfall now:
1. Standard candidates (reflexive + mapped)     → immediate
2. Port prediction (sequential delta)           → immediate
3. Birthday targets (if non-cone peer)          → +3s wait
4. All of above raced in parallel via JoinSet
5. Relay runs concurrently with 500ms head-start

599 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:50:11 +04:00
Siavash Sameni
f06f9073ae feat(nat): birthday attack module + HardNatBirthdayStart signal (#86, #87)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 25s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
Birthday attack for random symmetric NATs:
- birthday.rs: open_acceptor_ports() opens N sockets, STUN-probes
  each to learn external ports. generate_dialer_targets() builds
  hit list (known ports first, then random fill). spray_dialer()
  sprays QUIC connects with rate limiting, first success wins.
- Default: 32 acceptor ports, 128 dialer probes, 20ms interval

Signal coordination:
- HardNatBirthdayStart { acceptor_ports, external_ip } sent by
  Acceptor when peer's HardNatProbe shows random/sequential NAT
- Relay forwards it like other call signals
- Desktop recv loop handles and logs it

Hybrid waterfall integration:
- On receiving HardNatProbe with non-cone allocation, Acceptor
  auto-opens birthday ports and sends BirthdayStart
- Sockets kept alive 10s for NAT mapping persistence
- Dialer spray integration into race() pending (needs transport
  hot-swap for background upgrade)

6 new tests, 599 total, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:44:36 +04:00
Siavash Sameni
6c49d7436f feat(ui): direct-only mode setting (no relay fallback)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m38s
New toggle in Settings → "Direct-only mode (no relay fallback)":
- Default: OFF (normal behavior, relay fallback on P2P failure)
- When ON: connect returns error if P2P fails, with full
  candidate_diags in the debug log showing why each candidate
  failed. Call never falls back to relay.

Useful for testing NAT traversal — you see the exact failure
reason instead of the call silently working through relay.

Wired end-to-end:
- Settings.directOnly persisted in localStorage
- Passed as directOnly param to Rust connect command
- connect:path_negotiated shows direct_only flag
- connect:direct_only_failed emits on failure with diags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:04:45 +04:00
Siavash Sameni
1de280fe04 fix(nat): working NAT tickle + smart filter debug + timeout diags
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m39s
Fixes from real-world 5G↔Starlink testing:

NAT tickle fix:
- tokio::net::UdpSocket::bind() doesn't set SO_REUSEADDR, so binding
  to the same port as quinn silently failed. Now uses socket2::Socket
  with explicit SO_REUSEADDR + SO_REUSEPORT (via libc on unix).
- Tickle now logs success/failure for debugging.

Diagnostic fixes:
- connect:dual_path_race_start shows both dial_order_raw and
  dial_order_smart so we can see what filtering removed
- Grace-period timeout (relay wins first, direct still running)
  now fills "timeout:grace" diags for unrecorded candidates
- Previously candidate_diags was empty when relay won the race

Dependencies:
- Added socket2 = "0.5" to wzp-client

593 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:58:13 +04:00
Siavash Sameni
bc6d327ebb feat(nat): smart candidate filtering + acceptor NAT tickle + 4s timeout
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
Major P2P improvements for cross-network calls:

Smart candidate filtering (smart_dial_order):
- Strip LAN candidates when peer's public IP differs from ours
  (172.16.x.x is unreachable from a different network)
- Strip all IPv6 candidates (Phase 7 disabled, wastes dial slots)
- Only keep mapped + reflexive for cross-network calls
- LAN candidates preserved when both peers share the same public IP

Acceptor NAT tickle:
- A-role sends a 1-byte UDP packet to each peer candidate BEFORE
  accepting. This opens the NAT pinhole for return traffic from
  the Dialer's IP — critical for address-restricted NATs that only
  allow inbound from IPs they've seen outbound traffic to.
- Uses SO_REUSEADDR on the same port as the quinn endpoint.

Direct timeout increased from 2s to 4s:
- Cross-network QUIC handshakes through CGNAT can take 2-3s
- 2s was too aggressive for 5G/LTE networks

Diagnostic fix:
- Record "timeout:4s" for candidates still in-flight when the
  timeout fires (previously these had no diagnostic entry)

5 new tests for smart_dial_order edge cases.
593 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:42:02 +04:00
Siavash Sameni
c478224d67 fix(ui): remove buffer clear that wiped connect events
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m35s
The callDebugBuffer.length=0 in showCallScreen() ran AFTER the
connect command returned, wiping all connect: events (path_negotiated,
race_start, race_done, candidate_diags). Only media: events survived
because they arrived after the clear.

Removed all automatic buffer clearing. The reverse().find() already
handles stale data by picking the most recent event. The manual
"Clear log" button (line 624) is the only way to clear now.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:25:13 +04:00
Siavash Sameni
16dcc75514 fix(ui): move buffer clear from call-end to call-start
Some checks failed
Mirror to GitHub / mirror (push) Failing after 25s
Build Release Binaries / build-amd64 (push) Failing after 3m42s
Clearing callDebugBuffer in showConnectScreen() wiped all debug
events the moment a call ended, so the user saw empty logs. Moved
the clear to showCallScreen() instead — the buffer is reset at the
START of a new call, not the end. This way:

- After hanging up, all events from the call are still visible
- Starting a new call clears stale data from the previous one
- The reverse().find() for P2P badge still gets fresh data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:17:16 +04:00
Siavash Sameni
db5751985e fix(ui): replace findLast with reverse().find() for WebView compat
Some checks failed
Mirror to GitHub / mirror (push) Failing after 26s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
findLast() requires Chrome 97+ / Android WebView 97+. Older Android
devices crash with TypeError in pollStatus(), killing all status
updates including the debug log. Use [...arr].reverse().find() which
works everywhere.

Also pass peerMappedAddr in the direct-call connect invoke.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:06:07 +04:00
Siavash Sameni
c0dd6c06ff feat(debug): per-candidate dial diagnostics in dual-path race
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m24s
Added CandidateDiag struct to RaceResult with per-candidate:
- address attempted
- result (ok / skipped:ipv6 / error:reason)
- elapsed time in ms

Surfaced in call-debug events:
- connect:dual_path_race_start now includes dial_order + peer_mapped
- connect:dual_path_race_done now includes candidate_diags array

Upgraded dual_path tracing from debug to info for IPv6 skips and
dial failures so they appear in logcat/console.

Helps diagnose why P2P fails on specific networks (5G CGNAT,
address-restricted NATs, etc).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:16:34 +04:00
Siavash Sameni
6805caae0e fix(ui): P2P badge showing stale status from previous call
Some checks failed
Mirror to GitHub / mirror (push) Failing after 26s
Build Release Binaries / build-amd64 (push) Failing after 3m47s
The callDebugBuffer persisted across calls, so .find() returned the
path_negotiated event from Call 1 (P2P Direct) when rendering the
badge during Call 2 (Relay). Two fixes:

1. Clear callDebugBuffer in showConnectScreen() between calls
2. Use .findLast() instead of .find() so the most recent event wins

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:02:06 +04:00
Siavash Sameni
5a03da72d3 feat(ui): selectable NAT detection mode + netcheck Tauri command
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
detect_nat_type now accepts optional `mode` parameter:
- "relay" — relay-based Reflect only (original behavior)
- "stun" — public STUN servers only (no relay needed)
- "both" — relay + STUN in parallel (default, highest confidence)

New run_netcheck Tauri command exposes the full network diagnostic
(NAT type, IPv4/v6, port mapping, relay latencies, port allocation)
to the JS frontend.

JS usage:
  await invoke('detect_nat_type', { relays, mode: 'stun' })
  await invoke('run_netcheck', { relays })

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:43:17 +04:00
Siavash Sameni
e3e63a40a0 feat(nat): wire hard NAT port prediction into call flow (#85)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m27s
End-to-end integration of sequential port prediction:

- place_call: spawns background detect_port_allocation() + sends
  HardNatProbe signal after offer (doesn't delay call setup)
- answer_call: same for AcceptTrusted answers (privacy mode skips)
- Signal recv loop: stashes HardNatProbe in SignalState.peer_hard_nat_probe
- connect: reads peer's probe, if Sequential{delta} runs predict_ports()
  and adds predicted addrs to PeerCandidates.local for the dual-path race
- parse_sequential_delta() helper for "sequential(delta=N)" strings

The full flow: both peers independently detect their NAT's port
allocation, exchange HardNatProbe via relay, and the connect command
uses the peer's sequence to predict which ports to dial — all before
the dual-path race starts.

588 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:39:40 +04:00
Siavash Sameni
7b4bce69d5 docs: update all docs for hard NAT detection + relay wiring
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m36s
- PROGRESS.md: hard NAT Phase A, relay cross-wiring, 588 tests
- ARCHITECTURE.md: hard NAT port prediction diagram + pattern table
- PRD-p2p-direct.md: Phase 8.6 split into a/b/c/d with status
- PRD-hard-nat.md: Phase A done, B signal ready, effort table updated
- PRD-netcheck.md: port_allocation field + probe documented

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:33:12 +04:00
Siavash Sameni
ec1bdf3cd5 feat(nat): hard NAT port allocation detection + prediction + HardNatProbe signal (#29)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m30s
Phase A of hard NAT traversal (PRD-hard-nat.md):

- PortAllocation enum: PortPreserving / Sequential{delta} / Random / Unknown
- detect_port_allocation(): sequential STUN probes from single socket,
  analyzes port sequence for allocation pattern
- classify_port_allocation(): pure function with jitter tolerance,
  wraparound handling, 60% threshold for noisy sequences
- predict_ports(): generates target port range from last_port + delta
- HardNatProbe signal message: carries port_sequence, allocation
  pattern, external_ip for peer coordination
- Relay forwards HardNatProbe to call peer
- Netcheck gains port_allocation field + format_report display

588 tests pass (17 new), 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:29:35 +04:00
Siavash Sameni
ee14862376 docs: add PRD for hard NAT traversal (port prediction + birthday attack)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 22s
Build Release Binaries / build-amd64 (push) Failing after 3m26s
4-phase design:
A. Port allocation pattern detection (sequential vs random)
B. Sequential port prediction (~80% success, <2s)
C. Birthday attack for random NATs (98% success, ~10s)
D. Hybrid waterfall with background relay-to-direct upgrade

Taskmaster tasks #84-87 added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:20:19 +04:00
Siavash Sameni
f83361895e docs: add PRDs for Phase 8 Tailscale-inspired features
Some checks failed
Mirror to GitHub / mirror (push) Failing after 23s
Build Release Binaries / build-amd64 (push) Failing after 3m35s
5 new PRDs:
- PRD-public-stun.md — RFC 5389 STUN client
- PRD-portmap.md — NAT-PMP/PCP/UPnP port mapping
- PRD-ice-regather.md — Mid-call ICE re-gathering
- PRD-netcheck.md — Network diagnostic
- PRD-relay-selection.md — Region-based relay selection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:08:46 +04:00
Siavash Sameni
0857d190ed chore: rename legacy Android build script to prevent accidental use
Some checks failed
Mirror to GitHub / mirror (push) Failing after 30s
Build Release Binaries / build-amd64 (push) Failing after 3m23s
build-android-docker.sh builds the old Kotlin app in android/app/
(18M APK), not the live Tauri app (209M). Renamed to
build-android-docker-LEGACY.sh so it's never picked by accident.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:42:23 +04:00
Siavash Sameni
5d431c0721 fix(android): restore tauri::Emitter import for Docker builder toolchain
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Has been cancelled
Edition 2024 on local macOS auto-resolves the Emitter trait, but the
Docker builder's Rust/Tauri version requires the explicit import for
AppHandle::emit() to resolve. Keeps the warning locally to avoid
breaking the CI build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:34:23 +04:00
Siavash Sameni
8fcf1be341 feat(nat): Tailscale-inspired STUN/ICE + port mapping + mid-call re-gathering (#28)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 23s
Build Release Binaries / build-amd64 (push) Failing after 6m8s
Phase 8: 5 new modules bringing NAT traversal close to Tailscale's approach.

- stun.rs: RFC 5389 STUN client — public server reflexive discovery,
  XOR-MAPPED-ADDRESS parsing, parallel probe with retry, STUN fallback
  in desktop try_reflect_own_addr()
- portmap.rs: NAT-PMP (RFC 6886) + PCP (RFC 6887) + UPnP IGD port
  mapping — gateway discovery, acquire/release/refresh lifecycle,
  new PeerCandidates.mapped candidate type in dial order
- ice_agent.rs: candidate lifecycle — gather(), re_gather(),
  apply_peer_update() with monotonic generation counter,
  CandidateUpdate signal message forwarded by relay
- netcheck.rs: comprehensive diagnostic — NAT type, IPv4/v6,
  port mapping availability, relay latencies, CLI --netcheck
- relay_map.rs: RTT-sorted relay map, preferred() selection,
  populate_from_ack() for RegisterPresenceAck.available_relays

Relay: CallRegistry stores + cross-wires caller/callee_mapped_addr
into CallSetup.peer_mapped_addr. Region config + available_relays
populated from federation peers in RegisterPresenceAck.

Desktop: place_call/answer_call call acquire_port_mapping() and
fill caller/callee_mapped_addr. STUN+relay combined NAT detection.

571 tests pass (66 new), 0 regressions, 0 warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:17:17 +04:00
Siavash Sameni
9377a9009c feat(quality): bandwidth probing for upward adaptive quality (#10)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 25s
Build Release Binaries / build-amd64 (push) Failing after 3m36s
After 30s stable at a tier, the AdaptiveQualityController actively
probes the next tier up by switching the encoder and observing for 5s.
If loss/RTT stay within the target tier's thresholds, the upgrade
commits. If >1 bad report, the probe aborts with a 60s cooldown.

Probing is disabled on cellular (studio tiers aren't classified there)
and skipped when already at Studio64k (highest tier).

This complements the passive upgrade path (10 consecutive good reports)
by actively discovering that a path can sustain higher quality, rather
than waiting for the classification to drift upward.

New: ProbeState struct, check_probe() method, 4 constants, 5 tests.
377 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:47:21 +04:00
Siavash Sameni
4471797edf docs: update all PRDs and PROGRESS to current state (2026-04-13)
Some checks failed
Mirror to GitHub / mirror (push) Has been cancelled
Build Release Binaries / build-amd64 (push) Has been cancelled
Updated 6 PRDs with implementation status:
- PRD-adaptive-quality: P2P quality done, bandwidth probing remains
- PRD-protocol-analyzer: all 5 phases documented
- PRD-relay-concurrency: DashMap + clone-before-send done
- PRD-p2p-direct: P2P adaptive quality update
- PRD-engine-dedup: all phases done
- PROGRESS.md: test count 372+, 3 new change sections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:40:56 +04:00
Siavash Sameni
425c67a08a feat(analyzer): replay, HTML report, encrypted decode stub (#15, #16, #17)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 26s
Build Release Binaries / build-amd64 (push) Failing after 3m31s
#15 - Replay mode: --replay <file.wzp> reads captured sessions offline,
      feeds packets through the same stats engine, prints summary.
      CaptureReader mirrors CaptureWriter's binary format.

#16 - HTML report: --html <report.html> generates self-contained HTML
      with Chart.js line charts (loss% and jitter over time per-stream),
      participant summary table, dark theme. Works with live sessions
      (after exit) or replay mode.

#17 - Encrypted decode: --key <hex> flag accepted and stored. Full audio
      decode deferred — SFU E2E encryption requires session key + nonce
      context from both endpoints. Header-only analysis (loss, jitter,
      codec, packet count) works without decryption.

Usage:
  wzp-analyzer --replay session.wzp --html report.html
  wzp-analyzer relay:4433 --room test --capture out.wzp --html report.html

372 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:31:28 +04:00
Siavash Sameni
88ca3e099a feat: wzp-analyzer binary — protocol analyzer with TUI (#13, #14, #15)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m20s
New binary: wzp-analyzer joins a room as a passive observer and displays
real-time per-participant quality metrics.

Features:
- Passive observation: connects to relay, receives all media, never sends
- Participant detection: identifies senders by sequence number streams
- Per-participant stats: packets, loss%, jitter, codec, codec switches
- TUI mode (ratatui): color-coded table (green/yellow/red by loss),
  10 FPS refresh, session header, quit with q/Ctrl+C
- No-TUI mode: prints stats to stderr every 2s (for headless/CI use)
- Capture mode: binary .wzp format with microsecond timestamps for
  offline replay (magic WZP\x01, JSON header, per-packet records)
- Session summary on exit

Usage:
  wzp-analyzer 193.180.213.68:4433 --room general
  wzp-analyzer 193.180.213.68:4433 --room general --no-tui --duration 60
  wzp-analyzer 193.180.213.68:4433 --room general --capture session.wzp

372 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:26:46 +04:00
Siavash Sameni
1e82811cc1 feat(p2p): adaptive quality on direct calls (#23)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m37s
P2P calls now adapt codec quality based on observed network conditions,
matching what relay calls already had.

Three-layer implementation:
- QualityReport::from_path_stats(): construct reports from local quinn
  stats (loss%, RTT, jitter) without needing relay-generated reports
- CallEncoder.pending_quality_report: one-shot attachment to next
  source packet (consumed on encode, not repeated)
- Engine send tasks: generate quality report every 50 frames (~1s)
  from quinn_path_stats() and attach via set_pending_quality_report()
- Engine recv tasks: self-observe from own QUIC path stats every 50
  packets, feed to AdaptiveQualityController for P2P adaptation
  (works even if peer isn't sending quality reports yet)

Both relay and P2P calls now have adaptive quality. On relay calls,
both peer-sent reports AND local observations feed the controller.
Hysteresis (3 consecutive bad reports to downgrade) prevents thrashing.

372 tests passing (+4 new: from_path_stats encoding, clamping, zero
values, encoder quality report attachment).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:14:06 +04:00
Siavash Sameni
81b5522942 refactor: clap CLI parser, safety docs, dead code docs, cross-refs
Some checks failed
Mirror to GitHub / mirror (push) Failing after 26s
Build Release Binaries / build-amd64 (push) Failing after 4m1s
Audit items 6, 8, 9, 10:

#6 - Relay CLI: replaced 154-line manual parse_args() with clap derive
     (13 flags/options preserved, auto --help, --version from build hash)
#8 - wzp-native: added # Safety docs to all 3 unsafe extern "C" fns
#9 - wzp-crypto: documented x25519_static_secret/public as reserved for
     future static-key federation auth (not dead code, intentionally unused)
#10 - Cross-references between quality.rs ↔ dred_tuner.rs module docs

368 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:40:49 +04:00
Siavash Sameni
d539a6dfb9 test(federation): 29 tests for federation.rs (was 0), engine dedup PRD
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m45s
Federation test coverage (crates/wzp-relay/tests/federation.rs):
- room_hash: determinism, uniqueness, length, case sensitivity (5)
- is_global_room: static config, call-* implicit, exact match (3)
- resolve_global_room: static + call-* resolution (2)
- global_room_hash: canonical names, fallthrough, independence (4)
- forward_to_peers: zero peers, live QUIC datagram delivery (2)
- broadcast_signal: zero peers, live QUIC signal delivery (2)
- send_signal_to_peer: unknown fingerprint error (1)
- peer lookup: fingerprint normalization, IP, trust priority (5)
- accessors: local_tls_fp, cross_relay_tx, remote_participants (3)
- integration: full media egress over live QUIC link (1)
- edge case: exact room match (1)

Total relay tests: 120 (was 91). Full suite: 368 passing.

Also added PRD-engine-dedup.md for the engine.rs helper extraction
completed in the previous commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:35:04 +04:00
Siavash Sameni
ba12aae439 refactor: extract shared engine helpers, federation clone-before-send, constants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 30s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Engine deduplication (PRD-engine-dedup.md):
- build_call_config(): shared CallConfig construction (was 23 lines × 2)
- codec_to_profile(): shared CodecId → QualityProfile mapping (was 19 lines × 2)
- run_signal_task(): shared signal handler (was 48 lines × 2)
- Net -39 lines from engine.rs, 6 duplicated blocks → single-line calls

Quick wins from REFACTOR-codebase-audit.md:
- 6 magic number constants extracted (CAPTURE_POLL_MS, RECV_TIMEOUT_MS, etc.)
- DRED_POLL_INTERVAL moved from 2 local defs to 1 module-level const
- federation.rs: forward_to_peers, broadcast_signal, send_signal_to_peer
  now clone peer list and release lock before sending (was holding Mutex
  across async I/O — last lock-during-send pattern eliminated)
- main.rs: close_transport() helper replaces 12 silent .ok() calls with
  debug-level logging

314 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:22:44 +04:00
Siavash Sameni
fdb78e08bd docs: full codebase refactoring audit with prioritized suggestions
Some checks failed
Mirror to GitHub / mirror (push) Failing after 32s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
Comprehensive analysis across all 8 crates + Tauri engine covering:
- engine.rs: 35% duplication between Android/desktop (350+ lines)
- SignalMessage: 36 variants mixing orthogonal concerns
- federation.rs: zero test coverage on 1,132 lines of complex logic
- peer_links: lock held across async sends (last lock-during-I/O)
- Magic numbers, error handling, CLI parsing, unsafe docs
- Priority matrix: 10 items ranked by effort/impact/risk

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:35:59 +04:00
Siavash Sameni
3a51db998a docs: relay concurrency refactor guide + PRD update for DashMap
Some checks failed
Mirror to GitHub / mirror (push) Failing after 25s
Build Release Binaries / build-amd64 (push) Failing after 8m3s
REFACTOR-relay-concurrency.md: complete post-DashMap analysis with
current lock inventory, 4 prioritized suggestions (clone-before-send,
peer_links DashMap, quality atomics, arc-swap snapshots), decision
matrix, and concurrency diagram.

PRD-relay-concurrency.md: updated to recommend DashMap as primary
approach (was Option A per-room locks).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:27:26 +04:00
Siavash Sameni
a52b011fb5 feat(relay): replace global Mutex<RoomManager> with DashMap sharding
Some checks failed
Mirror to GitHub / mirror (push) Failing after 24s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
Eliminates the single-lock bottleneck for media forwarding. Before:
all participants across all rooms competed for one Mutex. Now rooms
are stored in DashMap (64 internal shards with per-shard RwLocks).

Changes:
- RoomManager.rooms: HashMap → DashMap<String, Room>
- Per-room quality tracking (qualities, current_tier moved into Room)
- Arc<Mutex<RoomManager>> → Arc<RoomManager> everywhere
- 20 .lock().await sites removed across room.rs, main.rs, federation.rs, ws.rs
- federation forward_to_peers: clone peer list, release lock, then send
- ACL uses std::sync::Mutex (rarely accessed, non-async)

Concurrency improvement:
- Before: 100 rooms × 10 people = 1000 tasks → 1 Mutex
- After: distributed across 64 DashMap shards, ~15 tasks per shard avg
- Rooms are fully independent — room A never blocks room B

314 tests passing, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:17:57 +04:00
Siavash Sameni
2514151a89 docs: PRD for relay concurrency — per-room lock sharding
Some checks failed
Mirror to GitHub / mirror (push) Failing after 32s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
Full analysis of relay lock contention with precise inventory of every
lock acquisition in the hot path. Evaluates 4 design options:
A) Per-room Arc<Mutex<Room>> (recommended — 100x improvement for multi-room)
B) DashMap (good but less explicit)
C) Channel-based fan-out (over-engineered for current scale)
D) Snapshot-on-change via arc-swap (best perf, more complex)

Phase 1: per-room locks, Phase 2: federation lock fix, Phase 3: quality
tracking out of critical path. Estimated 1.5-2.5 days total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:01:21 +04:00
Siavash Sameni
f265fd772d docs: relay concurrency model, Opus6k fix, build script fixes
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 3m56s
- ARCHITECTURE.md: new "Relay Concurrency Model" section documenting
  threading, shared state locking table, scaling characteristics, and
  the RoomManager Mutex as primary bottleneck
- PROGRESS.md: Opus6k frame starvation fix, build script fixes
- PRD-dred-integration.md: Opus6k frame starvation bug documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:54:37 +04:00
Siavash Sameni
9ae9441de4 fix(audio): check capture ring available before read (fixes Opus6k choppy)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 32s
Build Release Binaries / build-amd64 (push) Failing after 3m58s
Partial reads from the capture ring consumed samples that were then
discarded when the send loop retried from buf[0]. For 20ms codecs this
was invisible (single Oboe burst fills 960 samples in one read), but
40ms codecs (Opus6k, 1920 samples) needed 2 bursts — the first partial
read consumed 960 real samples and threw them away.

Result: Opus6k produced ~11 frames/s instead of 25 (~44% of expected).

Fix: expose wzp_native_audio_capture_available() and check it before
reading, matching the desktop capture_ring.available() pattern. Partial
reads no longer occur because we only read when enough samples exist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:46:15 +04:00
Siavash Sameni
d9e7e72978 docs: update PROGRESS, PRDs for completed tasks #9, #11, #12, #27
Some checks failed
Mirror to GitHub / mirror (push) Failing after 28s
Build Release Binaries / build-amd64 (push) Failing after 3m50s
- PROGRESS.md: add 2026-04-13 section with 5-tier quality, QualityDirective
  handling, debug tap enhancements, dual_path fix, keystore sync
- PRD-coordinated-codec.md: Phase 3 marked complete (client directive handling)
- PRD-adaptive-quality.md: milestone table updated with Done/Pending status

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:34:01 +04:00
Siavash Sameni
8ff0c548a7 fix(audio): update frame_samples on codec profile switch, fix buf sizing
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Has been cancelled
frame_samples was immutable — when adaptive quality switched from 20ms
(Opus24k, 960 samples) to 40ms (Opus6k, 1920 samples), the send loop
kept reading 960 samples and feeding half-sized frames to the encoder.
This caused Opus6k to produce ~11 frames/s instead of 25, making audio
choppy.

Fix:
- frame_samples is now mut and updated on profile switch
- buf sized for max frame (1920) with frame_samples-bounded slices
- RMS, mute, encode, and capture reads all use &buf[..frame_samples]
- Applied to both Android and desktop send tasks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:33:02 +04:00
Siavash Sameni
f17420aa98 fix(build): sync keystores from persistent cache before build
Some checks failed
Mirror to GitHub / mirror (push) Failing after 27s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
Keystores are gitignored so git reset --hard deletes them. The build
script now copies them from a persistent $BASE_DIR/data/keystore/ cache
into the source tree before building. This ensures both primary and alt
servers always have signing keys available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:11:28 +04:00
Siavash Sameni
d424515542 feat: 5-tier quality classification, QualityDirective handling, debug tap stats
Some checks failed
Mirror to GitHub / mirror (push) Failing after 31s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
- Extend Tier enum from 3 to 6 levels: Studio64k/48k/32k + Good +
  Degraded + Catastrophic with asymmetric hysteresis (down:3, up:5,
  studio:10)
- Handle QualityDirective signals in both desktop and Android engines
  — relay-coordinated codec switching now works end-to-end
- Add periodic TAP STATS to debug tap: packets in/out, fan-out avg,
  seq gaps, codecs seen (every 5s)
- Mark task #2 done (ParticipantInfo in federation signals already
  implemented)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:23:48 +04:00
Siavash Sameni
ea5fc17c34 fix(relay): debug tap signal logging, dual_path test regression, PRD updates
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m39s
Mirror to GitHub / mirror (push) Failing after 28s
- Add log_signal() and log_event() to DebugTap for RoomUpdate,
  QualityDirective, join/leave lifecycle events (task #11)
- Fix dual_path.rs Phase 7 regression: add missing ipv6_endpoint arg
  to 3 race() call sites
- Update PRDs to reflect actual implementation status: mark adaptive
  quality, coordinated codec, P2P, network awareness, protocol analyzer
- Update PROGRESS.md with QualityDirective gap and dual_path regression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 09:54:52 +04:00
Siavash Sameni
1a7dd935ee fix(build): add zipalign + apksigner signing to build.sh
Some checks failed
Mirror to GitHub / mirror (push) Failing after 43s
Build Release Binaries / build-amd64 (push) Failing after 3m44s
build.sh was producing unsigned APKs because it reimplemented the Docker
build inline without the signing step from build-tauri-android.sh. Now
uses the same pipeline: find keystore (release preferred, debug fallback),
zipalign -f 4, apksigner sign with keystore credentials.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:13:20 +04:00
Siavash Sameni
a7c2261b70 fix(build): clean stale APKs before build, prefer release APK on upload
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m50s
find was picking up a cached 384MB debug APK over the fresh 25MB release
APK because the old file was listed first. Now:
1. Delete all APKs before the build starts (clean slate)
2. On upload, prefer *release*.apk over any other match

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:08:06 +04:00
Siavash Sameni
eca0bb7531 Merge branch 'opus-DRED-v2'
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m26s
2026-04-12 19:57:35 +04:00
Siavash Sameni
6f43415285 merge opus-DRED-v2 into main
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m25s
50 commits: BT audio routing, network change detection, Hangup call_id,
per-arch APK builds, setCommunicationDevice API 31+, deferred
MODE_IN_COMMUNICATION, Oboe BT mode, build signing, doc updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:41:57 +04:00
Siavash Sameni
d36feb2b59 ci: skip build on CI-only file changes
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Add paths-ignore for .gitea/** so build.yml doesn't waste runner time
when only workflow files are modified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:12:31 +04:00
Siavash Sameni
baf82d935b ci: add GitHub mirror workflow
Automatically pushes branches and tags to github.com:manawenuz/wzp.git
on every push to Forgejo. Uses GH_SSH_KEY secret for authentication.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 19:50:39 +04:00
Siavash Sameni
6eb10327c1 fix: use jq instead of python3 for JSON parsing in CI
ubuntu:24.04 doesn't have python3 installed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:47:04 +04:00
Siavash Sameni
50339542fa feat: upload build artifacts as Forgejo releases via API
JS-based upload-artifact action doesn't work with act runner.
Use curl to create a pre-release and attach the tarball instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:36:28 +04:00
Siavash Sameni
c67fa18f14 fix: add missing QualityProfile import in featherchat test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:26:54 +04:00
Siavash Sameni
6c5c4cb671 fix: add libssl-dev for openssl-sys build in CI
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:16:39 +04:00
Siavash Sameni
8816f13df8 fix: use stable Rust toolchain — time crate requires rustc >= 1.88
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:05:56 +04:00
Siavash Sameni
3804b0bf46 fix: use plain HTTPS for featherChat submodule (now public)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:56:42 +04:00
Siavash Sameni
234f3c4bfe fix: use HTTPS + token for featherChat submodule clone in CI
SSH has no keys in the container. Use exact URL remap to
https://<token>@git.tbs.amn.gg/manawenuz/featherChat.git

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:50:24 +04:00
Siavash Sameni
e97f278390 fix: remap submodule to Forgejo SSH URL for CI clone
Use ssh://git@git.tbs.amn.gg:2222/ instead of HTTPS token auth
which gets 403 on cross-repo access.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:48:08 +04:00
Siavash Sameni
f6a77da948 fix: init submodules in CI — remap SSH URLs to Forgejo HTTPS with token
wzp-crypto depends on deps/featherchat (git submodule). Remap the
origin SSH URL to the Forgejo HTTPS mirror with github.token auth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:45:25 +04:00
Siavash Sameni
82015a78af fix: authenticate git clone with GITHUB_TOKEN for private repo
The act runner can't clone a private repo over HTTPS without credentials.
Inject the auto-provided github.token into the clone URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:34:04 +04:00
Siavash Sameni
cb13af8abd fix: remove all JS-based actions for Forgejo act runner compatibility
act runner uses bare ubuntu:24.04 without Node.js — actions/checkout,
actions/upload-artifact, etc. all fail. Replace with plain git clone
and shell commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:31:43 +04:00
Siavash Sameni
0b8276b9c7 fix: CI workflow for Forgejo act runner — drop container, install Rust via rustup
The act runner doesn't have Node.js in the rust:1-bookworm container,
breaking JS-based actions (checkout, cache, upload-artifact).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:29:31 +04:00
61 changed files with 12384 additions and 2422 deletions

361
Cargo.lock generated
View File

@@ -51,6 +51,12 @@ dependencies = [
"alloc-no-stdlib",
]
[[package]]
name = "allocator-api2"
version = "0.2.21"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "683d7910e743518b0e34f1186f92494becacb047c7b6bf616c96772180fef923"
[[package]]
name = "alsa"
version = "0.9.1"
@@ -88,6 +94,56 @@ dependencies = [
"libc",
]
[[package]]
name = "anstream"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "824a212faf96e9acacdbd09febd34438f8f711fb84e09a8916013cd7815ca28d"
dependencies = [
"anstyle",
"anstyle-parse",
"anstyle-query",
"anstyle-wincon",
"colorchoice",
"is_terminal_polyfill",
"utf8parse",
]
[[package]]
name = "anstyle"
version = "1.0.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "940b3a0ca603d1eade50a4846a2afffd5ef57a9feac2c0e2ec2e14f9ead76000"
[[package]]
name = "anstyle-parse"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "52ce7f38b242319f7cabaa6813055467063ecdc9d355bbb4ce0c68908cd8130e"
dependencies = [
"utf8parse",
]
[[package]]
name = "anstyle-query"
version = "1.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "40c48f72fd53cd289104fc64099abca73db4166ad86ea0b4341abe65af83dadc"
dependencies = [
"windows-sys 0.61.2",
]
[[package]]
name = "anstyle-wincon"
version = "3.0.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "291e6a250ff86cd4a820112fb8898808a366d8f9f58ce16d1f538353ad55747d"
dependencies = [
"anstyle",
"once_cell_polyfill",
"windows-sys 0.61.2",
]
[[package]]
name = "anyhow"
version = "1.0.102"
@@ -172,7 +228,7 @@ dependencies = [
"futures-lite",
"parking",
"polling",
"rustix",
"rustix 1.1.4",
"slab",
"windows-sys 0.61.2",
]
@@ -203,7 +259,7 @@ dependencies = [
"cfg-if",
"event-listener",
"futures-lite",
"rustix",
"rustix 1.1.4",
]
[[package]]
@@ -229,7 +285,7 @@ dependencies = [
"cfg-if",
"futures-core",
"futures-io",
"rustix",
"rustix 1.1.4",
"signal-hook-registry",
"slab",
"windows-sys 0.61.2",
@@ -723,6 +779,21 @@ dependencies = [
"toml 0.9.12+spec-1.1.0",
]
[[package]]
name = "cassowary"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "df8670b8c7b9dae1793364eafadf7239c40d669904660c5960d74cfd80b46a53"
[[package]]
name = "castaway"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dec551ab6e7578819132c713a93c022a05d60159dc86e7a7050223577484c55a"
dependencies = [
"rustversion",
]
[[package]]
name = "cc"
version = "1.2.60"
@@ -851,7 +922,7 @@ checksum = "4ea181bf566f71cb9a5d17a59e1871af638180a18fb0035c92ae62b705207123"
dependencies = [
"atty",
"bitflags 1.3.2",
"clap_lex",
"clap_lex 0.2.4",
"indexmap 1.9.3",
"once_cell",
"strsim 0.10.0",
@@ -859,6 +930,40 @@ dependencies = [
"textwrap",
]
[[package]]
name = "clap"
version = "4.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b193af5b67834b676abd72466a96c1024e6a6ad978a1f484bd90b85c94041351"
dependencies = [
"clap_builder",
"clap_derive",
]
[[package]]
name = "clap_builder"
version = "4.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "714a53001bf66416adb0e2ef5ac857140e7dc3a0c48fb28b2f10762fc4b5069f"
dependencies = [
"anstream",
"anstyle",
"clap_lex 1.1.0",
"strsim 0.11.1",
]
[[package]]
name = "clap_derive"
version = "4.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1110bd8a634a1ab8cb04345d8d878267d57c3cf1b38d91b71af6686408bbca6a"
dependencies = [
"heck 0.5.0",
"proc-macro2",
"quote",
"syn 2.0.117",
]
[[package]]
name = "clap_lex"
version = "0.2.4"
@@ -868,6 +973,12 @@ dependencies = [
"os_str_bytes",
]
[[package]]
name = "clap_lex"
version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c8d4a3bb8b1e0c1050499d1815f5ab16d04f0959b233085fb31653fbfc9d98f9"
[[package]]
name = "cmake"
version = "0.1.58"
@@ -883,6 +994,12 @@ version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2cefd04ca4a2f096acf5f44da5e5931436d030a620901f1fe8fa773e6b9de65b"
[[package]]
name = "colorchoice"
version = "1.0.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1d07550c9036bf2ae0c684c4297d503f838287c83c53686d05370d0e139ae570"
[[package]]
name = "combine"
version = "4.6.7"
@@ -893,6 +1010,20 @@ dependencies = [
"memchr",
]
[[package]]
name = "compact_str"
version = "0.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3b79c4069c6cad78e2e0cdfcbd26275770669fb39fd308a752dc110e83b9af32"
dependencies = [
"castaway",
"cfg-if",
"itoa",
"rustversion",
"ryu",
"static_assertions",
]
[[package]]
name = "concurrent-queue"
version = "2.5.0"
@@ -1050,6 +1181,31 @@ version = "0.8.21"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28"
[[package]]
name = "crossterm"
version = "0.28.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "829d955a0bb380ef178a640b91779e3987da38c9aea133b20614cfed8cdea9c6"
dependencies = [
"bitflags 2.11.0",
"crossterm_winapi",
"mio",
"parking_lot",
"rustix 0.38.44",
"signal-hook",
"signal-hook-mio",
"winapi",
]
[[package]]
name = "crossterm_winapi"
version = "0.9.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "acdd7c62a3665c7f6830a51635d9ac9b23ed385797f70a83bb8bafe9c572ab2b"
dependencies = [
"winapi",
]
[[package]]
name = "crunchy"
version = "0.2.4"
@@ -1191,6 +1347,20 @@ dependencies = [
"syn 2.0.117",
]
[[package]]
name = "dashmap"
version = "6.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5041cc499144891f3790297212f32a74fb938e5136a14943f338ef9e0ae276cf"
dependencies = [
"cfg-if",
"crossbeam-utils",
"hashbrown 0.14.5",
"lock_api",
"once_cell",
"parking_lot_core",
]
[[package]]
name = "dasp"
version = "0.11.0"
@@ -2341,12 +2511,20 @@ version = "0.12.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8a9ee70c43aaf417c914396645a0fa852624801b24ebb7ae78fe8272889ac888"
[[package]]
name = "hashbrown"
version = "0.14.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e5274423e17b7c9fc20b6e7e208532f9b19825d82dfd615708b70edd83df41f1"
[[package]]
name = "hashbrown"
version = "0.15.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9229cfe53dfd69f0609a49f65461bd93001ea1ef889cd5529dd176593f5338a1"
dependencies = [
"allocator-api2",
"equivalent",
"foldhash 0.1.5",
]
@@ -2756,6 +2934,15 @@ dependencies = [
"serde_core",
]
[[package]]
name = "indoc"
version = "2.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "79cf5c93f93228cf8efb3ba362535fb11199ac548a09ce117c9b1adc3030d706"
dependencies = [
"rustversion",
]
[[package]]
name = "infer"
version = "0.19.0"
@@ -2774,6 +2961,19 @@ dependencies = [
"generic-array",
]
[[package]]
name = "instability"
version = "0.3.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5eb2d60ef19920a3a9193c3e371f726ec1dafc045dac788d0fb3704272458971"
dependencies = [
"darling",
"indoc",
"proc-macro2",
"quote",
"syn 2.0.117",
]
[[package]]
name = "ipnet"
version = "2.12.0"
@@ -2809,6 +3009,12 @@ dependencies = [
"once_cell",
]
[[package]]
name = "is_terminal_polyfill"
version = "1.70.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a6cb138bb79a146c1bd460005623e142ef0181e3d0219cb493e02f7d08a35695"
[[package]]
name = "itertools"
version = "0.13.0"
@@ -3050,6 +3256,12 @@ dependencies = [
"libc",
]
[[package]]
name = "linux-raw-sys"
version = "0.4.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d26c52dbd32dccf2d10cac7725f8eae5296885fb5703b261f7d0a0739ec807ab"
[[package]]
name = "linux-raw-sys"
version = "0.12.1"
@@ -3077,6 +3289,15 @@ version = "0.4.29"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"
[[package]]
name = "lru"
version = "0.12.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "234cf4f4a04dc1f57e24b96cc0cd600cf2af460d4161ac5ecdd0af8e1f3b2a38"
dependencies = [
"hashbrown 0.15.5",
]
[[package]]
name = "lru-slab"
version = "0.1.2"
@@ -3227,6 +3448,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "50b7e5b27aa02a74bac8c3f23f448f8d87ff11f92d3aac1a6ed369ee08cc56c1"
dependencies = [
"libc",
"log",
"wasi 0.11.1+wasi-snapshot-preview1",
"windows-sys 0.61.2",
]
@@ -3335,7 +3557,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "805d5964d1e7a0006a7fdced7dae75084d66d18b35f1dfe81bd76929b1f8da0c"
dependencies = [
"anyhow",
"clap",
"clap 3.2.25",
"dasp",
"dasp_interpolate",
"dasp_ring_buffer",
@@ -3612,6 +3834,12 @@ version = "1.21.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9f7c3e4beb33f85d45ae3e3a1792185706c8e16d043238c593331cc7cd313b50"
[[package]]
name = "once_cell_polyfill"
version = "1.70.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "384b8ab6d37215f3c5301a95a4accb5d64aa607f1fcb26a11b5303878451b4fe"
[[package]]
name = "opaque-debug"
version = "0.3.1"
@@ -3778,6 +4006,12 @@ dependencies = [
"windows-link 0.2.1",
]
[[package]]
name = "paste"
version = "1.0.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "57c0d7b74b563b49d38dae00a0c37d4d6de9b432382b2892f0574ddcae73fd0a"
[[package]]
name = "pathdiff"
version = "0.2.3"
@@ -4056,7 +4290,7 @@ dependencies = [
"concurrent-queue",
"hermit-abi 0.5.2",
"pin-project-lite",
"rustix",
"rustix 1.1.4",
"windows-sys 0.61.2",
]
@@ -4421,6 +4655,27 @@ version = "2.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4d4215fb79ef19442a0c71616aabb0715a386e6a16ed9031775ee3e3f20e7502"
[[package]]
name = "ratatui"
version = "0.29.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "eabd94c2f37801c20583fc49dd5cd6b0ba68c716787c2dd6ed18571e1e63117b"
dependencies = [
"bitflags 2.11.0",
"cassowary",
"compact_str",
"crossterm",
"indoc",
"instability",
"itertools",
"lru",
"paste",
"strum",
"unicode-segmentation",
"unicode-truncate",
"unicode-width 0.2.0",
]
[[package]]
name = "raw-window-handle"
version = "0.6.2"
@@ -4651,6 +4906,19 @@ dependencies = [
"transpose",
]
[[package]]
name = "rustix"
version = "0.38.44"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fdb5bc1ae2baa591800df16c9ca78619bf65c0488b41b96ccec5d11220d8c154"
dependencies = [
"bitflags 2.11.0",
"errno",
"libc",
"linux-raw-sys 0.4.15",
"windows-sys 0.59.0",
]
[[package]]
name = "rustix"
version = "1.1.4"
@@ -4660,7 +4928,7 @@ dependencies = [
"bitflags 2.11.0",
"errno",
"libc",
"linux-raw-sys",
"linux-raw-sys 0.12.1",
"windows-sys 0.61.2",
]
@@ -5191,6 +5459,17 @@ dependencies = [
"signal-hook-registry",
]
[[package]]
name = "signal-hook-mio"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b75a19a7a740b25bc7944bdee6172368f988763b744e3d4dfe753f6b4ece40cc"
dependencies = [
"libc",
"mio",
"signal-hook",
]
[[package]]
name = "signal-hook-registry"
version = "1.4.8"
@@ -5325,6 +5604,12 @@ version = "1.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6ce2be8dc25455e1f91df71bfa12ad37d7af1092ae736f3a6cd0e37bc7810596"
[[package]]
name = "static_assertions"
version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a2eb9349b6444b326872e140eb1cf5e7c522154d69e7a0ffb0fb81c06b37543f"
[[package]]
name = "strength_reduce"
version = "0.2.4"
@@ -5392,6 +5677,28 @@ version = "0.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7da8b5736845d9f2fcb837ea5d9e2628564b3b043a70948a3f0b778838c5fb4f"
[[package]]
name = "strum"
version = "0.26.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8fec0f0aef304996cf250b31b5a10dee7980c85da9d759361292b8bca5a18f06"
dependencies = [
"strum_macros",
]
[[package]]
name = "strum_macros"
version = "0.26.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4c6bee85a5a24955dc440386795aa378cd9cf82acd5f764469152d2270e581be"
dependencies = [
"heck 0.5.0",
"proc-macro2",
"quote",
"rustversion",
"syn 2.0.117",
]
[[package]]
name = "subtle"
version = "2.6.1"
@@ -5844,7 +6151,7 @@ dependencies = [
"fastrand",
"getrandom 0.4.2",
"once_cell",
"rustix",
"rustix 1.1.4",
"windows-sys 0.61.2",
]
@@ -6481,6 +6788,29 @@ version = "1.13.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9629274872b2bfaf8d66f5f15725007f635594914870f65218920345aa11aa8c"
[[package]]
name = "unicode-truncate"
version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b3644627a5af5fa321c95b9b235a72fd24cd29c648c2c379431e6628655627bf"
dependencies = [
"itertools",
"unicode-segmentation",
"unicode-width 0.1.14",
]
[[package]]
name = "unicode-width"
version = "0.1.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7dd6e30e90baa6f72411720665d41d89b9a3d039dc45b8faea1ddd07f617f6af"
[[package]]
name = "unicode-width"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1fc81956842c57dac11422a97c3b8195a1ff727f06e85c84ed2e8aa277c9a0fd"
[[package]]
name = "unicode-xid"
version = "0.2.6"
@@ -6540,6 +6870,12 @@ version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6c140620e7ffbb22c2dee59cafe6084a59b5ffc27a8859a5f0d494b5d52b6be"
[[package]]
name = "utf8parse"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821"
[[package]]
name = "uuid"
version = "1.23.0"
@@ -7664,13 +8000,18 @@ dependencies = [
"async-trait",
"bytes",
"chrono",
"clap 4.6.0",
"coreaudio-rs",
"cpal",
"crossterm",
"if-addrs",
"libc",
"rand 0.8.5",
"ratatui",
"rustls",
"serde",
"serde_json",
"socket2 0.5.10",
"tokio",
"tracing",
"tracing-subscriber",
@@ -7785,6 +8126,8 @@ dependencies = [
"axum 0.7.9",
"bytes",
"chrono",
"clap 4.6.0",
"dashmap",
"dirs",
"futures-util",
"prometheus",
@@ -7942,7 +8285,7 @@ dependencies = [
"hex",
"libc",
"ordered-stream",
"rustix",
"rustix 1.1.4",
"serde",
"serde_repr",
"tracing",

View File

@@ -1209,6 +1209,15 @@ async fn run_call(
stats.room_participant_count = count;
stats.room_participants = members;
}
Ok(Some(SignalMessage::QualityDirective { recommended_profile, reason })) => {
let idx = profile_to_index(&recommended_profile);
info!(
codec = ?recommended_profile.codec,
reason = reason.as_deref().unwrap_or(""),
"relay quality directive: switching profile"
);
pending_profile_recv.store(idx, Ordering::Release);
}
Ok(Some(msg)) => {
info!("signal received: {:?}", std::mem::discriminant(&msg));
}

View File

@@ -21,6 +21,9 @@ anyhow = "1"
serde = { workspace = true }
serde_json = "1"
chrono = "0.4"
clap = { version = "4", features = ["derive"] }
ratatui = "0.29"
crossterm = "0.28"
rustls = { version = "0.23", default-features = false, features = ["ring", "std"] }
cpal = { version = "0.15", optional = true }
libc = "0.2"
@@ -30,6 +33,8 @@ libc = "0.2"
# through the WAN reflex addr (which many consumer NATs, including
# MikroTik's default masquerade, don't support).
if-addrs = "0.13"
rand = { workspace = true }
socket2 = "0.5"
# coreaudio-rs is Apple-framework-only; gate it to macOS so enabling
# the `vpio` feature from a non-macOS target builds cleanly instead of
@@ -99,6 +104,10 @@ linux-aec = ["dep:webrtc-audio-processing"]
name = "wzp-client"
path = "src/cli.rs"
[[bin]]
name = "wzp-analyzer"
path = "src/analyzer.rs"
[[bin]]
name = "wzp-bench"
path = "src/bench_cli.rs"

View File

@@ -0,0 +1,952 @@
//! WarzonePhone Protocol Analyzer — passive call quality observer.
//!
//! Joins a relay room as a passive participant (no media sent) and displays
//! real-time per-participant quality metrics in a terminal UI.
//!
//! Usage:
//! wzp-analyzer 127.0.0.1:4433 --room test
//! wzp-analyzer 1.2.3.4:4433 --room test --capture session.wzp
//! wzp-analyzer 1.2.3.4:4433 --room test --no-tui --duration 60
use std::io::Write;
use std::sync::Arc;
use std::time::{Duration, Instant};
use clap::Parser;
use tracing::info;
use wzp_proto::{CodecId, MediaPacket, MediaTransport};
// ---------------------------------------------------------------------------
// CLI
// ---------------------------------------------------------------------------
/// WarzonePhone Protocol Analyzer — passive call quality observer
#[derive(Parser)]
#[command(name = "wzp-analyzer", version)]
struct Args {
/// Relay address (host:port) — required for live mode, ignored with --replay
relay: Option<String>,
/// Room name to observe — required for live mode, ignored with --replay
#[arg(short, long)]
room: Option<String>,
/// Auth token for relay
#[arg(long)]
token: Option<String>,
/// Identity seed (64-char hex)
#[arg(long)]
seed: Option<String>,
/// Capture packets to file
#[arg(long)]
capture: Option<String>,
/// Auto-stop after N seconds
#[arg(long)]
duration: Option<u64>,
/// Disable TUI (print stats to stdout instead)
#[arg(long)]
no_tui: bool,
/// Replay a captured .wzp file (offline analysis)
#[arg(long)]
replay: Option<String>,
/// Generate HTML report (from live session or replay)
#[arg(long)]
html: Option<String>,
/// Session key hex for decrypting payloads (enables audio decode)
// TODO(#17): Audio decode requires session key + nonce context.
// In SFU mode, payloads are E2E encrypted. Decoding requires
// either: (a) session key from both endpoints, or (b) running
// the analyzer as a trusted participant with its own key exchange.
// For now, header-only analysis provides loss%, jitter, codec stats.
#[arg(long)]
key: Option<String>,
}
// ---------------------------------------------------------------------------
// Per-participant statistics
// ---------------------------------------------------------------------------
struct ParticipantStats {
/// Stream identifier (index, assigned when we detect a new seq stream)
stream_id: usize,
/// Display name from RoomUpdate (if available)
alias: Option<String>,
/// Current codec
codec: CodecId,
/// Total packets received
packets: u64,
/// Detected lost packets (sequence gaps)
lost: u64,
/// Last seen sequence number
last_seq: u16,
/// Whether we've seen the first packet (for gap detection)
seq_initialized: bool,
/// EWMA jitter in ms
jitter_ms: f64,
/// Last packet arrival time
last_arrival: Option<Instant>,
/// Codec changes observed
codec_switches: u32,
/// First packet time
first_seen: Instant,
/// Last packet time
last_seen: Instant,
}
impl ParticipantStats {
fn new(id: usize, codec: CodecId) -> Self {
let now = Instant::now();
Self {
stream_id: id,
alias: None,
codec,
packets: 0,
lost: 0,
last_seq: 0,
seq_initialized: false,
jitter_ms: 0.0,
last_arrival: None,
codec_switches: 0,
first_seen: now,
last_seen: now,
}
}
fn ingest(&mut self, pkt: &MediaPacket, now: Instant) {
self.packets += 1;
self.last_seen = now;
// Codec switch detection
if pkt.header.codec_id != self.codec {
self.codec_switches += 1;
self.codec = pkt.header.codec_id;
}
// Loss detection from sequence gaps
if self.seq_initialized {
let expected = self.last_seq.wrapping_add(1);
let gap = pkt.header.seq.wrapping_sub(expected);
if gap > 0 && gap < 100 {
self.lost += gap as u64;
}
}
self.last_seq = pkt.header.seq;
self.seq_initialized = true;
// Jitter (inter-arrival time variance, EWMA)
if let Some(last) = self.last_arrival {
let interval_ms = now.duration_since(last).as_secs_f64() * 1000.0;
let expected_ms = pkt.header.codec_id.frame_duration_ms() as f64;
let diff = (interval_ms - expected_ms).abs();
self.jitter_ms = 0.1 * diff + 0.9 * self.jitter_ms;
}
self.last_arrival = Some(now);
}
fn loss_percent(&self) -> f64 {
let total = self.packets + self.lost;
if total == 0 {
0.0
} else {
(self.lost as f64 / total as f64) * 100.0
}
}
fn duration(&self) -> Duration {
self.last_seen.duration_since(self.first_seen)
}
fn display_name(&self) -> String {
self.alias
.as_deref()
.map(String::from)
.unwrap_or_else(|| format!("Stream {}", self.stream_id))
}
}
// ---------------------------------------------------------------------------
// Participant identification by sequence stream
// ---------------------------------------------------------------------------
/// Find the participant whose sequence counter is close to `seq`, or create a
/// new one. Each sender has an independent wrapping u16 counter, so we can
/// distinguish streams by proximity of consecutive sequence numbers.
fn find_or_create_participant(
participants: &mut Vec<ParticipantStats>,
seq: u16,
codec: CodecId,
) -> usize {
for (i, p) in participants.iter().enumerate() {
if p.seq_initialized {
let delta = seq.wrapping_sub(p.last_seq);
if delta > 0 && delta < 50 {
return i;
}
}
}
// New stream detected
let id = participants.len();
participants.push(ParticipantStats::new(id, codec));
id
}
// ---------------------------------------------------------------------------
// Capture writer (binary packet log for later replay)
// ---------------------------------------------------------------------------
struct CaptureWriter {
file: std::io::BufWriter<std::fs::File>,
start: Instant,
}
impl CaptureWriter {
fn new(path: &str, room: &str, relay: &str) -> anyhow::Result<Self> {
let file = std::fs::File::create(path)?;
let mut writer = std::io::BufWriter::new(file);
// Magic + version
writer.write_all(b"WZP\x01")?;
let header = serde_json::json!({
"room": room,
"relay": relay,
"start_time": chrono::Utc::now().to_rfc3339(),
"version": 1,
});
let header_bytes = serde_json::to_vec(&header)?;
writer.write_all(&(header_bytes.len() as u32).to_le_bytes())?;
writer.write_all(&header_bytes)?;
Ok(Self {
file: writer,
start: Instant::now(),
})
}
fn write_packet(&mut self, pkt: &MediaPacket, now: Instant) -> anyhow::Result<()> {
let elapsed_us = now.duration_since(self.start).as_micros() as u64;
self.file.write_all(&elapsed_us.to_le_bytes())?;
let raw = pkt.to_bytes();
self.file.write_all(&(raw.len() as u32).to_le_bytes())?;
self.file.write_all(&raw)?;
Ok(())
}
}
// ---------------------------------------------------------------------------
// Capture reader (for replay mode)
// ---------------------------------------------------------------------------
struct CaptureReader {
reader: std::io::BufReader<std::fs::File>,
header: serde_json::Value,
}
impl CaptureReader {
fn open(path: &str) -> anyhow::Result<Self> {
use std::io::Read;
let file = std::fs::File::open(path)?;
let mut reader = std::io::BufReader::new(file);
// Read magic
let mut magic = [0u8; 4];
reader.read_exact(&mut magic)?;
anyhow::ensure!(&magic == b"WZP\x01", "not a WZP capture file");
// Read header
let mut len_buf = [0u8; 4];
reader.read_exact(&mut len_buf)?;
let header_len = u32::from_le_bytes(len_buf) as usize;
let mut header_bytes = vec![0u8; header_len];
reader.read_exact(&mut header_bytes)?;
let header: serde_json::Value = serde_json::from_slice(&header_bytes)?;
Ok(Self { reader, header })
}
fn next_packet(&mut self) -> anyhow::Result<Option<(u64, MediaPacket)>> {
use std::io::Read;
// Read timestamp
let mut ts_buf = [0u8; 8];
match self.reader.read_exact(&mut ts_buf) {
Ok(()) => {}
Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => return Ok(None),
Err(e) => return Err(e.into()),
}
let timestamp_us = u64::from_le_bytes(ts_buf);
// Read packet
let mut len_buf = [0u8; 4];
self.reader.read_exact(&mut len_buf)?;
let pkt_len = u32::from_le_bytes(len_buf) as usize;
let mut pkt_bytes = vec![0u8; pkt_len];
self.reader.read_exact(&mut pkt_bytes)?;
let pkt = MediaPacket::from_bytes(bytes::Bytes::from(pkt_bytes))
.ok_or_else(|| anyhow::anyhow!("malformed packet in capture"))?;
Ok(Some((timestamp_us, pkt)))
}
}
// ---------------------------------------------------------------------------
// Timeline entry (for HTML report generation)
// ---------------------------------------------------------------------------
struct TimelineEntry {
timestamp_us: u64,
stream_id: usize,
#[allow(dead_code)]
codec: CodecId,
#[allow(dead_code)]
seq: u16,
#[allow(dead_code)]
payload_len: usize,
loss_pct: f64,
jitter_ms: f64,
}
// ---------------------------------------------------------------------------
// Replay mode (#15)
// ---------------------------------------------------------------------------
async fn run_replay(path: &str, args: &Args) -> anyhow::Result<()> {
let mut reader = CaptureReader::open(path)?;
eprintln!(
"Replaying: {} (room: {})",
path,
reader
.header
.get("room")
.and_then(|v| v.as_str())
.unwrap_or("?")
);
let mut participants: Vec<ParticipantStats> = Vec::new();
let mut total_packets: u64 = 0;
let start = Instant::now();
let mut timeline: Vec<TimelineEntry> = Vec::new();
// Decrypt session from --key (optional)
let mut decrypt_session: Option<wzp_crypto::ChaChaSession> = args.key.as_ref().and_then(|hex| {
if hex.len() != 64 { return None; }
let mut key = [0u8; 32];
for (i, chunk) in hex.as_bytes().chunks(2).enumerate() {
let s = std::str::from_utf8(chunk).unwrap_or("00");
key[i] = u8::from_str_radix(s, 16).unwrap_or(0);
}
Some(wzp_crypto::ChaChaSession::new(key))
});
let mut decrypt_ok: u64 = 0;
let mut decrypt_fail: u64 = 0;
while let Some((ts_us, pkt)) = reader.next_packet()? {
let now = Instant::now();
let idx = find_or_create_participant(&mut participants, pkt.header.seq, pkt.header.codec_id);
participants[idx].ingest(&pkt, now);
total_packets += 1;
// Attempt decryption if key provided
if let Some(ref mut session) = decrypt_session {
use wzp_proto::CryptoSession;
let header_bytes = pkt.header.to_bytes();
let mut plaintext = Vec::new();
match session.decrypt(&header_bytes, &pkt.payload, &mut plaintext) {
Ok(()) => {
decrypt_ok += 1;
if decrypt_ok <= 5 || decrypt_ok % 100 == 0 {
eprintln!(
" decrypt ok: seq={} codec={:?} payload={}B → plaintext={}B",
pkt.header.seq, pkt.header.codec_id,
pkt.payload.len(), plaintext.len()
);
}
}
Err(_) => {
decrypt_fail += 1;
if decrypt_fail <= 3 {
eprintln!(
" decrypt FAIL: seq={} (key mismatch, wrong direction, or rekey boundary)",
pkt.header.seq
);
}
}
}
}
// Record for HTML timeline
timeline.push(TimelineEntry {
timestamp_us: ts_us,
stream_id: idx,
codec: pkt.header.codec_id,
seq: pkt.header.seq,
payload_len: pkt.payload.len(),
loss_pct: participants[idx].loss_percent(),
jitter_ms: participants[idx].jitter_ms,
});
}
if decrypt_session.is_some() {
eprintln!(
"Decrypt stats: {} ok, {} failed (total {})",
decrypt_ok, decrypt_fail, total_packets
);
}
print_summary(&participants, total_packets, start.elapsed());
// Generate HTML if requested
if let Some(html_path) = &args.html {
generate_html_report(html_path, &participants, &timeline, total_packets, &reader.header)?;
eprintln!("HTML report: {}", html_path);
}
Ok(())
}
// ---------------------------------------------------------------------------
// HTML report generation (#16)
// ---------------------------------------------------------------------------
fn generate_html_report(
path: &str,
participants: &[ParticipantStats],
timeline: &[TimelineEntry],
total_packets: u64,
capture_header: &serde_json::Value,
) -> anyhow::Result<()> {
use std::io::Write as _;
let mut f = std::fs::File::create(path)?;
let room = capture_header
.get("room")
.and_then(|v| v.as_str())
.unwrap_or("unknown");
let start_time = capture_header
.get("start_time")
.and_then(|v| v.as_str())
.unwrap_or("?");
// Build per-stream loss/jitter timeline data for Chart.js
// Sample every 1 second (group timeline entries by second)
let max_ts = timeline.last().map(|e| e.timestamp_us).unwrap_or(0);
let duration_secs = (max_ts / 1_000_000) + 1;
let mut loss_data: std::collections::HashMap<usize, Vec<f64>> =
std::collections::HashMap::new();
let mut jitter_data: std::collections::HashMap<usize, Vec<f64>> =
std::collections::HashMap::new();
for stream_id in 0..participants.len() {
loss_data.insert(stream_id, vec![0.0; duration_secs as usize]);
jitter_data.insert(stream_id, vec![0.0; duration_secs as usize]);
}
for entry in timeline {
let sec = (entry.timestamp_us / 1_000_000) as usize;
if sec < duration_secs as usize {
if let Some(losses) = loss_data.get_mut(&entry.stream_id) {
losses[sec] = entry.loss_pct;
}
if let Some(jitters) = jitter_data.get_mut(&entry.stream_id) {
jitters[sec] = entry.jitter_ms;
}
}
}
let colors = [
"#e74c3c", "#3498db", "#2ecc71", "#f39c12", "#9b59b6", "#1abc9c",
];
// Build dataset JSON for charts
let mut loss_datasets = String::new();
let mut jitter_datasets = String::new();
for (i, p) in participants.iter().enumerate() {
let name = p.display_name();
let color = colors[i % colors.len()];
let loss_vals = loss_data
.get(&i)
.map(|v| format!("{:?}", v))
.unwrap_or_default();
let jitter_vals = jitter_data
.get(&i)
.map(|v| format!("{:?}", v))
.unwrap_or_default();
loss_datasets.push_str(&format!(
"{{ label: '{}', data: {}, borderColor: '{}', fill: false }},\n",
name, loss_vals, color
));
jitter_datasets.push_str(&format!(
"{{ label: '{}', data: {}, borderColor: '{}', fill: false }},\n",
name, jitter_vals, color
));
}
let labels: Vec<String> = (0..duration_secs).map(|s| format!("{}s", s)).collect();
let labels_json = format!("{:?}", labels);
// Summary table rows
let mut summary_rows = String::new();
for p in participants {
summary_rows.push_str(&format!(
"<tr><td>{}</td><td>{:?}</td><td>{}</td><td>{:.1}%</td><td>{:.0}ms</td><td>{}</td></tr>\n",
p.display_name(),
p.codec,
p.packets,
p.loss_percent(),
p.jitter_ms,
p.codec_switches
));
}
write!(
f,
r#"<!DOCTYPE html>
<html><head>
<meta charset="utf-8">
<title>WZP Call Report — {room}</title>
<script src="https://cdn.jsdelivr.net/npm/chart.js@4"></script>
<style>
body {{ font-family: -apple-system, sans-serif; max-width: 1200px; margin: 0 auto; padding: 20px; background: #1a1a2e; color: #e0e0e0; }}
h1,h2 {{ color: #4a9eff; }}
table {{ border-collapse: collapse; width: 100%; margin: 20px 0; }}
th,td {{ border: 1px solid #333; padding: 8px 12px; text-align: left; }}
th {{ background: #16213e; }}
tr:nth-child(even) {{ background: #1a1a3e; }}
.chart-container {{ background: #16213e; border-radius: 8px; padding: 16px; margin: 20px 0; }}
canvas {{ max-height: 300px; }}
.meta {{ color: #888; font-size: 0.9em; }}
</style>
</head><body>
<h1>WZP Call Quality Report</h1>
<p class="meta">Room: <b>{room}</b> | Start: {start_time} | Packets: {total_packets} | Duration: {duration_secs}s</p>
<h2>Participant Summary</h2>
<table>
<tr><th>Name</th><th>Codec</th><th>Packets</th><th>Loss</th><th>Jitter</th><th>Codec Switches</th></tr>
{summary_rows}
</table>
<h2>Packet Loss Over Time</h2>
<div class="chart-container"><canvas id="lossChart"></canvas></div>
<h2>Jitter Over Time</h2>
<div class="chart-container"><canvas id="jitterChart"></canvas></div>
<script>
const labels = {labels_json};
new Chart(document.getElementById('lossChart'), {{
type: 'line',
data: {{ labels, datasets: [{loss_datasets}] }},
options: {{ responsive: true, scales: {{ y: {{ beginAtZero: true, title: {{ display: true, text: 'Loss %' }} }} }} }}
}});
new Chart(document.getElementById('jitterChart'), {{
type: 'line',
data: {{ labels, datasets: [{jitter_datasets}] }},
options: {{ responsive: true, scales: {{ y: {{ beginAtZero: true, title: {{ display: true, text: 'Jitter (ms)' }} }} }} }}
}});
</script>
</body></html>"#
)?;
Ok(())
}
// ---------------------------------------------------------------------------
// No-TUI mode (print stats to stdout periodically)
// ---------------------------------------------------------------------------
async fn run_no_tui(
transport: &wzp_transport::QuinnTransport,
participants: &mut Vec<ParticipantStats>,
total_packets: &mut u64,
deadline: Option<Instant>,
mut capture_writer: Option<&mut CaptureWriter>,
) -> anyhow::Result<()> {
let mut print_timer = Instant::now();
loop {
if let Some(dl) = deadline {
if Instant::now() > dl {
break;
}
}
match tokio::time::timeout(Duration::from_millis(100), transport.recv_media()).await {
Ok(Ok(Some(pkt))) => {
let now = Instant::now();
let idx =
find_or_create_participant(participants, pkt.header.seq, pkt.header.codec_id);
participants[idx].ingest(&pkt, now);
*total_packets += 1;
if let Some(ref mut w) = capture_writer {
w.write_packet(&pkt, now)?;
}
}
Ok(Ok(None)) => break, // connection closed
Ok(Err(e)) => {
tracing::warn!("recv error: {e}");
break;
}
Err(_) => {} // timeout, loop again
}
if print_timer.elapsed() >= Duration::from_secs(2) {
print_stats(participants, *total_packets);
print_timer = Instant::now();
}
}
Ok(())
}
fn print_stats(participants: &[ParticipantStats], total: u64) {
eprintln!("--- {} participants | {} total packets ---", participants.len(), total);
for p in participants {
eprintln!(
" {}: {} pkts, {:.1}% loss, {:.0}ms jitter, {:?}, {:.0}s",
p.display_name(),
p.packets,
p.loss_percent(),
p.jitter_ms,
p.codec,
p.duration().as_secs_f64(),
);
}
}
// ---------------------------------------------------------------------------
// TUI mode (ratatui + crossterm)
// ---------------------------------------------------------------------------
async fn run_tui(
transport: &wzp_transport::QuinnTransport,
participants: &mut Vec<ParticipantStats>,
total_packets: &mut u64,
start_time: Instant,
deadline: Option<Instant>,
mut capture_writer: Option<&mut CaptureWriter>,
) -> anyhow::Result<()> {
crossterm::terminal::enable_raw_mode()?;
let mut stdout = std::io::stdout();
crossterm::execute!(stdout, crossterm::terminal::EnterAlternateScreen)?;
let backend = ratatui::backend::CrosstermBackend::new(stdout);
let mut terminal = ratatui::Terminal::new(backend)?;
let mut redraw_timer = Instant::now();
let result: anyhow::Result<()> = async {
loop {
// Check for quit key (q or Ctrl+C)
if crossterm::event::poll(Duration::from_millis(0))? {
if let crossterm::event::Event::Key(key) = crossterm::event::read()? {
use crossterm::event::{KeyCode, KeyModifiers};
if key.code == KeyCode::Char('q')
|| (key.code == KeyCode::Char('c')
&& key.modifiers.contains(KeyModifiers::CONTROL))
{
break;
}
}
}
if let Some(dl) = deadline {
if Instant::now() > dl {
break;
}
}
// Receive packets (non-blocking with short timeout)
match tokio::time::timeout(Duration::from_millis(20), transport.recv_media()).await {
Ok(Ok(Some(pkt))) => {
let now = Instant::now();
let idx = find_or_create_participant(
participants,
pkt.header.seq,
pkt.header.codec_id,
);
participants[idx].ingest(&pkt, now);
*total_packets += 1;
if let Some(ref mut w) = capture_writer {
w.write_packet(&pkt, now)?;
}
}
Ok(Ok(None)) => break,
Ok(Err(e)) => {
tracing::warn!("recv error: {e}");
break;
}
Err(_) => {}
}
// Redraw TUI at ~10 FPS
if redraw_timer.elapsed() >= Duration::from_millis(100) {
terminal.draw(|f| draw_ui(f, participants, *total_packets, start_time))?;
redraw_timer = Instant::now();
}
}
Ok(())
}
.await;
// Always restore terminal, even on error
crossterm::terminal::disable_raw_mode()?;
crossterm::execute!(
std::io::stdout(),
crossterm::terminal::LeaveAlternateScreen
)?;
result
}
fn draw_ui(
f: &mut ratatui::Frame,
participants: &[ParticipantStats],
total_packets: u64,
start_time: Instant,
) {
use ratatui::layout::{Constraint, Direction, Layout};
use ratatui::style::{Color, Modifier, Style};
use ratatui::widgets::{Block, Borders, Paragraph, Row, Table};
let elapsed = start_time.elapsed();
let elapsed_str = format!(
"{:02}:{:02}:{:02}",
elapsed.as_secs() / 3600,
(elapsed.as_secs() % 3600) / 60,
elapsed.as_secs() % 60
);
let chunks = Layout::default()
.direction(Direction::Vertical)
.constraints([
Constraint::Length(3), // header
Constraint::Min(5), // participant table
Constraint::Length(3), // footer
])
.split(f.area());
// Header
let header = Paragraph::new(format!(
" WZP Analyzer | {} participants | {} packets | {}",
participants.len(),
total_packets,
elapsed_str
))
.block(Block::default().borders(Borders::ALL).title(" Protocol Analyzer "));
f.render_widget(header, chunks[0]);
// Participant table
let header_row = Row::new(vec![
"#", "Name", "Codec", "Packets", "Loss%", "Jitter", "Switches", "Duration",
])
.style(Style::default().add_modifier(Modifier::BOLD));
let rows: Vec<Row> = participants
.iter()
.map(|p| {
let loss_color = if p.loss_percent() > 5.0 {
Color::Red
} else if p.loss_percent() > 1.0 {
Color::Yellow
} else {
Color::Green
};
Row::new(vec![
format!("{}", p.stream_id),
p.display_name(),
format!("{:?}", p.codec),
format!("{}", p.packets),
format!("{:.1}%", p.loss_percent()),
format!("{:.0}ms", p.jitter_ms),
format!("{}", p.codec_switches),
format!("{:.0}s", p.duration().as_secs_f64()),
])
.style(Style::default().fg(loss_color))
})
.collect();
let widths = [
Constraint::Length(3), // #
Constraint::Length(20), // Name
Constraint::Length(12), // Codec
Constraint::Length(10), // Packets
Constraint::Length(8), // Loss%
Constraint::Length(10), // Jitter
Constraint::Length(10), // Switches
Constraint::Length(10), // Duration
];
let table = Table::new(rows, widths)
.header(header_row)
.block(Block::default().borders(Borders::ALL).title(" Participants "));
f.render_widget(table, chunks[1]);
// Footer
let footer =
Paragraph::new(" Press 'q' to quit ").block(Block::default().borders(Borders::ALL));
f.render_widget(footer, chunks[2]);
}
// ---------------------------------------------------------------------------
// Summary (printed on exit)
// ---------------------------------------------------------------------------
fn print_summary(participants: &[ParticipantStats], total: u64, elapsed: Duration) {
eprintln!("\n=== Session Summary ===");
eprintln!(
"Duration: {:.1}s | Total packets: {} | Participants: {}",
elapsed.as_secs_f64(),
total,
participants.len()
);
for p in participants {
eprintln!(
" {}: {} pkts, {:.1}% loss, {:.0}ms jitter, {:?}, {} codec switches",
p.display_name(),
p.packets,
p.loss_percent(),
p.jitter_ms,
p.codec,
p.codec_switches,
);
}
}
// ---------------------------------------------------------------------------
// main
// ---------------------------------------------------------------------------
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let args = Args::parse();
// Only init tracing subscriber in no-tui mode (it would corrupt the TUI otherwise)
if args.no_tui || args.replay.is_some() {
tracing_subscriber::fmt().init();
}
let _crypto_session: Option<std::sync::Mutex<wzp_crypto::ChaChaSession>> =
if let Some(ref key_hex) = args.key {
if key_hex.len() != 64 {
eprintln!("Error: --key must be 64 hex characters (32 bytes). Got {} chars.", key_hex.len());
std::process::exit(1);
}
let mut key_bytes = [0u8; 32];
for (i, chunk) in key_hex.as_bytes().chunks(2).enumerate() {
let hex_str = std::str::from_utf8(chunk).unwrap_or("00");
key_bytes[i] = u8::from_str_radix(hex_str, 16).unwrap_or(0);
}
eprintln!("Encrypted payload decoding enabled (key loaded).");
Some(std::sync::Mutex::new(
wzp_crypto::ChaChaSession::new(key_bytes),
))
} else {
None
};
// Replay mode: offline analysis of a .wzp capture file
if let Some(ref replay_path) = args.replay {
return run_replay(replay_path, &args).await;
}
// Live mode requires relay and room
let relay = args
.relay
.as_deref()
.ok_or_else(|| anyhow::anyhow!("relay address required for live mode (use --replay for offline)"))?;
let room = args
.room
.as_deref()
.ok_or_else(|| anyhow::anyhow!("--room required for live mode (use --replay for offline)"))?;
// TLS crypto provider
let _ = rustls::crypto::ring::default_provider().install_default();
// Identity seed
let seed = match &args.seed {
Some(hex) => {
let s = wzp_crypto::Seed::from_hex(hex).map_err(|e| anyhow::anyhow!(e))?;
info!(fingerprint = %s.derive_identity().public_identity().fingerprint, "identity from --seed");
s
}
None => {
let s = wzp_crypto::Seed::generate();
info!(fingerprint = %s.derive_identity().public_identity().fingerprint, "generated ephemeral identity");
s
}
};
// Connect to relay
let relay_addr: std::net::SocketAddr = relay.parse()?;
let bind_addr: std::net::SocketAddr = if relay_addr.is_ipv6() {
"[::]:0".parse()?
} else {
"0.0.0.0:0".parse()?
};
let endpoint = wzp_transport::create_endpoint(bind_addr, None)?;
let client_config = wzp_transport::client_config();
let conn = wzp_transport::connect(&endpoint, relay_addr, room, client_config).await?;
let transport = Arc::new(wzp_transport::QuinnTransport::new(conn));
// Crypto handshake
let _crypto_session =
wzp_client::handshake::perform_handshake(&*transport, &seed.0, Some("analyzer")).await?;
// Auth if token provided
if let Some(ref token) = args.token {
let auth = wzp_proto::SignalMessage::AuthToken {
token: token.clone(),
};
transport.send_signal(&auth).await?;
}
// Capture file (optional)
let mut capture_writer = args
.capture
.as_ref()
.map(|path| CaptureWriter::new(path, room, relay))
.transpose()?;
// Duration timeout
let deadline = args
.duration
.map(|s| Instant::now() + Duration::from_secs(s));
// State
let mut participants: Vec<ParticipantStats> = Vec::new();
let mut total_packets: u64 = 0;
let start_time = Instant::now();
if args.no_tui {
run_no_tui(
&transport,
&mut participants,
&mut total_packets,
deadline,
capture_writer.as_mut(),
)
.await?;
} else {
run_tui(
&transport,
&mut participants,
&mut total_packets,
start_time,
deadline,
capture_writer.as_mut(),
)
.await?;
}
// Print summary
print_summary(&participants, total_packets, start_time.elapsed());
// Clean close
transport.close().await?;
Ok(())
}

View File

@@ -0,0 +1,350 @@
//! Birthday attack for hard NAT traversal.
//!
//! When both peers are behind symmetric NATs with random port
//! allocation, standard hole-punching fails because neither side
//! can predict the other's external port. This module implements
//! the birthday-paradox approach:
//!
//! 1. **Acceptor** opens N sockets, STUN-probes each to learn
//! their external ports, reports them to the Dialer.
//! 2. **Dialer** sprays QUIC connect attempts to the Acceptor's
//! reported ports + random ports on the Acceptor's IP.
//! 3. Birthday paradox: with N=64 ports and M=256 probes across
//! 65536 ports, collision probability is high.
//!
//! In practice, the Acceptor's STUN-probed ports are known
//! exactly (not random), so the Dialer targets them first —
//! making this more like "spray-and-pray with a hit list" than
//! a pure birthday attack.
use std::net::{Ipv4Addr, SocketAddr};
use std::time::{Duration, Instant};
use crate::stun;
/// Configuration for the birthday attack.
#[derive(Debug, Clone)]
pub struct BirthdayConfig {
/// Number of sockets the Acceptor opens (default: 32).
/// Each socket gets STUN-probed to learn its external port.
/// More = higher chance of collision, but more resource usage.
pub acceptor_ports: u16,
/// Number of QUIC connect attempts the Dialer makes (default: 128).
/// Spread across the Acceptor's known ports + random ports.
pub dialer_probes: u16,
/// Rate limit: ms between consecutive probes (default: 20ms = 50/s).
pub probe_interval_ms: u16,
/// Overall timeout for the birthday attack phase.
pub timeout: Duration,
/// STUN config for probing external ports.
pub stun_config: stun::StunConfig,
}
impl Default for BirthdayConfig {
fn default() -> Self {
Self {
acceptor_ports: 32,
dialer_probes: 128,
probe_interval_ms: 20,
timeout: Duration::from_secs(8),
stun_config: stun::StunConfig {
servers: vec!["stun.l.google.com:19302".into()],
timeout: Duration::from_secs(2),
},
}
}
}
/// Result of the Acceptor's port-opening phase.
#[derive(Debug, Clone, serde::Serialize)]
pub struct AcceptorPorts {
/// External IP (from STUN).
pub external_ip: Option<Ipv4Addr>,
/// List of (local_port, external_port) for each opened socket.
pub ports: Vec<PortMapping>,
/// How many sockets we attempted to open.
pub attempted: u16,
/// How many STUN probes succeeded.
pub succeeded: u16,
}
/// A single socket's local↔external port mapping.
#[derive(Debug, Clone, serde::Serialize)]
pub struct PortMapping {
pub local_port: u16,
pub external_port: u16,
}
/// Open N sockets and STUN-probe each to discover external ports.
///
/// Returns the set of known external ports that the Dialer should
/// target. Each socket stays open (bound) so the NAT mapping
/// remains active until the returned `PortGuard` is dropped.
///
/// The sockets are returned so the caller can keep them alive
/// during the attack. Dropping them closes the NAT pinholes.
pub async fn open_acceptor_ports(
config: &BirthdayConfig,
) -> (AcceptorPorts, Vec<tokio::net::UdpSocket>) {
let mut sockets = Vec::new();
let mut mappings = Vec::new();
let mut external_ip: Option<Ipv4Addr> = None;
let mut succeeded: u16 = 0;
let stun_server = match config.stun_config.servers.first() {
Some(s) => match stun::resolve_stun_server(s).await {
Ok(a) => Some(a),
Err(_) => None,
},
None => None,
};
for _ in 0..config.acceptor_ports {
// Bind to random port
let sock = match tokio::net::UdpSocket::bind("0.0.0.0:0").await {
Ok(s) => s,
Err(_) => continue,
};
let local_port = match sock.local_addr() {
Ok(a) => a.port(),
Err(_) => continue,
};
// STUN probe to learn external port
if let Some(stun_addr) = stun_server {
match stun::stun_reflect(&sock, stun_addr, config.stun_config.timeout).await {
Ok(ext_addr) => {
if external_ip.is_none() {
if let std::net::IpAddr::V4(ip) = ext_addr.ip() {
external_ip = Some(ip);
}
}
mappings.push(PortMapping {
local_port,
external_port: ext_addr.port(),
});
succeeded += 1;
}
Err(e) => {
tracing::debug!(local_port, error = %e, "birthday: STUN probe failed for socket");
}
}
}
sockets.push(sock);
}
tracing::info!(
attempted = config.acceptor_ports,
succeeded,
external_ip = ?external_ip,
"birthday: acceptor ports opened"
);
let result = AcceptorPorts {
external_ip,
ports: mappings,
attempted: config.acceptor_ports,
succeeded,
};
(result, sockets)
}
/// Generate the list of target addresses for the Dialer to spray.
///
/// Priority order:
/// 1. Acceptor's known external ports (from STUN probes) — highest hit rate
/// 2. Random ports on the Acceptor's IP — birthday paradox fill
pub fn generate_dialer_targets(
acceptor_ip: Ipv4Addr,
known_ports: &[u16],
total_probes: u16,
) -> Vec<SocketAddr> {
let mut targets = Vec::with_capacity(total_probes as usize);
// First: all known ports (guaranteed targets)
for &port in known_ports {
targets.push(SocketAddr::new(
std::net::IpAddr::V4(acceptor_ip),
port,
));
}
// Fill remaining with random ports (birthday attack)
let remaining = total_probes.saturating_sub(known_ports.len() as u16);
if remaining > 0 {
use rand::Rng;
let mut rng = rand::thread_rng();
for _ in 0..remaining {
let port = rng.gen_range(1024..=65535u16);
let addr = SocketAddr::new(
std::net::IpAddr::V4(acceptor_ip),
port,
);
if !targets.contains(&addr) {
targets.push(addr);
}
}
}
targets
}
/// Run the Dialer side of the birthday attack.
///
/// Sprays QUIC connection attempts at the target addresses.
/// Returns the first successful connection, or None on timeout.
pub async fn spray_dialer(
endpoint: &wzp_transport::Endpoint,
targets: &[SocketAddr],
call_sni: &str,
probe_interval: Duration,
timeout: Duration,
) -> Option<wzp_transport::QuinnTransport> {
let start = Instant::now();
let mut set = tokio::task::JoinSet::new();
tracing::info!(
target_count = targets.len(),
interval_ms = probe_interval.as_millis(),
timeout_s = timeout.as_secs(),
"birthday: dialer starting spray"
);
// Spray connects with rate limiting
for (idx, &target) in targets.iter().enumerate() {
if start.elapsed() >= timeout {
break;
}
let ep = endpoint.clone();
let sni = call_sni.to_string();
let client_cfg = wzp_transport::client_config();
set.spawn(async move {
let result = wzp_transport::connect(&ep, target, &sni, client_cfg).await;
(idx, target, result)
});
// Rate limit — don't blast the NAT
if idx < targets.len() - 1 {
tokio::time::sleep(probe_interval).await;
}
}
tracing::info!(
spawned = set.len(),
elapsed_ms = start.elapsed().as_millis(),
"birthday: all probes spawned, waiting for first success"
);
// Wait for first success or all failures
let deadline = start + timeout;
while let Some(join_res) = tokio::select! {
r = set.join_next() => r,
_ = tokio::time::sleep_until(tokio::time::Instant::from_std(deadline)) => None,
} {
match join_res {
Ok((idx, target, Ok(conn))) => {
tracing::info!(
idx,
%target,
remote = %conn.remote_address(),
elapsed_ms = start.elapsed().as_millis(),
"birthday: HIT! QUIC handshake succeeded"
);
set.abort_all();
return Some(wzp_transport::QuinnTransport::new(conn));
}
Ok((idx, target, Err(e))) => {
tracing::debug!(
idx,
%target,
error = %e,
"birthday: probe failed"
);
}
Err(_) => {}
}
}
tracing::info!(
elapsed_ms = start.elapsed().as_millis(),
"birthday: all probes failed or timed out"
);
None
}
// ── Tests ──────────────────────────────────────────────────────────
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn generate_targets_known_ports_first() {
let ip = Ipv4Addr::new(203, 0, 113, 5);
let known = vec![10000, 10001, 10002];
let targets = generate_dialer_targets(ip, &known, 10);
// Known ports should be first
assert_eq!(targets[0].port(), 10000);
assert_eq!(targets[1].port(), 10001);
assert_eq!(targets[2].port(), 10002);
// Rest are random
assert!(targets.len() <= 10);
// All target the right IP
assert!(targets.iter().all(|a| a.ip() == std::net::IpAddr::V4(ip)));
}
#[test]
fn generate_targets_no_known_all_random() {
let ip = Ipv4Addr::new(10, 0, 0, 1);
let targets = generate_dialer_targets(ip, &[], 50);
assert!(!targets.is_empty());
assert!(targets.len() <= 50);
// All ports in valid range
assert!(targets.iter().all(|a| a.port() >= 1024));
}
#[test]
fn generate_targets_more_known_than_total() {
let ip = Ipv4Addr::new(10, 0, 0, 1);
let known: Vec<u16> = (10000..10100).collect();
let targets = generate_dialer_targets(ip, &known, 50);
// All 100 known ports included even though total=50
assert_eq!(targets.len(), 100);
}
#[test]
fn generate_targets_dedup() {
let ip = Ipv4Addr::new(10, 0, 0, 1);
let targets = generate_dialer_targets(ip, &[], 100);
// No duplicates
let mut sorted = targets.clone();
sorted.sort();
sorted.dedup();
assert_eq!(sorted.len(), targets.len());
}
#[test]
fn default_config() {
let cfg = BirthdayConfig::default();
assert_eq!(cfg.acceptor_ports, 32);
assert_eq!(cfg.dialer_probes, 128);
assert!(cfg.timeout.as_secs() > 0);
}
#[test]
fn acceptor_ports_serializes() {
let result = AcceptorPorts {
external_ip: Some(Ipv4Addr::new(203, 0, 113, 5)),
ports: vec![PortMapping { local_port: 12345, external_port: 54321 }],
attempted: 32,
succeeded: 1,
};
let json = serde_json::to_string(&result).unwrap();
assert!(json.contains("54321"));
assert!(json.contains("203.0.113.5"));
}
}

View File

@@ -234,6 +234,8 @@ pub struct CallEncoder {
mini_frames_enabled: bool,
/// Frames encoded since the last full header was emitted.
frames_since_full: u32,
/// Pending quality report to attach to the next source packet.
pending_quality_report: Option<QualityReport>,
}
impl CallEncoder {
@@ -264,6 +266,7 @@ impl CallEncoder {
mini_context: MiniFrameContext::default(),
mini_frames_enabled: config.mini_frames_enabled,
frames_since_full: 0,
pending_quality_report: None,
}
}
@@ -367,7 +370,7 @@ impl CallEncoder {
version: 0,
is_repair: false,
codec_id: self.profile.codec,
has_quality_report: false,
has_quality_report: self.pending_quality_report.is_some(),
fec_ratio_encoded,
seq: self.seq,
timestamp: self.timestamp_ms,
@@ -377,7 +380,7 @@ impl CallEncoder {
csrc_count: 0,
},
payload: Bytes::from(encoded.clone()),
quality_report: None,
quality_report: self.pending_quality_report.take(),
};
self.seq = self.seq.wrapping_add(1);
@@ -454,6 +457,13 @@ impl CallEncoder {
self.audio_enc.set_expected_loss(tuning.expected_loss_pct);
}
/// Queue a quality report for attachment to the next source packet.
/// Used by the send task to embed locally-observed path quality so
/// the peer can drive adaptive quality switching.
pub fn set_pending_quality_report(&mut self, report: QualityReport) {
self.pending_quality_report = Some(report);
}
/// Enable or disable acoustic echo cancellation.
pub fn set_aec_enabled(&mut self, enabled: bool) {
self.aec.set_enabled(enabled);
@@ -1578,4 +1588,28 @@ mod tests {
let packets = enc.encode_frame(&pcm).unwrap();
assert!(!packets.is_empty());
}
#[test]
fn encoder_attaches_quality_report() {
let mut enc = CallEncoder::new(&CallConfig {
profile: QualityProfile::GOOD,
suppression_enabled: false,
..Default::default()
});
// Set a quality report
enc.set_pending_quality_report(QualityReport::from_path_stats(5.0, 80, 10));
// Encode a frame — should have quality_report attached
let pcm = voice_frame_20ms(0);
let packets = enc.encode_frame(&pcm).unwrap();
assert!(!packets.is_empty());
assert!(packets[0].header.has_quality_report, "first packet should have quality report");
assert!(packets[0].quality_report.is_some());
// Next frame should NOT have quality_report (it was consumed)
let packets2 = enc.encode_frame(&voice_frame_20ms(960)).unwrap();
assert!(!packets2[0].header.has_quality_report, "second packet should not have quality report");
assert!(packets2[0].quality_report.is_none());
}
}

View File

@@ -52,6 +52,8 @@ struct CliArgs {
signal: bool,
/// Place a direct call to a fingerprint (requires --signal).
call_target: Option<String>,
/// Run network diagnostic (STUN, port mapping, relay latencies).
netcheck: bool,
}
impl CliArgs {
@@ -97,6 +99,7 @@ fn parse_args() -> CliArgs {
let mut relay_str = None;
let mut signal = false;
let mut call_target = None;
let mut netcheck = false;
let mut i = 1;
while i < args.len() {
@@ -182,6 +185,7 @@ fn parse_args() -> CliArgs {
);
}
"--sweep" => sweep = true,
"--netcheck" => { netcheck = true; }
"--version-check" => { version_check = true; }
"--help" | "-h" => {
eprintln!("Usage: wzp-client [options] [relay-addr]");
@@ -238,6 +242,7 @@ fn parse_args() -> CliArgs {
version_check,
signal,
call_target,
netcheck,
}
}
@@ -256,6 +261,23 @@ async fn main() -> anyhow::Result<()> {
return Ok(());
}
// --netcheck: run network diagnostic and exit
if cli.netcheck {
let config = wzp_client::netcheck::NetcheckConfig {
stun_config: wzp_client::stun::StunConfig::default(),
relays: vec![
("relay".into(), cli.relay_addr),
],
timeout: std::time::Duration::from_secs(5),
test_portmap: true,
test_ipv6: true,
local_port: 0,
};
let report = wzp_client::netcheck::run_netcheck(&config).await;
print!("{}", wzp_client::netcheck::format_report(&report));
return Ok(());
}
// --version-check: query relay version over QUIC and exit
if cli.version_check {
let client_config = wzp_transport::client_config();
@@ -776,6 +798,7 @@ async fn run_signal_mode(
// relay-path.
caller_reflexive_addr: None,
caller_local_addrs: Vec::new(),
caller_mapped_addr: None,
caller_build_version: None,
}).await?;
}
@@ -810,13 +833,14 @@ async fn run_signal_mode(
// so callee addr stays hidden from the caller.
callee_reflexive_addr: None,
callee_local_addrs: Vec::new(),
callee_mapped_addr: None,
callee_build_version: None,
}).await;
}
SignalMessage::DirectCallAnswer { call_id, accept_mode, .. } => {
info!(call_id = %call_id, mode = ?accept_mode, "call answered");
}
SignalMessage::CallSetup { call_id, room, relay_addr: setup_relay, peer_direct_addr: _, peer_local_addrs: _ } => {
SignalMessage::CallSetup { call_id, room, relay_addr: setup_relay, peer_direct_addr: _, peer_local_addrs: _, peer_mapped_addr: _ } => {
info!(call_id = %call_id, room = %room, relay = %setup_relay, "call setup — connecting to media room");
// Connect to the media room

View File

@@ -38,6 +38,15 @@ pub enum WinningPath {
Relay,
}
/// Diagnostic info for a single candidate dial attempt.
#[derive(Debug, Clone, serde::Serialize)]
pub struct CandidateDiag {
pub index: usize,
pub addr: String,
pub result: String, // "ok", "skipped:ipv6", "error:..."
pub elapsed_ms: Option<u32>,
}
/// Phase 6: the race now returns BOTH transports (when available)
/// so the connect command can negotiate with the peer before
/// committing. The negotiation decides which transport to use
@@ -54,6 +63,8 @@ pub struct RaceResult {
/// Informational — the actual path used is decided by the
/// Phase 6 negotiation after both sides exchange reports.
pub local_winner: WinningPath,
/// Per-candidate diagnostic info for debugging.
pub candidate_diags: Vec<CandidateDiag>,
}
/// Attempt a direct QUIC connection to the peer in parallel with
@@ -88,19 +99,30 @@ pub struct PeerCandidates {
/// same-LAN pairs — direct dials to these bypass the NAT
/// entirely.
pub local: Vec<SocketAddr>,
/// Phase 8 (Tailscale-inspired): peer's port-mapped external
/// address from NAT-PMP/PCP/UPnP. When the router supports
/// port mapping, this gives a stable external address even
/// behind symmetric NATs.
pub mapped: Option<SocketAddr>,
}
impl PeerCandidates {
/// Flatten into the list of addrs the D-role should dial.
/// Order: LAN host candidates first (fastest when they
/// work), then reflexive (covers the non-LAN case).
/// work), then port-mapped (stable even behind symmetric
/// NATs), then reflexive (covers the non-LAN case).
pub fn dial_order(&self) -> Vec<SocketAddr> {
let mut out = Vec::with_capacity(self.local.len() + 1);
let mut out = Vec::with_capacity(self.local.len() + 2);
out.extend(self.local.iter().copied());
// Port-mapped address goes before reflexive — it's
// more reliable on symmetric NATs where the reflexive
// addr might not match what the peer actually sees.
if let Some(a) = self.mapped {
if !out.contains(&a) {
out.push(a);
}
}
if let Some(a) = self.reflexive {
// Only add if it's not already in the list (some
// edge cases on same-LAN could have the same addr
// in both).
if !out.contains(&a) {
out.push(a);
}
@@ -108,10 +130,54 @@ impl PeerCandidates {
out
}
/// Smart dial order: filters out candidates that can't possibly
/// work given our own reflexive address.
///
/// - **LAN candidates**: only included if peer's public IP
/// matches ours (same network). Private IPs are unreachable
/// cross-network.
/// - **IPv6 candidates**: stripped entirely (Phase 7 disabled).
/// - **Reflexive + mapped**: always included.
pub fn smart_dial_order(&self, own_reflexive: Option<&SocketAddr>) -> Vec<SocketAddr> {
let own_public_ip = own_reflexive.map(|a| a.ip());
let peer_public_ip = self.reflexive.map(|a| a.ip());
let same_network = match (own_public_ip, peer_public_ip) {
(Some(a), Some(b)) => a == b,
_ => false,
};
let mut out = Vec::with_capacity(self.local.len() + 2);
// LAN candidates only when on the same network.
if same_network {
for addr in &self.local {
if !addr.is_ipv6() {
out.push(*addr);
}
}
}
// Port-mapped (always useful — it's a public addr).
if let Some(a) = self.mapped {
if !a.is_ipv6() && !out.contains(&a) {
out.push(a);
}
}
// Reflexive (always useful — it's the peer's public addr).
if let Some(a) = self.reflexive {
if !a.is_ipv6() && !out.contains(&a) {
out.push(a);
}
}
out
}
/// Is there anything for the D-role to dial? If not, the
/// race reduces to relay-only.
pub fn is_empty(&self) -> bool {
self.reflexive.is_none() && self.local.is_empty()
self.reflexive.is_none() && self.local.is_empty() && self.mapped.is_none()
}
}
@@ -122,6 +188,9 @@ pub async fn race(
relay_addr: SocketAddr,
room_sni: String,
call_sni: String,
// Our own reflexive address — used to filter LAN candidates
// that can't work cross-network.
own_reflexive: Option<SocketAddr>,
// Phase 5: when `Some`, reuse this endpoint for BOTH the
// direct-path branch AND the relay dial. Pass the signal
// endpoint. The endpoint MUST be server-capable (created
@@ -141,6 +210,10 @@ pub async fn race(
// is created. Install attempt is idempotent.
let _ = rustls::crypto::ring::default_provider().install_default();
// Shared diagnostic collector for per-candidate results.
let diags_collector: Arc<std::sync::Mutex<Vec<CandidateDiag>>> =
Arc::new(std::sync::Mutex::new(Vec::new()));
// Build the direct-path endpoint + future based on role.
//
// A-role: one accept future on the shared endpoint. The
@@ -196,7 +269,76 @@ pub async fn race(
// as dial — IPv6 connections die on datagram send).
// Accept on IPv4 shared endpoint only.
let _v6_ep_unused = ipv6_endpoint.clone();
// Collect peer addrs for NAT tickle (Acceptor-side).
let tickle_addrs: Vec<SocketAddr> = peer_candidates
.smart_dial_order(own_reflexive.as_ref())
.into_iter()
.filter(|a| !a.ip().is_loopback() && !a.ip().is_unspecified())
.collect();
direct_fut = Box::pin(async move {
// NAT tickle: send a small UDP packet to each of the
// Dialer's candidate addresses FROM our shared endpoint.
// This opens our NAT's pinhole for return traffic from
// those IPs — critical for address-restricted NATs that
// only allow inbound from IPs they've seen outbound
// traffic to. Without this, the Dialer's QUIC Initial
// gets dropped by our NAT.
if !tickle_addrs.is_empty() {
if let Ok(local_addr) = ep_for_fut.local_addr() {
// Send a tickle to each peer candidate address
// to open our NAT for return traffic from that IP.
//
// We use a socket2 socket with SO_REUSEADDR +
// SO_REUSEPORT on the SAME port as the quinn
// endpoint. This is necessary because quinn
// already holds the port — a plain bind() would
// fail with EADDRINUSE.
let tickle_result: Result<(), String> = (|| {
use std::net::UdpSocket as StdUdpSocket;
let sock = socket2::Socket::new(
socket2::Domain::IPV4,
socket2::Type::DGRAM,
Some(socket2::Protocol::UDP),
).map_err(|e| format!("socket: {e}"))?;
sock.set_reuse_address(true).map_err(|e| format!("reuseaddr: {e}"))?;
// macOS/BSD/Linux also need SO_REUSEPORT
#[cfg(any(target_os = "macos", target_os = "linux", target_os = "android"))]
{
// socket2 exposes set_reuse_port on unix
unsafe {
let optval: libc::c_int = 1;
libc::setsockopt(
std::os::unix::io::AsRawFd::as_raw_fd(&sock),
libc::SOL_SOCKET,
libc::SO_REUSEPORT,
&optval as *const _ as *const libc::c_void,
std::mem::size_of::<libc::c_int>() as libc::socklen_t,
);
}
}
sock.set_nonblocking(true).map_err(|e| format!("nonblock: {e}"))?;
let bind_addr: SocketAddr = SocketAddr::new(
std::net::IpAddr::V4(std::net::Ipv4Addr::UNSPECIFIED),
local_addr.port(),
);
sock.bind(&bind_addr.into()).map_err(|e| format!("bind :{}: {e}", local_addr.port()))?;
let std_sock: StdUdpSocket = sock.into();
for addr in &tickle_addrs {
let _ = std_sock.send_to(&[0u8; 1], addr);
tracing::info!(
%addr,
local_port = local_addr.port(),
"dual_path: A-role sent NAT tickle"
);
}
Ok(())
})();
if let Err(e) = tickle_result {
tracing::warn!(error = %e, "dual_path: A-role NAT tickle failed");
}
}
}
// Accept loop: retry if we get a stale/closed
// connection from a previous call. Max 3 retries
// to avoid spinning until the race timeout.
@@ -270,8 +412,9 @@ pub async fn race(
};
let ep_for_fut = ep.clone();
let _v6_ep_for_dial = ipv6_endpoint.clone();
let dial_order = peer_candidates.dial_order();
let dial_order = peer_candidates.smart_dial_order(own_reflexive.as_ref());
let sni = call_sni.clone();
let diags = diags_collector.clone();
direct_fut = Box::pin(async move {
if dial_order.is_empty() {
// No candidates — the race reduces to
@@ -300,17 +443,32 @@ pub async fn race(
// Re-enable once IPv6 datagram delivery is
// verified on target networks.
if candidate.is_ipv6() {
tracing::debug!(
tracing::info!(
%candidate,
candidate_idx = idx,
"dual_path: skipping IPv6 candidate (disabled)"
);
if let Ok(mut d) = diags.lock() {
d.push(CandidateDiag {
index: idx,
addr: candidate.to_string(),
result: "skipped:ipv6".into(),
elapsed_ms: None,
});
}
continue;
}
let ep = ep_for_fut.clone();
let client_cfg = wzp_transport::client_config();
let sni = sni.clone();
let diags_inner = diags.clone();
set.spawn(async move {
let start = std::time::Instant::now();
tracing::info!(
%candidate,
candidate_idx = idx,
"dual_path: dialing candidate"
);
let result = wzp_transport::connect(
&ep,
candidate,
@@ -318,6 +476,19 @@ pub async fn race(
client_cfg,
)
.await;
let elapsed = start.elapsed().as_millis() as u32;
let diag_result = match &result {
Ok(_) => "ok".to_string(),
Err(e) => format!("error:{e}"),
};
if let Ok(mut d) = diags_inner.lock() {
d.push(CandidateDiag {
index: idx,
addr: candidate.to_string(),
result: diag_result,
elapsed_ms: Some(elapsed),
});
}
(idx, candidate, result)
});
}
@@ -346,7 +517,7 @@ pub async fn race(
return Ok(QuinnTransport::new(conn));
}
Err(e) => {
tracing::debug!(
tracing::info!(
%candidate,
candidate_idx = idx,
error = %e,
@@ -423,15 +594,18 @@ pub async fn race(
// RaceResult with both transports (when available) and uses the
// Phase 6 MediaPathReport exchange to decide which one to
// actually use for media.
let smart_order = peer_candidates.smart_dial_order(own_reflexive.as_ref());
tracing::info!(
?role,
candidates = ?peer_candidates.dial_order(),
raw_candidates = ?peer_candidates.dial_order(),
filtered_candidates = ?smart_order,
?own_reflexive,
%relay_addr,
"dual_path: racing direct vs relay"
);
let mut direct_task = tokio::spawn(
tokio::time::timeout(Duration::from_secs(2), direct_fut),
tokio::time::timeout(Duration::from_secs(4), direct_fut),
);
let mut relay_task = tokio::spawn(async move {
// Keep the 500ms head start so direct has a chance
@@ -464,9 +638,25 @@ pub async fn race(
local_winner = WinningPath::Relay; // direct failed → relay is our only hope
}
Ok(Err(_)) => {
tracing::warn!("dual_path: direct timed out (2s)");
tracing::warn!("dual_path: direct timed out (4s)");
direct_result = Some(Err(anyhow::anyhow!("direct timeout")));
local_winner = WinningPath::Relay;
// Record timeout diag for candidates that were
// still in-flight when the timeout fired.
if let Ok(mut d) = diags_collector.lock() {
let recorded_indices: std::collections::HashSet<usize> =
d.iter().map(|diag| diag.index).collect();
for (idx, addr) in smart_order.iter().enumerate() {
if !recorded_indices.contains(&idx) {
d.push(CandidateDiag {
index: idx,
addr: addr.to_string(),
result: "timeout:4s".into(),
elapsed_ms: Some(4000),
});
}
}
}
}
Err(e) => {
tracing::warn!(error = %e, "dual_path: direct task panicked");
@@ -507,7 +697,24 @@ pub async fn race(
match tokio::time::timeout(Duration::from_secs(1), direct_task).await {
Ok(Ok(Ok(Ok(t)))) => { direct_result = Some(Ok(t)); }
Ok(Ok(Ok(Err(e)))) => { direct_result = Some(Err(anyhow::anyhow!("{e}"))); }
_ => { direct_result = Some(Err(anyhow::anyhow!("direct: no result in grace period"))); }
_ => {
direct_result = Some(Err(anyhow::anyhow!("direct: no result in grace period")));
// Fill timeout diags for candidates that never reported.
if let Ok(mut d) = diags_collector.lock() {
let recorded: std::collections::HashSet<usize> =
d.iter().map(|diag| diag.index).collect();
for (idx, addr) in smart_order.iter().enumerate() {
if !recorded.contains(&idx) {
d.push(CandidateDiag {
index: idx,
addr: addr.to_string(),
result: "timeout:grace".into(),
elapsed_ms: None,
});
}
}
}
}
}
}
if relay_result.is_none() {
@@ -534,6 +741,10 @@ pub async fn race(
let _ = (direct_ep, relay_ep, ipv6_endpoint);
let candidate_diags = diags_collector.lock()
.map(|d| d.clone())
.unwrap_or_default();
Ok(RaceResult {
direct_transport: direct_result
.and_then(|r| r.ok())
@@ -542,5 +753,208 @@ pub async fn race(
.and_then(|r| r.ok())
.map(|t| Arc::new(t)),
local_winner,
candidate_diags,
})
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn peer_candidates_dial_order_all_types() {
let candidates = PeerCandidates {
reflexive: Some("203.0.113.5:4433".parse().unwrap()),
local: vec![
"192.168.1.10:4433".parse().unwrap(),
"10.0.0.5:4433".parse().unwrap(),
],
mapped: Some("198.51.100.42:12345".parse().unwrap()),
};
let order = candidates.dial_order();
// Order: local first, then mapped, then reflexive
assert_eq!(order.len(), 4);
assert_eq!(order[0], "192.168.1.10:4433".parse::<SocketAddr>().unwrap());
assert_eq!(order[1], "10.0.0.5:4433".parse::<SocketAddr>().unwrap());
assert_eq!(order[2], "198.51.100.42:12345".parse::<SocketAddr>().unwrap());
assert_eq!(order[3], "203.0.113.5:4433".parse::<SocketAddr>().unwrap());
}
#[test]
fn peer_candidates_dial_order_no_mapped() {
let candidates = PeerCandidates {
reflexive: Some("203.0.113.5:4433".parse().unwrap()),
local: vec!["192.168.1.10:4433".parse().unwrap()],
mapped: None,
};
let order = candidates.dial_order();
assert_eq!(order.len(), 2);
assert_eq!(order[0], "192.168.1.10:4433".parse::<SocketAddr>().unwrap());
assert_eq!(order[1], "203.0.113.5:4433".parse::<SocketAddr>().unwrap());
}
#[test]
fn peer_candidates_dial_order_only_mapped() {
let candidates = PeerCandidates {
reflexive: None,
local: vec![],
mapped: Some("198.51.100.42:12345".parse().unwrap()),
};
let order = candidates.dial_order();
assert_eq!(order.len(), 1);
assert_eq!(order[0], "198.51.100.42:12345".parse::<SocketAddr>().unwrap());
}
#[test]
fn peer_candidates_dial_order_dedup_mapped_equals_reflexive() {
let addr: SocketAddr = "203.0.113.5:4433".parse().unwrap();
let candidates = PeerCandidates {
reflexive: Some(addr),
local: vec![],
mapped: Some(addr), // same as reflexive
};
let order = candidates.dial_order();
// Should be deduped to 1
assert_eq!(order.len(), 1);
assert_eq!(order[0], addr);
}
#[test]
fn peer_candidates_dial_order_dedup_mapped_in_local() {
let addr: SocketAddr = "192.168.1.10:4433".parse().unwrap();
let candidates = PeerCandidates {
reflexive: None,
local: vec![addr],
mapped: Some(addr), // same as a local addr
};
let order = candidates.dial_order();
assert_eq!(order.len(), 1);
assert_eq!(order[0], addr);
}
#[test]
fn peer_candidates_is_empty() {
let empty = PeerCandidates::default();
assert!(empty.is_empty());
let with_reflexive = PeerCandidates {
reflexive: Some("1.2.3.4:5".parse().unwrap()),
..Default::default()
};
assert!(!with_reflexive.is_empty());
let with_local = PeerCandidates {
local: vec!["10.0.0.1:5".parse().unwrap()],
..Default::default()
};
assert!(!with_local.is_empty());
let with_mapped = PeerCandidates {
mapped: Some("1.2.3.4:5".parse().unwrap()),
..Default::default()
};
assert!(!with_mapped.is_empty());
}
#[test]
fn peer_candidates_empty_dial_order() {
let empty = PeerCandidates::default();
assert!(empty.dial_order().is_empty());
}
#[test]
fn winning_path_debug() {
// Just verify Debug impl doesn't panic
let _ = format!("{:?}", WinningPath::Direct);
let _ = format!("{:?}", WinningPath::Relay);
}
// ── smart_dial_order tests ─────────────────────────────────
#[test]
fn smart_dial_order_same_network_includes_lan() {
let candidates = PeerCandidates {
reflexive: Some("203.0.113.5:4433".parse().unwrap()),
local: vec![
"192.168.1.10:4433".parse().unwrap(),
"10.0.0.5:4433".parse().unwrap(),
],
mapped: None,
};
let own: SocketAddr = "203.0.113.5:12345".parse().unwrap();
let order = candidates.smart_dial_order(Some(&own));
// Same public IP → LAN candidates included
assert!(order.contains(&"192.168.1.10:4433".parse().unwrap()));
assert!(order.contains(&"10.0.0.5:4433".parse().unwrap()));
assert!(order.contains(&"203.0.113.5:4433".parse().unwrap()));
}
#[test]
fn smart_dial_order_different_network_strips_lan() {
let candidates = PeerCandidates {
reflexive: Some("150.228.49.65:4433".parse().unwrap()),
local: vec![
"172.16.81.126:4433".parse().unwrap(),
"10.0.0.5:4433".parse().unwrap(),
],
mapped: None,
};
// Different public IP → LAN candidates stripped
let own: SocketAddr = "185.115.4.212:12345".parse().unwrap();
let order = candidates.smart_dial_order(Some(&own));
assert!(!order.contains(&"172.16.81.126:4433".parse().unwrap()));
assert!(!order.contains(&"10.0.0.5:4433".parse().unwrap()));
// Reflexive still included
assert!(order.contains(&"150.228.49.65:4433".parse().unwrap()));
}
#[test]
fn smart_dial_order_strips_ipv6() {
let candidates = PeerCandidates {
reflexive: Some("150.228.49.65:4433".parse().unwrap()),
local: vec![
"[2a0d:3344:692c::1]:4433".parse().unwrap(),
"172.16.81.126:4433".parse().unwrap(),
],
mapped: None,
};
// Same network, but IPv6 should be stripped
let own: SocketAddr = "150.228.49.65:5555".parse().unwrap();
let order = candidates.smart_dial_order(Some(&own));
assert!(!order.iter().any(|a| a.is_ipv6()));
assert!(order.contains(&"172.16.81.126:4433".parse().unwrap()));
}
#[test]
fn smart_dial_order_no_own_reflexive_strips_lan() {
let candidates = PeerCandidates {
reflexive: Some("150.228.49.65:4433".parse().unwrap()),
local: vec!["172.16.81.126:4433".parse().unwrap()],
mapped: Some("198.51.100.42:12345".parse().unwrap()),
};
// No own reflexive → can't determine same network → strip LAN
let order = candidates.smart_dial_order(None);
assert!(!order.contains(&"172.16.81.126:4433".parse().unwrap()));
assert!(order.contains(&"198.51.100.42:12345".parse().unwrap()));
assert!(order.contains(&"150.228.49.65:4433".parse().unwrap()));
}
#[test]
fn smart_dial_order_mapped_always_included() {
let candidates = PeerCandidates {
reflexive: Some("150.228.49.65:4433".parse().unwrap()),
local: vec![],
mapped: Some("198.51.100.42:12345".parse().unwrap()),
};
let own: SocketAddr = "185.115.4.212:12345".parse().unwrap();
let order = candidates.smart_dial_order(Some(&own));
assert_eq!(order.len(), 2); // mapped + reflexive
assert!(order.contains(&"198.51.100.42:12345".parse().unwrap()));
assert!(order.contains(&"150.228.49.65:4433".parse().unwrap()));
}
}

View File

@@ -131,6 +131,14 @@ pub fn signal_to_call_type(signal: &SignalMessage) -> CallSignalType {
// bridge. Catch-all mapping for completeness.
SignalMessage::FederatedSignalForward { .. } => CallSignalType::Offer,
SignalMessage::MediaPathReport { .. } => CallSignalType::Offer, // control-plane
SignalMessage::CandidateUpdate { .. } => CallSignalType::IceCandidate, // mid-call re-gather
SignalMessage::HardNatProbe { .. } => CallSignalType::IceCandidate, // hard NAT coordination
SignalMessage::HardNatBirthdayStart { .. } => CallSignalType::IceCandidate, // birthday attack
SignalMessage::UpgradeProposal { .. }
| SignalMessage::UpgradeResponse { .. }
| SignalMessage::UpgradeConfirm { .. }
| SignalMessage::QualityCapability { .. } => CallSignalType::Offer, // quality negotiation
SignalMessage::PresenceList { .. } => CallSignalType::Offer, // lobby presence
SignalMessage::QualityDirective { .. } => CallSignalType::Offer, // relay-initiated
}
}

View File

@@ -0,0 +1,444 @@
//! Phase 8 (Tailscale-inspired): ICE agent for candidate lifecycle
//! management and mid-call re-gathering.
//!
//! The `IceAgent` owns the state of all candidate discovery
//! mechanisms (STUN, port mapping, host candidates) and provides:
//!
//! - `gather()`: initial candidate gathering during call setup
//! - `re_gather()`: triggered on network change, produces a
//! `CandidateUpdate` to send to the peer
//! - `apply_peer_update()`: processes peer's candidate updates
//!
//! This is NOT a full ICE agent (RFC 8445). It's the Tailscale-style
//! "gather all candidates, race them all in parallel, pick the
//! winner" approach, adapted for QUIC transport.
use std::net::SocketAddr;
use std::sync::atomic::{AtomicU32, Ordering};
use std::time::Duration;
use wzp_proto::SignalMessage;
use crate::dual_path::PeerCandidates;
use crate::portmap;
use crate::reflect;
use crate::stun;
/// All candidates gathered for the local side.
#[derive(Debug, Clone)]
pub struct CandidateSet {
/// STUN-discovered server-reflexive address.
pub reflexive: Option<SocketAddr>,
/// LAN host candidates from local interfaces.
pub local: Vec<SocketAddr>,
/// Port-mapped address from NAT-PMP/PCP/UPnP.
pub mapped: Option<SocketAddr>,
/// Generation counter (monotonically increasing per call).
pub generation: u32,
}
/// Configuration for the ICE agent.
#[derive(Debug, Clone)]
pub struct IceAgentConfig {
/// STUN servers to use for reflexive discovery.
pub stun_config: stun::StunConfig,
/// Whether to attempt port mapping.
pub enable_portmap: bool,
/// Timeout for each discovery mechanism.
pub gather_timeout: Duration,
/// The QUIC endpoint's local port (for host candidate pairing).
pub local_v4_port: u16,
/// Optional IPv6 port.
pub local_v6_port: Option<u16>,
}
impl Default for IceAgentConfig {
fn default() -> Self {
Self {
stun_config: stun::StunConfig::default(),
enable_portmap: true,
gather_timeout: Duration::from_secs(3),
local_v4_port: 0,
local_v6_port: None,
}
}
}
/// ICE agent managing candidate lifecycle.
pub struct IceAgent {
config: IceAgentConfig,
generation: AtomicU32,
call_id: String,
/// Last-seen peer generation (to filter stale updates).
peer_generation: AtomicU32,
}
impl IceAgent {
pub fn new(call_id: String, config: IceAgentConfig) -> Self {
Self {
config,
generation: AtomicU32::new(0),
call_id,
peer_generation: AtomicU32::new(0),
}
}
/// Initial candidate gathering. Runs all discovery mechanisms
/// in parallel and returns the full candidate set.
pub async fn gather(&self) -> CandidateSet {
let generation = self.generation.fetch_add(1, Ordering::Relaxed);
// Run STUN + port mapping + host candidates in parallel.
let stun_fut = stun::discover_reflexive(&self.config.stun_config);
let portmap_fut = async {
if self.config.enable_portmap && self.config.local_v4_port > 0 {
portmap::acquire_port_mapping(self.config.local_v4_port, None)
.await
.ok()
} else {
None
}
};
let (stun_result, portmap_result) = tokio::join!(
tokio::time::timeout(self.config.gather_timeout, stun_fut),
tokio::time::timeout(self.config.gather_timeout, portmap_fut),
);
let reflexive = stun_result.ok().and_then(|r| r.ok());
let mapped = portmap_result
.ok()
.flatten()
.map(|m| m.external_addr);
let local = reflect::local_host_candidates(
self.config.local_v4_port,
self.config.local_v6_port,
);
tracing::info!(
generation,
reflexive = ?reflexive,
mapped = ?mapped,
local_count = local.len(),
"ice_agent: gathered candidates"
);
CandidateSet {
reflexive,
local,
mapped,
generation,
}
}
/// Re-gather candidates after a network change. Increments the
/// generation counter and returns a `CandidateUpdate` signal
/// message to send to the peer.
pub async fn re_gather(&self) -> (CandidateSet, SignalMessage) {
let candidates = self.gather().await;
let update = SignalMessage::CandidateUpdate {
call_id: self.call_id.clone(),
reflexive_addr: candidates.reflexive.map(|a| a.to_string()),
local_addrs: candidates.local.iter().map(|a| a.to_string()).collect(),
mapped_addr: candidates.mapped.map(|a| a.to_string()),
generation: candidates.generation,
};
(candidates, update)
}
/// Process a peer's candidate update. Returns `Some(PeerCandidates)`
/// if the update is newer than the last-seen generation, `None`
/// if it's stale.
pub fn apply_peer_update(
&self,
update: &SignalMessage,
) -> Option<PeerCandidates> {
let (reflexive_addr, local_addrs, mapped_addr, generation) = match update {
SignalMessage::CandidateUpdate {
reflexive_addr,
local_addrs,
mapped_addr,
generation,
..
} => (reflexive_addr, local_addrs, mapped_addr, *generation),
_ => return None,
};
// Only accept if newer than last-seen generation.
let prev = self.peer_generation.fetch_max(generation, Ordering::AcqRel);
if generation <= prev {
tracing::debug!(
generation,
prev,
"ice_agent: ignoring stale CandidateUpdate"
);
return None;
}
let reflexive = reflexive_addr
.as_deref()
.and_then(|s| s.parse().ok());
let local: Vec<SocketAddr> = local_addrs
.iter()
.filter_map(|s| s.parse().ok())
.collect();
let mapped = mapped_addr
.as_deref()
.and_then(|s| s.parse().ok());
tracing::info!(
generation,
reflexive = ?reflexive,
mapped = ?mapped,
local_count = local.len(),
"ice_agent: applied peer candidate update"
);
Some(PeerCandidates {
reflexive,
local,
mapped,
})
}
/// Get the current generation counter.
pub fn generation(&self) -> u32 {
self.generation.load(Ordering::Relaxed)
}
}
// ── Tests ──────────────────────────────────────────────────────────
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn apply_peer_update_rejects_stale() {
let agent = IceAgent::new("test-call".into(), IceAgentConfig::default());
// First update (gen=1) should succeed.
let update1 = SignalMessage::CandidateUpdate {
call_id: "test-call".into(),
reflexive_addr: Some("203.0.113.5:4433".into()),
local_addrs: vec!["192.168.1.10:4433".into()],
mapped_addr: None,
generation: 1,
};
let result = agent.apply_peer_update(&update1);
assert!(result.is_some());
let candidates = result.unwrap();
assert_eq!(
candidates.reflexive,
Some("203.0.113.5:4433".parse().unwrap())
);
assert_eq!(candidates.local.len(), 1);
// Same generation (gen=1) should be rejected.
let update1b = SignalMessage::CandidateUpdate {
call_id: "test-call".into(),
reflexive_addr: Some("198.51.100.9:4433".into()),
local_addrs: vec![],
mapped_addr: None,
generation: 1,
};
assert!(agent.apply_peer_update(&update1b).is_none());
// Older generation (gen=0) should be rejected.
let update0 = SignalMessage::CandidateUpdate {
call_id: "test-call".into(),
reflexive_addr: Some("10.0.0.1:4433".into()),
local_addrs: vec![],
mapped_addr: None,
generation: 0,
};
assert!(agent.apply_peer_update(&update0).is_none());
// Newer generation (gen=2) should succeed.
let update2 = SignalMessage::CandidateUpdate {
call_id: "test-call".into(),
reflexive_addr: Some("198.51.100.9:5555".into()),
local_addrs: vec![],
mapped_addr: Some("203.0.113.5:12345".into()),
generation: 2,
};
let result = agent.apply_peer_update(&update2);
assert!(result.is_some());
let candidates = result.unwrap();
assert_eq!(
candidates.reflexive,
Some("198.51.100.9:5555".parse().unwrap())
);
assert_eq!(
candidates.mapped,
Some("203.0.113.5:12345".parse().unwrap())
);
}
#[test]
fn apply_wrong_signal_returns_none() {
let agent = IceAgent::new("test-call".into(), IceAgentConfig::default());
let wrong = SignalMessage::Reflect;
assert!(agent.apply_peer_update(&wrong).is_none());
}
#[test]
fn generation_increments() {
let agent = IceAgent::new("test".into(), IceAgentConfig::default());
assert_eq!(agent.generation(), 0);
// Simulate what gather() does internally
let g1 = agent.generation.fetch_add(1, Ordering::Relaxed);
assert_eq!(g1, 0);
assert_eq!(agent.generation(), 1);
let g2 = agent.generation.fetch_add(1, Ordering::Relaxed);
assert_eq!(g2, 1);
assert_eq!(agent.generation(), 2);
}
#[test]
fn apply_peer_update_parses_all_fields() {
let agent = IceAgent::new("test-call".into(), IceAgentConfig::default());
let update = SignalMessage::CandidateUpdate {
call_id: "test-call".into(),
reflexive_addr: Some("203.0.113.5:4433".into()),
local_addrs: vec![
"192.168.1.10:4433".into(),
"10.0.0.5:4433".into(),
],
mapped_addr: Some("198.51.100.42:12345".into()),
generation: 1,
};
let candidates = agent.apply_peer_update(&update).unwrap();
assert_eq!(
candidates.reflexive,
Some("203.0.113.5:4433".parse().unwrap())
);
assert_eq!(candidates.local.len(), 2);
assert_eq!(
candidates.local[0],
"192.168.1.10:4433".parse::<SocketAddr>().unwrap()
);
assert_eq!(
candidates.mapped,
Some("198.51.100.42:12345".parse().unwrap())
);
}
#[test]
fn apply_peer_update_handles_empty_fields() {
let agent = IceAgent::new("test".into(), IceAgentConfig::default());
let update = SignalMessage::CandidateUpdate {
call_id: "test".into(),
reflexive_addr: None,
local_addrs: vec![],
mapped_addr: None,
generation: 1,
};
let candidates = agent.apply_peer_update(&update).unwrap();
assert!(candidates.reflexive.is_none());
assert!(candidates.local.is_empty());
assert!(candidates.mapped.is_none());
}
#[test]
fn apply_peer_update_skips_unparseable_addrs() {
let agent = IceAgent::new("test".into(), IceAgentConfig::default());
let update = SignalMessage::CandidateUpdate {
call_id: "test".into(),
reflexive_addr: Some("not-an-addr".into()),
local_addrs: vec![
"192.168.1.10:4433".into(),
"garbage".into(),
"10.0.0.5:4433".into(),
],
mapped_addr: Some("also-bad".into()),
generation: 1,
};
let candidates = agent.apply_peer_update(&update).unwrap();
assert!(candidates.reflexive.is_none()); // unparseable
assert_eq!(candidates.local.len(), 2); // garbage filtered
assert!(candidates.mapped.is_none()); // unparseable
}
#[test]
fn default_config_values() {
let cfg = IceAgentConfig::default();
assert!(cfg.enable_portmap);
assert!(cfg.gather_timeout.as_secs() > 0);
assert!(!cfg.stun_config.servers.is_empty());
assert_eq!(cfg.local_v4_port, 0);
assert!(cfg.local_v6_port.is_none());
}
#[tokio::test]
async fn gather_returns_candidates_even_with_no_stun() {
// With default config (port 0 = no portmap, STUN will timeout
// quickly on loopback), gather should still return host candidates.
let agent = IceAgent::new("test".into(), IceAgentConfig {
stun_config: stun::StunConfig {
servers: vec![], // no servers = quick failure
timeout: Duration::from_millis(100),
},
enable_portmap: false,
gather_timeout: Duration::from_millis(200),
local_v4_port: 12345,
local_v6_port: None,
});
let candidates = agent.gather().await;
assert_eq!(candidates.generation, 0);
// Reflexive should be None (no STUN servers)
assert!(candidates.reflexive.is_none());
// Mapped should be None (portmap disabled)
assert!(candidates.mapped.is_none());
// Local candidates depend on the machine's interfaces
// but gather() should not panic.
}
#[tokio::test]
async fn re_gather_produces_signal_message() {
let agent = IceAgent::new("call-42".into(), IceAgentConfig {
stun_config: stun::StunConfig {
servers: vec![],
timeout: Duration::from_millis(50),
},
enable_portmap: false,
gather_timeout: Duration::from_millis(100),
local_v4_port: 4433,
local_v6_port: None,
});
let (candidates, signal) = agent.re_gather().await;
assert_eq!(candidates.generation, 0);
match signal {
SignalMessage::CandidateUpdate {
call_id,
generation,
..
} => {
assert_eq!(call_id, "call-42");
assert_eq!(generation, 0);
}
_ => panic!("expected CandidateUpdate"),
}
// Second re_gather increments generation
let (candidates2, signal2) = agent.re_gather().await;
assert_eq!(candidates2.generation, 1);
match signal2 {
SignalMessage::CandidateUpdate { generation, .. } => {
assert_eq!(generation, 1);
}
_ => panic!("expected CandidateUpdate"),
}
}
}

View File

@@ -34,7 +34,13 @@ pub mod featherchat;
pub mod handshake;
pub mod dual_path;
pub mod metrics;
pub mod birthday;
pub mod ice_agent;
pub mod netcheck;
pub mod portmap;
pub mod reflect;
pub mod relay_map;
pub mod stun;
pub mod sweep;
// AudioPlayback: three possible backends depending on feature flags.

View File

@@ -0,0 +1,524 @@
//! Phase 8 (Tailscale-inspired): Comprehensive network diagnostic.
//!
//! Probes STUN servers, relay infrastructure, port mapping
//! capabilities, IPv6 reachability, and NAT hairpinning in parallel
//! to produce a `NetcheckReport` that captures the client's network
//! environment at a point in time.
//!
//! Used for:
//! - Troubleshooting connectivity issues
//! - Automatic relay selection (Phase 5)
//! - Pre-call NAT assessment
//! - Quality prediction
use std::net::SocketAddr;
use std::time::{Duration, Instant};
use serde::Serialize;
use crate::portmap::{self, PortMapProtocol};
use crate::reflect::{self, NatType};
use crate::stun::{self, StunConfig};
/// Complete network diagnostic report.
#[derive(Debug, Clone, Serialize)]
pub struct NetcheckReport {
/// NAT type classification (from combined STUN + relay probes).
pub nat_type: NatType,
/// Server-reflexive address (consensus from probes).
pub reflexive_addr: Option<String>,
/// Whether IPv4 connectivity is available.
pub ipv4_reachable: bool,
/// Whether IPv6 connectivity is available.
pub ipv6_reachable: bool,
/// Whether the NAT supports hairpinning (loopback to own
/// reflexive address).
pub hairpin_works: Option<bool>,
/// Which port mapping protocol is available (if any).
pub port_mapping: Option<PortMapProtocol>,
/// Per-relay latency measurements.
pub relay_latencies: Vec<RelayLatency>,
/// Preferred relay (lowest latency).
pub preferred_relay: Option<String>,
/// STUN latency to first responding server (ms).
pub stun_latency_ms: Option<u32>,
/// Whether UPnP is available on the gateway.
pub upnp_available: bool,
/// Whether PCP is available on the gateway.
pub pcp_available: bool,
/// Whether NAT-PMP is available on the gateway.
pub nat_pmp_available: bool,
/// Default gateway address.
pub gateway: Option<String>,
/// Total time taken for the diagnostic (ms).
pub duration_ms: u32,
/// Individual STUN probe results.
pub stun_probes: Vec<reflect::NatProbeResult>,
/// NAT port allocation pattern (sequential vs random).
pub port_allocation: Option<stun::PortAllocation>,
}
/// Latency to a specific relay.
#[derive(Debug, Clone, Serialize)]
pub struct RelayLatency {
pub name: String,
pub addr: String,
pub rtt_ms: Option<u32>,
pub error: Option<String>,
}
/// Configuration for the netcheck run.
#[derive(Debug, Clone)]
pub struct NetcheckConfig {
/// STUN servers to probe.
pub stun_config: StunConfig,
/// Relay servers to probe (name, address pairs).
pub relays: Vec<(String, SocketAddr)>,
/// Per-probe timeout.
pub timeout: Duration,
/// Whether to test port mapping.
pub test_portmap: bool,
/// Whether to test IPv6.
pub test_ipv6: bool,
/// Local port for port mapping test (0 = skip).
pub local_port: u16,
}
impl Default for NetcheckConfig {
fn default() -> Self {
Self {
stun_config: StunConfig::default(),
relays: Vec::new(),
timeout: Duration::from_secs(5),
test_portmap: true,
test_ipv6: true,
local_port: 0,
}
}
}
/// Run a comprehensive network diagnostic.
///
/// Probes run in parallel for speed — the total time is bounded
/// by the slowest individual probe, not the sum.
pub async fn run_netcheck(config: &NetcheckConfig) -> NetcheckReport {
let start = Instant::now();
// Run all probes in parallel.
let stun_fut = stun::probe_stun_servers(&config.stun_config);
let relay_fut = probe_relays(&config.relays, config.timeout);
let portmap_fut = probe_portmap(config.test_portmap, config.local_port);
let gateway_fut = portmap::default_gateway();
let ipv6_fut = test_ipv6(config.test_ipv6, config.timeout);
let port_alloc_fut = stun::detect_port_allocation(&config.stun_config);
let (stun_probes, relay_latencies, portmap_result, gateway_result, ipv6_reachable, port_alloc_result) =
tokio::join!(stun_fut, relay_fut, portmap_fut, gateway_result_fut(gateway_fut), ipv6_fut, port_alloc_fut);
// Classify NAT from STUN probes.
let (nat_type, consensus_addr) = reflect::classify_nat(&stun_probes);
// Determine STUN latency (first successful probe).
let stun_latency_ms = stun_probes
.iter()
.filter_map(|p| p.latency_ms)
.min();
// IPv4 reachable if any STUN probe succeeded.
let ipv4_reachable = stun_probes
.iter()
.any(|p| p.observed_addr.is_some());
// Preferred relay = lowest RTT.
let preferred_relay = relay_latencies
.iter()
.filter_map(|r| r.rtt_ms.map(|rtt| (r.name.clone(), rtt)))
.min_by_key(|(_, rtt)| *rtt)
.map(|(name, _)| name);
// Port mapping availability.
let (port_mapping, nat_pmp_available, pcp_available, upnp_available) = match portmap_result {
Some(mapping) => {
let proto = mapping.protocol;
(
Some(proto),
proto == PortMapProtocol::NatPmp,
proto == PortMapProtocol::Pcp,
proto == PortMapProtocol::UPnP,
)
}
None => (None, false, false, false),
};
let gateway = match gateway_result {
Ok(gw) => Some(gw.to_string()),
Err(_) => None,
};
NetcheckReport {
nat_type,
reflexive_addr: consensus_addr,
ipv4_reachable,
ipv6_reachable,
hairpin_works: None, // TODO: implement hairpin test
port_mapping,
relay_latencies,
preferred_relay,
stun_latency_ms,
upnp_available,
pcp_available,
nat_pmp_available,
gateway,
duration_ms: start.elapsed().as_millis() as u32,
stun_probes,
port_allocation: Some(port_alloc_result.allocation),
}
}
/// Probe relay latencies via reflect.
async fn probe_relays(
relays: &[(String, SocketAddr)],
timeout: Duration,
) -> Vec<RelayLatency> {
if relays.is_empty() {
return Vec::new();
}
let timeout_ms = timeout.as_millis() as u64;
let mut set = tokio::task::JoinSet::new();
for (name, addr) in relays {
let name = name.clone();
let addr = *addr;
set.spawn(async move {
let start = Instant::now();
match reflect::probe_reflect_addr(addr, timeout_ms, None).await {
Ok((_observed, _latency)) => RelayLatency {
name,
addr: addr.to_string(),
rtt_ms: Some(start.elapsed().as_millis() as u32),
error: None,
},
Err(e) => RelayLatency {
name,
addr: addr.to_string(),
rtt_ms: None,
error: Some(e),
},
}
});
}
let mut results = Vec::with_capacity(relays.len());
while let Some(join_result) = set.join_next().await {
match join_result {
Ok(r) => results.push(r),
Err(_) => {}
}
}
// Sort by RTT (lowest first).
results.sort_by_key(|r| r.rtt_ms.unwrap_or(u32::MAX));
results
}
/// Attempt port mapping and return the mapping if successful.
async fn probe_portmap(
enabled: bool,
local_port: u16,
) -> Option<portmap::PortMapping> {
if !enabled || local_port == 0 {
return None;
}
portmap::acquire_port_mapping(local_port, None).await.ok()
}
/// Wrap the gateway future to handle the Result.
async fn gateway_result_fut(
fut: impl std::future::Future<Output = Result<std::net::Ipv4Addr, portmap::PortMapError>>,
) -> Result<std::net::Ipv4Addr, portmap::PortMapError> {
fut.await
}
/// Test IPv6 connectivity by attempting to bind and send on an IPv6 socket.
async fn test_ipv6(enabled: bool, timeout: Duration) -> bool {
if !enabled {
return false;
}
// Try to resolve and connect to an IPv6 STUN server.
let result = tokio::time::timeout(timeout, async {
let sock = tokio::net::UdpSocket::bind("[::]:0").await.ok()?;
// Try Google's IPv6 STUN — if DNS resolves to an AAAA record
// and we can send a packet, IPv6 is working.
let addr = stun::resolve_stun_server("stun.l.google.com:19302").await.ok()?;
if addr.is_ipv6() {
sock.send_to(&[0u8; 1], addr).await.ok()?;
Some(true)
} else {
// Server resolved to IPv4 — try binding to [::] at least
Some(false)
}
})
.await;
match result {
Ok(Some(true)) => true,
_ => {
// Fallback: can we at least bind an IPv6 socket?
tokio::net::UdpSocket::bind("[::]:0").await.is_ok()
}
}
}
/// Format a netcheck report as a human-readable string.
pub fn format_report(report: &NetcheckReport) -> String {
let mut out = String::new();
out.push_str(&format!("=== WarzonePhone Netcheck ===\n\n"));
out.push_str(&format!(
"NAT Type: {:?}\n",
report.nat_type
));
out.push_str(&format!(
"Reflexive Addr: {}\n",
report.reflexive_addr.as_deref().unwrap_or("(unknown)")
));
out.push_str(&format!(
"IPv4: {}\n",
if report.ipv4_reachable { "yes" } else { "no" }
));
out.push_str(&format!(
"IPv6: {}\n",
if report.ipv6_reachable { "yes" } else { "no" }
));
out.push_str(&format!(
"Gateway: {}\n",
report.gateway.as_deref().unwrap_or("(unknown)")
));
if let Some(ref alloc) = report.port_allocation {
out.push_str(&format!(
"Port Alloc: {alloc}\n"
));
}
out.push_str(&format!("\n--- Port Mapping ---\n"));
out.push_str(&format!(
"NAT-PMP: {} PCP: {} UPnP: {}\n",
if report.nat_pmp_available { "yes" } else { "no" },
if report.pcp_available { "yes" } else { "no" },
if report.upnp_available { "yes" } else { "no" },
));
if let Some(proto) = &report.port_mapping {
out.push_str(&format!("Active mapping: {:?}\n", proto));
}
if !report.stun_probes.is_empty() {
out.push_str(&format!("\n--- STUN Probes ---\n"));
for p in &report.stun_probes {
out.push_str(&format!(
" {}{} ({}ms){}\n",
p.relay_name,
p.observed_addr.as_deref().unwrap_or("failed"),
p.latency_ms.map(|ms| ms.to_string()).unwrap_or_else(|| "-".into()),
p.error.as_ref().map(|e| format!(" [{e}]")).unwrap_or_default(),
));
}
}
if !report.relay_latencies.is_empty() {
out.push_str(&format!("\n--- Relay Latencies ---\n"));
for r in &report.relay_latencies {
out.push_str(&format!(
" {} ({}) → {}ms{}\n",
r.name,
r.addr,
r.rtt_ms.map(|ms| ms.to_string()).unwrap_or_else(|| "-".into()),
r.error.as_ref().map(|e| format!(" [{e}]")).unwrap_or_default(),
));
}
if let Some(ref pref) = report.preferred_relay {
out.push_str(&format!(" Preferred: {pref}\n"));
}
}
out.push_str(&format!("\nCompleted in {}ms\n", report.duration_ms));
out
}
// ── Tests ──────────────────────────────────────────────────────────
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn default_config_has_stun_servers() {
let config = NetcheckConfig::default();
assert!(!config.stun_config.servers.is_empty());
}
#[test]
fn format_report_produces_output() {
let report = NetcheckReport {
nat_type: NatType::Cone,
reflexive_addr: Some("203.0.113.5:4433".into()),
ipv4_reachable: true,
ipv6_reachable: false,
hairpin_works: None,
port_mapping: None,
relay_latencies: vec![RelayLatency {
name: "relay-1".into(),
addr: "10.0.0.1:4433".into(),
rtt_ms: Some(25),
error: None,
}],
preferred_relay: Some("relay-1".into()),
stun_latency_ms: Some(15),
upnp_available: false,
pcp_available: false,
nat_pmp_available: false,
gateway: Some("192.168.1.1".into()),
duration_ms: 1500,
stun_probes: vec![],
port_allocation: None,
};
let text = format_report(&report);
assert!(text.contains("Cone"));
assert!(text.contains("203.0.113.5:4433"));
assert!(text.contains("relay-1"));
assert!(text.contains("1500ms"));
}
#[test]
fn report_serializes_to_json() {
let report = NetcheckReport {
nat_type: NatType::Cone,
reflexive_addr: Some("203.0.113.5:4433".into()),
ipv4_reachable: true,
ipv6_reachable: false,
hairpin_works: None,
port_mapping: Some(PortMapProtocol::NatPmp),
relay_latencies: vec![],
preferred_relay: None,
stun_latency_ms: Some(25),
upnp_available: false,
pcp_available: false,
nat_pmp_available: true,
gateway: Some("192.168.1.1".into()),
duration_ms: 500,
stun_probes: vec![],
port_allocation: Some(stun::PortAllocation::Sequential { delta: 1 }),
};
let json = serde_json::to_string(&report).unwrap();
assert!(json.contains("Cone"));
assert!(json.contains("203.0.113.5:4433"));
assert!(json.contains("NatPmp"));
// Roundtrip
let decoded: serde_json::Value = serde_json::from_str(&json).unwrap();
assert_eq!(decoded["ipv4_reachable"], true);
assert_eq!(decoded["ipv6_reachable"], false);
assert_eq!(decoded["stun_latency_ms"], 25);
}
#[test]
fn relay_latency_serializes() {
let lat = RelayLatency {
name: "eu-west".into(),
addr: "10.0.0.1:4433".into(),
rtt_ms: Some(42),
error: None,
};
let json = serde_json::to_string(&lat).unwrap();
assert!(json.contains("eu-west"));
assert!(json.contains("42"));
}
#[test]
fn format_report_empty_relays() {
let report = NetcheckReport {
nat_type: NatType::Unknown,
reflexive_addr: None,
ipv4_reachable: false,
ipv6_reachable: false,
hairpin_works: None,
port_mapping: None,
relay_latencies: vec![],
preferred_relay: None,
stun_latency_ms: None,
upnp_available: false,
pcp_available: false,
nat_pmp_available: false,
gateway: None,
duration_ms: 100,
stun_probes: vec![],
port_allocation: None,
};
let text = format_report(&report);
assert!(text.contains("Unknown"));
assert!(text.contains("(unknown)")); // reflexive addr
assert!(text.contains("100ms"));
}
#[test]
fn format_report_with_stun_probes() {
let report = NetcheckReport {
nat_type: NatType::SymmetricPort,
reflexive_addr: None,
ipv4_reachable: true,
ipv6_reachable: true,
hairpin_works: Some(false),
port_mapping: Some(PortMapProtocol::UPnP),
relay_latencies: vec![
RelayLatency {
name: "us-east".into(),
addr: "10.0.0.1:4433".into(),
rtt_ms: Some(15),
error: None,
},
RelayLatency {
name: "eu-west".into(),
addr: "10.0.0.2:4433".into(),
rtt_ms: None,
error: Some("timeout".into()),
},
],
preferred_relay: Some("us-east".into()),
stun_latency_ms: Some(20),
upnp_available: true,
pcp_available: false,
nat_pmp_available: false,
gateway: Some("192.168.0.1".into()),
duration_ms: 3000,
stun_probes: vec![reflect::NatProbeResult {
relay_name: "stun:google".into(),
relay_addr: "74.125.250.129:19302".into(),
observed_addr: Some("203.0.113.5:12345".into()),
latency_ms: Some(20),
error: None,
}],
port_allocation: Some(stun::PortAllocation::Random),
};
let text = format_report(&report);
assert!(text.contains("SymmetricPort"));
assert!(text.contains("us-east"));
assert!(text.contains("eu-west"));
assert!(text.contains("Preferred: us-east"));
assert!(text.contains("UPnP: yes"));
assert!(text.contains("stun:google"));
assert!(text.contains("3000ms"));
}
/// Integration test: run actual netcheck (requires network).
#[tokio::test]
#[ignore]
async fn integration_netcheck() {
let config = NetcheckConfig::default();
let report = run_netcheck(&config).await;
println!("{}", format_report(&report));
assert!(report.duration_ms > 0);
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -473,6 +473,40 @@ pub fn classify_nat(probes: &[NatProbeResult]) -> (NatType, Option<String>) {
}
}
/// Enhanced NAT detection that combines relay-based reflection with
/// public STUN server probes for more robust classification.
///
/// Runs both probe sets concurrently:
/// 1. Relay probes via `detect_nat_type` (existing behavior)
/// 2. Public STUN probes via `probe_stun_servers`
///
/// Merges all results and classifies. More probes = higher confidence
/// in the NAT type classification. Falls back gracefully: if STUN
/// servers are unreachable, relay probes still work (and vice versa).
pub async fn detect_nat_type_with_stun(
relays: Vec<(String, SocketAddr)>,
timeout_ms: u64,
shared_endpoint: Option<wzp_transport::Endpoint>,
stun_config: &crate::stun::StunConfig,
) -> NatDetection {
// Run relay probes and STUN probes concurrently.
let relay_fut = detect_nat_type(relays, timeout_ms, shared_endpoint);
let stun_fut = crate::stun::probe_stun_servers(stun_config);
let (relay_detection, stun_probes) = tokio::join!(relay_fut, stun_fut);
// Merge all probes and re-classify.
let mut all_probes = relay_detection.probes;
all_probes.extend(stun_probes);
let (nat_type, consensus_addr) = classify_nat(&all_probes);
NatDetection {
probes: all_probes,
nat_type,
consensus_addr,
}
}
// ── Unit tests for the pure classifier ───────────────────────────
#[cfg(test)]

View File

@@ -0,0 +1,339 @@
//! Phase 8 (Tailscale-inspired): Relay map for automatic relay
//! selection based on latency.
//!
//! Maintains a sorted list of known relays with their measured
//! latencies. Used during call setup to pick the lowest-latency
//! relay, and by netcheck to report relay health.
use std::net::SocketAddr;
use std::time::{Duration, Instant};
use serde::Serialize;
/// A known relay endpoint with measured latency.
#[derive(Debug, Clone, Serialize)]
pub struct RelayEntry {
/// Human-readable name (e.g., "us-east", "eu-west").
pub name: String,
/// Relay address.
pub addr: SocketAddr,
/// Geographic region (from RegisterPresenceAck).
pub region: Option<String>,
/// Last measured RTT (ms).
pub rtt_ms: Option<u32>,
/// When the RTT was last measured.
#[serde(skip)]
pub last_probed: Option<Instant>,
/// Whether this relay is currently reachable.
pub reachable: bool,
}
/// Sorted relay map. Entries are ordered by RTT (lowest first).
#[derive(Debug, Clone, Default)]
pub struct RelayMap {
entries: Vec<RelayEntry>,
}
impl RelayMap {
pub fn new() -> Self {
Self {
entries: Vec::new(),
}
}
/// Add or update a relay entry.
pub fn upsert(&mut self, name: &str, addr: SocketAddr, region: Option<String>) {
if let Some(entry) = self.entries.iter_mut().find(|e| e.addr == addr) {
entry.name = name.to_string();
if region.is_some() {
entry.region = region;
}
} else {
self.entries.push(RelayEntry {
name: name.to_string(),
addr,
region,
rtt_ms: None,
last_probed: None,
reachable: false,
});
}
}
/// Update RTT measurement for a relay.
pub fn update_rtt(&mut self, addr: SocketAddr, rtt_ms: u32) {
if let Some(entry) = self.entries.iter_mut().find(|e| e.addr == addr) {
entry.rtt_ms = Some(rtt_ms);
entry.last_probed = Some(Instant::now());
entry.reachable = true;
}
self.sort();
}
/// Mark a relay as unreachable.
pub fn mark_unreachable(&mut self, addr: SocketAddr) {
if let Some(entry) = self.entries.iter_mut().find(|e| e.addr == addr) {
entry.reachable = false;
entry.last_probed = Some(Instant::now());
}
self.sort();
}
/// Get the preferred (lowest-latency, reachable) relay.
pub fn preferred(&self) -> Option<&RelayEntry> {
self.entries
.iter()
.find(|e| e.reachable && e.rtt_ms.is_some())
}
/// Get all entries, sorted by RTT.
pub fn entries(&self) -> &[RelayEntry] {
&self.entries
}
/// Populate from a `RegisterPresenceAck.available_relays` list.
/// Each entry is "name|addr" format.
pub fn populate_from_ack(&mut self, relays: &[String], relay_region: Option<&str>) {
for entry_str in relays {
if let Some((name, addr_str)) = entry_str.split_once('|') {
if let Ok(addr) = addr_str.parse::<SocketAddr>() {
self.upsert(name, addr, None);
}
}
}
// If the ack included a region for the current relay, we
// could tag it — but we'd need to know which relay we're
// connected to. Left for the caller to handle.
let _ = relay_region;
}
/// Check if any entry has a stale probe (older than `max_age`).
pub fn needs_reprobe(&self, max_age: Duration) -> bool {
self.entries.iter().any(|e| {
match e.last_probed {
None => true,
Some(t) => t.elapsed() > max_age,
}
})
}
/// Get entries that need reprobing.
pub fn stale_entries(&self, max_age: Duration) -> Vec<(String, SocketAddr)> {
self.entries
.iter()
.filter(|e| match e.last_probed {
None => true,
Some(t) => t.elapsed() > max_age,
})
.map(|e| (e.name.clone(), e.addr))
.collect()
}
fn sort(&mut self) {
self.entries.sort_by_key(|e| {
if e.reachable {
e.rtt_ms.unwrap_or(u32::MAX)
} else {
u32::MAX
}
});
}
}
// ── Tests ──────────────────────────────────────────────────────────
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn preferred_returns_lowest_rtt() {
let mut map = RelayMap::new();
let a1: SocketAddr = "10.0.0.1:4433".parse().unwrap();
let a2: SocketAddr = "10.0.0.2:4433".parse().unwrap();
let a3: SocketAddr = "10.0.0.3:4433".parse().unwrap();
map.upsert("slow", a1, None);
map.upsert("fast", a2, None);
map.upsert("mid", a3, None);
map.update_rtt(a1, 200);
map.update_rtt(a2, 15);
map.update_rtt(a3, 80);
let pref = map.preferred().unwrap();
assert_eq!(pref.addr, a2);
assert_eq!(pref.rtt_ms, Some(15));
}
#[test]
fn unreachable_not_preferred() {
let mut map = RelayMap::new();
let a1: SocketAddr = "10.0.0.1:4433".parse().unwrap();
let a2: SocketAddr = "10.0.0.2:4433".parse().unwrap();
map.upsert("fast-dead", a1, None);
map.upsert("slow-alive", a2, None);
map.update_rtt(a1, 5);
map.update_rtt(a2, 200);
map.mark_unreachable(a1);
let pref = map.preferred().unwrap();
assert_eq!(pref.addr, a2);
}
#[test]
fn populate_from_ack() {
let mut map = RelayMap::new();
map.populate_from_ack(
&[
"us-east|203.0.113.5:4433".into(),
"eu-west|198.51.100.9:4433".into(),
],
Some("us-east"),
);
assert_eq!(map.entries().len(), 2);
assert_eq!(map.entries()[0].name, "us-east");
assert_eq!(map.entries()[1].name, "eu-west");
}
#[test]
fn upsert_updates_existing() {
let mut map = RelayMap::new();
let addr: SocketAddr = "10.0.0.1:4433".parse().unwrap();
map.upsert("old-name", addr, None);
map.upsert("new-name", addr, Some("us-west".into()));
assert_eq!(map.entries().len(), 1);
assert_eq!(map.entries()[0].name, "new-name");
assert_eq!(map.entries()[0].region, Some("us-west".into()));
}
#[test]
fn upsert_preserves_region_when_none() {
let mut map = RelayMap::new();
let addr: SocketAddr = "10.0.0.1:4433".parse().unwrap();
map.upsert("relay", addr, Some("eu-west".into()));
map.upsert("relay", addr, None); // region is None
// Should keep the original region
assert_eq!(map.entries()[0].region, Some("eu-west".into()));
}
#[test]
fn preferred_returns_none_on_empty() {
let map = RelayMap::new();
assert!(map.preferred().is_none());
}
#[test]
fn preferred_returns_none_when_all_unreachable() {
let mut map = RelayMap::new();
let addr: SocketAddr = "10.0.0.1:4433".parse().unwrap();
map.upsert("relay", addr, None);
// Not update_rtt'd, so reachable=false
assert!(map.preferred().is_none());
}
#[test]
fn needs_reprobe_empty_is_false() {
let map = RelayMap::new();
// No entries → nothing to reprobe
assert!(!map.needs_reprobe(Duration::from_secs(60)));
}
#[test]
fn needs_reprobe_never_probed() {
let mut map = RelayMap::new();
map.upsert("relay", "10.0.0.1:4433".parse().unwrap(), None);
assert!(map.needs_reprobe(Duration::from_secs(60)));
}
#[test]
fn needs_reprobe_fresh_is_false() {
let mut map = RelayMap::new();
let addr: SocketAddr = "10.0.0.1:4433".parse().unwrap();
map.upsert("relay", addr, None);
map.update_rtt(addr, 50);
// Just probed, so 60s max_age should not trigger
assert!(!map.needs_reprobe(Duration::from_secs(60)));
}
#[test]
fn stale_entries_returns_unprobed() {
let mut map = RelayMap::new();
let a1: SocketAddr = "10.0.0.1:4433".parse().unwrap();
let a2: SocketAddr = "10.0.0.2:4433".parse().unwrap();
map.upsert("probed", a1, None);
map.upsert("stale", a2, None);
map.update_rtt(a1, 50);
let stale = map.stale_entries(Duration::from_secs(60));
assert_eq!(stale.len(), 1);
assert_eq!(stale[0].1, a2);
}
#[test]
fn sort_stability_with_equal_rtt() {
let mut map = RelayMap::new();
let a1: SocketAddr = "10.0.0.1:4433".parse().unwrap();
let a2: SocketAddr = "10.0.0.2:4433".parse().unwrap();
map.upsert("first", a1, None);
map.upsert("second", a2, None);
map.update_rtt(a1, 50);
map.update_rtt(a2, 50);
// Both have same RTT — sort should be stable (insertion order)
assert_eq!(map.entries().len(), 2);
// Both are valid preferred relays
assert!(map.preferred().is_some());
}
#[test]
fn populate_from_ack_skips_malformed() {
let mut map = RelayMap::new();
map.populate_from_ack(
&[
"good|10.0.0.1:4433".into(),
"no-pipe-separator".into(),
"bad-addr|not-a-socket-addr".into(),
"also-good|10.0.0.2:4433".into(),
],
None,
);
assert_eq!(map.entries().len(), 2);
}
#[test]
fn mark_unreachable_sorts_to_end() {
let mut map = RelayMap::new();
let a1: SocketAddr = "10.0.0.1:4433".parse().unwrap();
let a2: SocketAddr = "10.0.0.2:4433".parse().unwrap();
map.upsert("fast", a1, None);
map.upsert("slow", a2, None);
map.update_rtt(a1, 10);
map.update_rtt(a2, 200);
assert_eq!(map.preferred().unwrap().addr, a1);
map.mark_unreachable(a1);
assert_eq!(map.preferred().unwrap().addr, a2);
}
#[test]
fn relay_entry_serializes() {
let entry = RelayEntry {
name: "test".into(),
addr: "10.0.0.1:4433".parse().unwrap(),
region: Some("us-east".into()),
rtt_ms: Some(42),
last_probed: Some(Instant::now()),
reachable: true,
};
let json = serde_json::to_string(&entry).unwrap();
assert!(json.contains("test"));
assert!(json.contains("us-east"));
assert!(json.contains("42"));
// last_probed is #[serde(skip)]
assert!(!json.contains("last_probed"));
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -113,11 +113,14 @@ async fn dual_path_direct_wins_on_loopback() {
PeerCandidates {
reflexive: Some(acceptor_listen_addr),
local: Vec::new(),
mapped: None,
},
relay_addr,
"test-room".into(),
"call-test".into(),
None, // own_reflexive: not needed in tests
None, // Phase 5: tests use fresh endpoints (no shared signal)
None, // Phase 7: no IPv6 endpoint in tests
)
.await
.expect("race must succeed");
@@ -155,11 +158,14 @@ async fn dual_path_relay_wins_when_direct_is_dead() {
PeerCandidates {
reflexive: Some(dead_peer),
local: Vec::new(),
mapped: None,
},
relay_addr,
"test-room".into(),
"call-test".into(),
None, // own_reflexive: not needed in tests
None, // Phase 5: tests use fresh endpoints (no shared signal)
None, // Phase 7: no IPv6 endpoint in tests
)
.await
.expect("race must succeed via relay fallback");
@@ -193,11 +199,14 @@ async fn dual_path_errors_cleanly_when_both_paths_dead() {
PeerCandidates {
reflexive: Some(dead_peer),
local: Vec::new(),
mapped: None,
},
dead_relay,
"test-room".into(),
"call-test".into(),
None, // own_reflexive: not needed in tests
None, // Phase 5: tests use fresh endpoints (no shared signal)
None, // Phase 7: no IPv6 endpoint in tests
)
.await;
let elapsed = start.elapsed();

View File

@@ -18,10 +18,14 @@ use crate::session::ChaChaSession;
pub struct WarzoneKeyExchange {
/// Ed25519 signing key (identity).
signing_key: SigningKey,
/// X25519 static secret (derived from seed, used for identity encryption).
/// X25519 static secret derived from identity seed. Reserved for future
/// use in static-key federation authentication (not used in current
/// ephemeral-only handshake protocol).
#[allow(dead_code)]
x25519_static_secret: StaticSecret,
/// X25519 static public key.
/// X25519 static public key derived from identity seed. Reserved for
/// future use in static-key federation authentication (not used in
/// current ephemeral-only handshake protocol).
#[allow(dead_code)]
x25519_static_public: X25519PublicKey,
/// Ephemeral X25519 secret for the current call (set by generate_ephemeral).

View File

@@ -26,6 +26,11 @@ pub extern "C" fn wzp_native_version() -> i32 {
/// Writes a NUL-terminated string into `out` (capped at `cap`) and
/// returns bytes written excluding the NUL.
///
/// # Safety
/// `out` must be a valid pointer to at least `cap` contiguous bytes of
/// writable memory. Passing a null pointer or zero capacity is safe
/// (returns 0), but a dangling non-null pointer is undefined behaviour.
#[unsafe(no_mangle)]
pub unsafe extern "C" fn wzp_native_hello(out: *mut u8, cap: usize) -> usize {
const MSG: &[u8] = b"hello from wzp-native\0";
@@ -264,9 +269,20 @@ pub extern "C" fn wzp_native_audio_stop() {
}
}
/// Number of capture samples available to read without blocking.
#[unsafe(no_mangle)]
pub extern "C" fn wzp_native_audio_capture_available() -> usize {
backend().capture.available_read()
}
/// Read captured PCM samples from the capture ring. Returns the number
/// of `i16` samples actually copied into `out` (may be less than
/// `out_len` if the ring is empty).
///
/// # Safety
/// `out` must be a valid pointer to `out_len` contiguous `i16` values.
/// The caller must ensure no other thread writes to the same buffer
/// concurrently. Passing a null pointer or zero length is safe (returns 0).
#[unsafe(no_mangle)]
pub unsafe extern "C" fn wzp_native_audio_read_capture(out: *mut i16, out_len: usize) -> usize {
if out.is_null() || out_len == 0 {
@@ -280,6 +296,12 @@ pub unsafe extern "C" fn wzp_native_audio_read_capture(out: *mut i16, out_len: u
/// samples actually enqueued (may be less than `in_len` if the ring
/// is nearly full — in practice the caller should pace to 20 ms
/// frames and spin briefly if the ring is full).
///
/// # Safety
/// `input` must be a valid pointer to `in_len` contiguous `i16` values
/// that remain valid for the duration of the call. Passing a null pointer
/// or zero length is safe (returns 0). The caller must not free or mutate
/// the buffer while this function is executing.
#[unsafe(no_mangle)]
pub unsafe extern "C" fn wzp_native_audio_write_playout(input: *const i16, in_len: usize) -> usize {
if input.is_null() || in_len == 0 {

View File

@@ -10,6 +10,10 @@
//! prediction): when jitter variance spikes >30% over a 200 ms window — typical
//! of Starlink satellite handovers — it temporarily boosts DRED to the maximum
//! allowed for the current codec before packets actually start dropping.
//!
//! See also: [`crate::quality`] for discrete tier classification that drives
//! codec switching. DredTuner operates within a tier, adjusting DRED
//! parameters continuously based on live network metrics.
use crate::CodecId;

View File

@@ -27,7 +27,7 @@ pub use codec_id::{CodecId, QualityProfile};
pub use error::*;
pub use packet::{
CallAcceptMode, HangupReason, MediaHeader, MediaPacket, MiniFrameContext, MiniHeader,
QualityReport, RoomParticipant, SignalMessage, TrunkEntry, TrunkFrame, FRAME_TYPE_FULL,
PresenceUser, QualityReport, RoomParticipant, SignalMessage, TrunkEntry, TrunkFrame, FRAME_TYPE_FULL,
FRAME_TYPE_MINI,
};
pub use bandwidth::{BandwidthEstimator, CongestionState};

View File

@@ -156,6 +156,14 @@ impl MediaHeader {
}
}
/// A user visible in the signal presence list.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct PresenceUser {
pub fingerprint: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub alias: Option<String>,
}
/// Quality report appended to a media packet when Q flag is set (4 bytes).
#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub struct QualityReport {
@@ -180,6 +188,19 @@ impl QualityReport {
self.rtt_4ms as u16 * 4
}
/// Construct a QualityReport from locally-observed path statistics.
///
/// Used by the send task to embed quality data in outgoing packets so
/// the peer's recv task (or relay) can drive adaptive quality switching.
pub fn from_path_stats(loss_pct: f32, rtt_ms: u32, jitter_ms: u32) -> Self {
Self {
loss_pct: (loss_pct / 100.0 * 255.0).clamp(0.0, 255.0) as u8,
rtt_4ms: (rtt_ms / 4).min(255) as u8,
jitter_ms: jitter_ms.min(255) as u8,
bitrate_cap_kbps: 200,
}
}
pub fn write_to(&self, buf: &mut impl BufMut) {
buf.put_u8(self.loss_pct);
buf.put_u8(self.rtt_4ms);
@@ -725,6 +746,13 @@ pub enum SignalMessage {
/// Relay's build version (git short hash).
#[serde(default, skip_serializing_if = "Option::is_none")]
relay_build: Option<String>,
/// Phase 8: relay's geographic region (e.g., "us-east", "eu-west").
#[serde(default, skip_serializing_if = "Option::is_none")]
relay_region: Option<String>,
/// Phase 8: other relays the client can use, sorted by relay
/// mesh proximity. Each entry is "name|addr" (e.g., "eu-west|203.0.113.5:4433").
#[serde(default, skip_serializing_if = "Vec::is_empty")]
available_relays: Vec<String>,
},
/// Direct call offer routed through the relay to a specific peer.
@@ -764,6 +792,12 @@ pub enum SignalMessage {
/// the same LAN.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
caller_local_addrs: Vec<String>,
/// Phase 8 (Tailscale-inspired): caller's port-mapped external
/// address from NAT-PMP/PCP/UPnP. When the router supports
/// port mapping, this gives a stable external address even
/// behind symmetric NATs.
#[serde(default, skip_serializing_if = "Option::is_none")]
caller_mapped_addr: Option<String>,
/// Build version (git short hash) for debugging.
#[serde(default, skip_serializing_if = "Option::is_none")]
caller_build_version: Option<String>,
@@ -800,6 +834,10 @@ pub enum SignalMessage {
/// `callee_reflexive_addr`.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
callee_local_addrs: Vec<String>,
/// Phase 8 (Tailscale-inspired): callee's port-mapped external
/// address from NAT-PMP/PCP/UPnP.
#[serde(default, skip_serializing_if = "Option::is_none")]
callee_mapped_addr: Option<String>,
/// Build version (git short hash) for debugging.
#[serde(default, skip_serializing_if = "Option::is_none")]
callee_build_version: Option<String>,
@@ -831,6 +869,11 @@ pub enum SignalMessage {
/// Client-side race tries all of these in parallel.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
peer_local_addrs: Vec<String>,
/// Phase 8 (Tailscale-inspired): the OTHER party's port-mapped
/// external address from NAT-PMP/PCP/UPnP. Added to the
/// candidate dial order between host and reflexive addrs.
#[serde(default, skip_serializing_if = "Option::is_none")]
peer_mapped_addr: Option<String>,
},
/// Ringing notification (relay → caller, callee received the offer).
@@ -886,6 +929,65 @@ pub enum SignalMessage {
race_winner: String,
},
// ── Phase 8: mid-call ICE re-gathering ────────────────────────
/// Phase 8 (Tailscale-inspired): mid-call candidate update sent
/// when a client's network changes (WiFi → cellular, IP change,
/// etc.). The relay forwards this to the call peer, who can
/// re-race with the new candidates to upgrade or maintain the
/// direct path.
///
/// The `generation` counter is monotonically increasing per call
/// — peers ignore updates with a generation <= their last-seen
/// generation to handle reordering.
CandidateUpdate {
call_id: String,
/// New server-reflexive address (STUN-discovered or relay-reflected).
#[serde(default, skip_serializing_if = "Option::is_none")]
reflexive_addr: Option<String>,
/// New LAN host addresses.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
local_addrs: Vec<String>,
/// New port-mapped address (NAT-PMP/PCP/UPnP).
#[serde(default, skip_serializing_if = "Option::is_none")]
mapped_addr: Option<String>,
/// Monotonic generation counter.
generation: u32,
},
// ── Hard NAT traversal (port prediction) ──────────────────────
/// Hard NAT probe coordination — exchanged when both peers
/// detect symmetric NAT. Carries the port allocation pattern
/// and recent port sequence so the peer can predict which port
/// to dial.
HardNatProbe {
call_id: String,
/// Last observed external ports (most recent first).
/// Typically 3-5 entries from sequential STUN probes.
port_sequence: Vec<u16>,
/// Detected allocation pattern as string:
/// "sequential:N" (N=delta), "random", "preserving"
allocation: String,
/// Probe timestamp (ms since epoch) for synchronization.
probe_time_ms: u64,
/// External IP from STUN.
external_ip: String,
},
/// Birthday attack coordination — Acceptor tells Dialer which
/// ports it has open. The Dialer then sprays QUIC connects to
/// these ports (and optionally random ports) on the Acceptor's IP.
HardNatBirthdayStart {
call_id: String,
/// Number of sockets the Acceptor opened.
acceptor_port_count: u16,
/// External ports discovered via STUN (the "hit list").
acceptor_ports: Vec<u16>,
/// Acceptor's external IP.
external_ip: String,
},
// ── Phase 4: cross-relay direct-call signaling ────────────────────
/// Phase 4: relay-to-relay envelope for forwarding direct-call
@@ -925,6 +1027,71 @@ pub enum SignalMessage {
#[serde(default, skip_serializing_if = "Option::is_none")]
reason: Option<String>,
},
// ── Signal presence ───────────────────────────────────────────
/// Relay broadcasts the list of currently registered signal
/// users to all connected clients. Sent on every register/
/// deregister so clients can maintain a live lobby user list.
PresenceList {
/// List of online users. Each entry is { fingerprint, alias }.
users: Vec<PresenceUser>,
},
// ── Quality upgrade negotiation (#28, #29) ──────────────────
/// Peer proposes upgrading to a higher quality profile.
/// The other side can accept or reject based on its own network
/// conditions. Used for consensual upgrades that require both
/// sides to agree (e.g., switching from Opus24k to Studio48k).
UpgradeProposal {
call_id: String,
/// Unique ID for this proposal (to match response).
proposal_id: String,
/// The profile being proposed.
proposed_profile: crate::QualityProfile,
/// Current local network quality to justify the upgrade.
#[serde(default, skip_serializing_if = "Option::is_none")]
local_loss_pct: Option<f32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
local_rtt_ms: Option<u32>,
},
/// Response to an UpgradeProposal.
UpgradeResponse {
call_id: String,
proposal_id: String,
/// true = accepted, both sides switch. false = rejected.
accepted: bool,
/// Reason for rejection (if any).
#[serde(default, skip_serializing_if = "Option::is_none")]
reason: Option<String>,
},
/// Confirmation that the upgrade is committed — both sides
/// should switch encoder at the next frame boundary.
UpgradeConfirm {
call_id: String,
proposal_id: String,
confirmed_profile: crate::QualityProfile,
},
// ── Per-participant quality (#30) ───────────────────────────
/// Peer reports its own quality capability — allows asymmetric
/// encoding where each side uses the best quality its connection
/// supports, rather than forcing all to the weakest link.
QualityCapability {
call_id: String,
/// The best profile this peer can sustain based on its
/// current network conditions.
max_profile: crate::QualityProfile,
/// Current loss/RTT for context.
#[serde(default, skip_serializing_if = "Option::is_none")]
loss_pct: Option<f32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
rtt_ms: Option<u32>,
},
}
/// How the callee responds to a direct call.
@@ -966,6 +1133,32 @@ pub enum HangupReason {
mod tests {
use super::*;
#[test]
fn quality_report_from_path_stats_basic() {
let qr = QualityReport::from_path_stats(10.0, 100, 20);
// 10.0 / 100.0 * 255.0 = 25.5 → truncated to 25
assert_eq!(qr.loss_pct, 25);
assert_eq!(qr.rtt_4ms, 25); // 100 / 4 = 25
assert_eq!(qr.jitter_ms, 20);
assert_eq!(qr.bitrate_cap_kbps, 200);
}
#[test]
fn quality_report_from_path_stats_zero() {
let qr = QualityReport::from_path_stats(0.0, 0, 0);
assert_eq!(qr.loss_pct, 0);
assert_eq!(qr.rtt_4ms, 0);
assert_eq!(qr.jitter_ms, 0);
}
#[test]
fn quality_report_from_path_stats_clamps_high() {
let qr = QualityReport::from_path_stats(100.0, 2000, 300);
assert_eq!(qr.loss_pct, 255);
assert_eq!(qr.rtt_4ms, 255); // 2000/4=500, clamped to 255
assert_eq!(qr.jitter_ms, 255);
}
#[test]
fn header_roundtrip() {
let header = MediaHeader {
@@ -1108,6 +1301,7 @@ mod tests {
supported_profiles: vec![],
caller_reflexive_addr: Some("192.0.2.1:4433".into()),
caller_local_addrs: Vec::new(),
caller_mapped_addr: None,
caller_build_version: None,
};
let forward = SignalMessage::FederatedSignalForward {
@@ -1151,6 +1345,7 @@ mod tests {
chosen_profile: None,
callee_reflexive_addr: Some("198.51.100.9:4433".into()),
callee_local_addrs: Vec::new(),
callee_mapped_addr: None,
callee_build_version: None,
},
SignalMessage::CallRinging { call_id: "c1".into() },
@@ -1187,6 +1382,7 @@ mod tests {
supported_profiles: vec![],
caller_reflexive_addr: Some("192.0.2.1:4433".into()),
caller_local_addrs: Vec::new(),
caller_mapped_addr: None,
caller_build_version: None,
};
let json = serde_json::to_string(&offer).unwrap();
@@ -1216,6 +1412,7 @@ mod tests {
supported_profiles: vec![],
caller_reflexive_addr: None,
caller_local_addrs: Vec::new(),
caller_mapped_addr: None,
caller_build_version: None,
};
let json_none = serde_json::to_string(&offer_none).unwrap();
@@ -1234,6 +1431,7 @@ mod tests {
chosen_profile: None,
callee_reflexive_addr: Some("198.51.100.9:4433".into()),
callee_local_addrs: Vec::new(),
callee_mapped_addr: None,
callee_build_version: None,
};
let decoded: SignalMessage =
@@ -1255,6 +1453,7 @@ mod tests {
relay_addr: "203.0.113.5:4433".into(),
peer_direct_addr: Some("192.0.2.1:4433".into()),
peer_local_addrs: Vec::new(),
peer_mapped_addr: None,
};
let decoded: SignalMessage =
serde_json::from_str(&serde_json::to_string(&setup).unwrap()).unwrap();
@@ -1724,4 +1923,345 @@ mod tests {
assert_eq!(wire[0], FRAME_TYPE_FULL, "frame {i} should be FULL when disabled");
}
}
// ── Quality negotiation roundtrip tests (#28, #29, #30) ─────
#[test]
fn upgrade_proposal_roundtrip() {
let msg = SignalMessage::UpgradeProposal {
call_id: "c1".into(),
proposal_id: "p1".into(),
proposed_profile: crate::QualityProfile::STUDIO_48K,
local_loss_pct: Some(0.5),
local_rtt_ms: Some(25),
};
let json = serde_json::to_string(&msg).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::UpgradeProposal { proposal_id, proposed_profile, .. } => {
assert_eq!(proposal_id, "p1");
assert_eq!(proposed_profile, crate::QualityProfile::STUDIO_48K);
}
_ => panic!("wrong variant"),
}
}
#[test]
fn upgrade_response_roundtrip() {
let msg = SignalMessage::UpgradeResponse {
call_id: "c1".into(),
proposal_id: "p1".into(),
accepted: true,
reason: None,
};
let json = serde_json::to_string(&msg).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::UpgradeResponse { accepted, .. } => assert!(accepted),
_ => panic!("wrong variant"),
}
}
#[test]
fn upgrade_confirm_roundtrip() {
let msg = SignalMessage::UpgradeConfirm {
call_id: "c1".into(),
proposal_id: "p1".into(),
confirmed_profile: crate::QualityProfile::STUDIO_64K,
};
let json = serde_json::to_string(&msg).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::UpgradeConfirm { confirmed_profile, .. } => {
assert_eq!(confirmed_profile, crate::QualityProfile::STUDIO_64K);
}
_ => panic!("wrong variant"),
}
}
#[test]
fn quality_capability_roundtrip() {
let msg = SignalMessage::QualityCapability {
call_id: "c1".into(),
max_profile: crate::QualityProfile::GOOD,
loss_pct: Some(2.5),
rtt_ms: Some(80),
};
let json = serde_json::to_string(&msg).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::QualityCapability { max_profile, loss_pct, .. } => {
assert_eq!(max_profile, crate::QualityProfile::GOOD);
assert!((loss_pct.unwrap() - 2.5).abs() < 0.01);
}
_ => panic!("wrong variant"),
}
}
// ── Phase 8: Tailscale-inspired signal roundtrip tests ──────
#[test]
fn candidate_update_roundtrip() {
let msg = SignalMessage::CandidateUpdate {
call_id: "test-123".into(),
reflexive_addr: Some("203.0.113.5:4433".into()),
local_addrs: vec![
"192.168.1.10:4433".into(),
"10.0.0.5:4433".into(),
],
mapped_addr: Some("198.51.100.42:12345".into()),
generation: 7,
};
let json = serde_json::to_string(&msg).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::CandidateUpdate {
call_id,
reflexive_addr,
local_addrs,
mapped_addr,
generation,
} => {
assert_eq!(call_id, "test-123");
assert_eq!(reflexive_addr.as_deref(), Some("203.0.113.5:4433"));
assert_eq!(local_addrs.len(), 2);
assert_eq!(mapped_addr.as_deref(), Some("198.51.100.42:12345"));
assert_eq!(generation, 7);
}
_ => panic!("wrong variant"),
}
}
#[test]
fn candidate_update_minimal_roundtrip() {
let msg = SignalMessage::CandidateUpdate {
call_id: "c".into(),
reflexive_addr: None,
local_addrs: vec![],
mapped_addr: None,
generation: 0,
};
let json = serde_json::to_string(&msg).unwrap();
// skip_serializing_if should omit None/empty fields
assert!(!json.contains("reflexive_addr"));
assert!(!json.contains("local_addrs"));
assert!(!json.contains("mapped_addr"));
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::CandidateUpdate { generation, .. } => {
assert_eq!(generation, 0);
}
_ => panic!("wrong variant"),
}
}
#[test]
fn offer_with_mapped_addr_roundtrip() {
let msg = SignalMessage::DirectCallOffer {
caller_fingerprint: "alice".into(),
caller_alias: None,
target_fingerprint: "bob".into(),
call_id: "c1".into(),
identity_pub: [0; 32],
ephemeral_pub: [0; 32],
signature: vec![],
supported_profiles: vec![],
caller_reflexive_addr: Some("1.2.3.4:5".into()),
caller_local_addrs: vec!["10.0.0.1:5".into()],
caller_mapped_addr: Some("5.6.7.8:9999".into()),
caller_build_version: None,
};
let json = serde_json::to_string(&msg).unwrap();
assert!(json.contains("caller_mapped_addr"));
assert!(json.contains("5.6.7.8:9999"));
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::DirectCallOffer {
caller_mapped_addr, ..
} => {
assert_eq!(caller_mapped_addr.as_deref(), Some("5.6.7.8:9999"));
}
_ => panic!("wrong variant"),
}
}
#[test]
fn offer_without_mapped_addr_omits_field() {
let msg = SignalMessage::DirectCallOffer {
caller_fingerprint: "alice".into(),
caller_alias: None,
target_fingerprint: "bob".into(),
call_id: "c1".into(),
identity_pub: [0; 32],
ephemeral_pub: [0; 32],
signature: vec![],
supported_profiles: vec![],
caller_reflexive_addr: None,
caller_local_addrs: vec![],
caller_mapped_addr: None,
caller_build_version: None,
};
let json = serde_json::to_string(&msg).unwrap();
assert!(!json.contains("caller_mapped_addr"));
}
#[test]
fn answer_with_mapped_addr_roundtrip() {
let msg = SignalMessage::DirectCallAnswer {
call_id: "c1".into(),
accept_mode: CallAcceptMode::AcceptTrusted,
identity_pub: None,
ephemeral_pub: None,
signature: None,
chosen_profile: None,
callee_reflexive_addr: Some("1.2.3.4:5".into()),
callee_local_addrs: vec![],
callee_mapped_addr: Some("9.8.7.6:1111".into()),
callee_build_version: None,
};
let json = serde_json::to_string(&msg).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::DirectCallAnswer {
callee_mapped_addr, ..
} => {
assert_eq!(callee_mapped_addr.as_deref(), Some("9.8.7.6:1111"));
}
_ => panic!("wrong variant"),
}
}
#[test]
fn setup_with_mapped_addr_roundtrip() {
let msg = SignalMessage::CallSetup {
call_id: "c1".into(),
room: "room".into(),
relay_addr: "1.2.3.4:5".into(),
peer_direct_addr: Some("5.6.7.8:9".into()),
peer_local_addrs: vec!["10.0.0.1:9".into()],
peer_mapped_addr: Some("11.12.13.14:15".into()),
};
let json = serde_json::to_string(&msg).unwrap();
assert!(json.contains("peer_mapped_addr"));
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::CallSetup {
peer_mapped_addr, ..
} => {
assert_eq!(peer_mapped_addr.as_deref(), Some("11.12.13.14:15"));
}
_ => panic!("wrong variant"),
}
}
#[test]
fn backward_compat_offer_without_mapped_addr_parses() {
// Old client JSON that doesn't have caller_mapped_addr at all
let json = r#"{
"DirectCallOffer": {
"caller_fingerprint": "alice",
"target_fingerprint": "bob",
"call_id": "c1",
"identity_pub": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
"ephemeral_pub": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
"signature": [],
"supported_profiles": [],
"caller_reflexive_addr": "1.2.3.4:5"
}
}"#;
let decoded: SignalMessage = serde_json::from_str(json).unwrap();
match decoded {
SignalMessage::DirectCallOffer {
caller_mapped_addr,
caller_reflexive_addr,
..
} => {
assert!(caller_mapped_addr.is_none());
assert_eq!(caller_reflexive_addr.as_deref(), Some("1.2.3.4:5"));
}
_ => panic!("wrong variant"),
}
}
#[test]
fn backward_compat_setup_without_mapped_addr_parses() {
let json = r#"{
"CallSetup": {
"call_id": "c1",
"room": "room",
"relay_addr": "1.2.3.4:5",
"peer_direct_addr": "5.6.7.8:9"
}
}"#;
let decoded: SignalMessage = serde_json::from_str(json).unwrap();
match decoded {
SignalMessage::CallSetup {
peer_mapped_addr,
peer_direct_addr,
..
} => {
assert!(peer_mapped_addr.is_none());
assert_eq!(peer_direct_addr.as_deref(), Some("5.6.7.8:9"));
}
_ => panic!("wrong variant"),
}
}
#[test]
fn register_presence_ack_with_new_fields_roundtrip() {
let msg = SignalMessage::RegisterPresenceAck {
success: true,
error: None,
relay_build: Some("abc123".into()),
relay_region: Some("us-east".into()),
available_relays: vec![
"eu-west|10.0.0.1:4433".into(),
"ap-south|10.0.0.2:4433".into(),
],
};
let json = serde_json::to_string(&msg).unwrap();
assert!(json.contains("relay_region"));
assert!(json.contains("us-east"));
assert!(json.contains("available_relays"));
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::RegisterPresenceAck {
relay_region,
available_relays,
..
} => {
assert_eq!(relay_region.as_deref(), Some("us-east"));
assert_eq!(available_relays.len(), 2);
}
_ => panic!("wrong variant"),
}
}
#[test]
fn register_presence_ack_backward_compat() {
// Old relay JSON without relay_region or available_relays
let json = r#"{
"RegisterPresenceAck": {
"success": true,
"relay_build": "old-build"
}
}"#;
let decoded: SignalMessage = serde_json::from_str(json).unwrap();
match decoded {
SignalMessage::RegisterPresenceAck {
relay_region,
available_relays,
relay_build,
..
} => {
assert!(relay_region.is_none());
assert!(available_relays.is_empty());
assert_eq!(relay_build.as_deref(), Some("old-build"));
}
_ => panic!("wrong variant"),
}
}
}

View File

@@ -1,3 +1,5 @@
//! See also: [`crate::dred_tuner`] for continuous DRED tuning within a tier.
use std::collections::VecDeque;
use std::time::{Duration, Instant};
@@ -6,19 +8,31 @@ use crate::traits::QualityController;
use crate::QualityProfile;
/// Network quality tier — drives codec and FEC selection.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
///
/// 5-tier range from studio quality down to catastrophic:
/// Studio64k > Studio48k > Studio32k > Good > Degraded > Catastrophic
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub enum Tier {
/// loss < 10%, RTT < 400ms
Good,
/// loss 10-40% OR RTT 400-600ms
Degraded,
/// loss > 40% OR RTT > 600ms
Catastrophic,
/// loss >= 15% OR RTT >= 200ms — Codec2 1.2k
Catastrophic = 0,
/// loss < 15% AND RTT < 200ms — Opus 6k
Degraded = 1,
/// loss < 5% AND RTT < 100ms — Opus 24k
Good = 2,
/// loss < 2% AND RTT < 80ms — Opus 32k
Studio32k = 3,
/// loss < 1% AND RTT < 50ms — Opus 48k
Studio48k = 4,
/// loss < 1% AND RTT < 30ms — Opus 64k
Studio64k = 5,
}
impl Tier {
pub fn profile(self) -> QualityProfile {
match self {
Self::Studio64k => QualityProfile::STUDIO_64K,
Self::Studio48k => QualityProfile::STUDIO_48K,
Self::Studio32k => QualityProfile::STUDIO_32K,
Self::Good => QualityProfile::GOOD,
Self::Degraded => QualityProfile::DEGRADED,
Self::Catastrophic => QualityProfile::CATASTROPHIC,
@@ -39,7 +53,7 @@ impl Tier {
NetworkContext::CellularLte
| NetworkContext::Cellular5g
| NetworkContext::Cellular3g => {
// Tighter thresholds for cellular networks
// Tighter thresholds for cellular — no studio tiers
if loss > 25.0 || rtt > 500 {
Self::Catastrophic
} else if loss > 8.0 || rtt > 300 {
@@ -49,13 +63,18 @@ impl Tier {
}
}
NetworkContext::WiFi | NetworkContext::Unknown => {
// Original thresholds
if loss > 40.0 || rtt > 600 {
if loss >= 15.0 || rtt >= 200 {
Self::Catastrophic
} else if loss > 10.0 || rtt > 400 {
} else if loss >= 5.0 || rtt >= 100 {
Self::Degraded
} else {
} else if loss >= 2.0 || rtt >= 80 {
Self::Good
} else if loss >= 1.0 || rtt >= 50 {
Self::Studio32k
} else if rtt >= 30 {
Self::Studio48k
} else {
Self::Studio64k
}
}
}
@@ -64,11 +83,19 @@ impl Tier {
/// Return the next lower (worse) tier, or None if already at the worst.
pub fn downgrade(self) -> Option<Tier> {
match self {
Self::Studio64k => Some(Self::Studio48k),
Self::Studio48k => Some(Self::Studio32k),
Self::Studio32k => Some(Self::Good),
Self::Good => Some(Self::Degraded),
Self::Degraded => Some(Self::Catastrophic),
Self::Catastrophic => None,
}
}
/// Whether this is a studio tier (above Good).
pub fn is_studio(self) -> bool {
matches!(self, Self::Studio64k | Self::Studio48k | Self::Studio32k)
}
}
/// Describes the network transport type for context-aware quality decisions.
@@ -108,20 +135,48 @@ pub struct AdaptiveQualityController {
fec_boost_until: Option<Instant>,
/// FEC boost amount to add during handoff recovery window.
fec_boost_amount: f32,
/// Probing state: when Some, we're actively testing a higher tier.
probe: Option<ProbeState>,
/// Time spent stable at the current tier (for probe trigger).
stable_since: Option<Instant>,
}
/// Threshold for downgrading (fast reaction to degradation).
const DOWNGRADE_THRESHOLD: u32 = 3;
/// Threshold for downgrading on cellular networks (even faster).
const CELLULAR_DOWNGRADE_THRESHOLD: u32 = 2;
/// Threshold for upgrading (slow, cautious improvement).
const UPGRADE_THRESHOLD: u32 = 10;
/// Threshold for upgrading from Catastrophic/Degraded to Good.
const UPGRADE_THRESHOLD: u32 = 5;
/// Threshold for upgrading into studio tiers (very conservative).
const STUDIO_UPGRADE_THRESHOLD: u32 = 10;
/// Maximum history window size.
const HISTORY_SIZE: usize = 20;
/// Default FEC boost amount during handoff recovery.
const DEFAULT_FEC_BOOST: f32 = 0.2;
/// Duration of FEC boost after a network handoff.
const FEC_BOOST_DURATION_SECS: u64 = 10;
/// Minimum time stable at current tier before probing upward (30 seconds).
const PROBE_STABLE_SECS: u64 = 30;
/// Duration of a probe window (5 seconds — ~25 quality reports at 1/s).
const PROBE_DURATION_SECS: u64 = 5;
/// Maximum bad reports during probe before aborting (1 out of ~5 = 20%).
const PROBE_MAX_BAD: u32 = 1;
/// Cooldown after a failed probe before trying again (60 seconds).
const PROBE_COOLDOWN_SECS: u64 = 60;
/// Active bandwidth probe state.
struct ProbeState {
/// The tier we're probing (one step above current).
target_tier: Tier,
/// Profile to apply during probe.
target_profile: QualityProfile,
/// When the probe started.
started: Instant,
/// Reports observed during probe.
probe_reports: u32,
/// Bad reports during probe (loss/RTT exceeded target tier thresholds).
bad_reports: u32,
}
impl AdaptiveQualityController {
pub fn new() -> Self {
@@ -135,6 +190,8 @@ impl AdaptiveQualityController {
network_context: NetworkContext::default(),
fec_boost_until: None,
fec_boost_amount: DEFAULT_FEC_BOOST,
probe: None,
stable_since: None,
}
}
@@ -174,6 +231,10 @@ impl AdaptiveQualityController {
self.forced = false;
}
// Cancel any active probe
self.probe = None;
self.stable_since = None;
// Activate FEC boost for any network change
self.fec_boost_until = Some(Instant::now() + Duration::from_secs(FEC_BOOST_DURATION_SECS));
}
@@ -194,6 +255,8 @@ impl AdaptiveQualityController {
pub fn reset_counters(&mut self) {
self.consecutive_up = 0;
self.consecutive_down = 0;
self.probe = None;
self.stable_since = None;
}
/// Get the effective downgrade threshold based on network context.
@@ -213,16 +276,13 @@ impl AdaptiveQualityController {
return None;
}
let is_worse = match (self.current_tier, observed_tier) {
(Tier::Good, Tier::Degraded | Tier::Catastrophic) => true,
(Tier::Degraded, Tier::Catastrophic) => true,
_ => false,
};
let is_worse = observed_tier < self.current_tier;
if is_worse {
self.consecutive_up = 0;
self.consecutive_down += 1;
if self.consecutive_down >= self.downgrade_threshold() {
// Jump directly to the observed tier (don't step one-at-a-time on downgrade)
self.current_tier = observed_tier;
self.current_profile = observed_tier.profile();
self.consecutive_down = 0;
@@ -232,22 +292,115 @@ impl AdaptiveQualityController {
// Better conditions
self.consecutive_down = 0;
self.consecutive_up += 1;
if self.consecutive_up >= UPGRADE_THRESHOLD {
// Studio tiers require more consecutive good reports
let threshold = if self.current_tier >= Tier::Good {
STUDIO_UPGRADE_THRESHOLD
} else {
UPGRADE_THRESHOLD
};
if self.consecutive_up >= threshold {
// Only upgrade one step at a time
let next_tier = match self.current_tier {
Tier::Catastrophic => Tier::Degraded,
Tier::Degraded => Tier::Good,
Tier::Good => return None,
};
self.current_tier = next_tier;
self.current_profile = next_tier.profile();
self.consecutive_up = 0;
return Some(self.current_profile);
if let Some(next_tier) = self.upgrade_one_step() {
self.current_tier = next_tier;
self.current_profile = next_tier.profile();
self.consecutive_up = 0;
return Some(self.current_profile);
}
}
}
None
}
/// Check whether to start, continue, or conclude a bandwidth probe.
///
/// Called from `observe()` when no hysteresis transition fired.
fn check_probe(&mut self, observed_tier: Tier) -> Option<QualityProfile> {
// Don't probe if forced, or if already at highest tier, or on cellular
if self.forced || self.current_tier == Tier::Studio64k {
return None;
}
if matches!(
self.network_context,
NetworkContext::CellularLte | NetworkContext::Cellular5g | NetworkContext::Cellular3g
) {
return None;
}
// If we have an active probe, evaluate it
if let Some(ref mut probe) = self.probe {
probe.probe_reports += 1;
// Check if the observed tier meets the probe target
if observed_tier < probe.target_tier {
probe.bad_reports += 1;
}
// Probe failed: too many bad reports
if probe.bad_reports > PROBE_MAX_BAD {
let _failed_probe = self.probe.take();
// Reset stable_since to trigger cooldown
self.stable_since =
Some(Instant::now() + Duration::from_secs(PROBE_COOLDOWN_SECS));
return None; // stay at current tier
}
// Probe succeeded: enough good reports within the window
if probe.started.elapsed() >= Duration::from_secs(PROBE_DURATION_SECS) {
let target = probe.target_tier;
let profile = probe.target_profile;
self.probe.take();
self.current_tier = target;
self.current_profile = profile;
self.consecutive_up = 0;
self.stable_since = Some(Instant::now());
return Some(profile);
}
return None; // probe still running
}
// No active probe — check if we should start one
if observed_tier >= self.current_tier {
// Track stability
if self.stable_since.is_none() {
self.stable_since = Some(Instant::now());
}
if let Some(stable_since) = self.stable_since {
if stable_since.elapsed() >= Duration::from_secs(PROBE_STABLE_SECS) {
// Stable long enough — start probing
if let Some(next) = self.upgrade_one_step() {
self.probe = Some(ProbeState {
target_tier: next,
target_profile: next.profile(),
started: Instant::now(),
probe_reports: 0,
bad_reports: 0,
});
// Return the probe profile so the encoder switches
return Some(next.profile());
}
}
}
} else {
// Conditions degraded — reset stability timer
self.stable_since = None;
}
None
}
fn upgrade_one_step(&self) -> Option<Tier> {
match self.current_tier {
Tier::Catastrophic => Some(Tier::Degraded),
Tier::Degraded => Some(Tier::Good),
Tier::Good => Some(Tier::Studio32k),
Tier::Studio32k => Some(Tier::Studio48k),
Tier::Studio48k => Some(Tier::Studio64k),
Tier::Studio64k => None,
}
}
}
impl Default for AdaptiveQualityController {
@@ -269,7 +422,17 @@ impl QualityController for AdaptiveQualityController {
}
let observed = Tier::classify_with_context(report, self.network_context);
self.try_transition(observed)
// First check for downgrades/upgrades via hysteresis
if let Some(profile) = self.try_transition(observed) {
// Cancel any active probe on tier change
self.probe.take();
self.stable_since = None;
return Some(profile);
}
// Then check probing
self.check_probe(observed)
}
fn force_profile(&mut self, profile: QualityProfile) {
@@ -331,25 +494,33 @@ mod tests {
}
assert_eq!(ctrl.tier(), Tier::Catastrophic);
// 9 good reports — not enough
let good = make_report(2.0, 100);
for _ in 0..9 {
// 4 good reports — not enough (threshold is 5)
let good = make_report(0.5, 20); // studio-quality report
for _ in 0..4 {
assert!(ctrl.observe(&good).is_none());
}
assert_eq!(ctrl.tier(), Tier::Catastrophic);
// 10th good report triggers upgrade (one step: Catastrophic → Degraded)
// 5th good report triggers upgrade (one step: Catastrophic → Degraded)
let result = ctrl.observe(&good);
assert!(result.is_some());
assert_eq!(ctrl.tier(), Tier::Degraded);
// Need another 10 to go from Degraded → Good
for _ in 0..9 {
// Another 5 to go from Degraded → Good
for _ in 0..4 {
assert!(ctrl.observe(&good).is_none());
}
let result = ctrl.observe(&good);
assert!(result.is_some());
assert_eq!(ctrl.tier(), Tier::Good);
// Studio upgrades need 10 consecutive — Good → Studio32k
for _ in 0..9 {
assert!(ctrl.observe(&good).is_none());
}
let result = ctrl.observe(&good);
assert!(result.is_some());
assert_eq!(ctrl.tier(), Tier::Studio32k);
}
#[test]
@@ -366,11 +537,29 @@ mod tests {
#[test]
fn tier_classification() {
assert_eq!(Tier::classify(&make_report(5.0, 200)), Tier::Good);
assert_eq!(Tier::classify(&make_report(15.0, 200)), Tier::Degraded);
assert_eq!(Tier::classify(&make_report(5.0, 500)), Tier::Degraded);
assert_eq!(Tier::classify(&make_report(50.0, 200)), Tier::Catastrophic);
assert_eq!(Tier::classify(&make_report(5.0, 700)), Tier::Catastrophic);
// Studio tiers
assert_eq!(Tier::classify(&make_report(0.5, 20)), Tier::Studio64k);
assert_eq!(Tier::classify(&make_report(0.5, 40)), Tier::Studio48k);
assert_eq!(Tier::classify(&make_report(1.5, 60)), Tier::Studio32k);
// Good/Degraded/Catastrophic
assert_eq!(Tier::classify(&make_report(3.0, 90)), Tier::Good);
assert_eq!(Tier::classify(&make_report(6.0, 120)), Tier::Degraded);
assert_eq!(Tier::classify(&make_report(16.0, 120)), Tier::Catastrophic);
assert_eq!(Tier::classify(&make_report(5.0, 200)), Tier::Catastrophic);
}
#[test]
fn studio_tier_boundaries() {
// loss < 1% AND RTT < 30ms → Studio64k
assert_eq!(Tier::classify(&make_report(0.9, 28)), Tier::Studio64k);
// loss < 1% AND RTT 30-49ms → Studio48k
assert_eq!(Tier::classify(&make_report(0.9, 32)), Tier::Studio48k);
// loss < 2% AND RTT < 80ms → Studio32k (but loss >= 1%)
assert_eq!(Tier::classify(&make_report(1.5, 40)), Tier::Studio32k);
// loss >= 2% → Good (use 2.5 to survive u8 quantization)
assert_eq!(Tier::classify(&make_report(2.5, 40)), Tier::Good);
// RTT 80ms → Good
assert_eq!(Tier::classify(&make_report(0.5, 80)), Tier::Good);
}
// ---------------------------------------------------------------
@@ -379,8 +568,8 @@ mod tests {
#[test]
fn cellular_tighter_thresholds() {
// 12% loss: Good on WiFi, Degraded on cellular
let report = make_report(12.0, 200);
// 9% loss: Degraded on both WiFi (>=5%) and cellular (>=8%)
let report = make_report(9.0, 80);
assert_eq!(
Tier::classify_with_context(&report, NetworkContext::WiFi),
Tier::Degraded
@@ -390,22 +579,22 @@ mod tests {
Tier::Degraded
);
// 9% loss: Good on WiFi, Degraded on cellular
let report = make_report(9.0, 200);
// 6% loss, low RTT: Degraded on WiFi (>=5%), Good on cellular (<8%)
let report = make_report(6.0, 80);
assert_eq!(
Tier::classify_with_context(&report, NetworkContext::WiFi),
Tier::Degraded
);
assert_eq!(
Tier::classify_with_context(&report, NetworkContext::CellularLte),
Tier::Good
);
assert_eq!(
Tier::classify_with_context(&report, NetworkContext::CellularLte),
Tier::Degraded
);
// 30% loss: Degraded on WiFi, Catastrophic on cellular
let report = make_report(30.0, 200);
// 30% loss: Catastrophic on WiFi (>=15%), Catastrophic on cellular (>=25%)
let report = make_report(30.0, 80);
assert_eq!(
Tier::classify_with_context(&report, NetworkContext::WiFi),
Tier::Degraded
Tier::Catastrophic
);
assert_eq!(
Tier::classify_with_context(&report, NetworkContext::Cellular3g),
@@ -415,15 +604,29 @@ mod tests {
#[test]
fn cellular_rtt_thresholds() {
// RTT 350ms: Good on WiFi, Degraded on cellular
let report = make_report(2.0, 348); // rtt_4ms rounds so use 348
// RTT 150ms: Degraded on WiFi (>=100ms), Good on cellular (<300ms and loss<8%)
let report = make_report(2.0, 148);
assert_eq!(
Tier::classify_with_context(&report, NetworkContext::WiFi),
Tier::Good
Tier::Degraded
);
assert_eq!(
Tier::classify_with_context(&report, NetworkContext::CellularLte),
Tier::Degraded
Tier::Good
);
}
#[test]
fn cellular_no_studio_tiers() {
// Even with perfect network, cellular stays at Good (no studio)
let report = make_report(0.0, 10);
assert_eq!(
Tier::classify_with_context(&report, NetworkContext::CellularLte),
Tier::Good
);
assert_eq!(
Tier::classify_with_context(&report, NetworkContext::WiFi),
Tier::Studio64k
);
}
@@ -469,6 +672,9 @@ mod tests {
#[test]
fn tier_downgrade() {
assert_eq!(Tier::Studio64k.downgrade(), Some(Tier::Studio48k));
assert_eq!(Tier::Studio48k.downgrade(), Some(Tier::Studio32k));
assert_eq!(Tier::Studio32k.downgrade(), Some(Tier::Good));
assert_eq!(Tier::Good.downgrade(), Some(Tier::Degraded));
assert_eq!(Tier::Degraded.downgrade(), Some(Tier::Catastrophic));
assert_eq!(Tier::Catastrophic.downgrade(), None);
@@ -478,4 +684,97 @@ mod tests {
fn network_context_default() {
assert_eq!(NetworkContext::default(), NetworkContext::Unknown);
}
// ---------------------------------------------------------------
// Bandwidth probing tests
// ---------------------------------------------------------------
#[test]
fn probe_triggers_after_stable_period() {
let mut ctrl = AdaptiveQualityController::new();
let excellent = make_report(0.3, 20); // would classify as Studio64k
// Starts at Good. Fast-forward stability by setting stable_since directly.
ctrl.stable_since = Some(Instant::now() - Duration::from_secs(31));
// One excellent report should trigger a probe (Good → Studio32k)
let result = ctrl.observe(&excellent);
assert!(result.is_some(), "should start probe after 30s stable");
assert!(ctrl.probe.is_some(), "probe should be active");
assert_eq!(ctrl.probe.as_ref().unwrap().target_tier, Tier::Studio32k);
}
#[test]
fn probe_succeeds_after_window() {
let mut ctrl = AdaptiveQualityController::new();
ctrl.stable_since = Some(Instant::now() - Duration::from_secs(31));
let excellent = make_report(0.3, 20);
// Trigger probe start
let result = ctrl.observe(&excellent);
assert!(result.is_some());
// Simulate probe window elapsed by backdating started
ctrl.probe.as_mut().unwrap().started =
Instant::now() - Duration::from_secs(PROBE_DURATION_SECS);
// Next good report should finalize the probe
let result = ctrl.observe(&excellent);
assert!(result.is_some(), "probe should succeed");
assert_eq!(ctrl.current_tier, Tier::Studio32k);
assert!(ctrl.probe.is_none(), "probe should be cleared");
}
#[test]
fn probe_fails_on_bad_reports() {
let mut ctrl = AdaptiveQualityController::new();
// Put controller at Studio32k, pretend we've been stable
ctrl.current_tier = Tier::Studio32k;
ctrl.current_profile = Tier::Studio32k.profile();
ctrl.stable_since = Some(Instant::now() - Duration::from_secs(31));
// Start a probe to Studio48k
let excellent = make_report(0.3, 20);
let result = ctrl.observe(&excellent);
assert!(result.is_some()); // probe started
assert_eq!(ctrl.probe.as_ref().unwrap().target_tier, Tier::Studio48k);
// Feed bad reports (loss too high for Studio48k)
let degraded = make_report(3.0, 100);
ctrl.observe(&degraded); // first bad
ctrl.observe(&degraded); // second bad — exceeds PROBE_MAX_BAD (1)
// Probe should be cancelled
assert!(ctrl.probe.is_none(), "probe should be cancelled after bad reports");
// Should still be at Studio32k (not upgraded)
assert_eq!(ctrl.current_tier, Tier::Studio32k);
}
#[test]
fn no_probe_on_cellular() {
let mut ctrl = AdaptiveQualityController::new();
ctrl.signal_network_change(NetworkContext::CellularLte);
ctrl.current_tier = Tier::Good;
ctrl.current_profile = Tier::Good.profile();
ctrl.stable_since = Some(Instant::now() - Duration::from_secs(60));
let good = make_report(0.5, 40);
let result = ctrl.observe(&good);
// Should NOT probe on cellular
assert!(ctrl.probe.is_none(), "should not probe on cellular");
assert!(result.is_none() || ctrl.current_tier == Tier::Good);
}
#[test]
fn no_probe_at_highest_tier() {
let mut ctrl = AdaptiveQualityController::new();
ctrl.current_tier = Tier::Studio64k;
ctrl.current_profile = Tier::Studio64k.profile();
ctrl.stable_since = Some(Instant::now() - Duration::from_secs(60));
let excellent = make_report(0.1, 10);
let result = ctrl.observe(&excellent);
assert!(result.is_none(), "should not probe when already at Studio64k");
}
}

View File

@@ -20,6 +20,7 @@ bytes = { workspace = true }
serde = { workspace = true }
toml = "0.8"
anyhow = "1"
clap = { version = "4", features = ["derive"] }
reqwest = { version = "0.12", features = ["json"] }
serde_json = "1"
rustls = { version = "0.23", default-features = false, features = ["ring", "std"] }
@@ -28,6 +29,7 @@ prometheus = "0.13"
axum = { version = "0.7", default-features = false, features = ["tokio", "http1", "ws"] }
tower-http = { version = "0.6", features = ["fs"] }
futures-util = "0.3"
dashmap = "6"
dirs = "6"
sha2 = { workspace = true }
chrono = "0.4"

View File

@@ -61,6 +61,13 @@ pub struct DirectCall {
/// interface addresses from the `DirectCallAnswer`. Cross-
/// wired into the caller's `CallSetup.peer_local_addrs`.
pub callee_local_addrs: Vec<String>,
/// Phase 8 (Tailscale-inspired): caller's port-mapped
/// external address from NAT-PMP/PCP/UPnP. Cross-wired
/// into callee's `CallSetup.peer_mapped_addr`.
pub caller_mapped_addr: Option<String>,
/// Phase 8: callee's port-mapped external address.
/// Cross-wired into caller's `CallSetup.peer_mapped_addr`.
pub callee_mapped_addr: Option<String>,
}
/// Registry of active direct calls.
@@ -92,6 +99,8 @@ impl CallRegistry {
peer_relay_fp: None,
caller_local_addrs: Vec::new(),
callee_local_addrs: Vec::new(),
caller_mapped_addr: None,
callee_mapped_addr: None,
};
self.calls.insert(call_id.clone(), call);
self.calls.get(&call_id).unwrap()
@@ -142,6 +151,22 @@ impl CallRegistry {
}
}
/// Phase 8: stash the caller's port-mapped address from
/// the `DirectCallOffer`.
pub fn set_caller_mapped_addr(&mut self, call_id: &str, addr: Option<String>) {
if let Some(call) = self.calls.get_mut(call_id) {
call.caller_mapped_addr = addr;
}
}
/// Phase 8: stash the callee's port-mapped address from
/// the `DirectCallAnswer`.
pub fn set_callee_mapped_addr(&mut self, call_id: &str, addr: Option<String>) {
if let Some(call) = self.calls.get_mut(call_id) {
call.callee_mapped_addr = addr;
}
}
/// Get a call by ID.
pub fn get(&self, call_id: &str) -> Option<&DirectCall> {
self.calls.get(call_id)
@@ -340,6 +365,49 @@ mod tests {
reg.set_peer_relay_fp("does-not-exist", Some("x".into()));
}
#[test]
fn call_registry_stores_mapped_addrs() {
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
// Default: both mapped addrs are None.
let c = reg.get("c1").unwrap();
assert!(c.caller_mapped_addr.is_none());
assert!(c.callee_mapped_addr.is_none());
// Caller advertises its port-mapped addr via DirectCallOffer.
reg.set_caller_mapped_addr("c1", Some("203.0.113.5:12345".into()));
assert_eq!(
reg.get("c1").unwrap().caller_mapped_addr.as_deref(),
Some("203.0.113.5:12345")
);
// Callee responds with its mapped addr.
reg.set_callee_mapped_addr("c1", Some("198.51.100.9:54321".into()));
assert_eq!(
reg.get("c1").unwrap().callee_mapped_addr.as_deref(),
Some("198.51.100.9:54321")
);
// Both addrs readable — relay uses them to cross-wire
// peer_mapped_addr in CallSetup.
let c = reg.get("c1").unwrap();
assert_eq!(c.caller_mapped_addr.as_deref(), Some("203.0.113.5:12345"));
assert_eq!(c.callee_mapped_addr.as_deref(), Some("198.51.100.9:54321"));
// Setter on unknown call is a no-op.
reg.set_caller_mapped_addr("nope", Some("x".into()));
}
#[test]
fn call_registry_clearing_mapped_addr_works() {
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
reg.set_caller_mapped_addr("c1", Some("1.2.3.4:5".into()));
reg.set_caller_mapped_addr("c1", None);
assert!(reg.get("c1").unwrap().caller_mapped_addr.is_none());
}
#[test]
fn call_registry_clearing_reflex_addr_works() {
// Passing None to the setter must clear a previously-set value

View File

@@ -87,6 +87,14 @@ pub struct RelayConfig {
/// Unlike [[peers]], no url is needed — the peer connects to us.
#[serde(default)]
pub trusted: Vec<TrustedConfig>,
/// Phase 8: geographic region identifier (e.g., "us-east", "eu-west").
/// Sent to clients in `RegisterPresenceAck.relay_region` so they can
/// build a relay map for automatic selection.
pub region: Option<String>,
/// Phase 8: externally-advertised address for this relay. Used to
/// populate `available_relays` in `RegisterPresenceAck`. If not set,
/// `listen_addr` is used.
pub advertised_addr: Option<SocketAddr>,
/// Debug tap: log packet headers for matching rooms ("*" = all rooms).
/// Activated via --debug-tap <room> or debug_tap = "room" in TOML.
pub debug_tap: Option<String>,
@@ -114,6 +122,8 @@ impl Default for RelayConfig {
peers: Vec::new(),
global_rooms: Vec::new(),
trusted: Vec::new(),
region: None,
advertised_addr: None,
debug_tap: None,
event_log: None,
}

View File

@@ -134,7 +134,7 @@ pub struct FederationManager {
peers: Vec<PeerConfig>,
trusted: Vec<TrustedConfig>,
global_rooms: HashSet<String>,
room_mgr: Arc<Mutex<RoomManager>>,
room_mgr: Arc<RoomManager>,
endpoint: quinn::Endpoint,
local_tls_fp: String,
metrics: Arc<crate::metrics::RelayMetrics>,
@@ -161,7 +161,7 @@ impl FederationManager {
peers: Vec<PeerConfig>,
trusted: Vec<TrustedConfig>,
global_rooms: HashSet<String>,
room_mgr: Arc<Mutex<RoomManager>>,
room_mgr: Arc<RoomManager>,
endpoint: quinn::Endpoint,
local_tls_fp: String,
metrics: Arc<crate::metrics::RelayMetrics>,
@@ -213,16 +213,19 @@ impl FederationManager {
/// `origin_relay_fp` against its own fp and drops self-sourced
/// forwards.
pub async fn broadcast_signal(&self, msg: &wzp_proto::SignalMessage) -> usize {
let links = self.peer_links.lock().await;
let peers: Vec<(String, String, Arc<QuinnTransport>)> = {
let links = self.peer_links.lock().await;
links.iter().map(|(fp, l)| (fp.clone(), l.label.clone(), l.transport.clone())).collect()
}; // lock released
let mut count = 0;
for (fp, link) in links.iter() {
match link.transport.send_signal(msg).await {
for (fp, label, transport) in &peers {
match transport.send_signal(msg).await {
Ok(()) => {
count += 1;
tracing::debug!(peer = %link.label, %fp, "federation: broadcast signal ok");
tracing::debug!(peer = %label, %fp, "federation: broadcast signal ok");
}
Err(e) => {
tracing::warn!(peer = %link.label, %fp, error = %e, "federation: broadcast signal failed");
tracing::warn!(peer = %label, %fp, error = %e, "federation: broadcast signal failed");
}
}
}
@@ -243,10 +246,12 @@ impl FederationManager {
msg: &wzp_proto::SignalMessage,
) -> Result<(), String> {
let normalized = normalize_fp(peer_relay_fp);
let links = self.peer_links.lock().await;
match links.get(&normalized) {
Some(link) => link
.transport
let transport = {
let links = self.peer_links.lock().await;
links.get(&normalized).map(|l| l.transport.clone())
}; // lock released
match transport {
Some(t) => t
.send_signal(msg)
.await
.map_err(|e| format!("send to peer {normalized}: {e}")),
@@ -333,10 +338,7 @@ impl FederationManager {
}
// Room event dispatcher
let room_events = {
let mgr = self.room_mgr.lock().await;
mgr.subscribe_events()
};
let room_events = self.room_mgr.subscribe_events();
let this = self.clone();
handles.push(tokio::spawn(async move {
run_room_event_dispatcher(this, room_events).await;
@@ -406,20 +408,22 @@ impl FederationManager {
/// or rate limiting; the body currently forwards on `room_hash` alone
/// because that's what the wire format carries.
pub async fn forward_to_peers(&self, _room_name: &str, room_hash: &[u8; 8], media_data: &Bytes) {
let links = self.peer_links.lock().await;
if links.is_empty() {
return;
}
for (_fp, link) in links.iter() {
let peers: Vec<(String, Arc<QuinnTransport>)> = {
let links = self.peer_links.lock().await;
if links.is_empty() { return; }
links.values().map(|l| (l.label.clone(), l.transport.clone())).collect()
}; // lock released
for (label, transport) in &peers {
let mut tagged = Vec::with_capacity(8 + media_data.len());
tagged.extend_from_slice(room_hash);
tagged.extend_from_slice(media_data);
match link.transport.send_raw_datagram(&tagged) {
match transport.send_raw_datagram(&tagged) {
Ok(()) => {
self.metrics.federation_packets_forwarded
.with_label_values(&[&link.label, "out"]).inc();
.with_label_values(&[label, "out"]).inc();
}
Err(e) => warn!(peer = %link.label, "federation send error: {e}"),
Err(e) => warn!(peer = %label, "federation send error: {e}"),
}
}
}
@@ -483,15 +487,15 @@ async fn run_room_event_dispatcher(
match events.recv().await {
Ok(RoomEvent::LocalJoin { room }) => {
if fm.is_global_room(&room) {
let participants = {
let mgr = fm.room_mgr.lock().await;
mgr.local_participant_list(&room)
};
let participants = fm.room_mgr.local_participant_list(&room);
info!(room = %room, count = participants.len(), "global room now active, announcing to peers");
let msg = SignalMessage::GlobalRoomActive { room, participants };
let links = fm.peer_links.lock().await;
for link in links.values() {
let _ = link.transport.send_signal(&msg).await;
let transports: Vec<Arc<QuinnTransport>> = {
let links = fm.peer_links.lock().await;
links.values().map(|l| l.transport.clone()).collect()
};
for t in &transports {
let _ = t.send_signal(&msg).await;
}
}
}
@@ -499,9 +503,12 @@ async fn run_room_event_dispatcher(
if fm.is_global_room(&room) {
info!(room = %room, "global room now inactive, announcing to peers");
let msg = SignalMessage::GlobalRoomInactive { room };
let links = fm.peer_links.lock().await;
for link in links.values() {
let _ = link.transport.send_signal(&msg).await;
let transports: Vec<Arc<QuinnTransport>> = {
let links = fm.peer_links.lock().await;
links.values().map(|l| l.transport.clone()).collect()
};
for t in &transports {
let _ = t.send_signal(&msg).await;
}
}
}
@@ -560,11 +567,11 @@ async fn run_stale_presence_sweeper(fm: Arc<FederationManager>) {
// Broadcast updated RoomUpdate for affected rooms
for room in &affected_rooms {
let mgr = fm.room_mgr.lock().await;
for local_room in mgr.active_rooms() {
if fm.resolve_global_room(&local_room) == fm.resolve_global_room(room) {
let mut all_participants = mgr.local_participant_list(&local_room);
let remote = fm.get_remote_participants(&local_room).await;
let active = fm.room_mgr.active_rooms();
for local_room in &active {
if fm.resolve_global_room(local_room) == fm.resolve_global_room(room) {
let mut all_participants = fm.room_mgr.local_participant_list(local_room);
let remote = fm.get_remote_participants(local_room).await;
all_participants.extend(remote);
let mut seen = HashSet::new();
all_participants.retain(|p| seen.insert(p.fingerprint.clone()));
@@ -572,8 +579,7 @@ async fn run_stale_presence_sweeper(fm: Arc<FederationManager>) {
count: all_participants.len() as u32,
participants: all_participants,
};
let senders = mgr.local_senders(&local_room);
drop(mgr);
let senders = fm.room_mgr.local_senders(local_room);
room::broadcast_signal(&senders, &update).await;
info!(room = %room, "swept stale presence — broadcast updated RoomUpdate");
break;
@@ -651,14 +657,13 @@ async fn run_federation_link(
// Announce our currently active global rooms to this new peer
// Collect all announcements first, then send (avoid holding locks across await)
let announcements = {
let mgr = fm.room_mgr.lock().await;
let active = mgr.active_rooms();
let active = fm.room_mgr.active_rooms();
let mut msgs = Vec::new();
// Local rooms
for room_name in &active {
if fm.is_global_room(room_name) {
let participants = mgr.local_participant_list(room_name);
let participants = fm.room_mgr.local_participant_list(room_name);
info!(peer = %peer_label, room = %room_name, participants = participants.len(), "announcing local global room to new peer");
msgs.push(SignalMessage::GlobalRoomActive { room: room_name.clone(), participants });
}
@@ -828,22 +833,24 @@ async fn handle_signal(
// Broadcast updated RoomUpdate to local clients in this room
// Find the local room name (may be hashed or raw)
let mgr = fm.room_mgr.lock().await;
for local_room in mgr.active_rooms() {
if fm.is_global_room(&local_room) && fm.resolve_global_room(&local_room) == fm.resolve_global_room(&room) {
let active = fm.room_mgr.active_rooms();
for local_room in &active {
if fm.is_global_room(local_room) && fm.resolve_global_room(local_room) == fm.resolve_global_room(&room) {
// Build merged participant list: local + all remote (deduped)
let mut all_participants = mgr.local_participant_list(&local_room);
let links = fm.peer_links.lock().await;
for link in links.values() {
if let Some(ref canonical) = fm.resolve_global_room(&local_room) {
if let Some(remote) = link.remote_participants.get(canonical.as_str()) {
all_participants.extend(remote.iter().cloned());
}
// Also check raw room name, but only if different from canonical
if canonical != &local_room {
if let Some(remote) = link.remote_participants.get(&local_room) {
let mut all_participants = fm.room_mgr.local_participant_list(local_room);
{
let links = fm.peer_links.lock().await;
for link in links.values() {
if let Some(ref canonical) = fm.resolve_global_room(local_room) {
if let Some(remote) = link.remote_participants.get(canonical.as_str()) {
all_participants.extend(remote.iter().cloned());
}
// Also check raw room name, but only if different from canonical
if canonical != local_room {
if let Some(remote) = link.remote_participants.get(local_room) {
all_participants.extend(remote.iter().cloned());
}
}
}
}
}
@@ -854,9 +861,7 @@ async fn handle_signal(
count: all_participants.len() as u32,
participants: all_participants,
};
let senders = mgr.local_senders(&local_room);
drop(links);
drop(mgr);
let senders = fm.room_mgr.local_senders(local_room);
room::broadcast_signal(&senders, &update).await;
break;
}
@@ -899,10 +904,7 @@ async fn handle_signal(
// Propagate to other peers: send updated GlobalRoomActive with revised list,
// or GlobalRoomInactive if no participants remain anywhere
let local_active = {
let mgr = fm.room_mgr.lock().await;
mgr.active_rooms().iter().any(|r| fm.resolve_global_room(r) == fm.resolve_global_room(&room))
};
let local_active = fm.room_mgr.active_rooms().iter().any(|r| fm.resolve_global_room(r) == fm.resolve_global_room(&room));
let has_remaining = !remaining_remote.is_empty() || local_active;
// Collect peer transports to send to (avoid holding lock across await)
@@ -916,10 +918,9 @@ async fn handle_signal(
// Send updated participant list to other peers
let mut updated_participants = remaining_remote.clone();
if local_active {
let mgr = fm.room_mgr.lock().await;
for local_room in mgr.active_rooms() {
for local_room in fm.room_mgr.active_rooms() {
if fm.resolve_global_room(&local_room) == fm.resolve_global_room(&room) {
updated_participants.extend(mgr.local_participant_list(&local_room));
updated_participants.extend(fm.room_mgr.local_participant_list(&local_room));
break;
}
}
@@ -940,10 +941,10 @@ async fn handle_signal(
}
// Broadcast updated RoomUpdate to local clients (remote participant removed)
let mgr = fm.room_mgr.lock().await;
for local_room in mgr.active_rooms() {
if fm.is_global_room(&local_room) && fm.resolve_global_room(&local_room) == fm.resolve_global_room(&room) {
let mut all_participants = mgr.local_participant_list(&local_room);
let active = fm.room_mgr.active_rooms();
for local_room in &active {
if fm.is_global_room(local_room) && fm.resolve_global_room(local_room) == fm.resolve_global_room(&room) {
let mut all_participants = fm.room_mgr.local_participant_list(local_room);
all_participants.extend(remaining_remote.iter().cloned());
// Deduplicate by fingerprint
let mut seen = HashSet::new();
@@ -952,8 +953,7 @@ async fn handle_signal(
count: all_participants.len() as u32,
participants: all_participants,
};
let senders = mgr.local_senders(&local_room);
drop(mgr);
let senders = fm.room_mgr.local_senders(local_room);
room::broadcast_signal(&senders, &update).await;
info!(room = %room, "broadcast updated presence (remote participant removed)");
break;
@@ -1070,10 +1070,9 @@ async fn handle_datagram(
}
}
// Find room by hash check local rooms AND global room config
// Find room by hash -- check local rooms AND global room config
let room_name = {
let mgr = fm.room_mgr.lock().await;
let active = mgr.active_rooms();
let active = fm.room_mgr.active_rooms();
// First: check local rooms (has participants)
active.iter().find(|r| room_hash(r) == rh).cloned()
.or_else(|| active.iter().find(|r| fm.global_room_hash(r) == rh).cloned())
@@ -1093,10 +1092,7 @@ async fn handle_datagram(
// for a room we don't have locally — could be a
// timing issue (peer joined before us) or a hash
// mismatch.
let active = {
let mgr = fm.room_mgr.lock().await;
mgr.active_rooms()
};
let active = fm.room_mgr.active_rooms();
warn!(
room_hash = ?rh,
active_rooms = ?active,
@@ -1121,10 +1117,7 @@ async fn handle_datagram(
// Deliver to all local participants — forward the raw bytes as-is.
// The original sender's MediaPacket is preserved exactly (no re-serialization).
let locals = {
let mgr = fm.room_mgr.lock().await;
mgr.local_senders(&room_name)
};
let locals = fm.room_mgr.local_senders(&room_name);
for sender in &locals {
match sender {
room::ParticipantSender::Quic(t) => {

View File

@@ -12,6 +12,7 @@ use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
use std::time::Duration;
use clap::Parser;
use tokio::sync::Mutex;
use tracing::{debug, error, info, warn};
@@ -23,6 +24,79 @@ use wzp_relay::presence::PresenceRegistry;
use wzp_relay::room::{self, RoomManager};
use wzp_relay::session_mgr::SessionManager;
/// Close a transport gracefully, logging any error at debug level.
async fn close_transport(t: &dyn wzp_proto::MediaTransport, context: &str) {
if let Err(e) = t.close().await {
tracing::debug!(context, error = %e, "transport close (non-fatal)");
}
}
/// WarzonePhone relay daemon — SFU, federation, direct-call signaling
#[derive(Parser, Debug)]
#[command(name = "wzp-relay", version = env!("WZP_BUILD_HASH"))]
struct Args {
/// Load config from TOML file (creates example if missing)
#[arg(short = 'c', long = "config")]
config_file: Option<String>,
/// Identity file path (creates if missing, uses OsRng)
#[arg(short = 'i', long)]
identity: Option<String>,
/// Listen address for QUIC connections
#[arg(long)]
listen: Option<SocketAddr>,
/// Remote relay address for forwarding (disables room mode)
#[arg(long)]
remote: Option<SocketAddr>,
/// featherChat auth endpoint (e.g., https://chat.example.com/v1/auth/validate).
/// When set, clients must send a bearer token as first signal message.
#[arg(long)]
auth_url: Option<String>,
/// Prometheus metrics HTTP port (e.g., 9090). Disabled if not set.
#[arg(long)]
metrics_port: Option<u16>,
/// Peer relay to probe for health monitoring (repeatable)
#[arg(long = "probe")]
probe: Vec<SocketAddr>,
/// Enable mesh mode (probes all --probe targets concurrently)
#[arg(long)]
probe_mesh: bool,
/// Enable trunk batching for outgoing media in room mode
#[arg(long)]
trunking: bool,
/// WebSocket listener port for browser clients (e.g., 8080)
#[arg(long)]
ws_port: Option<u16>,
/// Directory to serve static files from (HTML/JS/WASM)
#[arg(long)]
static_dir: Option<String>,
/// Declare a room as global (bridged across federation). Repeatable.
#[arg(long = "global-room")]
global_room: Vec<String>,
/// Log packet headers for a room ('*' for all rooms)
#[arg(long)]
debug_tap: Option<String>,
/// JSONL event log file path for protocol analysis
#[arg(long)]
event_log: Option<String>,
/// Print mesh health table and exit (diagnostic)
#[arg(long)]
mesh_status: bool,
}
/// Parsed CLI result — config + identity path.
struct CliResult {
config: RelayConfig,
@@ -32,25 +106,21 @@ struct CliResult {
}
fn parse_args() -> CliResult {
let args: Vec<String> = std::env::args().collect();
let args = Args::parse();
// First pass: extract --config and --identity
let mut config_file = None;
let mut identity_path = None;
let mut i = 1;
while i < args.len() {
match args[i].as_str() {
"--config" | "-c" => { i += 1; config_file = args.get(i).cloned(); }
"--identity" | "-i" => { i += 1; identity_path = args.get(i).cloned(); }
_ => {}
}
i += 1;
// Handle --mesh-status: print and exit
if args.mesh_status {
let m = RelayMetrics::new();
print!("{}", wzp_relay::probe::mesh_summary(m.registry()));
std::process::exit(0);
}
// Track if we need to create the config after identity is known
let config_needs_create = config_file.as_ref().map(|p| !std::path::Path::new(p).exists()).unwrap_or(false);
let config_needs_create = args.config_file.as_ref()
.map(|p| !std::path::Path::new(p).exists())
.unwrap_or(false);
let mut config = if let Some(ref path) = config_file {
let mut config = if let Some(ref path) = args.config_file {
if config_needs_create {
// Will be re-created with personalized info after identity is loaded
RelayConfig::default()
@@ -66,125 +136,49 @@ fn parse_args() -> CliResult {
};
// CLI flags override config file values
let mut i = 1;
while i < args.len() {
match args[i].as_str() {
"--config" | "-c" => { i += 1; } // already handled
"--identity" | "-i" => { i += 1; } // already handled
"--listen" => {
i += 1;
config.listen_addr = args.get(i).expect("--listen requires an address")
.parse().expect("invalid --listen address");
}
"--remote" => {
i += 1;
config.remote_relay = Some(
args.get(i).expect("--remote requires an address")
.parse().expect("invalid --remote address"),
);
}
"--auth-url" => {
i += 1;
config.auth_url = Some(
args.get(i).expect("--auth-url requires a URL").to_string(),
);
}
"--metrics-port" => {
i += 1;
config.metrics_port = Some(
args.get(i).expect("--metrics-port requires a port number")
.parse().expect("invalid --metrics-port number"),
);
}
"--probe" => {
i += 1;
let addr: SocketAddr = args.get(i)
.expect("--probe requires an address")
.parse()
.expect("invalid --probe address");
config.probe_targets.push(addr);
}
"--probe-mesh" => {
config.probe_mesh = true;
}
"--trunking" => {
config.trunking_enabled = true;
}
"--ws-port" => {
i += 1;
config.ws_port = Some(
args.get(i).expect("--ws-port requires a port number")
.parse().expect("invalid --ws-port number"),
);
}
"--static-dir" => {
i += 1;
config.static_dir = Some(
args.get(i).expect("--static-dir requires a directory path").to_string(),
);
}
"--global-room" => {
i += 1;
config.global_rooms.push(wzp_relay::config::GlobalRoomConfig {
name: args.get(i).expect("--global-room requires a room name").to_string(),
});
}
"--debug-tap" => {
i += 1;
config.debug_tap = Some(
args.get(i).expect("--debug-tap requires a room name (or '*' for all)").to_string(),
);
}
"--event-log" => {
i += 1;
config.event_log = Some(
args.get(i).expect("--event-log requires a file path").to_string(),
);
}
"--version" | "-V" => {
println!("wzp-relay {}", env!("WZP_BUILD_HASH"));
std::process::exit(0);
}
"--mesh-status" => {
// Print mesh table from a fresh registry and exit.
// In practice this is useful after the relay has been running;
// here we just demonstrate the formatter with an empty registry.
let m = RelayMetrics::new();
print!("{}", wzp_relay::probe::mesh_summary(m.registry()));
std::process::exit(0);
}
"--help" | "-h" => {
eprintln!("Usage: wzp-relay [--config <path>] [--listen <addr>] [--remote <addr>] [--auth-url <url>] [--metrics-port <port>] [--probe <addr>]... [--probe-mesh] [--mesh-status]");
eprintln!();
eprintln!("Options:");
eprintln!(" -c, --config <path> Load config from TOML file (creates example if missing)");
eprintln!(" -i, --identity <path> Identity file path (creates if missing, uses OsRng)");
eprintln!(" --listen <addr> Listen address (default: 0.0.0.0:4433)");
eprintln!(" --remote <addr> Remote relay for forwarding (disables room mode)");
eprintln!(" --auth-url <url> featherChat auth endpoint (e.g., https://chat.example.com/v1/auth/validate)");
eprintln!(" When set, clients must send a bearer token as first signal message.");
eprintln!(" --metrics-port <port> Prometheus metrics HTTP port (e.g., 9090). Disabled if not set.");
eprintln!(" --probe <addr> Peer relay to probe for health monitoring (repeatable).");
eprintln!(" --probe-mesh Enable mesh mode (mark config flag, probes all --probe targets).");
eprintln!(" --mesh-status Print mesh health table and exit (diagnostic).");
eprintln!(" --trunking Enable trunk batching for outgoing media in room mode.");
eprintln!(" --global-room <name> Declare a room as global (bridged across federation). Repeatable.");
eprintln!(" --debug-tap <room> Log packet headers for a room ('*' for all rooms).");
eprintln!(" --ws-port <port> WebSocket listener port for browser clients (e.g., 8080).");
eprintln!(" --static-dir <dir> Directory to serve static files from (HTML/JS/WASM).");
eprintln!();
eprintln!("Room mode (default):");
eprintln!(" Clients join rooms by name. Packets forwarded to all others (SFU).");
std::process::exit(0);
}
other => {
eprintln!("unknown argument: {other}");
std::process::exit(1);
}
}
i += 1;
if let Some(addr) = args.listen {
config.listen_addr = addr;
}
if let Some(addr) = args.remote {
config.remote_relay = Some(addr);
}
if let Some(url) = args.auth_url {
config.auth_url = Some(url);
}
if let Some(port) = args.metrics_port {
config.metrics_port = Some(port);
}
if !args.probe.is_empty() {
config.probe_targets.extend(args.probe);
}
if args.probe_mesh {
config.probe_mesh = true;
}
if args.trunking {
config.trunking_enabled = true;
}
if let Some(port) = args.ws_port {
config.ws_port = Some(port);
}
if let Some(dir) = args.static_dir {
config.static_dir = Some(dir);
}
for name in args.global_room {
config.global_rooms.push(wzp_relay::config::GlobalRoomConfig { name });
}
if let Some(tap) = args.debug_tap {
config.debug_tap = Some(tap);
}
if let Some(log) = args.event_log {
config.event_log = Some(log);
}
CliResult {
config,
identity_path: args.identity,
config_file: args.config_file,
config_needs_create,
}
CliResult { config, identity_path, config_file, config_needs_create }
}
struct RelayStats {
@@ -416,7 +410,7 @@ async fn main() -> anyhow::Result<()> {
};
// Room manager (room mode only)
let room_mgr = Arc::new(Mutex::new(RoomManager::new()));
let room_mgr = Arc::new(RoomManager::new());
// Event log for protocol analysis
let event_log = wzp_relay::event_log::start_event_log(
@@ -509,7 +503,7 @@ async fn main() -> anyhow::Result<()> {
}
}
if let Some(ref tap) = config.debug_tap {
info!(filter = %tap, "debug tap enabled — logging packet headers");
info!(filter = %tap, "debug tap enabled — logging packets, signals, join/leave events");
}
// Phase 4: cross-relay direct-call dispatcher task.
@@ -544,6 +538,7 @@ async fn main() -> anyhow::Result<()> {
ref call_id,
ref caller_reflexive_addr,
ref caller_local_addrs,
ref caller_mapped_addr,
..
} => {
// Is the target on THIS relay? If not, drop —
@@ -563,7 +558,8 @@ async fn main() -> anyhow::Result<()> {
// Stash in local registry so the answer path
// can find the call + route the reply back
// through the same federation link. Include
// Phase 5.5 LAN host candidates too.
// Phase 5.5 LAN host candidates + Phase 8
// port-mapped addr.
{
let mut reg = call_registry_d.lock().await;
reg.create_call(
@@ -573,6 +569,7 @@ async fn main() -> anyhow::Result<()> {
);
reg.set_caller_reflexive_addr(call_id, caller_reflexive_addr.clone());
reg.set_caller_local_addrs(call_id, caller_local_addrs.clone());
reg.set_caller_mapped_addr(call_id, caller_mapped_addr.clone());
reg.set_peer_relay_fp(call_id, Some(origin_relay_fp.clone()));
}
// Deliver the offer to the local target.
@@ -591,6 +588,7 @@ async fn main() -> anyhow::Result<()> {
accept_mode,
ref callee_reflexive_addr,
ref callee_local_addrs,
ref callee_mapped_addr,
..
} => {
// Look up the local caller fp from the registry.
@@ -622,14 +620,11 @@ async fn main() -> anyhow::Result<()> {
}
// Accept — stash the callee's reflex addr + LAN
// host candidates + mark the call active,
// then read back everything needed to cross-
// wire peer_direct_addr + peer_local_addrs in
// the local CallSetup.
// Also set peer_relay_fp so the originating
// relay knows where to forward MediaPathReport.
// host candidates + mapped addr + mark the call
// active, then read back everything needed to
// cross-wire into the local CallSetup.
let room_name = format!("call-{call_id}");
let (callee_addr_for_setup, callee_local_for_setup) = {
let (callee_addr_for_setup, callee_local_for_setup, callee_mapped_for_setup) = {
let mut reg = call_registry_d.lock().await;
reg.set_active(call_id, accept_mode, room_name.clone());
reg.set_peer_relay_fp(call_id, Some(origin_relay_fp.clone()));
@@ -638,10 +633,12 @@ async fn main() -> anyhow::Result<()> {
callee_reflexive_addr.clone(),
);
reg.set_callee_local_addrs(call_id, callee_local_addrs.clone());
reg.set_callee_mapped_addr(call_id, callee_mapped_addr.clone());
let c = reg.get(call_id);
(
c.and_then(|c| c.callee_reflexive_addr.clone()),
c.map(|c| c.callee_local_addrs.clone()).unwrap_or_default(),
c.and_then(|c| c.callee_mapped_addr.clone()),
)
};
@@ -654,19 +651,13 @@ async fn main() -> anyhow::Result<()> {
}
// Emit the LOCAL CallSetup to our local caller.
// relay_addr = our own advertised addr so if P2P
// fails the caller will at least dial OUR relay
// (single-relay fallback — Phase 4.1 will wire
// federated media so that actually reaches the
// peer). peer_direct_addr = the callee's reflex
// addr carried in the answer. peer_local_addrs
// = callee's LAN host candidates (Phase 5.5 ICE).
let setup = SignalMessage::CallSetup {
call_id: call_id.clone(),
room: room_name.clone(),
relay_addr: advertised_addr_d.clone(),
peer_direct_addr: callee_addr_for_setup,
peer_local_addrs: callee_local_for_setup,
peer_mapped_addr: callee_mapped_for_setup,
};
let hub = signal_hub_d.lock().await;
let _ = hub.send_to(&caller_fp, &setup).await;
@@ -778,6 +769,14 @@ async fn main() -> anyhow::Result<()> {
let signal_hub = signal_hub.clone();
let call_registry = call_registry.clone();
let advertised_addr_str = advertised_addr_str.clone();
// Phase 8: relay region + peer addresses for RegisterPresenceAck
let relay_region = config.region.clone();
let relay_peers_for_ack: Vec<String> = config.peers.iter()
.filter_map(|p| {
let label = p.label.as_deref().unwrap_or("peer");
Some(format!("{label}|{}", p.url))
})
.collect();
// Phase 4: per-task clone of this relay's federation TLS
// fingerprint so the FederatedSignalForward envelopes the
// spawned signal handler builds carry `origin_relay_fp`.
@@ -908,7 +907,7 @@ async fn main() -> anyhow::Result<()> {
}
}
}
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
return;
}
@@ -1011,10 +1010,25 @@ async fn main() -> anyhow::Result<()> {
success: true,
error: None,
relay_build: Some(BUILD_GIT_HASH.to_string()),
relay_region: relay_region.clone(),
available_relays: relay_peers_for_ack.clone(),
}).await;
info!(%addr, fingerprint = %client_fp, alias = ?client_alias, "signal client registered");
// Send the full presence list directly to the new
// client (guaranteed delivery — their recv loop is
// about to start). Then broadcast to all OTHER
// clients so they learn about the new user.
{
let hub = signal_hub.lock().await;
let presence = hub.presence_list();
// Direct send to new client (arrives right after ack)
let _ = transport.send_signal(&presence).await;
// Broadcast to everyone else
hub.broadcast(&presence).await;
}
// Signal recv loop
loop {
match transport.recv_signal().await {
@@ -1025,12 +1039,14 @@ async fn main() -> anyhow::Result<()> {
ref call_id,
ref caller_reflexive_addr,
ref caller_local_addrs,
ref caller_mapped_addr,
..
} => {
let target_fp = target_fingerprint.clone();
let call_id = call_id.clone();
let caller_addr_for_registry = caller_reflexive_addr.clone();
let caller_local_for_registry = caller_local_addrs.clone();
let caller_mapped_for_registry = caller_mapped_addr.clone();
// Check if target is online
let online = {
@@ -1103,6 +1119,10 @@ async fn main() -> anyhow::Result<()> {
&call_id,
caller_local_for_registry.clone(),
);
reg.set_caller_mapped_addr(
&call_id,
caller_mapped_for_registry.clone(),
);
}
// Send ringing to caller immediately
@@ -1124,6 +1144,7 @@ async fn main() -> anyhow::Result<()> {
reg.create_call(call_id.clone(), client_fp.clone(), target_fp.clone());
reg.set_caller_reflexive_addr(&call_id, caller_addr_for_registry);
reg.set_caller_local_addrs(&call_id, caller_local_for_registry);
reg.set_caller_mapped_addr(&call_id, caller_mapped_for_registry);
}
// Forward offer to callee
@@ -1145,12 +1166,14 @@ async fn main() -> anyhow::Result<()> {
ref accept_mode,
ref callee_reflexive_addr,
ref callee_local_addrs,
ref callee_mapped_addr,
..
} => {
let call_id = call_id.clone();
let mode = *accept_mode;
let callee_addr_for_registry = callee_reflexive_addr.clone();
let callee_local_for_registry = callee_local_addrs.clone();
let callee_mapped_for_registry = callee_mapped_addr.clone();
// Phase 4: look up peer fingerprint AND
// peer_relay_fp in one lock acquisition.
@@ -1213,17 +1236,20 @@ async fn main() -> anyhow::Result<()> {
// BOTH parties' addrs so we can cross-wire
// peer_direct_addr on the CallSetups below.
let room = format!("call-{call_id}");
let (caller_addr, callee_addr, caller_local, callee_local) = {
let (caller_addr, callee_addr, caller_local, callee_local, caller_mapped, callee_mapped) = {
let mut reg = call_registry.lock().await;
reg.set_active(&call_id, mode, room.clone());
reg.set_callee_reflexive_addr(&call_id, callee_addr_for_registry);
reg.set_callee_local_addrs(&call_id, callee_local_for_registry.clone());
reg.set_callee_mapped_addr(&call_id, callee_mapped_for_registry);
let call = reg.get(&call_id);
(
call.and_then(|c| c.caller_reflexive_addr.clone()),
call.and_then(|c| c.callee_reflexive_addr.clone()),
call.map(|c| c.caller_local_addrs.clone()).unwrap_or_default(),
call.map(|c| c.callee_local_addrs.clone()).unwrap_or_default(),
call.and_then(|c| c.caller_mapped_addr.clone()),
call.and_then(|c| c.callee_mapped_addr.clone()),
)
};
info!(
@@ -1272,6 +1298,7 @@ async fn main() -> anyhow::Result<()> {
relay_addr: relay_addr_for_setup,
peer_direct_addr: caller_addr.clone(),
peer_local_addrs: caller_local.clone(),
peer_mapped_addr: caller_mapped.clone(),
};
let hub = signal_hub.lock().await;
let _ = hub.send_to(&client_fp, &setup_for_callee).await;
@@ -1284,14 +1311,15 @@ async fn main() -> anyhow::Result<()> {
}
// Send CallSetup to BOTH parties with
// cross-wired peer_direct_addr +
// peer_local_addrs (Phase 5.5 ICE).
// cross-wired candidates (Phase 5.5 ICE
// + Phase 8 port-mapped addrs).
let setup_for_caller = SignalMessage::CallSetup {
call_id: call_id.clone(),
room: room.clone(),
relay_addr: relay_addr_for_setup.clone(),
peer_direct_addr: callee_addr.clone(),
peer_local_addrs: callee_local.clone(),
peer_mapped_addr: callee_mapped,
};
let setup_for_callee = SignalMessage::CallSetup {
call_id: call_id.clone(),
@@ -1299,6 +1327,7 @@ async fn main() -> anyhow::Result<()> {
relay_addr: relay_addr_for_setup,
peer_direct_addr: caller_addr.clone(),
peer_local_addrs: caller_local.clone(),
peer_mapped_addr: caller_mapped,
};
let hub = signal_hub.lock().await;
let _ = hub.send_to(&peer_fp, &setup_for_caller).await;
@@ -1388,6 +1417,81 @@ async fn main() -> anyhow::Result<()> {
}
}
// Phase 8: forward CandidateUpdate to the
// call peer for mid-call ICE re-gathering.
// Same forwarding pattern as MediaPathReport.
SignalMessage::CandidateUpdate { ref call_id, .. } => {
let (peer_fp, peer_relay_fp) = {
let reg = call_registry.lock().await;
match reg.get(call_id) {
Some(c) => (
reg.peer_fingerprint(call_id, &client_fp)
.map(|s| s.to_string()),
c.peer_relay_fp.clone(),
),
None => (None, None),
}
};
if let Some(fp) = peer_fp {
if let Some(ref origin_fp) = peer_relay_fp {
if let Some(ref fm) = federation_mgr {
let forward = SignalMessage::FederatedSignalForward {
inner: Box::new(msg.clone()),
origin_relay_fp: tls_fp.clone(),
};
if let Err(e) = fm.send_signal_to_peer(origin_fp, &forward).await {
warn!(
%call_id,
%origin_fp,
error = %e,
"cross-relay CandidateUpdate forward failed"
);
}
}
} else {
let hub = signal_hub.lock().await;
let _ = hub.send_to(&fp, &msg).await;
}
}
}
// Hard NAT: forward HardNatProbe + HardNatBirthdayStart
// to call peer (same pattern as CandidateUpdate).
SignalMessage::HardNatBirthdayStart { ref call_id, .. } |
SignalMessage::HardNatProbe { ref call_id, .. } |
SignalMessage::UpgradeProposal { ref call_id, .. } |
SignalMessage::UpgradeResponse { ref call_id, .. } |
SignalMessage::UpgradeConfirm { ref call_id, .. } |
SignalMessage::QualityCapability { ref call_id, .. } => {
let (peer_fp, peer_relay_fp) = {
let reg = call_registry.lock().await;
match reg.get(call_id) {
Some(c) => (
reg.peer_fingerprint(call_id, &client_fp)
.map(|s| s.to_string()),
c.peer_relay_fp.clone(),
),
None => (None, None),
}
};
if let Some(fp) = peer_fp {
if let Some(ref origin_fp) = peer_relay_fp {
if let Some(ref fm) = federation_mgr {
let forward = SignalMessage::FederatedSignalForward {
inner: Box::new(msg.clone()),
origin_relay_fp: tls_fp.clone(),
};
let _ = fm.send_signal_to_peer(origin_fp, &forward).await;
}
} else {
let hub = signal_hub.lock().await;
let _ = hub.send_to(&fp, &msg).await;
}
}
}
SignalMessage::Ping { timestamp_ms } => {
let _ = transport.send_signal(&SignalMessage::Pong { timestamp_ms }).await;
}
@@ -1469,13 +1573,16 @@ async fn main() -> anyhow::Result<()> {
{
let mut hub = signal_hub.lock().await;
hub.unregister(&client_fp);
// Broadcast updated presence to remaining clients
let presence_msg = hub.presence_list();
hub.broadcast(&presence_msg).await;
}
{
let mut reg = presence.lock().await;
reg.unregister_local(&client_fp);
}
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
return;
}
@@ -1499,14 +1606,14 @@ async fn main() -> anyhow::Result<()> {
Err(e) => {
metrics.auth_attempts.with_label_values(&["fail"]).inc();
error!(%addr, "auth failed: {e}");
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
return;
}
}
}
Ok(Some(_)) => {
error!(%addr, "expected AuthToken as first signal, got something else");
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
return;
}
Ok(None) => {
@@ -1515,7 +1622,7 @@ async fn main() -> anyhow::Result<()> {
}
Err(e) => {
error!(%addr, "signal recv error during auth: {e}");
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
return;
}
}
@@ -1537,7 +1644,7 @@ async fn main() -> anyhow::Result<()> {
}
Err(e) => {
error!(%addr, "handshake failed: {e}");
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
return;
}
};
@@ -1561,7 +1668,7 @@ async fn main() -> anyhow::Result<()> {
};
if !authorized {
warn!(%addr, room = %room_name, fp = %participant_fp, "rejected: not authorized for this call room");
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
return;
}
info!(%addr, room = %room_name, fp = %participant_fp, "authorized for call room");
@@ -1602,7 +1709,7 @@ async fn main() -> anyhow::Result<()> {
tokio::select! { _ = up => {} _ = dn => {} }
stats_handle.abort();
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
} else {
// Room mode — enforce max sessions, then join room
let session_id = {
@@ -1611,7 +1718,7 @@ async fn main() -> anyhow::Result<()> {
Ok(id) => id,
Err(e) => {
error!(%addr, room = %room_name, "session rejected: {e}");
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
return;
}
}
@@ -1621,21 +1728,18 @@ async fn main() -> anyhow::Result<()> {
// Call rooms: enforce 2-participant limit
if room_name.starts_with("call-") {
let mgr = room_mgr.lock().await;
if mgr.room_size(&room_name) >= 2 {
drop(mgr);
if room_mgr.room_size(&room_name) >= 2 {
warn!(%addr, room = %room_name, "call room full (max 2 participants)");
metrics.active_sessions.dec();
let mut smgr = session_mgr.lock().await;
smgr.remove_session(session_id);
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
return;
}
}
let participant_id = {
let mut mgr = room_mgr.lock().await;
match mgr.join(
match room_mgr.join(
&room_name,
addr,
room::ParticipantSender::Quic(transport.clone()),
@@ -1643,8 +1747,7 @@ async fn main() -> anyhow::Result<()> {
caller_alias.as_deref(),
) {
Ok((id, update, senders)) => {
metrics.active_rooms.set(mgr.list().len() as i64);
drop(mgr); // release lock before async broadcast
metrics.active_rooms.set(room_mgr.list().len() as i64);
// Merge federated participants into RoomUpdate if this is a global room
let merged_update = if let Some(ref fm) = federation_mgr {
@@ -1663,6 +1766,15 @@ async fn main() -> anyhow::Result<()> {
} else { update }
} else { update };
if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_signal(&room_name, &merged_update);
tap.log_event(&room_name, "join", &format!(
"participant={id} addr={addr} alias={}",
caller_alias.as_deref().unwrap_or("?")
));
}
}
room::broadcast_signal(&senders, &merged_update).await;
id
}
@@ -1671,7 +1783,7 @@ async fn main() -> anyhow::Result<()> {
metrics.active_sessions.dec();
let mut smgr = session_mgr.lock().await;
smgr.remove_session(session_id);
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
return;
}
}
@@ -1720,16 +1832,13 @@ async fn main() -> anyhow::Result<()> {
}
metrics.remove_session_metrics(&session_id_str);
metrics.active_sessions.dec();
{
let mgr = room_mgr.lock().await;
metrics.active_rooms.set(mgr.list().len() as i64);
}
metrics.active_rooms.set(room_mgr.list().len() as i64);
{
let mut smgr = session_mgr.lock().await;
smgr.remove_session(session_id);
}
transport.close().await.ok();
close_transport(&*transport, "cleanup").await;
}
});
}

View File

@@ -9,7 +9,7 @@ use std::sync::Arc;
use std::time::Duration;
use bytes::Bytes;
use tokio::sync::Mutex;
use dashmap::DashMap;
use tracing::{error, info, warn};
use wzp_proto::packet::TrunkFrame;
@@ -50,6 +50,108 @@ impl DebugTap {
"TAP"
);
}
pub fn log_signal(&self, room: &str, signal: &wzp_proto::SignalMessage) {
match signal {
wzp_proto::SignalMessage::RoomUpdate { count, participants } => {
let names: Vec<&str> = participants.iter()
.map(|p| p.alias.as_deref().unwrap_or("?"))
.collect();
info!(
target: "debug_tap",
room = %room,
signal = "RoomUpdate",
count,
participants = ?names,
"TAP SIGNAL"
);
}
wzp_proto::SignalMessage::QualityDirective { recommended_profile, reason } => {
info!(
target: "debug_tap",
room = %room,
signal = "QualityDirective",
codec = ?recommended_profile.codec,
reason = reason.as_deref().unwrap_or(""),
"TAP SIGNAL"
);
}
other => {
info!(
target: "debug_tap",
room = %room,
signal = ?std::mem::discriminant(other),
"TAP SIGNAL"
);
}
}
}
pub fn log_event(&self, room: &str, event: &str, detail: &str) {
info!(
target: "debug_tap",
room = %room,
event,
detail,
"TAP EVENT"
);
}
pub fn log_stats(&self, room: &str, stats: &TapStats) {
let codecs: Vec<String> = stats.codecs_seen.iter().map(|c| format!("{c:?}")).collect();
info!(
target: "debug_tap",
room = %room,
period = "5s",
in_pkts = stats.in_pkts,
out_pkts = stats.out_pkts,
fan_out_avg = format!("{:.1}", if stats.in_pkts > 0 { stats.out_pkts as f64 / stats.in_pkts as f64 } else { 0.0 }),
seq_gaps = stats.seq_gaps,
codecs_seen = ?codecs,
"TAP STATS"
);
}
}
/// Per-participant stats for the debug tap periodic summary.
pub struct TapStats {
pub in_pkts: u64,
pub out_pkts: u64,
pub seq_gaps: u64,
pub codecs_seen: std::collections::HashSet<wzp_proto::CodecId>,
last_seq: Option<u16>,
}
impl TapStats {
pub fn new() -> Self {
Self {
in_pkts: 0,
out_pkts: 0,
seq_gaps: 0,
codecs_seen: std::collections::HashSet::new(),
last_seq: None,
}
}
pub fn record_in(&mut self, pkt: &wzp_proto::MediaPacket, fan_out: usize) {
self.in_pkts += 1;
self.out_pkts += fan_out as u64;
self.codecs_seen.insert(pkt.header.codec_id);
if let Some(prev) = self.last_seq {
let expected = prev.wrapping_add(1);
if pkt.header.seq != expected {
self.seq_gaps += 1;
}
}
self.last_seq = Some(pkt.header.seq);
}
pub fn reset_period(&mut self) {
self.in_pkts = 0;
self.out_pkts = 0;
self.seq_gaps = 0;
// Keep codecs_seen and last_seq across periods
}
}
/// Tracks network quality for a single participant in a room.
@@ -83,11 +185,7 @@ impl ParticipantQuality {
fn weakest_tier<'a>(qualities: impl Iterator<Item = &'a ParticipantQuality>) -> Tier {
qualities
.map(|pq| pq.current_tier)
.min_by_key(|t| match t {
Tier::Good => 2,
Tier::Degraded => 1,
Tier::Catastrophic => 0,
})
.min()
.unwrap_or(Tier::Good)
}
@@ -179,12 +277,18 @@ struct Participant {
/// A room holding multiple participants.
struct Room {
participants: Vec<Participant>,
/// Per-participant quality tracking, keyed by participant_id.
qualities: HashMap<ParticipantId, ParticipantQuality>,
/// Current room-wide tier (to avoid repeated broadcasts).
current_tier: Tier,
}
impl Room {
fn new() -> Self {
Self {
participants: Vec::new(),
qualities: HashMap::new(),
current_tier: Tier::Good,
}
}
@@ -241,29 +345,27 @@ impl Room {
}
/// Manages all rooms on the relay.
///
/// Uses `DashMap` for per-room sharded locking -- rooms are independently
/// lockable so the media hot-path never contends on a single mutex.
pub struct RoomManager {
rooms: HashMap<String, Room>,
/// Room access control list. Maps hashed room name allowed fingerprints.
rooms: DashMap<String, Room>,
/// Room access control list. Maps hashed room name -> allowed fingerprints.
/// When `None`, rooms are open (no auth mode). When `Some`, only listed
/// fingerprints can join the corresponding room.
acl: Option<HashMap<String, HashSet<String>>>,
/// fingerprints can join the corresponding room. Protected by std Mutex
/// since ACL mutations are rare (only during call setup).
acl: Option<std::sync::Mutex<HashMap<String, HashSet<String>>>>,
/// Channel for room lifecycle events (federation subscribes).
event_tx: tokio::sync::broadcast::Sender<RoomEvent>,
/// Per-participant quality tracking, keyed by (room_name, participant_id).
qualities: HashMap<(String, ParticipantId), ParticipantQuality>,
/// Current room-wide tier per room (to avoid repeated broadcasts).
room_tiers: HashMap<String, Tier>,
}
impl RoomManager {
pub fn new() -> Self {
let (event_tx, _) = tokio::sync::broadcast::channel(64);
Self {
rooms: HashMap::new(),
rooms: DashMap::new(),
acl: None,
event_tx,
qualities: HashMap::new(),
room_tiers: HashMap::new(),
}
}
@@ -271,11 +373,9 @@ impl RoomManager {
pub fn with_acl() -> Self {
let (event_tx, _) = tokio::sync::broadcast::channel(64);
Self {
rooms: HashMap::new(),
acl: Some(HashMap::new()),
rooms: DashMap::new(),
acl: Some(std::sync::Mutex::new(HashMap::new())),
event_tx,
qualities: HashMap::new(),
room_tiers: HashMap::new(),
}
}
@@ -285,9 +385,10 @@ impl RoomManager {
}
/// Grant a fingerprint access to a room.
pub fn allow(&mut self, room_name: &str, fingerprint: &str) {
if let Some(ref mut acl) = self.acl {
acl.entry(room_name.to_string())
pub fn allow(&self, room_name: &str, fingerprint: &str) {
if let Some(ref acl) = self.acl {
acl.lock().unwrap()
.entry(room_name.to_string())
.or_default()
.insert(fingerprint.to_string());
}
@@ -300,6 +401,7 @@ impl RoomManager {
(None, _) => true, // no ACL = open
(Some(_), None) => false, // ACL enabled but no fingerprint
(Some(acl), Some(fp)) => {
let acl = acl.lock().unwrap();
// Room not in ACL = open room (allow anyone authenticated)
match acl.get(room_name) {
None => true,
@@ -311,7 +413,7 @@ impl RoomManager {
/// Join a room. Returns (participant_id, room_update_msg, all_senders) for broadcasting.
pub fn join(
&mut self,
&self,
room_name: &str,
addr: std::net::SocketAddr,
sender: ParticipantSender,
@@ -322,25 +424,25 @@ impl RoomManager {
warn!(room = room_name, fingerprint = ?fingerprint, "unauthorized room join attempt");
return Err("not authorized for this room".to_string());
}
let was_empty = !self.rooms.contains_key(room_name)
|| self.rooms.get(room_name).map_or(true, |r| r.is_empty());
let room = self.rooms.entry(room_name.to_string()).or_insert_with(Room::new);
let was_empty = self.rooms.get(room_name).map_or(true, |r| r.is_empty());
let mut room = self.rooms.entry(room_name.to_string()).or_insert_with(Room::new);
let id = room.add(addr, sender, fingerprint.map(|s| s.to_string()), alias.map(|s| s.to_string()));
self.qualities.insert((room_name.to_string(), id), ParticipantQuality::new());
if was_empty {
let _ = self.event_tx.send(RoomEvent::LocalJoin { room: room_name.to_string() });
}
room.qualities.insert(id, ParticipantQuality::new());
let update = wzp_proto::SignalMessage::RoomUpdate {
count: room.len() as u32,
participants: room.participant_list(),
};
let senders = room.all_senders();
drop(room); // release DashMap guard before event_tx send (not async, but good practice)
if was_empty {
let _ = self.event_tx.send(RoomEvent::LocalJoin { room: room_name.to_string() });
}
Ok((id, update, senders))
}
/// Join a room via WebSocket. Convenience wrapper around `join()`.
pub fn join_ws(
&mut self,
&self,
room_name: &str,
addr: std::net::SocketAddr,
sender: tokio::sync::mpsc::Sender<Bytes>,
@@ -352,7 +454,7 @@ impl RoomManager {
/// Get list of active room names.
pub fn active_rooms(&self) -> Vec<String> {
self.rooms.keys().cloned().collect()
self.rooms.iter().map(|r| r.key().clone()).collect()
}
/// Get participant list for a room (fingerprint + alias).
@@ -372,26 +474,29 @@ impl RoomManager {
}
/// Leave a room. Returns (room_update_msg, remaining_senders) for broadcasting, or None if room is now empty.
pub fn leave(&mut self, room_name: &str, participant_id: ParticipantId) -> Option<(wzp_proto::SignalMessage, Vec<ParticipantSender>)> {
self.qualities.remove(&(room_name.to_string(), participant_id));
if let Some(room) = self.rooms.get_mut(room_name) {
room.remove(participant_id);
if room.is_empty() {
self.rooms.remove(room_name);
self.room_tiers.remove(room_name);
let _ = self.event_tx.send(RoomEvent::LocalLeave { room: room_name.to_string() });
info!(room = room_name, "room closed (empty)");
return None;
pub fn leave(&self, room_name: &str, participant_id: ParticipantId) -> Option<(wzp_proto::SignalMessage, Vec<ParticipantSender>)> {
let result = {
if let Some(mut room) = self.rooms.get_mut(room_name) {
room.qualities.remove(&participant_id);
room.remove(participant_id);
if room.is_empty() {
drop(room); // release write guard before remove
self.rooms.remove(room_name);
let _ = self.event_tx.send(RoomEvent::LocalLeave { room: room_name.to_string() });
info!(room = room_name, "room closed (empty)");
return None;
}
let update = wzp_proto::SignalMessage::RoomUpdate {
count: room.len() as u32,
participants: room.participant_list(),
};
let senders = room.all_senders();
Some((update, senders))
} else {
None
}
let update = wzp_proto::SignalMessage::RoomUpdate {
count: room.len() as u32,
participants: room.participant_list(),
};
let senders = room.all_senders();
Some((update, senders))
} else {
None
}
};
result
}
/// Get senders for all OTHER participants in a room.
@@ -411,23 +516,29 @@ impl RoomManager {
self.rooms.get(room_name).map(|r| r.len()).unwrap_or(0)
}
/// Check if a room exists and has participants.
pub fn is_room_active(&self, room_name: &str) -> bool {
self.rooms.contains_key(room_name)
}
/// List all rooms with their sizes.
pub fn list(&self) -> Vec<(String, usize)> {
self.rooms.iter().map(|(k, v)| (k.clone(), v.len())).collect()
self.rooms.iter().map(|r| (r.key().clone(), r.len())).collect()
}
/// Feed a quality report from a participant. If the room-wide weakest
/// tier changes, returns `(QualityDirective signal, all senders)` for
/// broadcasting.
pub fn observe_quality(
&mut self,
&self,
room_name: &str,
participant_id: ParticipantId,
report: &wzp_proto::packet::QualityReport,
) -> Option<(wzp_proto::SignalMessage, Vec<ParticipantSender>)> {
let key = (room_name.to_string(), participant_id);
let tier_changed = self.qualities
.get_mut(&key)
let mut room = self.rooms.get_mut(room_name)?;
let tier_changed = room.qualities
.get_mut(&participant_id)
.and_then(|pq| pq.observe(report))
.is_some();
@@ -436,22 +547,19 @@ impl RoomManager {
}
// Compute the weakest tier across all participants in this room
let room_qualities = self.qualities.iter()
.filter(|((rn, _), _)| rn == room_name)
.map(|(_, pq)| pq);
let weakest = weakest_tier(room_qualities);
let weakest = weakest_tier(room.qualities.values());
let current_room_tier = self.room_tiers.get(room_name).copied().unwrap_or(Tier::Good);
if weakest == current_room_tier {
if weakest == room.current_tier {
return None;
}
// Room-wide tier changed update and broadcast directive
self.room_tiers.insert(room_name.to_string(), weakest);
// Room-wide tier changed -- update and broadcast directive
let old_tier = room.current_tier;
room.current_tier = weakest;
let profile = weakest.profile();
info!(
room = room_name,
old_tier = ?current_room_tier,
old_tier = ?old_tier,
new_tier = ?weakest,
codec = ?profile.codec,
fec_ratio = profile.fec_ratio,
@@ -462,9 +570,7 @@ impl RoomManager {
recommended_profile: profile,
reason: Some(format!("weakest link: {weakest:?}")),
};
let senders = self.rooms.get(room_name)
.map(|r| r.all_senders())
.unwrap_or_default();
let senders = room.all_senders();
Some((directive, senders))
}
}
@@ -548,7 +654,7 @@ impl TrunkedForwarder {
/// into [`TrunkedForwarder`]s and flushed every 5 ms or when the batcher is
/// full, reducing QUIC datagram overhead.
pub async fn run_participant(
room_mgr: Arc<Mutex<RoomManager>>,
room_mgr: Arc<RoomManager>,
room_name: String,
participant_id: ParticipantId,
transport: Arc<wzp_transport::QuinnTransport>,
@@ -574,7 +680,7 @@ pub async fn run_participant(
/// Plain (non-trunked) forwarding loop — original behaviour.
async fn run_participant_plain(
room_mgr: Arc<Mutex<RoomManager>>,
room_mgr: Arc<RoomManager>,
room_name: String,
participant_id: ParticipantId,
transport: Arc<wzp_transport::QuinnTransport>,
@@ -592,6 +698,12 @@ async fn run_participant_plain(
let mut send_errors = 0u64;
let mut last_log_instant = std::time::Instant::now();
let mut tap_stats = if debug_tap.as_ref().map_or(false, |t| t.matches(&room_name)) {
Some(TapStats::new())
} else {
None
};
info!(
room = %room_name,
participant = participant_id,
@@ -642,13 +754,12 @@ async fn run_participant_plain(
// Get current list of other participants + check quality directive
let lock_start = std::time::Instant::now();
let (others, quality_directive) = {
let mut mgr = room_mgr.lock().await;
let directive = if let Some(ref report) = pkt.quality_report {
mgr.observe_quality(&room_name, participant_id, report)
room_mgr.observe_quality(&room_name, participant_id, report)
} else {
None
};
let o = mgr.others(&room_name, participant_id);
let o = room_mgr.others(&room_name, participant_id);
(o, directive)
};
let lock_ms = lock_start.elapsed().as_millis() as u64;
@@ -663,15 +774,23 @@ async fn run_participant_plain(
// Broadcast quality directive to all participants if tier changed
if let Some((directive, all_senders)) = quality_directive {
if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_signal(&room_name, &directive);
}
}
broadcast_signal(&all_senders, &directive).await;
}
// Debug tap: log packet metadata
// Debug tap: log packet metadata + record stats
if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_packet(&room_name, "in", &addr, &pkt, others.len());
}
}
if let Some(ref mut ts) = tap_stats {
ts.record_in(&pkt, others.len());
}
// Forward to all others
let fwd_start = std::time::Instant::now();
@@ -729,10 +848,7 @@ async fn run_participant_plain(
// Periodic stats log every 5 seconds
if last_log_instant.elapsed() >= Duration::from_secs(5) {
let room_size = {
let mgr = room_mgr.lock().await;
mgr.room_size(&room_name)
};
let room_size = room_mgr.room_size(&room_name);
info!(
room = %room_name,
participant = participant_id,
@@ -744,6 +860,10 @@ async fn run_participant_plain(
send_errors,
"participant stats"
);
if let (Some(tap), Some(ts)) = (&debug_tap, &mut tap_stats) {
tap.log_stats(&room_name, ts);
ts.reset_period();
}
max_recv_gap_ms = 0;
max_forward_ms = 0;
last_log_instant = std::time::Instant::now();
@@ -751,16 +871,28 @@ async fn run_participant_plain(
}
// Clean up — leave room and broadcast update to remaining participants
let mut mgr = room_mgr.lock().await;
if let Some((update, senders)) = mgr.leave(&room_name, participant_id) {
drop(mgr); // release lock before async broadcast
if let Some((update, senders)) = room_mgr.leave(&room_name, participant_id) {
if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_event(&room_name, "leave", &format!(
"participant={participant_id} addr={addr} forwarded={packets_forwarded}"
));
tap.log_signal(&room_name, &update);
}
}
broadcast_signal(&senders, &update).await;
} else if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_event(&room_name, "leave", &format!(
"participant={participant_id} addr={addr} (room closed)"
));
}
}
}
/// Trunked forwarding loop — batches outgoing packets per peer.
async fn run_participant_trunked(
room_mgr: Arc<Mutex<RoomManager>>,
room_mgr: Arc<RoomManager>,
room_name: String,
participant_id: ParticipantId,
transport: Arc<wzp_transport::QuinnTransport>,
@@ -835,13 +967,12 @@ async fn run_participant_trunked(
let lock_start = std::time::Instant::now();
let (others, quality_directive) = {
let mut mgr = room_mgr.lock().await;
let directive = if let Some(ref report) = pkt.quality_report {
mgr.observe_quality(&room_name, participant_id, report)
room_mgr.observe_quality(&room_name, participant_id, report)
} else {
None
};
let o = mgr.others(&room_name, participant_id);
let o = room_mgr.others(&room_name, participant_id);
(o, directive)
};
let lock_ms = lock_start.elapsed().as_millis() as u64;
@@ -907,10 +1038,7 @@ async fn run_participant_trunked(
// Periodic stats every 5 seconds
if last_log_instant.elapsed() >= Duration::from_secs(5) {
let room_size = {
let mgr = room_mgr.lock().await;
mgr.room_size(&room_name)
};
let room_size = room_mgr.room_size(&room_name);
info!(
room = %room_name,
participant = participant_id,
@@ -951,9 +1079,7 @@ async fn run_participant_trunked(
let _ = fwd.flush().await;
}
let mut mgr = room_mgr.lock().await;
if let Some((update, senders)) = mgr.leave(&room_name, participant_id) {
drop(mgr);
if let Some((update, senders)) = room_mgr.leave(&room_name, participant_id) {
broadcast_signal(&senders, &update).await;
}
}
@@ -999,7 +1125,7 @@ mod tests {
#[test]
fn acl_restricts_to_allowed() {
let mut mgr = RoomManager::with_acl();
let mgr = RoomManager::with_acl();
mgr.allow("room1", "alice");
mgr.allow("room1", "bob");
assert!(mgr.is_authorized("room1", Some("alice")));

View File

@@ -86,6 +86,26 @@ impl SignalHub {
pub fn alias(&self, fp: &str) -> Option<&str> {
self.clients.get(fp).and_then(|c| c.alias.as_deref())
}
/// Build a PresenceList message with all online users.
pub fn presence_list(&self) -> SignalMessage {
let users: Vec<wzp_proto::PresenceUser> = self
.clients
.values()
.map(|c| wzp_proto::PresenceUser {
fingerprint: c.fingerprint.clone(),
alias: c.alias.clone(),
})
.collect();
SignalMessage::PresenceList { users }
}
/// Broadcast a message to ALL connected signal clients.
pub async fn broadcast(&self, msg: &SignalMessage) {
for client in self.clients.values() {
let _ = client.transport.send_signal(msg).await;
}
}
}
#[cfg(test)]

View File

@@ -31,7 +31,7 @@ use crate::session_mgr::SessionManager;
/// Shared state for WebSocket handlers.
#[derive(Clone)]
pub struct WsState {
pub room_mgr: Arc<Mutex<RoomManager>>,
pub room_mgr: Arc<RoomManager>,
pub session_mgr: Arc<Mutex<SessionManager>>,
pub auth_url: Option<String>,
pub metrics: Arc<RelayMetrics>,
@@ -143,10 +143,9 @@ async fn handle_ws_connection(socket: WebSocket, room: String, state: WsState) {
// 4. Join room with WS sender
let addr: SocketAddr = ([0, 0, 0, 0], 0).into();
let participant_id = {
let mut mgr = state.room_mgr.lock().await;
match mgr.join_ws(&room, addr, tx, fingerprint.as_deref()) {
match state.room_mgr.join_ws(&room, addr, tx, fingerprint.as_deref()) {
Ok(id) => {
state.metrics.active_rooms.set(mgr.list().len() as i64);
state.metrics.active_rooms.set(state.room_mgr.list().len() as i64);
id
}
Err(e) => {
@@ -184,10 +183,7 @@ async fn handle_ws_connection(socket: WebSocket, room: String, state: WsState) {
loop {
match ws_rx.next().await {
Some(Ok(Message::Binary(data))) => {
let others = {
let mgr = state.room_mgr.lock().await;
mgr.others(&room, participant_id)
};
let others = state.room_mgr.others(&room, participant_id);
for other in &others {
let _ = other.send_raw(&data).await;
}
@@ -214,11 +210,8 @@ async fn handle_ws_connection(socket: WebSocket, room: String, state: WsState) {
reg.unregister_local(fp);
}
{
let mut mgr = state.room_mgr.lock().await;
mgr.leave(&room, participant_id);
state.metrics.active_rooms.set(mgr.list().len() as i64);
}
state.room_mgr.leave(&room, participant_id);
state.metrics.active_rooms.set(state.room_mgr.list().len() as i64);
let session_id_str: String = session_id.iter().map(|b| format!("{b:02x}")).collect();
state.metrics.remove_session_metrics(&session_id_str);

View File

@@ -52,6 +52,7 @@ fn alice_offer(call_id: &str) -> SignalMessage {
supported_profiles: vec![],
caller_reflexive_addr: Some(ALICE_ADDR.into()),
caller_local_addrs: Vec::new(),
caller_mapped_addr: None,
caller_build_version: None,
}
}
@@ -133,6 +134,7 @@ fn bob_answer(call_id: &str) -> SignalMessage {
chosen_profile: None,
callee_reflexive_addr: Some(BOB_ADDR.into()),
callee_local_addrs: Vec::new(),
callee_mapped_addr: None,
callee_build_version: None,
}
}
@@ -178,6 +180,7 @@ fn relay_b_handle_local_answer(
relay_addr: RELAY_B_ADDR.into(),
peer_direct_addr: caller_addr,
peer_local_addrs: Vec::new(),
peer_mapped_addr: None,
};
let _ = callee_addr;
(forward, setup_for_bob)
@@ -219,6 +222,7 @@ fn relay_a_handle_forwarded_answer(
relay_addr: RELAY_A_ADDR.into(),
peer_direct_addr: callee_reflexive_addr,
peer_local_addrs: Vec::new(),
peer_mapped_addr: None,
}
}

View File

@@ -0,0 +1,662 @@
//! Tests for `wzp_relay::federation`.
//!
//! Covers:
//! - room_hash determinism and uniqueness
//! - is_global_room (static config + call-* implicit global)
//! - resolve_global_room
//! - global_room_hash
//! - forward_to_peers with zero peers (no-op)
//! - forward_to_peers with live QUIC peer links
//! - broadcast_signal to live QUIC peers
//! - send_signal_to_peer targeted routing
//! - find_peer_by_fingerprint / find_peer_by_addr / check_inbound_trust
//! - set_cross_relay_tx + local_tls_fp accessors
use std::collections::HashSet;
use std::net::{Ipv4Addr, SocketAddr};
use std::sync::Arc;
use std::time::Duration;
use bytes::Bytes;
use wzp_proto::{MediaTransport, SignalMessage};
use wzp_relay::config::{PeerConfig, TrustedConfig};
use wzp_relay::event_log::EventLogger;
use wzp_relay::federation::{room_hash, FederationManager};
use wzp_relay::metrics::RelayMetrics;
use wzp_relay::room::RoomManager;
use wzp_transport::{client_config, create_endpoint, server_config, QuinnTransport};
// ───────────────────────────── helpers ──────────────────────────────
/// Create a FederationManager for unit tests (no live peers).
fn create_test_fm(global_rooms: HashSet<String>) -> Arc<FederationManager> {
create_test_fm_full(vec![], vec![], global_rooms)
}
/// Create a FederationManager with full config (peers + trusted + global rooms).
fn create_test_fm_full(
peers: Vec<PeerConfig>,
trusted: Vec<TrustedConfig>,
global_rooms: HashSet<String>,
) -> Arc<FederationManager> {
let _ = rustls::crypto::ring::default_provider().install_default();
let (sc, _cert) = server_config();
let ep = create_endpoint((Ipv4Addr::LOCALHOST, 0).into(), Some(sc))
.expect("test endpoint");
let room_mgr = Arc::new(RoomManager::new());
let metrics = Arc::new(RelayMetrics::new());
let event_log = EventLogger::Noop;
Arc::new(FederationManager::new(
peers,
trusted,
global_rooms,
room_mgr,
ep,
"test-relay-fp-abc123".into(),
metrics,
event_log,
))
}
/// Build an in-process QUIC client/server pair on loopback.
/// Returns (client_transport, server_transport, endpoints).
/// The endpoints must be kept alive for the test duration.
async fn connected_pair() -> (
Arc<QuinnTransport>,
Arc<QuinnTransport>,
(quinn::Endpoint, quinn::Endpoint),
) {
let _ = rustls::crypto::ring::default_provider().install_default();
let (sc, _cert_der) = server_config();
let server_addr: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let server_ep = create_endpoint(server_addr, Some(sc)).expect("server endpoint");
let server_listen = server_ep.local_addr().expect("server local addr");
let client_bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let client_ep = create_endpoint(client_bind, None).expect("client endpoint");
let server_ep_clone = server_ep.clone();
let accept_fut = tokio::spawn(async move {
let conn = wzp_transport::accept(&server_ep_clone)
.await
.expect("accept");
Arc::new(QuinnTransport::new(conn))
});
let client_conn =
wzp_transport::connect(&client_ep, server_listen, "localhost", client_config())
.await
.expect("connect");
let client_transport = Arc::new(QuinnTransport::new(client_conn));
let server_transport = accept_fut.await.expect("join accept task");
(client_transport, server_transport, (server_ep, client_ep))
}
// ───────────────────── 1. room_hash determinism ─────────────────────
#[test]
fn room_hash_deterministic() {
let h1 = room_hash("podcast");
let h2 = room_hash("podcast");
assert_eq!(h1, h2);
}
#[test]
fn room_hash_different_rooms() {
let h1 = room_hash("room-a");
let h2 = room_hash("room-b");
assert_ne!(h1, h2);
}
#[test]
fn room_hash_is_8_bytes() {
let h = room_hash("some-room");
assert_eq!(h.len(), 8);
}
#[test]
fn room_hash_empty_string() {
// Should not panic on empty input
let h = room_hash("");
assert_eq!(h.len(), 8);
// And should differ from a non-empty room
assert_ne!(h, room_hash("nonempty"));
}
#[test]
fn room_hash_case_sensitive() {
// "Podcast" and "podcast" are different rooms
let h1 = room_hash("Podcast");
let h2 = room_hash("podcast");
assert_ne!(h1, h2);
}
// ───────────────── 2. is_global_room / resolve_global_room ──────────
#[tokio::test]
async fn is_global_room_static_config() {
let global: HashSet<String> = ["podcast", "lobby"].iter().map(|s| s.to_string()).collect();
let fm = create_test_fm(global);
assert!(fm.is_global_room("podcast"));
assert!(fm.is_global_room("lobby"));
assert!(!fm.is_global_room("private-room"));
assert!(!fm.is_global_room(""));
}
#[tokio::test]
async fn is_global_room_call_prefix_implicit() {
// Phase 4.1: call-* rooms are implicitly global
let fm = create_test_fm(HashSet::new());
assert!(fm.is_global_room("call-abc123"));
assert!(fm.is_global_room("call-"));
assert!(fm.is_global_room("call-some-uuid-here"));
// But not just "call" without the dash
assert!(!fm.is_global_room("call"));
assert!(!fm.is_global_room("callback"));
}
#[tokio::test]
async fn resolve_global_room_static() {
let global: HashSet<String> = ["podcast"].iter().map(|s| s.to_string()).collect();
let fm = create_test_fm(global);
assert_eq!(fm.resolve_global_room("podcast"), Some("podcast".into()));
assert_eq!(fm.resolve_global_room("unknown"), None);
}
#[tokio::test]
async fn resolve_global_room_call_prefix() {
let fm = create_test_fm(HashSet::new());
let resolved = fm.resolve_global_room("call-test-123");
assert_eq!(resolved, Some("call-test-123".into()));
}
#[tokio::test]
async fn global_room_hash_uses_canonical_name() {
let global: HashSet<String> = ["podcast"].iter().map(|s| s.to_string()).collect();
let fm = create_test_fm(global);
// For a known global room, global_room_hash should match room_hash of the canonical name
let expected = room_hash("podcast");
assert_eq!(fm.global_room_hash("podcast"), expected);
}
#[tokio::test]
async fn global_room_hash_unknown_room_falls_through() {
let fm = create_test_fm(HashSet::new());
// Unknown room: just hashes whatever was passed
let expected = room_hash("random-room");
assert_eq!(fm.global_room_hash("random-room"), expected);
}
#[tokio::test]
async fn global_room_hash_call_prefix() {
let fm = create_test_fm(HashSet::new());
// call-* resolves to itself
let expected = room_hash("call-xyz");
assert_eq!(fm.global_room_hash("call-xyz"), expected);
}
// ───────────────── 3. forward_to_peers with zero peers ──────────────
#[tokio::test]
async fn forward_to_peers_empty_returns_immediately() {
let fm = create_test_fm(HashSet::new());
let hash = room_hash("room");
let data = Bytes::from_static(b"test-media-payload");
// Should not panic or hang
let result = tokio::time::timeout(
Duration::from_secs(2),
fm.forward_to_peers("room", &hash, &data),
)
.await;
assert!(result.is_ok(), "forward_to_peers should return immediately with no peers");
}
// ─────────── 4. forward_to_peers with live QUIC peer links ──────────
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn forward_to_peers_delivers_tagged_datagram() {
// We create a FederationManager and manually wire a connected QUIC
// pair to simulate a peer link. The fm holds the server-side
// transport; we read from the client side to verify delivery.
let fm = create_test_fm(HashSet::new());
let (client_transport, server_transport, _endpoints) = connected_pair().await;
// Manually insert a PeerLink by using handle_inbound's internal
// pattern: we call the private peer_links mutex directly. Since
// PeerLink is private, we instead use handle_inbound which calls
// run_federation_link. But that requires a full signal loop.
//
// Alternative approach: spawn a mock "federation relay" server,
// have the FM connect to it via connect_to_peer, and read back
// from the server side. But connect_to_peer also starts the full
// link loop.
//
// Simplest: create a second FM that acts as the peer, and use
// the broadcast_signal / forward_to_peers pattern after the link
// is established via handle_inbound.
//
// Actually the simplest approach for testing forward_to_peers is
// to accept that PeerLink is private, so we instead test through
// the full federation link lifecycle. We'll spawn a mini relay
// that does the FederationHello handshake and then reads datagrams.
// Approach: spawn the server side to do the hello exchange, then
// the fm handle_inbound will register the link, then we can call
// forward_to_peers and read from the server side... But
// handle_inbound blocks in run_federation_link.
//
// Final approach: we test the wire format directly. The client
// side is "us" (the relay) — we send a tagged datagram manually,
// and verify the peer side receives it with the correct format.
// This tests the same logic as forward_to_peers without needing
// peer_links access.
let room = "test-room";
let rh = room_hash(room);
let media = b"opus-frame-data-here";
// Build the tagged datagram the same way forward_to_peers does
let mut tagged = Vec::with_capacity(8 + media.len());
tagged.extend_from_slice(&rh);
tagged.extend_from_slice(media);
// Send from the server side (as if we are the relay forwarding)
server_transport
.send_raw_datagram(&tagged)
.expect("send datagram");
// Read from client side (as if we are the peer relay receiving)
let received = tokio::time::timeout(
Duration::from_secs(2),
client_transport.connection().read_datagram(),
)
.await
.expect("should receive within timeout")
.expect("read_datagram ok");
// Verify: first 8 bytes are the room hash, remainder is media
assert!(received.len() >= 8, "datagram too short");
let mut recv_hash = [0u8; 8];
recv_hash.copy_from_slice(&received[..8]);
assert_eq!(recv_hash, rh, "room hash mismatch");
assert_eq!(&received[8..], media, "media payload mismatch");
drop(client_transport);
drop(server_transport);
}
// ─────────── 5. broadcast_signal to live QUIC peers ─────────────────
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn broadcast_signal_sends_to_all_peers() {
// We need the peer links to be registered inside the FM.
// The simplest approach: spawn a mock peer relay that accepts
// federation connections, does the FederationHello handshake,
// and then reads signals.
let _ = rustls::crypto::ring::default_provider().install_default();
// Create a mock "peer relay" server endpoint
let (sc, _cert) = server_config();
let peer_addr: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let peer_ep = create_endpoint(peer_addr, Some(sc)).expect("peer endpoint");
let peer_listen = peer_ep.local_addr().expect("peer local addr");
// The FM that will connect outbound
let peer_cfg = PeerConfig {
url: peer_listen.to_string(),
fingerprint: "aa:bb:cc:dd".into(),
label: Some("mock-peer".into()),
};
let global: HashSet<String> = ["podcast"].iter().map(|s| s.to_string()).collect();
let fm = create_test_fm_full(vec![peer_cfg], vec![], global);
// Spawn the FM's run (which will try to connect to our mock peer)
let fm_clone = fm.clone();
let _fm_task = tokio::spawn(async move {
fm_clone.run().await;
});
// Accept the connection on the mock peer side
let peer_ep_clone = peer_ep.clone();
let peer_transport = tokio::time::timeout(Duration::from_secs(5), async {
let conn = wzp_transport::accept(&peer_ep_clone).await.expect("accept");
Arc::new(QuinnTransport::new(conn))
})
.await
.expect("FM should connect to mock peer within 5s");
// The FM sends FederationHello as the first signal. Read it.
let hello = tokio::time::timeout(
Duration::from_secs(2),
peer_transport.recv_signal(),
)
.await
.expect("hello timeout")
.expect("recv ok")
.expect("some message");
match hello {
SignalMessage::FederationHello { tls_fingerprint } => {
assert_eq!(tls_fingerprint, "test-relay-fp-abc123");
}
other => panic!("expected FederationHello, got: {:?}", std::mem::discriminant(&other)),
}
// Now the FM's run_federation_link registered the peer in peer_links
// and will announce active global rooms. We may receive
// GlobalRoomActive signals next (for any rooms the FM has active).
// For this test, no local participants, so no GlobalRoomActive.
// Give the link time to fully set up
tokio::time::sleep(Duration::from_millis(100)).await;
// Now call broadcast_signal on the FM
let test_msg = SignalMessage::FederatedSignalForward {
inner: Box::new(SignalMessage::Reflect),
origin_relay_fp: "other-relay-fp".into(),
};
let count = fm.broadcast_signal(&test_msg).await;
assert_eq!(count, 1, "should have broadcast to exactly 1 peer");
// Read the signal on the peer side
let received = tokio::time::timeout(
Duration::from_secs(2),
peer_transport.recv_signal(),
)
.await
.expect("broadcast signal timeout")
.expect("recv ok")
.expect("some message");
match received {
SignalMessage::FederatedSignalForward { origin_relay_fp, .. } => {
assert_eq!(origin_relay_fp, "other-relay-fp");
}
other => panic!("expected FederatedSignalForward, got: {:?}", std::mem::discriminant(&other)),
}
drop(peer_transport);
}
// ──────────── 6. send_signal_to_peer targeted routing ───────────────
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn send_signal_to_peer_unknown_fp_returns_error() {
let fm = create_test_fm(HashSet::new());
let msg = SignalMessage::Reflect;
let result = fm.send_signal_to_peer("nonexistent-fp", &msg).await;
assert!(result.is_err());
assert!(result.unwrap_err().contains("no active federation link"));
}
// ──────────── 7. find_peer_by_fingerprint / addr / trust ────────────
#[tokio::test]
async fn find_peer_by_fingerprint_matches() {
let peer = PeerConfig {
url: "10.0.0.1:4433".into(),
fingerprint: "AA:BB:CC:DD".into(),
label: Some("relay-eu".into()),
};
let fm = create_test_fm_full(vec![peer], vec![], HashSet::new());
// Normalized match (colons removed, lowercased)
let found = fm.find_peer_by_fingerprint("aabbccdd");
assert!(found.is_some());
assert_eq!(found.unwrap().label.as_deref(), Some("relay-eu"));
// With colons
let found2 = fm.find_peer_by_fingerprint("AA:BB:CC:DD");
assert!(found2.is_some());
// Non-matching
assert!(fm.find_peer_by_fingerprint("11:22:33:44").is_none());
}
#[tokio::test]
async fn find_peer_by_addr_matches_ip() {
let peer = PeerConfig {
url: "10.0.0.1:4433".into(),
fingerprint: "aabb".into(),
label: None,
};
let fm = create_test_fm_full(vec![peer], vec![], HashSet::new());
// Same IP, different port still matches (find_peer_by_addr matches by IP)
let addr: SocketAddr = "10.0.0.1:9999".parse().unwrap();
let found = fm.find_peer_by_addr(addr);
assert!(found.is_some());
// Different IP
let addr2: SocketAddr = "10.0.0.2:4433".parse().unwrap();
assert!(fm.find_peer_by_addr(addr2).is_none());
}
#[tokio::test]
async fn find_trusted_by_fingerprint() {
let trusted = TrustedConfig {
fingerprint: "AA:BB:CC:DD:EE".into(),
label: Some("trusted-relay".into()),
};
let fm = create_test_fm_full(vec![], vec![trusted], HashSet::new());
let found = fm.find_trusted_by_fingerprint("aabbccddee");
assert!(found.is_some());
assert_eq!(found.unwrap().label.as_deref(), Some("trusted-relay"));
assert!(fm.find_trusted_by_fingerprint("ffffffff").is_none());
}
#[tokio::test]
async fn check_inbound_trust_prefers_peer_by_addr() {
let peer = PeerConfig {
url: "10.0.0.1:4433".into(),
fingerprint: "aabb".into(),
label: Some("peer-relay".into()),
};
let trusted = TrustedConfig {
fingerprint: "ccdd".into(),
label: Some("trusted-relay".into()),
};
let fm = create_test_fm_full(vec![peer], vec![trusted], HashSet::new());
// Matches by addr (peer takes priority)
let addr: SocketAddr = "10.0.0.1:5555".parse().unwrap();
let label = fm.check_inbound_trust(addr, "ccdd");
assert_eq!(label, Some("peer-relay".into()));
}
#[tokio::test]
async fn check_inbound_trust_falls_back_to_trusted_fp() {
let trusted = TrustedConfig {
fingerprint: "CC:DD".into(),
label: Some("trusted-relay".into()),
};
let fm = create_test_fm_full(vec![], vec![trusted], HashSet::new());
// No peer matches, but trusted fingerprint matches
let addr: SocketAddr = "10.99.99.99:1234".parse().unwrap();
let label = fm.check_inbound_trust(addr, "ccdd");
assert_eq!(label, Some("trusted-relay".into()));
}
#[tokio::test]
async fn check_inbound_trust_returns_none_for_unknown() {
let fm = create_test_fm(HashSet::new());
let addr: SocketAddr = "10.0.0.1:4433".parse().unwrap();
assert!(fm.check_inbound_trust(addr, "unknown-fp").is_none());
}
// ──────────── 8. set_cross_relay_tx + local_tls_fp ──────────────────
#[tokio::test]
async fn local_tls_fp_returns_configured_value() {
let fm = create_test_fm(HashSet::new());
assert_eq!(fm.local_tls_fp(), "test-relay-fp-abc123");
}
#[tokio::test]
async fn set_cross_relay_tx_wires_channel() {
let fm = create_test_fm(HashSet::new());
let (tx, mut rx) = tokio::sync::mpsc::channel(16);
fm.set_cross_relay_tx(tx).await;
// The channel is now wired — we can't easily test it without
// going through handle_signal, but we can at least verify it
// doesn't panic and the fm accepted the sender.
// (The channel itself works — we test the Sender.)
let msg = SignalMessage::Reflect;
let _ = rx.try_recv(); // should be empty
drop(rx);
}
// ──────────── 9. broadcast_signal with zero peers ───────────────────
#[tokio::test]
async fn broadcast_signal_zero_peers_returns_zero() {
let fm = create_test_fm(HashSet::new());
let msg = SignalMessage::Reflect;
let count = fm.broadcast_signal(&msg).await;
assert_eq!(count, 0);
}
// ──────────── 10. get_remote_participants with no links ─────────────
#[tokio::test]
async fn get_remote_participants_empty_with_no_links() {
let fm = create_test_fm(HashSet::new());
let participants = fm.get_remote_participants("podcast").await;
assert!(participants.is_empty());
}
// ─────── 11. Federation media egress with live QUIC connection ──────
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn federation_media_egress_forwards_to_peer() {
// This test verifies the full media path:
// local media -> federation egress channel -> forward_to_peers -> peer reads datagram
//
// We set up a real QUIC federation link via fm.run() connecting to
// a mock peer, then push media through the room manager's federation
// egress channel.
let _ = rustls::crypto::ring::default_provider().install_default();
// Mock peer relay
let (sc, _cert) = server_config();
let peer_addr: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let peer_ep = create_endpoint(peer_addr, Some(sc)).expect("peer endpoint");
let peer_listen = peer_ep.local_addr().expect("peer local addr");
let peer_cfg = PeerConfig {
url: peer_listen.to_string(),
fingerprint: "ee:ff:00:11".into(),
label: Some("egress-peer".into()),
};
let global: HashSet<String> = ["podcast"].iter().map(|s| s.to_string()).collect();
let fm = create_test_fm_full(vec![peer_cfg], vec![], global);
// Start the FM (connects to mock peer)
let fm_clone = fm.clone();
let _fm_task = tokio::spawn(async move { fm_clone.run().await });
// Accept the connection
let peer_ep_clone = peer_ep.clone();
let peer_transport = tokio::time::timeout(Duration::from_secs(5), async {
let conn = wzp_transport::accept(&peer_ep_clone).await.expect("accept");
Arc::new(QuinnTransport::new(conn))
})
.await
.expect("FM should connect within 5s");
// Read the FederationHello
let _hello = tokio::time::timeout(
Duration::from_secs(2),
peer_transport.recv_signal(),
)
.await
.expect("hello timeout")
.expect("recv ok")
.expect("some message");
// Wait for link setup
tokio::time::sleep(Duration::from_millis(100)).await;
// Now send media via forward_to_peers
let room = "podcast";
let rh = room_hash(room);
let media_payload = Bytes::from_static(b"test-opus-frame-1234567890");
fm.forward_to_peers(room, &rh, &media_payload).await;
// Read the datagram on the peer side
let received = tokio::time::timeout(
Duration::from_secs(2),
peer_transport.connection().read_datagram(),
)
.await
.expect("should receive media within timeout")
.expect("read_datagram ok");
// Verify tagged format: [8-byte room_hash][media_payload]
assert!(received.len() >= 8);
let mut recv_hash = [0u8; 8];
recv_hash.copy_from_slice(&received[..8]);
assert_eq!(recv_hash, rh, "room hash must match");
assert_eq!(
&received[8..],
&media_payload[..],
"media payload must match"
);
drop(peer_transport);
}
// ───── 12. Multiple global rooms: each hashes independently ─────────
#[tokio::test]
async fn multiple_global_rooms_independent_hashes() {
let global: HashSet<String> = ["podcast", "lobby", "arena"]
.iter()
.map(|s| s.to_string())
.collect();
let fm = create_test_fm(global);
let hashes: Vec<[u8; 8]> = ["podcast", "lobby", "arena"]
.iter()
.map(|r| fm.global_room_hash(r))
.collect();
// All different
assert_ne!(hashes[0], hashes[1]);
assert_ne!(hashes[1], hashes[2]);
assert_ne!(hashes[0], hashes[2]);
}
// ───── 13. is_global_room edge cases ────────────────────────────────
#[tokio::test]
async fn is_global_room_exact_match_required_for_static() {
let global: HashSet<String> = ["podcast"].iter().map(|s| s.to_string()).collect();
let fm = create_test_fm(global);
// Substring/prefix should NOT match
assert!(!fm.is_global_room("podcast-extra"));
assert!(!fm.is_global_room("pod"));
assert!(!fm.is_global_room("podcastt"));
}

View File

@@ -82,6 +82,7 @@ fn handle_answer_and_build_setups(
relay_addr: "203.0.113.5:4433".into(),
peer_direct_addr: callee_addr,
peer_local_addrs: Vec::new(),
peer_mapped_addr: None,
};
let setup_for_callee = SignalMessage::CallSetup {
call_id,
@@ -89,6 +90,7 @@ fn handle_answer_and_build_setups(
relay_addr: "203.0.113.5:4433".into(),
peer_direct_addr: caller_addr,
peer_local_addrs: Vec::new(),
peer_mapped_addr: None,
};
(setup_for_caller, setup_for_callee)
}
@@ -105,6 +107,7 @@ fn mk_offer(call_id: &str, caller_reflexive_addr: Option<&str>) -> SignalMessage
supported_profiles: vec![],
caller_reflexive_addr: caller_reflexive_addr.map(String::from),
caller_local_addrs: Vec::new(),
caller_mapped_addr: None,
caller_build_version: None,
}
}
@@ -123,6 +126,7 @@ fn mk_answer(
chosen_profile: None,
callee_reflexive_addr: callee_reflexive_addr.map(String::from),
callee_local_addrs: Vec::new(),
callee_mapped_addr: None,
callee_build_version: None,
}
}

View File

@@ -66,6 +66,8 @@ async fn spawn_mock_relay() -> (SocketAddr, tokio::task::JoinHandle<()>) {
success: true,
error: None,
relay_build: None,
relay_region: None,
available_relays: Vec::new(),
})
.await;
}

View File

@@ -11,97 +11,71 @@
</head>
<body>
<div id="app">
<!-- Connect screen -->
<div id="connect-screen">
<h1>WarzonePhone</h1>
<p class="subtitle">Encrypted Voice</p>
<div class="form">
<label>Relay
<button id="relay-selected" class="relay-selected" type="button">
<span id="relay-dot" class="dot"></span>
<span id="relay-label">Select relay...</span>
<span class="arrow">&#9881;</span>
</button>
</label>
<label>Room
<input id="room" type="text" value="general" />
</label>
<label>Alias
<input id="alias" type="text" placeholder="your name" />
</label>
<div class="form-row">
<label class="checkbox">
<input id="os-aec" type="checkbox" checked />
OS Echo Cancel
</label>
<button id="settings-btn-home" class="icon-btn" title="Settings (Cmd+,)">&#9881;</button>
<!-- ═══════════════════════════════════════════════════════
LOBBY — default view, auto-connects signal on launch
═══════════════════════════════════════════════════════ -->
<div id="lobby-screen">
<header class="lobby-header">
<div class="lobby-title-row">
<h1>WarzonePhone</h1>
<button id="settings-btn" class="icon-btn" title="Settings">&#9881;</button>
</div>
<!-- Mode toggle -->
<div class="mode-toggle" style="display:flex;gap:8px;margin-bottom:8px;">
<button id="mode-room" class="mode-btn active" style="flex:1">Room</button>
<button id="mode-direct" class="mode-btn" style="flex:1">Direct Call</button>
<div class="lobby-status-row">
<span id="lobby-dot" class="dot"></span>
<span id="lobby-relay-label" class="lobby-relay">Connecting...</span>
<span id="lobby-room-label" class="lobby-room">general</span>
</div>
<!-- Room mode (default) -->
<div id="room-mode">
<button id="connect-btn" class="primary">Connect</button>
<div class="lobby-identity">
<span id="lobby-identicon"></span>
<span id="lobby-fp" class="fp-display"></span>
</div>
</header>
<!-- Direct call mode -->
<div id="direct-mode" class="hidden">
<button id="register-btn" class="primary" style="background:#2196F3">Register on Relay</button>
<div id="direct-registered" class="hidden" style="margin-top:12px">
<div class="direct-registered-header">
<p id="registered-status" style="color:var(--green);font-size:13px;margin:0">&#x2705; Registered — waiting for calls</p>
<button id="deregister-btn" class="secondary-btn small">Deregister</button>
</div>
<div id="incoming-call-panel" class="hidden" style="background:#1B5E20;padding:12px;border-radius:8px;margin:8px 0">
<p style="font-weight:bold;margin:0 0 4px 0">Incoming Call</p>
<p id="incoming-caller" style="font-size:12px;opacity:0.8;margin:0 0 8px 0">From: unknown</p>
<div style="display:flex;gap:8px">
<button id="accept-call-btn" style="flex:1;background:var(--green);color:white;border:none;padding:8px;border-radius:6px;cursor:pointer">Accept</button>
<button id="reject-call-btn" style="flex:1;background:var(--red);color:white;border:none;padding:8px;border-radius:6px;cursor:pointer">Reject</button>
</div>
</div>
<!-- User list -->
<div class="lobby-users-section">
<div class="lobby-users-header">
<span>Online</span>
<span id="lobby-user-count" class="badge">0</span>
</div>
<div id="lobby-user-list" class="lobby-user-list">
<div class="lobby-empty">No one else is here yet</div>
</div>
</div>
<!-- Recent contacts -->
<div id="recent-contacts-section" class="hidden">
<div class="history-header">Recent contacts</div>
<div id="recent-contacts-list" class="history-list"></div>
</div>
<!-- Voice join FAB -->
<div class="lobby-fab-row">
<button id="join-voice-btn" class="fab" title="Join Voice Chat">
<span class="fab-icon">&#x1F3A7;</span>
<span class="fab-label">Join Voice</span>
</button>
</div>
<!-- Call history -->
<div id="call-history-section" class="hidden">
<div class="history-header">
History
<button id="clear-history-btn" class="link-btn">clear</button>
</div>
<div id="call-history-list" class="history-list"></div>
</div>
<label style="margin-top:8px">Call by fingerprint
<input id="target-fp" type="text" placeholder="xxxx:xxxx:xxxx:..." />
</label>
<button id="call-btn" class="primary" style="margin-top:8px">Call</button>
<p id="call-status-text" style="color:var(--yellow);font-size:13px;margin-top:4px"></p>
<!-- Incoming call banner -->
<div id="incoming-call-banner" class="incoming-banner hidden">
<div class="incoming-info">
<span id="incoming-identicon" class="incoming-identicon"></span>
<div>
<div id="incoming-caller-name" class="incoming-name">Unknown</div>
<div class="incoming-subtitle">Incoming call...</div>
</div>
</div>
<p id="connect-error" class="error"></p>
<div class="incoming-actions">
<button id="accept-call-btn" class="btn-accept">Accept</button>
<button id="reject-call-btn" class="btn-reject">Reject</button>
</div>
</div>
<div class="identity-info">
<span id="my-identicon"></span>
<span id="my-fingerprint" class="fp-display"></span>
</div>
<div class="recent-rooms" id="recent-rooms"></div>
</div>
<!-- In-call screen -->
<!-- ═══════════════════════════════════════════════════════
IN-CALL — voice active (room or direct)
═══════════════════════════════════════════════════════ -->
<div id="call-screen" class="hidden">
<div class="call-header">
<div class="call-header-row">
<button id="back-to-lobby-btn" class="icon-btn small" title="Back to lobby">&#x2190;</button>
<div id="room-name" class="room-name"></div>
<button id="settings-btn-call" class="icon-btn small" title="Settings (Cmd+,)">&#9881;</button>
<button id="settings-btn-call" class="icon-btn small" title="Settings">&#9881;</button>
</div>
<div class="call-meta">
<span id="call-status" class="status-dot"></span>
@@ -111,16 +85,14 @@
<div class="level-meter">
<div id="level-bar" class="level-bar-fill"></div>
</div>
<!-- Direct-call phone layout — shown instead of the group
participant list when directCallPeer is set. Centered
identicon, name, fp, connection badge. Hidden for
room calls (directCallPeer == null). -->
<!-- Direct-call phone layout -->
<div id="direct-call-view" class="direct-call-view hidden">
<div id="dc-identicon" class="dc-identicon"></div>
<div id="dc-name" class="dc-name">Unknown</div>
<div id="dc-fp" class="dc-fp"></div>
<div id="dc-badge" class="dc-badge">Connecting...</div>
</div>
<!-- Room participants -->
<div id="participants" class="participants"></div>
<div class="controls">
<button id="mic-btn" class="control-btn" title="Toggle Mic (m)">
@@ -136,7 +108,29 @@
<div id="stats" class="stats"></div>
</div>
<!-- Settings panel -->
<!-- ═══════════════════════════════════════════════════════
USER CONTEXT MENU (tap on user in lobby)
═══════════════════════════════════════════════════════ -->
<div id="user-context-menu" class="context-menu hidden">
<div class="context-header">
<span id="ctx-identicon" class="ctx-identicon"></span>
<div>
<div id="ctx-name" class="ctx-name">User</div>
<div id="ctx-fp" class="ctx-fp"></div>
</div>
</div>
<button id="ctx-call-btn" class="context-action">
<span>&#x1F4DE;</span> Direct Call
</button>
<button id="ctx-message-btn" class="context-action" disabled>
<span>&#x1F4AC;</span> Message (coming soon)
</button>
<button id="ctx-close-btn" class="context-action dim">Close</button>
</div>
<!-- ═══════════════════════════════════════════════════════
SETTINGS PANEL (overlay)
═══════════════════════════════════════════════════════ -->
<div id="settings-panel" class="hidden">
<div class="settings-card">
<div class="settings-header">
@@ -157,28 +151,53 @@
<div class="quality-control">
<div class="quality-header">
<span class="setting-label">QUALITY</span>
<span id="s-quality-label" class="quality-label">Auto</span>
<span id="s-quality-label" class="quality-value">Auto</span>
</div>
<input id="s-quality" type="range" min="0" max="7" step="1" value="3" class="quality-slider" />
<div class="quality-ticks">
<span>64k</span>
<span>48k</span>
<span>32k</span>
<input id="s-quality" type="range" min="0" max="6" step="1" value="6" />
<div class="quality-labels">
<span>Codec2 1.2k</span>
<span>Auto</span>
<span>24k</span>
<span>6k</span>
<span>C2</span>
<span>1.2k</span>
</div>
</div>
<label class="checkbox">
<input id="s-os-aec" type="checkbox" />
OS Echo Cancellation (macOS VoiceProcessingIO)
</label>
<label class="checkbox">
<input id="s-agc" type="checkbox" checked />
Automatic Gain Control
<input id="s-os-aec" type="checkbox" checked />
OS Echo Cancellation
</label>
</div>
<div class="settings-section">
<h3>Relays</h3>
<div id="s-relay-list"></div>
<div class="relay-add">
<input id="s-relay-name" type="text" placeholder="Name" style="flex:1" />
<input id="s-relay-addr" type="text" placeholder="host:port" style="flex:2" />
<button id="s-relay-add" class="secondary-btn small">Add</button>
</div>
</div>
<div class="settings-section">
<h3>Identity</h3>
<div>
<span class="setting-label">FINGERPRINT</span>
<div id="s-fingerprint" class="fp-display" style="margin-top:4px"></div>
</div>
<div style="margin-top:8px">
<span class="setting-label">IDENTITY FILE</span>
<div style="font-size:12px;opacity:0.6;margin-top:2px">~/.wzp/identity</div>
</div>
</div>
<div class="settings-section">
<h3>Network</h3>
<div>
<span class="setting-label">PUBLIC ADDRESS</span>
<span id="s-public-addr" style="color:var(--green);font-size:13px;margin-left:8px"></span>
<button id="s-reflect-btn" class="secondary-btn small" style="margin-left:8px">Detect</button>
</div>
<div style="margin-top:8px">
<button id="s-nat-detect-btn" class="secondary-btn" style="width:100%">Detect NAT</button>
<div id="s-nat-result" style="font-size:11px;margin-top:4px;opacity:0.7;white-space:pre-wrap"></div>
</div>
</div>
<div class="settings-section">
<h3>Debug</h3>
<label class="checkbox">
<input id="s-dred-debug" type="checkbox" />
DRED debug logs (verbose, dev only)
@@ -187,6 +206,14 @@
<input id="s-call-debug" type="checkbox" />
Call flow debug logs (trace every step of a call)
</label>
<label class="checkbox">
<input id="s-direct-only" type="checkbox" />
Direct-only mode (no relay fallback)
</label>
<label class="checkbox">
<input id="s-birthday-attack" type="checkbox" />
Birthday attack (extra ports for hard NAT — adds ~3s)
</label>
</div>
<div class="settings-section" id="s-call-debug-section" style="display:none">
<h3>Call Debug Log</h3>
@@ -197,92 +224,8 @@
<button id="s-call-debug-clear" class="secondary-btn" style="flex:1">Clear log</button>
</div>
<small id="s-call-debug-copy-status" style="display:block;margin-top:4px;color:var(--text-dim);font-size:10px"></small>
<small style="color:var(--text-dim);display:block;margin-top:4px">
Rolling buffer of the last 200 call-flow events. Turned off by
default — the GUI overlay only populates when the checkbox above
is on, but logcat (adb) always keeps a copy regardless.
</small>
</div>
<div class="settings-section">
<h3>Identity</h3>
<div class="setting-row">
<span class="setting-label">Fingerprint</span>
<span id="s-fingerprint" class="fp-display-large"></span>
</div>
<div class="setting-row">
<span class="setting-label">Identity file</span>
<span class="fp-display">~/.wzp/identity</span>
</div>
</div>
<div class="settings-section">
<h3>Network</h3>
<div class="setting-row">
<span class="setting-label">Public address</span>
<span id="s-reflected-addr" class="fp-display">(not queried)</span>
<button id="s-reflect-btn" class="secondary-btn">Detect</button>
</div>
<small style="color:var(--text-dim);display:block;margin-top:4px">
Asks the registered relay to echo back the IP:port it sees for this
connection (QUIC-native NAT reflection, replaces STUN).
</small>
<div class="setting-row" style="margin-top:10px">
<span class="setting-label">NAT type</span>
<span id="s-nat-type" class="fp-display">(not detected)</span>
<button id="s-nat-detect-btn" class="secondary-btn">Detect NAT</button>
</div>
<div id="s-nat-probes" style="margin-top:6px;font-size:11px;color:var(--text-dim)"></div>
<small style="color:var(--text-dim);display:block;margin-top:4px">
Probes every configured relay in parallel and compares the results
to classify the NAT: cone (P2P viable), symmetric (must relay),
multiple, or unknown.
</small>
</div>
<div class="settings-section">
<h3>Recent Rooms</h3>
<div id="s-recent-rooms" class="recent-rooms-list"></div>
<button id="s-clear-recent" class="secondary-btn">Clear History</button>
</div>
<button id="settings-save" class="primary">Save</button>
</div>
</div>
<!-- Manage Relays dialog -->
<div id="relay-dialog" class="hidden">
<div class="settings-card relay-dialog-card">
<div class="settings-header">
<h2>Manage Relays</h2>
<button id="relay-dialog-close" class="icon-btn">&times;</button>
</div>
<div id="relay-dialog-list" class="relay-dialog-list"></div>
<div class="relay-add-row">
<div class="relay-add-inputs">
<input id="relay-add-name" type="text" placeholder="Name" />
<input id="relay-add-addr" type="text" placeholder="host:port" />
</div>
<button id="relay-add-btn" class="primary">Add Relay</button>
</div>
</div>
</div>
<!-- Key changed warning dialog -->
<div id="key-warning" class="hidden">
<div class="settings-card key-warning-card">
<div class="key-warning-icon">&#9888;</div>
<h2>Server Key Changed</h2>
<p class="key-warning-text">The relay's identity has changed since you last connected. This usually happens when the server was restarted, but could also indicate a security issue.</p>
<div class="key-warning-fps">
<div class="key-fp-row">
<span class="key-fp-label">Previously known</span>
<code id="kw-old-fp" class="key-fp"></code>
</div>
<div class="key-fp-row">
<span class="key-fp-label">New key</span>
<code id="kw-new-fp" class="key-fp"></code>
</div>
</div>
<div class="key-warning-actions">
<button id="kw-accept" class="primary">Accept New Key</button>
<button id="kw-cancel" class="secondary-btn">Cancel</button>
</div>
<button id="settings-save" class="primary" style="margin-top:12px">Save</button>
</div>
</div>
</div>

View File

@@ -13,7 +13,6 @@ use std::sync::atomic::{AtomicBool, AtomicU8, AtomicU32, AtomicU64, Ordering};
use std::sync::Arc;
use std::time::Instant;
use tauri::Emitter;
use tokio::sync::Mutex;
use tracing::{error, info};
@@ -30,6 +29,16 @@ use wzp_proto::traits::{AudioDecoder, QualityController};
use wzp_proto::{AdaptiveQualityController, CodecId, MediaTransport, QualityProfile};
const FRAME_SAMPLES_40MS: usize = 1920;
const CAPTURE_POLL_MS: u64 = 5;
const RECV_TIMEOUT_MS: u64 = 100;
const SIGNAL_TIMEOUT_MS: u64 = 200;
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
const CONNECT_TIMEOUT_SECS: u64 = 10;
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
const HEARTBEAT_INTERVAL_SECS: u64 = 2;
const DRED_POLL_INTERVAL: u32 = 25;
/// Generate and attach a QualityReport every N frames (~1s at 20ms/frame).
const QUALITY_REPORT_INTERVAL: u32 = 50;
/// Profile index mapping for the AtomicU8 adaptive-quality bridge.
const PROFILE_NO_CHANGE: u8 = 0xFF;
@@ -78,6 +87,101 @@ fn resolve_quality(quality: &str) -> Option<QualityProfile> {
}
}
/// Build a CallConfig from a quality string. Used by both Android and desktop send tasks.
fn build_call_config(quality: &str) -> CallConfig {
let profile = resolve_quality(quality);
match profile {
Some(p) => CallConfig {
noise_suppression: false,
suppression_enabled: false,
..CallConfig::from_profile(p)
},
None => CallConfig {
noise_suppression: false,
suppression_enabled: false,
..CallConfig::default()
},
}
}
/// Map a received codec ID to the corresponding QualityProfile.
/// Used by recv tasks when the peer switches codecs.
fn codec_to_profile(codec: CodecId) -> QualityProfile {
match codec {
CodecId::Opus24k => QualityProfile::GOOD,
CodecId::Opus6k => QualityProfile::DEGRADED,
CodecId::Opus32k => QualityProfile::STUDIO_32K,
CodecId::Opus48k => QualityProfile::STUDIO_48K,
CodecId::Opus64k => QualityProfile::STUDIO_64K,
CodecId::Codec2_1200 => QualityProfile::CATASTROPHIC,
CodecId::Codec2_3200 => QualityProfile {
codec: CodecId::Codec2_3200,
fec_ratio: 0.5,
frame_duration_ms: 20,
frames_per_block: 5,
},
other => QualityProfile { codec: other, ..QualityProfile::GOOD },
}
}
/// Signal handler task -- shared between Android and desktop.
/// Handles RoomUpdate (participant list), QualityDirective (relay-pushed
/// codec switch), and Hangup from the relay signal stream.
async fn run_signal_task(
transport: Arc<wzp_transport::QuinnTransport>,
running: Arc<AtomicBool>,
pending_profile: Arc<AtomicU8>,
participants: Arc<Mutex<Vec<ParticipantInfo>>>,
event_cb: Arc<dyn Fn(&str, &str) + Send + Sync>,
) {
loop {
if !running.load(Ordering::Relaxed) {
break;
}
match tokio::time::timeout(
std::time::Duration::from_millis(SIGNAL_TIMEOUT_MS),
transport.recv_signal(),
)
.await
{
Ok(Ok(Some(wzp_proto::SignalMessage::RoomUpdate {
participants: parts,
..
}))) => {
let mut seen = std::collections::HashSet::new();
let unique: Vec<ParticipantInfo> = parts
.into_iter()
.filter(|p| seen.insert((p.fingerprint.clone(), p.alias.clone())))
.map(|p| ParticipantInfo {
fingerprint: p.fingerprint,
alias: p.alias,
relay_label: p.relay_label,
})
.collect();
let count = unique.len();
*participants.lock().await = unique;
event_cb("room-update", &format!("{count} participants"));
}
Ok(Ok(Some(wzp_proto::SignalMessage::QualityDirective {
recommended_profile,
reason,
}))) => {
let idx = profile_to_index(&recommended_profile);
info!(
codec = ?recommended_profile.codec,
reason = reason.as_deref().unwrap_or(""),
"relay quality directive: switching profile"
);
pending_profile.store(idx, Ordering::Release);
}
Ok(Ok(Some(_))) => {}
Ok(Ok(None)) => break,
Ok(Err(_)) => break,
Err(_) => {}
}
}
}
/// Wrapper to make non-Sync audio handles safe to store in shared state.
/// The audio handle is only accessed from the thread that created it (drop),
/// never shared across threads — Sync is safe.
@@ -395,7 +499,7 @@ impl CallEngine {
};
let client_config = wzp_transport::client_config();
let conn = match tokio::time::timeout(
std::time::Duration::from_secs(10),
std::time::Duration::from_secs(CONNECT_TIMEOUT_SECS),
wzp_transport::connect(&endpoint, relay_addr, &room, client_config),
).await {
Ok(Ok(c)) => c,
@@ -404,8 +508,8 @@ impl CallEngine {
return Err(e.into());
}
Err(_) => {
error!("connect TIMED OUT after 10s — QUIC handshake never completed. Relay may be unreachable from this endpoint.");
return Err(anyhow::anyhow!("QUIC connect timeout (10s)"));
error!("connect TIMED OUT after {CONNECT_TIMEOUT_SECS}s — QUIC handshake never completed. Relay may be unreachable from this endpoint.");
return Err(anyhow::anyhow!("QUIC connect timeout ({CONNECT_TIMEOUT_SECS}s)"));
}
};
info!(t_ms = call_t0.elapsed().as_millis(), "first-join diag: QUIC connection established, performing handshake");
@@ -525,32 +629,22 @@ impl CallEngine {
let send_app = app.clone();
let send_pending_profile = pending_profile.clone();
tokio::spawn(async move {
let profile = resolve_quality(&send_quality);
let config = match profile {
Some(p) => CallConfig {
noise_suppression: false,
suppression_enabled: false,
..CallConfig::from_profile(p)
},
None => CallConfig {
noise_suppression: false,
suppression_enabled: false,
..CallConfig::default()
},
};
let frame_samples = (config.profile.frame_duration_ms as usize) * 48;
let config = build_call_config(&send_quality);
let mut frame_samples = (config.profile.frame_duration_ms as usize) * 48;
info!(codec = ?config.profile.codec, frame_samples, t_ms = send_t0.elapsed().as_millis(), "first-join diag: send task spawned (android/oboe)");
*send_tx_codec.lock().await = format!("{:?}", config.profile.codec);
let mut encoder = CallEncoder::new(&config);
encoder.set_aec_enabled(false);
let mut buf = vec![0i16; frame_samples];
// Sized for max frame (40ms = 1920 samples) so profile
// switches between 20ms ↔ 40ms codecs don't need realloc.
let mut buf = vec![0i16; 1920];
// Continuous DRED tuning: poll quinn path stats every 25
// frames (~500 ms at 20 ms/frame) and adjust DRED duration +
// expected-loss hint based on real-time network conditions.
let mut dred_tuner = wzp_proto::DredTuner::new(config.profile.codec);
let mut frames_since_dred_poll: u32 = 0;
const DRED_POLL_INTERVAL: u32 = 25;
let mut frames_since_quality_report: u32 = 0;
let mut heartbeat = std::time::Instant::now();
let mut last_rms: u32 = 0;
@@ -568,14 +662,19 @@ impl CallEngine {
if !send_r.load(Ordering::Relaxed) {
break;
}
// wzp-native doesn't expose `available()`, so we just try
// to read a full frame and sleep briefly if the ring is
// short. Oboe's capture callback fills at a steady rate
// so in steady state this spins once per frame.
let read = crate::wzp_native::audio_read_capture(&mut buf);
if read < frame_samples {
// Check ring has enough samples before reading to avoid
// partial reads that consume samples and then get
// overwritten on the next attempt (caused 40ms codecs
// like Opus6k to produce ~11 frames/s instead of 25).
if crate::wzp_native::audio_capture_available() < frame_samples {
short_reads += 1;
tokio::time::sleep(std::time::Duration::from_millis(CAPTURE_POLL_MS)).await;
continue;
}
let read = crate::wzp_native::audio_read_capture(&mut buf[..frame_samples]);
if read < frame_samples {
// Shouldn't happen after available() check, but guard anyway.
short_reads += 1;
tokio::time::sleep(std::time::Duration::from_millis(5)).await;
continue;
}
if !first_full_read_logged {
@@ -589,8 +688,8 @@ impl CallEngine {
}
// RMS for UI meter
let sum_sq: f64 = buf.iter().map(|&s| (s as f64) * (s as f64)).sum();
let rms = (sum_sq / buf.len() as f64).sqrt() as u32;
let sum_sq: f64 = buf[..frame_samples].iter().map(|&s| (s as f64) * (s as f64)).sum();
let rms = (sum_sq / frame_samples as f64).sqrt() as u32;
send_level.store(rms, Ordering::Relaxed);
last_rms = rms;
if !first_nonzero_rms_logged && rms > 0 {
@@ -603,9 +702,9 @@ impl CallEngine {
}
if send_mic.load(Ordering::Relaxed) {
buf.fill(0);
buf[..frame_samples].fill(0);
}
match encoder.encode_frame(&buf) {
match encoder.encode_frame(&buf[..frame_samples]) {
Ok(pkts) => {
for pkt in &pkts {
last_pkt_bytes = pkt.payload.len();
@@ -646,8 +745,10 @@ impl CallEngine {
let p = send_pending_profile.swap(PROFILE_NO_CHANGE, Ordering::Acquire);
if p != PROFILE_NO_CHANGE {
if let Some(new_profile) = index_to_profile(p) {
info!(to = ?new_profile.codec, "auto: switching encoder profile");
let new_fs = (new_profile.frame_duration_ms as usize) * 48;
info!(to = ?new_profile.codec, frame_samples = new_fs, "auto: switching encoder profile (android)");
if encoder.set_profile(new_profile).is_ok() {
frame_samples = new_fs;
dred_tuner.set_codec(new_profile.codec);
*send_tx_codec.lock().await = format!("{:?}", new_profile.codec);
}
@@ -683,8 +784,23 @@ impl CallEngine {
}
}
// Quality report: generate from quinn stats and attach to next packet.
// The peer's recv task (or relay) uses this for adaptive quality.
frames_since_quality_report += 1;
if frames_since_quality_report >= QUALITY_REPORT_INTERVAL {
frames_since_quality_report = 0;
let snap = send_t.quinn_path_stats();
let pq = send_t.path_quality();
let report = wzp_proto::QualityReport::from_path_stats(
snap.loss_pct,
snap.rtt_ms,
pq.jitter_ms,
);
encoder.set_pending_quality_report(report);
}
// Heartbeat every 2s with capture+encode+send state
if heartbeat.elapsed() >= std::time::Duration::from_secs(2) {
if heartbeat.elapsed() >= std::time::Duration::from_secs(HEARTBEAT_INTERVAL_SECS) {
let fs = send_fs.load(Ordering::Relaxed);
let drops = send_drops.load(Ordering::Relaxed);
info!(
@@ -744,6 +860,7 @@ impl CallEngine {
// above for the full flow.
let mut dred_recv = DredRecvState::new();
let mut quality_ctrl = AdaptiveQualityController::new();
let mut recv_quality_counter: u32 = 0;
info!(codec = ?current_codec, t_ms = recv_t0.elapsed().as_millis(), "first-join diag: recv task spawned (android/oboe)");
// First-join diagnostic latches — see send task above for the
// sibling capture milestones.
@@ -801,7 +918,7 @@ impl CallEngine {
break;
}
match tokio::time::timeout(
std::time::Duration::from_millis(100),
std::time::Duration::from_millis(RECV_TIMEOUT_MS),
recv_t.recv_media(),
)
.await
@@ -840,19 +957,7 @@ impl CallEngine {
if *rx != codec_name { *rx = codec_name; }
}
if pkt.header.codec_id != current_codec {
let new_profile = match pkt.header.codec_id {
CodecId::Opus24k => QualityProfile::GOOD,
CodecId::Opus6k => QualityProfile::DEGRADED,
CodecId::Opus32k => QualityProfile::STUDIO_32K,
CodecId::Opus48k => QualityProfile::STUDIO_48K,
CodecId::Opus64k => QualityProfile::STUDIO_64K,
CodecId::Codec2_1200 => QualityProfile::CATASTROPHIC,
CodecId::Codec2_3200 => QualityProfile {
codec: CodecId::Codec2_3200,
fec_ratio: 0.5, frame_duration_ms: 20, frames_per_block: 5,
},
other => QualityProfile { codec: other, ..QualityProfile::GOOD },
};
let new_profile = codec_to_profile(pkt.header.codec_id);
info!(from = ?current_codec, to = ?pkt.header.codec_id, "recv: switching decoder");
let _ = decoder.set_profile(new_profile);
current_profile = new_profile;
@@ -902,6 +1007,29 @@ impl CallEngine {
}
}
// P2P self-observation: if no quality reports from peer,
// generate local observations from our own QUIC path stats.
// This ensures adaptive quality works even on P2P calls
// where the peer hasn't been updated to send reports yet.
recv_quality_counter += 1;
if recv_quality_counter >= QUALITY_REPORT_INTERVAL {
recv_quality_counter = 0;
let snap = recv_t.quinn_path_stats();
let pq = recv_t.path_quality();
let local_report = wzp_proto::QualityReport::from_path_stats(
snap.loss_pct,
snap.rtt_ms,
pq.jitter_ms,
);
if auto_profile {
if let Some(new_profile) = quality_ctrl.observe(&local_report) {
let idx = profile_to_index(&new_profile);
info!(to = ?new_profile.codec, "auto: local quality observation recommends switch");
pending_profile_recv.store(idx, Ordering::Release);
}
}
}
match decoder.decode(&pkt.payload, &mut pcm) {
Ok(n) => {
last_decode_n = n;
@@ -1006,7 +1134,7 @@ impl CallEngine {
}
// Heartbeat every 2s with decode+playout state
if heartbeat.elapsed() >= std::time::Duration::from_secs(2) {
if heartbeat.elapsed() >= std::time::Duration::from_secs(HEARTBEAT_INTERVAL_SECS) {
let fr = recv_fr.load(Ordering::Relaxed);
if wzp_codec::dred_verbose_logs() {
info!(
@@ -1114,48 +1242,15 @@ impl CallEngine {
}
});
// Signal task (presence — same shape as desktop).
let sig_t = transport.clone();
let sig_r = running.clone();
let sig_p = participants.clone();
// Signal task (presence + quality directives).
let event_cb = Arc::new(event_cb);
let sig_cb = event_cb.clone();
tokio::spawn(async move {
loop {
if !sig_r.load(Ordering::Relaxed) {
break;
}
match tokio::time::timeout(
std::time::Duration::from_millis(200),
sig_t.recv_signal(),
)
.await
{
Ok(Ok(Some(wzp_proto::SignalMessage::RoomUpdate {
participants: parts,
..
}))) => {
let mut seen = std::collections::HashSet::new();
let unique: Vec<ParticipantInfo> = parts
.into_iter()
.filter(|p| seen.insert((p.fingerprint.clone(), p.alias.clone())))
.map(|p| ParticipantInfo {
fingerprint: p.fingerprint,
alias: p.alias,
relay_label: p.relay_label,
})
.collect();
let count = unique.len();
*sig_p.lock().await = unique;
sig_cb("room-update", &format!("{count} participants"));
}
Ok(Ok(Some(_))) => {}
Ok(Ok(None)) => break,
Ok(Err(_)) => break,
Err(_) => {}
}
}
});
tokio::spawn(run_signal_task(
transport.clone(),
running.clone(),
pending_profile.clone(),
participants.clone(),
event_cb.clone(),
));
Ok(Self {
running,
@@ -1327,52 +1422,41 @@ impl CallEngine {
let send_tx_codec = tx_codec.clone();
let send_pending_profile = pending_profile.clone();
tokio::spawn(async move {
let profile = resolve_quality(&send_quality);
let config = match profile {
Some(p) => CallConfig {
noise_suppression: false,
suppression_enabled: false,
..CallConfig::from_profile(p)
},
None => CallConfig {
noise_suppression: false,
suppression_enabled: false,
..CallConfig::default()
},
};
let frame_samples = (config.profile.frame_duration_ms as usize) * 48;
let config = build_call_config(&send_quality);
let mut frame_samples = (config.profile.frame_duration_ms as usize) * 48;
info!(codec = ?config.profile.codec, frame_samples, "send task starting");
*send_tx_codec.lock().await = format!("{:?}", config.profile.codec);
let mut encoder = CallEncoder::new(&config);
encoder.set_aec_enabled(false); // OS AEC or none
let mut buf = vec![0i16; frame_samples];
let mut buf = vec![0i16; 1920]; // max frame (40ms)
// Continuous DRED tuning (same as Android send task).
let mut dred_tuner = wzp_proto::DredTuner::new(config.profile.codec);
let mut frames_since_dred_poll: u32 = 0;
const DRED_POLL_INTERVAL: u32 = 25;
let mut frames_since_quality_report: u32 = 0;
loop {
if !send_r.load(Ordering::Relaxed) {
break;
}
if capture_ring.available() < frame_samples {
tokio::time::sleep(std::time::Duration::from_millis(5)).await;
tokio::time::sleep(std::time::Duration::from_millis(CAPTURE_POLL_MS)).await;
continue;
}
capture_ring.read(&mut buf);
capture_ring.read(&mut buf[..frame_samples]);
// Compute RMS audio level for UI meter
if !buf.is_empty() {
let sum_sq: f64 = buf.iter().map(|&s| (s as f64) * (s as f64)).sum();
let rms = (sum_sq / buf.len() as f64).sqrt() as u32;
{
let pcm = &buf[..frame_samples];
let sum_sq: f64 = pcm.iter().map(|&s| (s as f64) * (s as f64)).sum();
let rms = (sum_sq / pcm.len() as f64).sqrt() as u32;
send_level.store(rms, Ordering::Relaxed);
}
if send_mic.load(Ordering::Relaxed) {
buf.fill(0);
buf[..frame_samples].fill(0);
}
match encoder.encode_frame(&buf) {
match encoder.encode_frame(&buf[..frame_samples]) {
Ok(pkts) => {
for pkt in &pkts {
if let Err(e) = send_t.send_media(pkt).await {
@@ -1393,8 +1477,10 @@ impl CallEngine {
let p = send_pending_profile.swap(PROFILE_NO_CHANGE, Ordering::Acquire);
if p != PROFILE_NO_CHANGE {
if let Some(new_profile) = index_to_profile(p) {
info!(to = ?new_profile.codec, "auto: switching encoder profile");
let new_fs = (new_profile.frame_duration_ms as usize) * 48;
info!(to = ?new_profile.codec, frame_samples = new_fs, "auto: switching encoder profile (desktop)");
if encoder.set_profile(new_profile).is_ok() {
frame_samples = new_fs;
dred_tuner.set_codec(new_profile.codec);
*send_tx_codec.lock().await = format!("{:?}", new_profile.codec);
}
@@ -1416,6 +1502,21 @@ impl CallEngine {
encoder.apply_dred_tuning(tuning);
}
}
// Quality report: generate from quinn stats and attach to next packet.
// The peer's recv task (or relay) uses this for adaptive quality.
frames_since_quality_report += 1;
if frames_since_quality_report >= QUALITY_REPORT_INTERVAL {
frames_since_quality_report = 0;
let snap = send_t.quinn_path_stats();
let pq = send_t.path_quality();
let report = wzp_proto::QualityReport::from_path_stats(
snap.loss_pct,
snap.rtt_ms,
pq.jitter_ms,
);
encoder.set_pending_quality_report(report);
}
}
});
@@ -1439,13 +1540,14 @@ impl CallEngine {
let mut pcm = vec![0i16; FRAME_SAMPLES_40MS]; // big enough for any codec
let mut dred_recv = DredRecvState::new();
let mut quality_ctrl = AdaptiveQualityController::new();
let mut recv_quality_counter: u32 = 0;
loop {
if !recv_r.load(Ordering::Relaxed) {
break;
}
match tokio::time::timeout(
std::time::Duration::from_millis(100),
std::time::Duration::from_millis(RECV_TIMEOUT_MS),
recv_t.recv_media(),
)
.await
@@ -1460,19 +1562,7 @@ impl CallEngine {
}
// Auto-switch decoder if incoming codec differs
if pkt.header.codec_id != current_codec {
let new_profile = match pkt.header.codec_id {
CodecId::Opus24k => QualityProfile::GOOD,
CodecId::Opus6k => QualityProfile::DEGRADED,
CodecId::Opus32k => QualityProfile::STUDIO_32K,
CodecId::Opus48k => QualityProfile::STUDIO_48K,
CodecId::Opus64k => QualityProfile::STUDIO_64K,
CodecId::Codec2_1200 => QualityProfile::CATASTROPHIC,
CodecId::Codec2_3200 => QualityProfile {
codec: CodecId::Codec2_3200,
fec_ratio: 0.5, frame_duration_ms: 20, frames_per_block: 5,
},
other => QualityProfile { codec: other, ..QualityProfile::GOOD },
};
let new_profile = codec_to_profile(pkt.header.codec_id);
info!(from = ?current_codec, to = ?pkt.header.codec_id, "recv: switching decoder");
let _ = decoder.set_profile(new_profile);
current_profile = new_profile;
@@ -1512,6 +1602,29 @@ impl CallEngine {
}
}
// P2P self-observation: if no quality reports from peer,
// generate local observations from our own QUIC path stats.
// This ensures adaptive quality works even on P2P calls
// where the peer hasn't been updated to send reports yet.
recv_quality_counter += 1;
if recv_quality_counter >= QUALITY_REPORT_INTERVAL {
recv_quality_counter = 0;
let snap = recv_t.quinn_path_stats();
let pq = recv_t.path_quality();
let local_report = wzp_proto::QualityReport::from_path_stats(
snap.loss_pct,
snap.rtt_ms,
pq.jitter_ms,
);
if auto_profile {
if let Some(new_profile) = quality_ctrl.observe(&local_report) {
let idx = profile_to_index(&new_profile);
info!(to = ?new_profile.codec, "auto: local quality observation recommends switch");
pending_profile_recv.store(idx, Ordering::Release);
}
}
}
if let Ok(n) = decoder.decode(&pkt.payload, &mut pcm) {
agc.process_frame(&mut pcm[..n]);
if !recv_spk.load(Ordering::Relaxed) {
@@ -1534,48 +1647,15 @@ impl CallEngine {
}
});
// Signal task (presence)
let sig_t = transport.clone();
let sig_r = running.clone();
let sig_p = participants.clone();
// Signal task (presence + quality directives)
let event_cb = Arc::new(event_cb);
let sig_cb = event_cb.clone();
tokio::spawn(async move {
loop {
if !sig_r.load(Ordering::Relaxed) {
break;
}
match tokio::time::timeout(
std::time::Duration::from_millis(200),
sig_t.recv_signal(),
)
.await
{
Ok(Ok(Some(wzp_proto::SignalMessage::RoomUpdate {
participants: parts,
..
}))) => {
let mut seen = std::collections::HashSet::new();
let unique: Vec<ParticipantInfo> = parts
.into_iter()
.filter(|p| seen.insert((p.fingerprint.clone(), p.alias.clone())))
.map(|p| ParticipantInfo {
fingerprint: p.fingerprint,
alias: p.alias,
relay_label: p.relay_label,
})
.collect();
let count = unique.len();
*sig_p.lock().await = unique;
sig_cb("room-update", &format!("{count} participants"));
}
Ok(Ok(Some(_))) => {}
Ok(Ok(None)) => break,
Ok(Err(_)) => break,
Err(_) => {}
}
}
});
tokio::spawn(run_signal_task(
transport.clone(),
running.clone(),
pending_profile.clone(),
participants.clone(),
event_cb.clone(),
));
Ok(Self {
running,

View File

@@ -330,12 +330,27 @@ async fn connect(
// Optional so the room-join path (which has no peer addrs)
// can omit it entirely — it's only populated on direct calls.
peer_local_addrs: Option<Vec<String>>,
// Phase 8 (Tailscale-inspired): peer's port-mapped external
// address from NAT-PMP/PCP/UPnP, carried in CallSetup.
peer_mapped_addr: Option<String>,
// Debug: when true, skip relay fallback entirely — the call
// fails if direct P2P doesn't connect. Useful for testing NAT
// traversal without the relay masking failures.
direct_only: Option<bool>,
// Enable birthday attack for hard NAT traversal. Adds ~3s to
// call setup when peer has symmetric NAT.
birthday_attack: Option<bool>,
) -> Result<String, String> {
let force_direct = direct_only.unwrap_or(false);
let enable_birthday = birthday_attack.unwrap_or(false);
emit_call_debug(&app, "connect:start", serde_json::json!({
"relay": relay,
"room": room,
"peer_direct_addr": peer_direct_addr,
"peer_local_addrs": peer_local_addrs,
"peer_mapped_addr": peer_mapped_addr,
"direct_only": force_direct,
"birthday_attack": enable_birthday,
}));
let mut engine_lock = state.engine.lock().await;
if engine_lock.is_some() {
@@ -396,9 +411,97 @@ async fn connect(
(Some(r), Some(relay_sockaddr))
if peer_addr_parsed.is_some() || !peer_local_parsed.is_empty() =>
{
// Phase 8: parse peer_mapped_addr from CallSetup
let peer_mapped_parsed: Option<std::net::SocketAddr> = peer_mapped_addr
.as_deref()
.and_then(|s| s.parse().ok());
// Phase 8.6: if peer sent a HardNatProbe with sequential
// allocation, predict their next ports and add as candidates.
let mut predicted_addrs: Vec<std::net::SocketAddr> = Vec::new();
{
let sig = state.signal.lock().await;
if let Some(ref probe) = sig.peer_hard_nat_probe {
if let Some(delta) = parse_sequential_delta(&probe.allocation) {
if let Some(&last_port) = probe.port_sequence.first() {
let predicted = wzp_client::stun::predict_ports(
last_port, delta, 1, 3,
);
for p in predicted {
predicted_addrs.push(
std::net::SocketAddr::new(probe.external_ip, p)
);
}
tracing::info!(
delta,
last_port,
predicted_count = predicted_addrs.len(),
"connect: added predicted ports from HardNatProbe"
);
emit_call_debug(&app, "connect:hard_nat_predicted", serde_json::json!({
"delta": delta,
"last_port": last_port,
"predicted": predicted_addrs.iter().map(|a| a.to_string()).collect::<Vec<_>>(),
}));
}
}
}
}
// Phase 8.6: if peer sent birthday attack ports, add
// them as extra candidates the Dialer can target.
// Only wait for birthday ports if we know the peer has
// a non-cone NAT (from HardNatProbe). Otherwise start
// the race immediately — LAN/cone calls shouldn't wait.
let mut birthday_addrs: Vec<std::net::SocketAddr> = Vec::new();
{
let peer_needs_birthday = enable_birthday && {
let sig = state.signal.lock().await;
sig.peer_hard_nat_probe.as_ref()
.map(|p| p.allocation != "port-preserving")
.unwrap_or(false)
};
if peer_needs_birthday {
// Wait up to 3s for BirthdayStart (Acceptor needs
// time to open ports + STUN-probe them).
for _ in 0..6 {
let sig = state.signal.lock().await;
if sig.peer_birthday_ports.is_some() { break; }
drop(sig);
tokio::time::sleep(std::time::Duration::from_millis(500)).await;
}
}
let sig = state.signal.lock().await;
if let Some(ref bday) = sig.peer_birthday_ports {
let targets = wzp_client::birthday::generate_dialer_targets(
match bday.external_ip {
std::net::IpAddr::V4(ip) => ip,
_ => std::net::Ipv4Addr::UNSPECIFIED,
},
&bday.ports,
64, // spray up to 64 targets
);
birthday_addrs = targets;
tracing::info!(
birthday_targets = birthday_addrs.len(),
known_ports = bday.ports.len(),
"connect: adding birthday attack targets"
);
emit_call_debug(&app, "connect:birthday_targets", serde_json::json!({
"known_ports": bday.ports,
"total_targets": birthday_addrs.len(),
}));
}
}
let mut all_local = peer_local_parsed.clone();
all_local.extend(predicted_addrs);
all_local.extend(birthday_addrs);
let candidates = wzp_client::dual_path::PeerCandidates {
reflexive: peer_addr_parsed,
local: peer_local_parsed.clone(),
local: all_local,
mapped: peer_mapped_parsed,
};
tracing::info!(
role = ?r,
@@ -408,19 +511,18 @@ async fn connect(
own = ?own_reflex_addr,
"connect: starting dual-path race"
);
let own_reflex_parsed: Option<std::net::SocketAddr> =
own_reflex_addr.as_deref().and_then(|s| s.parse().ok());
emit_call_debug(&app, "connect:dual_path_race_start", serde_json::json!({
"role": format!("{:?}", r),
"peer_reflex": peer_addr_parsed.map(|a| a.to_string()),
"peer_mapped": peer_mapped_parsed.map(|a| a.to_string()),
"peer_local": peer_local_parsed.iter().map(|a| a.to_string()).collect::<Vec<_>>(),
"dial_order_raw": candidates.dial_order().iter().map(|a| a.to_string()).collect::<Vec<_>>(),
"dial_order_smart": candidates.smart_dial_order(own_reflex_parsed.as_ref()).iter().map(|a| a.to_string()).collect::<Vec<_>>(),
"relay_addr": relay_sockaddr.to_string(),
"own_reflex_addr": own_reflex_addr,
}));
// Phase 6 fix: install the oneshot BEFORE the race
// starts. The peer's MediaPathReport can arrive
// while our race is still running — if we set up
// the oneshot after the race, the recv loop has
// nowhere to send the report and it gets dropped,
// causing a 3s timeout and false relay fallback.
let (path_report_tx, path_report_rx) = tokio::sync::oneshot::channel::<bool>();
{
let mut sig = state.signal.lock().await;
@@ -435,6 +537,7 @@ async fn connect(
relay_sockaddr,
room_sni,
call_sni,
own_reflex_parsed,
signal_endpoint_for_race.clone(),
ipv6_endpoint_for_race.clone(),
)
@@ -453,6 +556,7 @@ async fn connect(
"local_winner": format!("{:?}", local_winner),
"local_direct_ok": local_direct_ok,
"has_relay": race_result.relay_transport.is_some(),
"candidate_diags": race_result.candidate_diags,
}));
// Phase 6: send our report to the peer and
@@ -526,7 +630,20 @@ async fn connect(
"local_direct_ok": local_direct_ok,
"peer_direct_ok": peer_direct_ok,
"chosen_path": format!("{:?}", chosen_path),
"direct_only": force_direct,
}));
// direct_only mode: refuse relay fallback
if force_direct && !use_direct {
let reason = format!(
"direct_only: P2P failed (local_ok={local_direct_ok}, peer_ok={peer_direct_ok})"
);
emit_call_debug(&app, "connect:direct_only_failed", serde_json::json!({
"reason": reason,
"candidate_diags": race_result.candidate_diags,
}));
return Err(reason);
}
tracing::info!(
?chosen_path,
use_direct,
@@ -943,6 +1060,34 @@ struct SignalState {
/// peer's, it installs a oneshot sender here. The recv loop
/// fires it when MediaPathReport arrives.
pending_path_report: Option<tokio::sync::oneshot::Sender<bool>>,
/// Phase 8.6: peer's HardNatProbe data, if received. The connect
/// command reads this to generate predicted port candidates for
/// sequential NATs.
peer_hard_nat_probe: Option<PeerHardNatInfo>,
/// Phase 8.6: peer's birthday attack ports, if received.
peer_birthday_ports: Option<PeerBirthdayInfo>,
}
/// Parsed data from a peer's HardNatBirthdayStart signal.
#[derive(Debug, Clone)]
struct PeerBirthdayInfo {
external_ip: std::net::IpAddr,
ports: Vec<u16>,
}
/// Parsed data from a peer's HardNatProbe signal.
#[derive(Debug, Clone)]
struct PeerHardNatInfo {
external_ip: std::net::IpAddr,
port_sequence: Vec<u16>,
allocation: String,
}
/// Parse "sequential(delta=N)" allocation string into the delta value.
fn parse_sequential_delta(allocation: &str) -> Option<i16> {
let s = allocation.strip_prefix("sequential(delta=")?;
let s = s.strip_suffix(')')?;
s.parse().ok()
}
#[tauri::command]
@@ -1149,7 +1294,7 @@ fn do_register_signal(
"peer_build": callee_build_version,
}));
}
Ok(Some(SignalMessage::CallSetup { call_id, room, relay_addr, peer_direct_addr, peer_local_addrs })) => {
Ok(Some(SignalMessage::CallSetup { call_id, room, relay_addr, peer_direct_addr, peer_local_addrs, peer_mapped_addr })) => {
// Phase 3: peer_direct_addr carries the OTHER party's
// reflex addr. Phase 5.5: peer_local_addrs carries
// their LAN host candidates (usable for same-LAN
@@ -1168,6 +1313,7 @@ fn do_register_signal(
"relay_addr": relay_addr,
"peer_direct_addr": peer_direct_addr,
"peer_local_addrs": peer_local_addrs,
"peer_mapped_addr": peer_mapped_addr,
}));
let mut sig = signal_state.lock().await;
sig.signal_status = "setup".into();
@@ -1180,6 +1326,7 @@ fn do_register_signal(
"relay_addr": relay_addr,
"peer_direct_addr": peer_direct_addr,
"peer_local_addrs": peer_local_addrs,
"peer_mapped_addr": peer_mapped_addr,
}),
);
}
@@ -1214,6 +1361,170 @@ fn do_register_signal(
let _ = tx.send(direct_ok);
}
}
Ok(Some(SignalMessage::CandidateUpdate { call_id, reflexive_addr, local_addrs, mapped_addr, generation })) => {
// Phase 8: peer re-gathered candidates after a
// network change. Emit to JS for UI notification
// and potential transport re-race.
tracing::info!(
%call_id,
generation,
reflexive = ?reflexive_addr,
mapped = ?mapped_addr,
local_count = local_addrs.len(),
"signal: CandidateUpdate from peer"
);
emit_call_debug(&app_clone, "recv:CandidateUpdate", serde_json::json!({
"call_id": call_id,
"generation": generation,
"reflexive_addr": reflexive_addr,
"local_addrs": local_addrs,
"mapped_addr": mapped_addr,
}));
let _ = app_clone.emit("signal-event", serde_json::json!({
"type": "candidate_update",
"call_id": call_id,
"generation": generation,
"reflexive_addr": reflexive_addr,
"local_addrs": local_addrs,
"mapped_addr": mapped_addr,
}));
// TODO Phase 8: use IceAgent.apply_peer_update() +
// race_upgrade() to attempt transport hot-swap
}
Ok(Some(SignalMessage::HardNatProbe { call_id, port_sequence, allocation, probe_time_ms, external_ip })) => {
tracing::info!(
%call_id,
%allocation,
ports = ?port_sequence,
%external_ip,
probe_time_ms,
"signal: HardNatProbe from peer"
);
emit_call_debug(&app_clone, "recv:HardNatProbe", serde_json::json!({
"call_id": call_id,
"allocation": allocation,
"port_sequence": port_sequence,
"external_ip": external_ip,
}));
// Stash for the connect command to use in port prediction
if let Ok(ip) = external_ip.parse::<std::net::IpAddr>() {
let mut sig = signal_state.lock().await;
sig.peer_hard_nat_probe = Some(PeerHardNatInfo {
external_ip: ip,
port_sequence: port_sequence.clone(),
allocation: allocation.clone(),
});
}
// If peer has a random/symmetric NAT and WE are the
// Acceptor, open birthday attack ports and send
// BirthdayStart so the peer can spray us.
if allocation == "random" || allocation.starts_with("sequential") {
let state_bg = signal_state.clone();
let app_bg = app_clone.clone();
let call_id_bg = call_id.clone();
tokio::spawn(async move {
let config = wzp_client::birthday::BirthdayConfig::default();
let (result, _sockets) = wzp_client::birthday::open_acceptor_ports(&config).await;
if result.succeeded > 0 {
let ext_ports: Vec<u16> = result.ports.iter().map(|p| p.external_port).collect();
let ext_ip = result.external_ip
.map(|ip| ip.to_string())
.unwrap_or_default();
emit_call_debug(&app_bg, "birthday:acceptor_ports_opened", serde_json::json!({
"succeeded": result.succeeded,
"external_ip": ext_ip,
"ports": ext_ports,
}));
let sig = state_bg.lock().await;
if let Some(ref t) = sig.transport {
let _ = t.send_signal(&wzp_proto::SignalMessage::HardNatBirthdayStart {
call_id: call_id_bg,
acceptor_port_count: result.succeeded,
acceptor_ports: ext_ports,
external_ip: ext_ip,
}).await;
}
// Keep _sockets alive for 10s so NAT mappings persist
tokio::time::sleep(std::time::Duration::from_secs(10)).await;
}
});
}
}
Ok(Some(SignalMessage::PresenceList { users })) => {
tracing::info!(count = users.len(), "signal: PresenceList received");
// Emit to JS frontend for lobby user list
let user_list: Vec<serde_json::Value> = users.iter().map(|u| {
serde_json::json!({
"fingerprint": u.fingerprint,
"alias": u.alias,
})
}).collect();
let _ = app_clone.emit("signal-event", serde_json::json!({
"type": "presence_list",
"users": user_list,
}));
}
Ok(Some(SignalMessage::UpgradeProposal { call_id, proposal_id, proposed_profile, local_loss_pct, local_rtt_ms })) => {
tracing::info!(%call_id, %proposal_id, ?proposed_profile, "signal: UpgradeProposal from peer");
emit_call_debug(&app_clone, "recv:UpgradeProposal", serde_json::json!({
"call_id": call_id, "proposal_id": proposal_id,
"proposed_profile": format!("{proposed_profile:?}"),
"peer_loss_pct": local_loss_pct, "peer_rtt_ms": local_rtt_ms,
}));
// TODO: auto-accept if our own quality supports it,
// or surface to UI for manual accept/reject
}
Ok(Some(SignalMessage::UpgradeResponse { call_id, proposal_id, accepted, reason })) => {
tracing::info!(%call_id, %proposal_id, accepted, ?reason, "signal: UpgradeResponse from peer");
emit_call_debug(&app_clone, "recv:UpgradeResponse", serde_json::json!({
"call_id": call_id, "proposal_id": proposal_id,
"accepted": accepted, "reason": reason,
}));
// TODO: if accepted, send UpgradeConfirm + switch encoder
}
Ok(Some(SignalMessage::UpgradeConfirm { call_id, proposal_id, confirmed_profile })) => {
tracing::info!(%call_id, %proposal_id, ?confirmed_profile, "signal: UpgradeConfirm");
emit_call_debug(&app_clone, "recv:UpgradeConfirm", serde_json::json!({
"call_id": call_id, "proposal_id": proposal_id,
"confirmed_profile": format!("{confirmed_profile:?}"),
}));
// TODO: switch encoder to confirmed_profile at next frame boundary
}
Ok(Some(SignalMessage::QualityCapability { call_id, max_profile, loss_pct, rtt_ms })) => {
tracing::info!(%call_id, ?max_profile, "signal: QualityCapability from peer");
emit_call_debug(&app_clone, "recv:QualityCapability", serde_json::json!({
"call_id": call_id,
"peer_max_profile": format!("{max_profile:?}"),
"peer_loss_pct": loss_pct, "peer_rtt_ms": rtt_ms,
}));
// TODO: adjust our encoder to not exceed peer's max_profile
// (asymmetric quality — each side encodes at its own best)
}
Ok(Some(SignalMessage::HardNatBirthdayStart { call_id, acceptor_port_count, acceptor_ports, external_ip })) => {
tracing::info!(
%call_id,
acceptor_port_count,
port_count = acceptor_ports.len(),
%external_ip,
"signal: HardNatBirthdayStart from peer"
);
emit_call_debug(&app_clone, "recv:HardNatBirthdayStart", serde_json::json!({
"call_id": call_id,
"acceptor_port_count": acceptor_port_count,
"acceptor_ports": acceptor_ports,
"external_ip": external_ip,
}));
// Stash for the connect command (if still running)
// or for a background spray after relay fallback.
if let Ok(ip) = external_ip.parse::<std::net::IpAddr>() {
let mut sig = signal_state.lock().await;
sig.peer_birthday_ports = Some(PeerBirthdayInfo {
external_ip: ip,
ports: acceptor_ports,
});
}
}
Ok(Some(SignalMessage::ReflectResponse { observed_addr })) => {
// "STUN for QUIC" response — the relay told us our
// own server-reflexive address. If a Tauri command
@@ -1501,6 +1812,35 @@ async fn place_call(
"local_addrs": caller_local_addrs,
}));
// Phase 8: attempt port mapping for symmetric NAT traversal.
// This is best-effort — if the router doesn't support NAT-PMP/PCP/UPnP,
// we fall back to reflexive + host candidates only.
let caller_mapped_addr: Option<String> = {
let v4_port = state.signal.lock().await.endpoint
.as_ref()
.and_then(|ep| ep.local_addr().ok())
.map(|la| la.port())
.unwrap_or(0);
if v4_port > 0 {
match wzp_client::portmap::acquire_port_mapping(v4_port, None).await {
Ok(mapping) => {
let addr = mapping.external_addr.to_string();
tracing::info!(%addr, protocol = ?mapping.protocol, "place_call: port mapping acquired");
emit_call_debug(&app, "place_call:portmap_ok", serde_json::json!({
"addr": addr, "protocol": format!("{:?}", mapping.protocol),
}));
Some(addr)
}
Err(e) => {
tracing::debug!(error = %e, "place_call: port mapping unavailable (normal on most networks)");
None
}
}
} else {
None
}
};
let sig = state.signal.lock().await;
let transport = sig.transport.as_ref().ok_or("not registered")?;
let call_id = format!(
@@ -1510,7 +1850,7 @@ async fn place_call(
.unwrap()
.as_nanos()
);
tracing::info!(%call_id, %target_fp, reflex = ?own_reflex, "place_call: sending DirectCallOffer");
tracing::info!(%call_id, %target_fp, reflex = ?own_reflex, mapped = ?caller_mapped_addr, "place_call: sending DirectCallOffer");
transport
.send_signal(&SignalMessage::DirectCallOffer {
caller_fingerprint: sig.fingerprint.clone(),
@@ -1523,6 +1863,7 @@ async fn place_call(
supported_profiles: vec![wzp_proto::QualityProfile::GOOD],
caller_reflexive_addr: own_reflex.clone(),
caller_local_addrs: caller_local_addrs.clone(),
caller_mapped_addr: caller_mapped_addr.clone(),
caller_build_version: Some(GIT_HASH.to_string()),
})
.await
@@ -1535,6 +1876,48 @@ async fn place_call(
"target_fp": target_fp,
"caller_reflexive_addr": own_reflex,
}));
// Phase 8.6: spawn background port allocation detection + HardNatProbe.
// This runs AFTER the offer is sent so it doesn't delay call setup.
// The probe result arrives at the peer before or during the connect
// command, giving both sides time to compute predicted ports.
{
let state_bg = (*state).clone();
let call_id_bg = call_id.clone();
tokio::spawn(async move {
let stun_config = wzp_client::stun::StunConfig {
servers: vec![
"stun.l.google.com:19302".into(),
"stun1.l.google.com:19302".into(),
"stun.cloudflare.com:3478".into(),
],
timeout: std::time::Duration::from_secs(2),
};
let result = wzp_client::stun::detect_port_allocation(&stun_config).await;
let alloc_str = result.allocation.to_string();
tracing::info!(
allocation = %alloc_str,
ports = ?result.observed_ports,
"place_call: port allocation detected, sending HardNatProbe"
);
let sig = state_bg.signal.lock().await;
if let Some(ref t) = sig.transport {
let _ = t.send_signal(&SignalMessage::HardNatProbe {
call_id: call_id_bg,
port_sequence: result.observed_ports,
allocation: alloc_str,
probe_time_ms: std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap_or_default()
.as_millis() as u64,
external_ip: result.external_ip
.map(|ip| ip.to_string())
.unwrap_or_default(),
}).await;
}
});
}
history::log(call_id, target_fp, None, history::CallDirection::Placed);
let _ = app.emit("history-changed", ());
Ok(())
@@ -1625,12 +2008,43 @@ async fn answer_call(
"local_addrs": callee_local_addrs,
}));
// Phase 8: attempt port mapping (AcceptTrusted only — privacy mode
// keeps the mapped addr hidden too).
let callee_mapped_addr: Option<String> =
if accept_mode == wzp_proto::CallAcceptMode::AcceptTrusted {
let v4_port = state.signal.lock().await.endpoint
.as_ref()
.and_then(|ep| ep.local_addr().ok())
.map(|la| la.port())
.unwrap_or(0);
if v4_port > 0 {
match wzp_client::portmap::acquire_port_mapping(v4_port, None).await {
Ok(mapping) => {
tracing::info!(
addr = %mapping.external_addr,
protocol = ?mapping.protocol,
"answer_call: port mapping acquired"
);
Some(mapping.external_addr.to_string())
}
Err(e) => {
tracing::debug!(error = %e, "answer_call: port mapping unavailable");
None
}
}
} else {
None
}
} else {
None
};
let sig = state.signal.lock().await;
let transport = sig.transport.as_ref().ok_or_else(|| {
tracing::warn!("answer_call: not registered (no transport)");
"not registered".to_string()
})?;
tracing::info!(%call_id, ?accept_mode, reflex = ?own_reflex, "answer_call: sending DirectCallAnswer");
tracing::info!(%call_id, ?accept_mode, reflex = ?own_reflex, mapped = ?callee_mapped_addr, "answer_call: sending DirectCallAnswer");
transport
.send_signal(&SignalMessage::DirectCallAnswer {
call_id: call_id.clone(),
@@ -1641,6 +2055,7 @@ async fn answer_call(
chosen_profile: Some(wzp_proto::QualityProfile::GOOD),
callee_reflexive_addr: own_reflex.clone(),
callee_local_addrs: callee_local_addrs.clone(),
callee_mapped_addr,
callee_build_version: Some(GIT_HASH.to_string()),
})
.await
@@ -1660,6 +2075,46 @@ async fn answer_call(
if mode != 0 && history::mark_received_if_pending(&call_id) {
let _ = app.emit("history-changed", ());
}
// Phase 8.6: send HardNatProbe (AcceptTrusted only — same
// privacy gate as reflexive addr).
if accept_mode == wzp_proto::CallAcceptMode::AcceptTrusted {
let state_bg = (*state).clone();
let call_id_bg = call_id.clone();
tokio::spawn(async move {
let stun_config = wzp_client::stun::StunConfig {
servers: vec![
"stun.l.google.com:19302".into(),
"stun1.l.google.com:19302".into(),
"stun.cloudflare.com:3478".into(),
],
timeout: std::time::Duration::from_secs(2),
};
let result = wzp_client::stun::detect_port_allocation(&stun_config).await;
let alloc_str = result.allocation.to_string();
tracing::info!(
allocation = %alloc_str,
ports = ?result.observed_ports,
"answer_call: port allocation detected, sending HardNatProbe"
);
let sig = state_bg.signal.lock().await;
if let Some(ref t) = sig.transport {
let _ = t.send_signal(&wzp_proto::SignalMessage::HardNatProbe {
call_id: call_id_bg,
port_sequence: result.observed_ports,
allocation: alloc_str,
probe_time_ms: std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap_or_default()
.as_millis() as u64,
external_ip: result.external_ip
.map(|ip| ip.to_string())
.unwrap_or_default(),
}).await;
}
});
}
Ok(())
}
@@ -1674,6 +2129,12 @@ async fn answer_call(
/// unsupported / timed out / transport failed (caller should
/// gracefully continue with a relay-only path), or `Err` on
/// "not registered" which is a hard precondition failure.
///
/// Phase 8 (Tailscale-inspired): if relay-based reflection fails,
/// falls back to public STUN servers for independent reflexive
/// discovery. This handles the case where the relay is overloaded
/// or temporarily unreachable for reflect but the call can still
/// proceed with STUN-discovered addresses.
async fn try_reflect_own_addr(
state: &Arc<AppState>,
) -> Result<Option<String>, String> {
@@ -1690,8 +2151,8 @@ async fn try_reflect_own_addr(
if let Err(e) = transport.send_signal(&SignalMessage::Reflect).await {
let mut sig = state.signal.lock().await;
sig.pending_reflect = None;
tracing::warn!(error = %e, "try_reflect_own_addr: send_signal failed, continuing without reflex addr");
return Ok(None);
tracing::warn!(error = %e, "try_reflect_own_addr: send_signal failed, falling back to STUN");
return try_stun_fallback(state).await;
}
match tokio::time::timeout(std::time::Duration::from_millis(1000), rx).await {
Ok(Ok(addr)) => {
@@ -1706,13 +2167,42 @@ async fn try_reflect_own_addr(
Ok(Some(s))
}
Ok(Err(_canceled)) => {
tracing::warn!("try_reflect_own_addr: oneshot canceled");
Ok(None)
tracing::warn!("try_reflect_own_addr: oneshot canceled, falling back to STUN");
try_stun_fallback(state).await
}
Err(_elapsed) => {
let mut sig = state.signal.lock().await;
sig.pending_reflect = None;
tracing::warn!("try_reflect_own_addr: 1s timeout (pre-Phase-1 relay?)");
tracing::warn!("try_reflect_own_addr: 1s timeout, falling back to STUN");
try_stun_fallback(state).await
}
}
}
/// STUN fallback for reflexive address discovery when relay-based
/// reflection fails. Queries public STUN servers independently.
async fn try_stun_fallback(
state: &Arc<AppState>,
) -> Result<Option<String>, String> {
let stun_config = wzp_client::stun::StunConfig {
servers: vec![
"stun.l.google.com:19302".into(),
"stun1.l.google.com:19302".into(),
],
timeout: std::time::Duration::from_secs(2),
};
match wzp_client::stun::discover_reflexive(&stun_config).await {
Ok(addr) => {
let s = addr.to_string();
tracing::info!(addr = %s, "STUN fallback: discovered reflexive address");
{
let mut sig = state.signal.lock().await;
sig.own_reflex_addr = Some(s.clone());
}
Ok(Some(s))
}
Err(e) => {
tracing::warn!(error = %e, "STUN fallback also failed, continuing without reflex addr");
Ok(None)
}
}
@@ -1792,17 +2282,18 @@ async fn get_reflected_address(
/// would make a symmetric NAT look like a cone NAT, which is
/// exactly the failure mode we're trying to detect.
///
/// Takes the relay list from JS because the GUI owns the relay
/// config (localStorage `wzp-settings.relays`). Frontend passes it
/// in; Rust side just does the network work.
/// NAT detection with selectable mode.
///
/// `mode`:
/// - `"relay"` — relay-based Reflect only (original Phase 1-2 behavior)
/// - `"stun"` — public STUN servers only (no relay needed)
/// - `"both"` (default) — relay + STUN in parallel (highest confidence)
#[tauri::command]
async fn detect_nat_type(
state: tauri::State<'_, Arc<AppState>>,
relays: Vec<RelayArg>,
mode: Option<String>,
) -> Result<serde_json::Value, String> {
// Parse relay args up front so a single malformed entry fails
// the whole call cleanly instead of surfacing as a probe error
// at the end.
let mut parsed = Vec::with_capacity(relays.len());
for r in relays {
let addr: std::net::SocketAddr = r
@@ -1812,21 +2303,71 @@ async fn detect_nat_type(
parsed.push((r.name, addr));
}
// Phase 5: share the signal endpoint across all probes so
// they emit from the same source port. Port-preserving NATs
// (MikroTik, most consumer routers) give a stable external
// port → classifier correctly sees cone instead of falsely
// labeling SymmetricPort. Falls back to None (per-probe fresh
// endpoint) when not registered.
let shared_endpoint = state.signal.lock().await.endpoint.clone();
let stun_config = wzp_client::stun::StunConfig::default();
// 1500ms per probe is generous: a same-host probe is < 10ms,
// a cross-continent probe is typically < 300ms, and we want
// to tolerate a one-off packet loss during connect.
let detection = wzp_client::reflect::detect_nat_type(parsed, 1500, shared_endpoint).await;
let mode_str = mode.as_deref().unwrap_or("both");
tracing::info!(mode = mode_str, relay_count = parsed.len(), "detect_nat_type: starting");
let detection = match mode_str {
"relay" => {
// Original behavior: relay-based Reflect only
wzp_client::reflect::detect_nat_type(parsed, 1500, shared_endpoint).await
}
"stun" => {
// Public STUN servers only — no relay connection needed
let probes = wzp_client::stun::probe_stun_servers(&stun_config).await;
let (nat_type, consensus_addr) = wzp_client::reflect::classify_nat(&probes);
wzp_client::reflect::NatDetection {
probes,
nat_type,
consensus_addr,
}
}
_ => {
// "both" — relay + STUN in parallel (default, highest confidence)
wzp_client::reflect::detect_nat_type_with_stun(
parsed, 1500, shared_endpoint, &stun_config,
).await
}
};
serde_json::to_value(&detection).map_err(|e| format!("serialize: {e}"))
}
/// Run comprehensive network diagnostic (STUN + relay + portmap + IPv6).
#[tauri::command]
async fn run_netcheck(
state: tauri::State<'_, Arc<AppState>>,
relays: Vec<RelayArg>,
) -> Result<serde_json::Value, String> {
let mut relay_addrs = Vec::with_capacity(relays.len());
for r in relays {
let addr: std::net::SocketAddr = r
.address
.parse()
.map_err(|e| format!("bad relay address {:?}: {e}", r.address))?;
relay_addrs.push((r.name, addr));
}
let local_port = state.signal.lock().await.endpoint
.as_ref()
.and_then(|ep| ep.local_addr().ok())
.map(|la| la.port())
.unwrap_or(0);
let config = wzp_client::netcheck::NetcheckConfig {
stun_config: wzp_client::stun::StunConfig::default(),
relays: relay_addrs,
timeout: std::time::Duration::from_secs(5),
test_portmap: true,
test_ipv6: true,
local_port,
};
let report = wzp_client::netcheck::run_netcheck(&config).await;
serde_json::to_value(&report).map_err(|e| format!("serialize: {e}"))
}
/// Deserialization shim for the relay list coming from JS. The
/// `wzp-settings.relays` array in localStorage has more fields
/// (rtt, serverFingerprint, knownFingerprint) but we only need
@@ -1940,6 +2481,8 @@ pub fn run() {
desired_relay_addr: None,
reconnect_in_progress: false,
pending_path_report: None,
peer_hard_nat_probe: None,
peer_birthday_ports: None,
})),
});
@@ -1990,7 +2533,7 @@ pub fn run() {
ping_relay, get_identity, get_app_info,
connect, disconnect, toggle_mic, toggle_speaker, get_status,
register_signal, place_call, answer_call, get_signal_status,
get_reflected_address, detect_nat_type,
get_reflected_address, detect_nat_type, run_netcheck,
hangup_call,
deregister,
set_speakerphone, is_speakerphone_on,

View File

@@ -28,6 +28,7 @@ static HELLO: OnceLock<unsafe extern "C" fn(*mut u8, usize) -> usize> = OnceLock
static AUDIO_START: OnceLock<unsafe extern "C" fn() -> i32> = OnceLock::new();
static AUDIO_START_BT: OnceLock<unsafe extern "C" fn() -> i32> = OnceLock::new();
static AUDIO_STOP: OnceLock<unsafe extern "C" fn()> = OnceLock::new();
static AUDIO_CAPTURE_AVAILABLE: OnceLock<extern "C" fn() -> usize> = OnceLock::new();
static AUDIO_READ_CAPTURE: OnceLock<unsafe extern "C" fn(*mut i16, usize) -> usize> = OnceLock::new();
static AUDIO_WRITE_PLAYOUT: OnceLock<unsafe extern "C" fn(*const i16, usize) -> usize> = OnceLock::new();
static AUDIO_IS_RUNNING: OnceLock<unsafe extern "C" fn() -> i32> = OnceLock::new();
@@ -68,6 +69,7 @@ pub fn init() -> Result<(), String> {
resolve!(AUDIO_START, unsafe extern "C" fn() -> i32, b"wzp_native_audio_start");
resolve!(AUDIO_START_BT, unsafe extern "C" fn() -> i32, b"wzp_native_audio_start_bt");
resolve!(AUDIO_STOP, unsafe extern "C" fn(), b"wzp_native_audio_stop");
resolve!(AUDIO_CAPTURE_AVAILABLE, extern "C" fn() -> usize, b"wzp_native_audio_capture_available");
resolve!(AUDIO_READ_CAPTURE, unsafe extern "C" fn(*mut i16, usize) -> usize, b"wzp_native_audio_read_capture");
resolve!(AUDIO_WRITE_PLAYOUT, unsafe extern "C" fn(*const i16, usize) -> usize, b"wzp_native_audio_write_playout");
resolve!(AUDIO_IS_RUNNING, unsafe extern "C" fn() -> i32, b"wzp_native_audio_is_running");
@@ -121,6 +123,12 @@ pub fn audio_stop() {
}
}
/// Number of capture samples available to read without blocking.
pub fn audio_capture_available() -> usize {
let Some(f) = AUDIO_CAPTURE_AVAILABLE.get() else { return 0; };
f()
}
/// Read captured i16 PCM into `out`. Returns bytes actually copied.
pub fn audio_read_capture(out: &mut [i16]) -> usize {
let Some(f) = AUDIO_READ_CAPTURE.get() else { return 0; };

File diff suppressed because it is too large Load Diff

View File

@@ -32,7 +32,333 @@ body {
.hidden { display: none !important; }
/* ── Connect screen ── */
/* ── Lobby screen (IRC-style) ── */
#lobby-screen {
display: flex;
flex-direction: column;
flex: 1;
gap: 0;
max-width: 480px;
margin: 0 auto;
width: 100%;
}
.lobby-header {
padding: 12px 0;
border-bottom: 1px solid var(--surface2);
}
.lobby-title-row {
display: flex;
align-items: center;
justify-content: space-between;
}
.lobby-title-row h1 {
font-size: 20px;
font-weight: 700;
letter-spacing: 0.5px;
}
.lobby-status-row {
display: flex;
align-items: center;
gap: 6px;
margin-top: 6px;
font-size: 12px;
color: var(--text-dim);
}
.lobby-relay { opacity: 0.7; }
.lobby-room { color: var(--green); font-weight: 500; }
.lobby-identity {
display: flex;
align-items: center;
gap: 6px;
margin-top: 6px;
font-size: 11px;
opacity: 0.5;
}
/* User list */
.lobby-users-section {
flex: 1;
display: flex;
flex-direction: column;
margin-top: 8px;
min-height: 0;
}
.lobby-users-header {
display: flex;
align-items: center;
gap: 8px;
padding: 8px 0;
font-size: 13px;
font-weight: 600;
color: var(--text-dim);
text-transform: uppercase;
letter-spacing: 1px;
}
.badge {
background: var(--surface2);
color: var(--text-dim);
font-size: 11px;
padding: 1px 7px;
border-radius: 10px;
font-weight: 600;
}
.lobby-user-list {
flex: 1;
overflow-y: auto;
display: flex;
flex-direction: column;
gap: 2px;
}
.lobby-empty {
color: var(--text-dim);
font-size: 13px;
text-align: center;
padding: 40px 20px;
opacity: 0.6;
}
/* Single user row */
.user-row {
display: flex;
align-items: center;
gap: 10px;
padding: 10px 12px;
border-radius: 8px;
cursor: pointer;
transition: background 0.15s;
}
.user-row:hover, .user-row:active {
background: var(--surface);
}
.user-identicon {
width: 36px;
height: 36px;
border-radius: 50%;
flex-shrink: 0;
display: flex;
align-items: center;
justify-content: center;
}
.user-info {
flex: 1;
min-width: 0;
}
.user-name {
font-size: 14px;
font-weight: 500;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.user-fp {
font-size: 10px;
color: var(--text-dim);
font-family: ui-monospace, monospace;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.user-status {
flex-shrink: 0;
display: flex;
align-items: center;
gap: 4px;
}
.user-status-icon {
font-size: 16px;
}
/* Speaking indicator */
.user-row.speaking {
background: rgba(74, 222, 128, 0.08);
}
.user-row.speaking .user-name {
color: var(--green);
}
/* In-voice indicator */
.user-row.in-voice .user-status-icon {
color: var(--green);
}
/* Voice join FAB */
.lobby-fab-row {
padding: 12px 0;
display: flex;
justify-content: center;
}
.fab {
display: flex;
align-items: center;
gap: 8px;
background: var(--green);
color: #111;
border: none;
padding: 12px 28px;
border-radius: 24px;
font-size: 15px;
font-weight: 600;
cursor: pointer;
box-shadow: 0 4px 16px rgba(74, 222, 128, 0.3);
transition: transform 0.15s, box-shadow 0.15s;
}
.fab:hover {
transform: scale(1.03);
box-shadow: 0 6px 20px rgba(74, 222, 128, 0.4);
}
.fab:active {
transform: scale(0.97);
}
.fab.active {
background: var(--red);
box-shadow: 0 4px 16px rgba(239, 68, 68, 0.3);
}
.fab-icon { font-size: 18px; }
/* Incoming call banner */
.incoming-banner {
position: fixed;
bottom: 20px;
left: 20px;
right: 20px;
max-width: 440px;
margin: 0 auto;
background: var(--surface);
border: 1px solid var(--green);
border-radius: 16px;
padding: 16px;
display: flex;
flex-direction: column;
gap: 12px;
box-shadow: 0 8px 32px rgba(0,0,0,0.5);
z-index: 100;
animation: slideUp 0.3s ease-out;
}
@keyframes slideUp {
from { transform: translateY(100%); opacity: 0; }
to { transform: translateY(0); opacity: 1; }
}
.incoming-info {
display: flex;
align-items: center;
gap: 12px;
}
.incoming-identicon { width: 40px; height: 40px; border-radius: 50%; }
.incoming-name { font-weight: 600; font-size: 15px; }
.incoming-subtitle { font-size: 12px; color: var(--green); }
.incoming-actions {
display: flex;
gap: 8px;
}
.btn-accept {
flex: 1;
background: var(--green);
color: #111;
border: none;
padding: 10px;
border-radius: 10px;
font-weight: 600;
cursor: pointer;
}
.btn-reject {
flex: 1;
background: var(--red);
color: white;
border: none;
padding: 10px;
border-radius: 10px;
font-weight: 600;
cursor: pointer;
}
/* Context menu */
.context-menu {
position: fixed;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
background: var(--surface);
border: 1px solid var(--surface2);
border-radius: 16px;
padding: 20px;
min-width: 260px;
z-index: 200;
box-shadow: 0 16px 48px rgba(0,0,0,0.6);
}
.context-header {
display: flex;
align-items: center;
gap: 12px;
margin-bottom: 16px;
padding-bottom: 12px;
border-bottom: 1px solid var(--surface2);
}
.ctx-identicon { width: 40px; height: 40px; border-radius: 50%; }
.ctx-name { font-weight: 600; font-size: 15px; }
.ctx-fp { font-size: 10px; color: var(--text-dim); font-family: monospace; }
.context-action {
display: flex;
align-items: center;
gap: 10px;
width: 100%;
background: none;
border: none;
color: var(--text);
padding: 10px 8px;
border-radius: 8px;
font-size: 14px;
cursor: pointer;
text-align: left;
}
.context-action:hover:not(:disabled) {
background: var(--surface2);
}
.context-action:disabled {
opacity: 0.4;
cursor: not-allowed;
}
.context-action.dim {
color: var(--text-dim);
font-size: 13px;
}
/* Legacy compat — keep old connect-screen ID working for JS that
references it (the old connect screen is now the lobby). */
#connect-screen {
display: flex;
flex-direction: column;

View File

@@ -473,6 +473,34 @@ sequenceDiagram
R->>R: Remove from room, broadcast RoomUpdate
```
## Relay Concurrency Model
### Threading
- Multi-threaded Tokio runtime (all available cores, work-stealing scheduler)
- Task-per-connection: each QUIC connection gets a dedicated `tokio::spawn`
- Task-per-participant-per-room: each participant's media forwarding loop is independent
### Shared State & Locking
| Lock | Protected Data | Hold Duration | Contention |
|------|---------------|---------------|------------|
| `RoomManager` (Mutex) | Rooms, participants, quality tiers | ~1ms/packet | O(N) per room |
| `PresenceRegistry` (Mutex) | Fingerprint registrations | ~1ms | Low (join/leave only) |
| `SessionManager` (Mutex) | Active session tracking | ~1ms | Low |
| `FederationManager.peer_links` (Mutex) | Peer connections | ~10ms during forward | Per-federation-packet |
### Scaling Characteristics
- **Many small rooms**: Scales well across all cores (rooms are independent)
- **Large single room (100+ participants)**: Serialized by RoomManager lock
- **Federation**: Per-peer tasks scale; `peer_links` lock held during send loop
### Primary Bottleneck
The RoomManager Mutex is acquired per-packet by every participant to get the fan-out peer list. Lock is released before I/O (sends happen outside lock), but packet processing is serialized through the lock within a room.
Future optimization: per-room locks or lock-free participant lists via `DashMap`.
## Client Architecture
### Desktop Engine (Tauri)
@@ -1072,3 +1100,118 @@ BT SCO only supports 8/16kHz. When `bt_active=1`, Oboe capture skips `setSampleR
### Hangup Signal Fix
`SignalMessage::Hangup` now carries an optional `call_id` field. The relay uses it to end only the specific call instead of broadcasting to all active calls for the user — preventing a race where a hangup for call 1 kills a newly-placed call 2.
## Phase 8: Tailscale-Inspired NAT Traversal (2026-04-14)
Five new modules in `wzp-client` bring NAT traversal capability close to Tailscale's approach:
```
┌──────────────────────────────────────────────────────────────────────┐
│ wzp-client NAT Traversal Stack │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ stun.rs │ │ portmap.rs │ │ reflect.rs (existing) │ │
│ │ RFC 5389 │ │ NAT-PMP │ │ Relay-based STUN │ │
│ │ Public │ │ PCP │ │ Multi-relay NAT detect │ │
│ │ STUN │ │ UPnP IGD │ │ │ │
│ └──────┬──────┘ └──────┬───────┘ └────────────┬─────────────┘ │
│ │ │ │ │
│ └────────────────┼────────────────────────┘ │
│ │ │
│ ┌───────▼────────┐ │
│ │ ice_agent.rs │ │
│ │ Gather / Re- │ │
│ │ gather / Apply│ │
│ └───────┬────────┘ │
│ │ │
│ ┌───────────┼───────────┐ │
│ │ │ │ │
│ ┌───────▼───┐ ┌───▼───┐ ┌───▼──────────┐ │
│ │ netcheck │ │ dual_ │ │ relay_map.rs │ │
│ │ .rs │ │ path │ │ RTT-sorted │ │
│ │ Diagnostic│ │ .rs │ │ relay list │ │
│ └───────────┘ │ Race │ └──────────────┘ │
│ └───────┘ │
└──────────────────────────────────────────────────────────────────────┘
```
### Candidate Types
| Type | Source | Priority | When Used |
|------|--------|----------|-----------|
| Host | `local_host_candidates()` | 1 (highest) | Same-LAN peers |
| Port-mapped | `portmap::acquire_port_mapping()` | 2 | Router supports NAT-PMP/PCP/UPnP |
| Server-reflexive | `stun::discover_reflexive()` or relay Reflect | 3 | Cone NAT |
| Relay | Relay address (fallback) | 4 (lowest) | Always available |
### Signal Flow for Mid-Call Re-Gathering
```
Network change (WiFi → cellular)
IceAgent::re_gather()
├── stun::discover_reflexive()
├── portmap::acquire_port_mapping()
└── local_host_candidates()
SignalMessage::CandidateUpdate { generation: N+1, ... }
▼ (via relay)
Peer's IceAgent::apply_peer_update()
PeerCandidates { reflexive, local, mapped }
dual_path::race() with new candidates (TODO: transport hot-swap)
```
### New SignalMessage Variants & Fields
| Signal | New Fields | Purpose |
|--------|-----------|---------|
| `DirectCallOffer` | `caller_mapped_addr` | Port-mapped address from NAT-PMP/PCP/UPnP |
| `DirectCallAnswer` | `callee_mapped_addr` | Same, callee side |
| `CallSetup` | `peer_mapped_addr` | Relay cross-wires mapped addr to peer |
| `CandidateUpdate` | (new variant) | Mid-call candidate re-gathering |
| `RegisterPresenceAck` | `relay_region`, `available_relays` | Relay mesh metadata for auto-selection |
All new fields use `#[serde(default, skip_serializing_if)]` for backward compatibility with older clients/relays.
### Hard NAT Port Prediction
For symmetric NATs that don't support port mapping, the system detects the NAT's port allocation pattern:
```
Single socket → 5 STUN servers (sequential probes)
Observed ports: [40001, 40002, 40003, 40004, 40005]
classify_port_allocation() → Sequential { delta: 1 }
predict_ports(last=40005, delta=1, offset=0, spread=2)
→ [40004, 40005, 40006, 40007, 40008]
HardNatProbe signal → peer
Peer dials predicted port range in parallel
```
| Pattern | Detection | Traversal Strategy |
|---------|-----------|-------------------|
| Port-preserving | All probes return same port | Standard hole-punch |
| Sequential (delta=N) | Consistent N-increment | Predict next port, dial range |
| Random | No pattern | Birthday attack or relay |
| Unknown | < 3 probes succeeded | Relay fallback |
The classifier tolerates:
- **Jitter**: ±1 from dominant delta (concurrent flow grabbed a port)
- **Wraparound**: 65535 → 1 treated as delta=+2, not -65534
- **Noise**: 60% threshold — if most deltas agree, call it sequential

View File

@@ -61,12 +61,16 @@ Catastrophic → Codec2 1.2k (minimum viable voice)
- Encoder can switch codec mid-stream
- Decoder already auto-detects incoming codec from packet headers
### What's missing
### What's been implemented since PRD was written
1. **QualityReport ingestion** — neither Android engine nor desktop engine reads quality reports from the relay
2. **Profile switch loop** — no periodic check that feeds reports to `QualityAdapter` and applies recommended switches
3. **Upward adaptation**`QualityAdapter` only classifies into 3 tiers (GOOD/DEGRADED/CATASTROPHIC). Needs extension to recommend studio tiers when conditions are excellent (loss < 1%, RTT < 50ms)
4. **Notification to UI** — when quality changes, the UI should show the current active codec
1. **QualityReport ingestion**~~neither Android engine nor desktop engine reads quality reports from the relay~~ **Done**: both Android (`crates/wzp-android/src/engine.rs`) and desktop (`desktop/src-tauri/src/engine.rs`) recv tasks ingest quality reports and feed `AdaptiveQualityController`
2. **Profile switch loop**~~no periodic check~~ **Done**: `pending_profile` AtomicU8 bridges recv→send task in both engines; send task applies profile switch at frame boundary
3. **Notification to UI**~~when quality changes, the UI should show the current active codec~~ **Done**: `tx_codec`/`rx_codec` in desktop `EngineStatus`; `currentCodec`/`peerCodec` in Android `CallStats`
### What's still missing
1. **Upward adaptation**`QualityAdapter` only classifies into 3 tiers (GOOD/DEGRADED/CATASTROPHIC). Needs extension to recommend studio tiers when conditions are excellent (loss < 1%, RTT < 50ms). See Phase 2 below.
2. **Relay QualityDirective handling** — relay broadcasts coordinated quality directives but neither engine processes them (signals are silently discarded). See PRD-coordinated-codec.md for details.
## Requirements
@@ -191,11 +195,20 @@ The `CallEncoder` already has `set_profile()`. The `CallDecoder` already auto-sw
## Milestones
| Phase | Scope | Effort | Dependency |
|-------|-------|--------|------------|
| 0 | Verify relay sends QualityReports | 0.5 day | None |
| 1a | Wire QualityAdapter in Android engine | 1 day | Phase 0 |
| 1b | Wire QualityAdapter in desktop engine | 1 day | Phase 0 |
| 1c | UI indicator (current codec) | 0.5 day | Phase 1a/1b |
| 2 | Extended 5-tier classification | 0.5 day | Phase 1 |
| 3 | Bandwidth probing | 2 days | Phase 2 |
| Phase | Scope | Effort | Status |
|-------|-------|--------|--------|
| 0 | Verify relay sends QualityReports | 0.5 day | Done |
| 1a | Wire QualityAdapter in Android engine | 1 day | Done |
| 1b | Wire QualityAdapter in desktop engine | 1 day | Done |
| 1c | UI indicator (current codec) | 0.5 day | Done |
| 2 | Extended 5-tier classification (Studio64k→Catastrophic) | 0.5 day | Done (2026-04-13) |
| 3 | Bandwidth probing | 2 days | Pending (task #10) |
## Implementation Status Update (2026-04-13)
All phases implemented:
- Phase 1: QualityAdapter with 3-tier classification — DONE
- Phase 2: Extended 5-tier (Studio 64k/48k/32k + GOOD + DEGRADED + CATASTROPHIC) — DONE
- Phase 3: Bandwidth probing — NOT DONE (see remaining tasks)
- P2P adaptive quality: QualityReport::from_path_stats() + self-observation from quinn stats — DONE
- Both relay and P2P calls now have full adaptive quality switching

View File

@@ -197,18 +197,25 @@ Implementation strategy: build for P2P first (simpler, 2 parties), then wrap the
| 5 | P2P quality adaptation (direct observation) | 1 day |
| 6 | Per-participant asymmetric encoding (Option 2) | 1 day |
## Implementation Status (2026-04-12)
## Implementation Status (2026-04-13)
Phases 1-2 are now implemented:
Phases 1-2 are implemented. Phase 3 has a critical gap.
### What was built
- **`QualityDirective` signal** (`crates/wzp-proto/src/packet.rs`): New `SignalMessage` variant with `recommended_profile` and optional `reason`
- **`ParticipantQuality`** (`crates/wzp-relay/src/room.rs`): Per-participant quality tracking using `AdaptiveQualityController`, created on join, removed on leave
- **Weakest-link broadcast**: `observe_quality()` method computes room-wide worst tier, broadcasts `QualityDirective` to all participants when tier changes
- **Desktop engine handling** (`desktop/src-tauri/src/engine.rs`): `AdaptiveQualityController` in recv task, `pending_profile` AtomicU8 bridge to send task, auto-mode profile switching
- **Desktop engine handling** (`desktop/src-tauri/src/engine.rs`): `AdaptiveQualityController` in recv task, `pending_profile` AtomicU8 bridge to send task, auto-mode profile switching based on **inbound quality reports**
### Phases 3-4 remaining
### Phase 3 completed (2026-04-13)
- Phase 3: Client-side handling of `QualityDirective` (reacting to relay-pushed profile)
- Phase 4: Upgrade proposal/negotiation protocol for quality recovery
Both engines now handle `QualityDirective` signals from the relay:
- **Desktop** (`engine.rs`): both P2P and relay signal tasks match `QualityDirective`, extract `recommended_profile`, store index via `sig_pending_profile.store(idx, Release)`. Send task picks it up at the next frame boundary.
- **Android** (`engine.rs`): signal task matches `QualityDirective`, stores via `pending_profile_recv.store(idx, Release)`.
Relay-coordinated codec switching is now end-to-end: relay monitors → broadcasts directive → clients switch.
### Phase remaining
- Phase 4: Upgrade proposal/negotiation protocol for quality recovery (task #28)

View File

@@ -386,3 +386,17 @@ When instantaneous jitter exceeds the EWMA × 1.3 (asymmetric: fast-up α=0.3, s
- 10 unit tests for tuner math (baseline, scaling, spike, cooldown, codec switch, Codec2 no-op)
- 4 integration tests (encoder adjustment, spike boost, Codec2 no-op, profile switch with encode verification)
### Opus6k Frame Starvation Bug (Fixed 2026-04-13)
During testing of the extended 1040ms DRED window on Opus6k, the 40ms codec produced only ~11 frames/s instead of 25 — making audio choppy regardless of DRED quality.
**Root cause:** The Android capture ring read loop did partial reads that consumed samples from the ring but discarded them when retrying:
1. Ring has 960 samples (one Oboe burst)
2. `audio_read_capture(&mut buf[..1920])` reads 960 into `buf[0..960]`, returns 960
3. Loop sees 960 < 1920, sleeps, retries from `buf[0..]` → overwrites the consumed samples
4. ~50% of captured audio thrown away per frame
**Fix:** Added `wzp_native_audio_capture_available()` to check ring fill level before reading (same pattern as the desktop CPAL path's `capture_ring.available()`). Also made `frame_samples` mutable so codec switches update the read size.
**Affected codecs:** Only 40ms frame codecs (Opus6k, Codec2_1200). 20ms codecs (Opus24k, etc.) were unaffected because a single Oboe burst fills the entire request.

140
docs/PRD-engine-dedup.md Normal file
View File

@@ -0,0 +1,140 @@
# PRD: Engine.rs Deduplication — Extract Shared Send/Recv Helpers
## Problem
`desktop/src-tauri/src/engine.rs` is 1,705 lines with two nearly identical `CallEngine::start()` implementations — one for Android (880 lines) and one for desktop (430 lines). ~350 lines are copy-pasted between them. Every change to the encode/decode/adaptive-quality pipeline requires editing both places, and they've already diverged in subtle ways (Android has extensive first-join diagnostics that desktop lacks).
## Scope
Extract the duplicated logic into shared helper functions. The Android and desktop paths should only differ in their audio I/O mechanism (Oboe ring via wzp-native vs CPAL capture_ring/playout_ring).
## What's Duplicated
| Block | Description | Lines (each) |
|-------|-------------|------|
| `build_call_config()` | Resolve quality string → CallConfig | 23 |
| Codec-to-profile match | Map CodecId → QualityProfile for decoder switch | 19 |
| Adaptive quality switch | Read AtomicU8, index_to_profile, set_profile, update frame_samples + dred_tuner | 15 |
| DRED tuner poll | Check frame counter, poll quinn stats, apply tuning | 15 |
| Quality report ingestion | Extract quality_report, feed to AdaptiveQualityController, store to AtomicU8 | 8 |
| Signal task | Accept signals, handle RoomUpdate/QualityDirective/Hangup | 48 |
| **Total** | | **~128 lines × 2 = 256 lines eliminated** |
## Implementation
### Phase 1: Top-Level Helper Functions
```rust
fn build_call_config(quality: &str) -> CallConfig {
let profile = resolve_quality(quality);
match profile {
Some(p) => CallConfig {
noise_suppression: false,
suppression_enabled: false,
..CallConfig::from_profile(p)
},
None => CallConfig {
noise_suppression: false,
suppression_enabled: false,
..CallConfig::default()
},
}
}
fn codec_to_profile(codec: CodecId) -> QualityProfile {
match codec {
CodecId::Opus24k => QualityProfile::GOOD,
CodecId::Opus6k => QualityProfile::DEGRADED,
CodecId::Opus32k => QualityProfile::STUDIO_32K,
CodecId::Opus48k => QualityProfile::STUDIO_48K,
CodecId::Opus64k => QualityProfile::STUDIO_64K,
CodecId::Codec2_1200 => QualityProfile::CATASTROPHIC,
CodecId::Codec2_3200 => QualityProfile {
codec: CodecId::Codec2_3200,
fec_ratio: 0.5,
frame_duration_ms: 20,
frames_per_block: 5,
},
other => QualityProfile { codec: other, ..QualityProfile::GOOD },
}
}
fn check_adaptive_switch(
pending: &AtomicU8,
encoder: &mut CallEncoder,
tuner: &mut wzp_proto::DredTuner,
frame_samples: &mut usize,
tx_codec: &tokio::sync::Mutex<String>,
) -> bool {
let p = pending.swap(PROFILE_NO_CHANGE, Ordering::Acquire);
if p == PROFILE_NO_CHANGE { return false; }
if let Some(new_profile) = index_to_profile(p) {
let new_fs = (new_profile.frame_duration_ms as usize) * 48;
if encoder.set_profile(new_profile).is_ok() {
*frame_samples = new_fs;
tuner.set_codec(new_profile.codec);
// Caller updates tx_codec display string
return true;
}
}
false
}
```
### Phase 2: Shared Signal Task
Extract the signal task into a standalone async function:
```rust
async fn run_signal_task(
transport: Arc<wzp_transport::QuinnTransport>,
running: Arc<AtomicBool>,
pending_profile: Arc<AtomicU8>,
participants: Arc<Mutex<Vec<ParticipantInfo>>>,
) {
loop {
if !running.load(Ordering::Relaxed) { break; }
match tokio::time::timeout(
Duration::from_millis(SIGNAL_TIMEOUT_MS),
transport.recv_signal(),
).await {
Ok(Ok(Some(msg))) => {
// Handle RoomUpdate, QualityDirective, Hangup...
}
_ => {}
}
}
}
```
### Phase 3: Shared DRED Poll + Quality Ingestion
These are small blocks but appear in both send and recv tasks. Extract as inline helpers or closures.
## Verification
1. `cargo check --workspace` — must compile
2. `cargo test -p wzp-proto -p wzp-relay -p wzp-client --lib` — must pass
3. Manual test: place a call Android↔Desktop, verify audio works in both directions
4. Verify adaptive quality still switches (set one side to auto, degrade network)
## Effort
- Phase 1: 1 hour (extract 3 functions, update 6 call sites)
- Phase 2: 30 min (extract signal task, update 2 spawn sites)
- Phase 3: 30 min (cleanup remaining small duplicates)
- Total: ~2 hours
## Not In Scope
- Audio I/O trait abstraction (Oboe vs CPAL) — different project, different risk profile
- Moving Android-specific diagnostics (first-join, PCM recorder) into a feature flag
- Splitting engine.rs into multiple files
## Implementation Status (2026-04-13)
All phases implemented:
- build_call_config(): shared CallConfig construction — DONE
- codec_to_profile(): shared CodecId → QualityProfile mapping — DONE
- run_signal_task(): shared signal handler — DONE
- Net reduction: ~39 lines, 6 duplicated blocks → single-line calls

220
docs/PRD-hard-nat.md Normal file
View File

@@ -0,0 +1,220 @@
# PRD: Hard NAT Traversal (Port Prediction + Birthday Attack)
> Phase: Partial implementation
> Status: Phase A done, Phase B signal ready, C-D not started (2026-04-14)
> Crate: wzp-client, wzp-proto, wzp-relay
## Problem
When both peers are behind **symmetric NATs** (endpoint-dependent mapping), standard hole-punching fails because the external port changes per destination. Our Phase 8.2 port mapping (NAT-PMP/PCP/UPnP) solves this when the router supports it (~70% of consumer routers), but the remaining ~30% — plus corporate firewalls, cloud NATs (AWS/Azure), and carrier-grade NATs — fall back to relay.
Tailscale tackles this with two techniques:
1. **Port prediction** for NATs with sequential allocation patterns
2. **Birthday attack** for NATs with random allocation
Both are viable when **at least one peer has a predictable NAT** (easy+hard pair). When **both** peers have fully random symmetric NATs, even Tailscale falls back to relay.
## Background: How Symmetric NATs Allocate Ports
| Pattern | Behavior | Prevalence | Traversal |
|---------|----------|------------|-----------|
| **Sequential** | port N, N+1, N+2... per new flow | ~40% of symmetric NATs (home routers) | Port prediction viable |
| **Random** | truly random port per flow | ~50% (enterprise, cloud, CGNAT) | Birthday attack only |
| **Port-preserving** | same as source port when possible | ~10% (behaves like cone NAT) | Standard hole-punch works |
## Solution Overview
### Phase A: NAT Port Allocation Pattern Detection
Before attempting hard NAT traversal, detect whether the NAT allocates ports sequentially or randomly. This determines which strategy to use.
**Method**: Send 5 STUN Binding Requests from the same source socket to 5 different STUN servers. Collect the 5 observed external ports. Analyze:
```
Ports: [40001, 40002, 40003, 40004, 40005] → Sequential (delta=1)
Ports: [40001, 40003, 40005, 40007, 40009] → Sequential (delta=2)
Ports: [40001, 52847, 19432, 61203, 8847] → Random
Ports: [4433, 4433, 4433, 4433, 4433] → Port-preserving (cone-like)
```
Classification:
- All same port → `PortPreserving` (use standard hole-punch)
- Consistent delta between consecutive ports → `Sequential { delta: i16 }`
- No pattern → `Random`
**New struct**:
```rust
pub enum PortAllocation {
PortPreserving,
Sequential { delta: i16 },
Random,
Unknown,
}
```
Add to `NetcheckReport` and `NatDetection`.
### Phase B: Port Prediction (Sequential NATs)
When the NAT is sequential, we can **predict** the next external port:
1. Client sends a STUN probe → observes external port P
2. Client knows the NAT will assign P+delta for the next outbound flow
3. Client tells peer (via relay or chat): "dial me at `my_ip:(P + delta * N)`" where N is the number of flows the client will open before the peer's packet arrives
4. Client opens a QUIC connection to the peer's predicted port at the same time
5. If the prediction lands within a small window, the QUIC handshake succeeds
**Timing is critical**: both peers must probe, predict, and dial within a tight window (~500ms) so the port prediction doesn't drift.
**Coordination via relay** (or out-of-band chat):
```
SignalMessage::HardNatProbe {
call_id: String,
/// My observed port sequence (last 3 ports, most recent first)
port_sequence: Vec<u16>,
/// My detected allocation pattern
allocation: PortAllocation,
/// Timestamp (ms since epoch) — for synchronization
probe_time_ms: u64,
/// My external IP (from STUN)
external_ip: String,
}
```
Both peers exchange `HardNatProbe`, then simultaneously:
1. Each predicts the other's next port: `peer_ip:(peer_last_port + peer_delta * offset)`
2. Each opens N parallel QUIC connections to predicted port range: `[predicted - 2, predicted + 2]`
3. First successful handshake wins
**Expected success rate**: ~80% for sequential NATs with consistent delta, within 2-3 seconds.
### Phase C: Birthday Attack (Random NATs)
When the NAT is random, port prediction is impossible. Instead, exploit the **birthday paradox**:
**Math**: With N ports open on side A and M probes from side B into a 65536-port space:
- N=256, M=256: P(collision) ≈ 1 - e^(-256*256/65536) ≈ 63%
- N=256, M=512: P(collision) ≈ 1 - e^(-256*512/65536) ≈ 87%
- N=256, M=1024: P(collision) ≈ 1 - e^(-256*1024/65536) ≈ 98%
**Implementation**:
1. **Acceptor side** (easy NAT or the side with more ports available):
- Open 256 UDP sockets bound to random ports
- For each socket, send one STUN probe to learn its external port
- Report all 256 external ports to the peer
2. **Dialer side** (hard NAT):
- Send 1024 QUIC Initial packets to random ports on the Acceptor's external IP
- Rate: 100-200 packets/sec to avoid triggering rate limits
- Duration: ~5-10 seconds
3. **Collision detection**:
- When one of the Dialer's packets hits one of the Acceptor's open ports, the QUIC handshake begins
- The Acceptor sees an incoming Initial on one of its 256 sockets
**Problem for VoIP**: This takes 5-10 seconds even at high probe rates. For a phone call, this means a long "connecting..." phase. Acceptable as a last resort before relay fallback.
### Phase D: Hybrid Strategy
Combine all techniques in a waterfall:
```
1. Port mapping (NAT-PMP/PCP/UPnP) → <100ms [Phase 8.2, done]
↓ failed
2. Standard hole-punch (cone NAT) → <500ms [Phase 3-6, done]
↓ failed (symmetric NAT detected)
3. Port prediction (sequential NAT) → <2s [Phase A+B, new]
↓ failed (random NAT detected)
4. Birthday attack (one side random) → <10s [Phase C, new]
↓ failed (both sides random)
5. Relay fallback → always [Phase 1, done]
```
The relay path starts **immediately in parallel** with all direct attempts (existing 500ms head-start architecture). The user hears audio via relay while the harder traversal techniques probe in the background. If a direct path is found, the call seamlessly upgrades (using the Phase 8.3 transport hot-swap mechanism).
## QUIC-Specific Challenges
### 1. Connection ID Mismatch
QUIC's Initial packet contains a random Destination Connection ID. When birthday-attack probes land on the Acceptor's socket, the CID won't match any expected value. Quinn handles this via its `Endpoint` which accepts any incoming Initial — but we need to ensure the Endpoint is in server mode on all 256 ports.
**Solution**: Use quinn's `Endpoint` with a server config on each socket. Quinn's accept logic handles unknown CIDs correctly.
### 2. Probe Packet Format
Birthday attack probes must be valid QUIC Initial packets (not raw UDP). Quinn's `Endpoint::connect()` sends a proper Initial, so each probe is a real connection attempt. Failed probes time out naturally.
### 3. Stateful Connections
Unlike WireGuard (stateless), each QUIC probe creates connection state. With 1024 probes, that's 1024 half-open connections. Must aggressively abort losers once one succeeds.
**Solution**: Use `JoinSet` (existing pattern in `dual_path.rs`) and `abort_all()` on first success.
### 4. NAT Pinhole Lifetime
QUIC Initial retransmission timer (1s default) may exceed the NAT pinhole lifetime on aggressive NATs. One probe per port may not be enough.
**Solution**: Send 2-3 Initials per predicted port, 200ms apart.
## Signal Protocol
New variants:
```rust
/// Hard NAT probe coordination — exchanged before birthday attack.
HardNatProbe {
call_id: String,
/// Last 5 observed external ports (most recent first).
port_sequence: Vec<u16>,
/// Detected allocation pattern.
allocation: String, // "sequential:1", "sequential:2", "random", "preserving"
/// Probe timestamp for synchronization (ms since epoch).
probe_time_ms: u64,
/// External IP from STUN.
external_ip: String,
}
/// Hard NAT birthday attack coordination.
HardNatBirthdayStart {
call_id: String,
/// Number of ports opened by the acceptor side.
acceptor_port_count: u16,
/// External ports the acceptor has open (for targeted probing).
/// Only sent if port_count is small enough to enumerate.
acceptor_ports: Vec<u16>,
/// "start probing now" timestamp.
start_at_ms: u64,
}
```
## Integration with Existing Architecture
- **Netcheck**: `NetcheckReport` gains `port_allocation: PortAllocation` field
- **IceAgent**: `gather()` includes port allocation detection; `re_gather()` re-probes on network change
- **dual_path**: `race()` extended with hard-NAT probe phase between standard hole-punch timeout and relay commitment
- **Desktop**: `place_call` / `answer_call` exchange `HardNatProbe` when both sides report `SymmetricPort` NAT type
## Effort Estimate
| Phase | Scope | Effort | Status |
|-------|-------|--------|--------|
| A | Port allocation pattern detection | 1 day | **Done**`PortAllocation` enum, `detect_port_allocation()`, `classify_port_allocation()`, `predict_ports()`, 17 tests |
| B | Sequential port prediction + coordination | 2 days | **Signal ready**`HardNatProbe` signal + relay forwarding done. `dual_path::race()` integration pending |
| C | Birthday attack (256 sockets + 1024 probes) | 3 days | Not started |
| D | Hybrid waterfall + background upgrade | 2 days | Not started |
**Total**: ~8 days. Phase A is done and feeds into netcheck. Phase B has signal plumbing complete — needs `dual_path::race()` integration to actually dial predicted ports. Phase C (birthday) is the most complex and lowest ROI.
## Success Criteria
- Port allocation detection correctly classifies sequential vs random on test routers
- Sequential port prediction achieves >70% direct connection rate on sequential-NAT routers
- Birthday attack achieves >90% within 10 seconds when one peer has cone NAT
- Relay-to-direct upgrade is seamless (no audio gap) via Phase 8.3 transport hot-swap
- No regression in call setup time for cone-NAT pairs (the common case)
## References
- [Tailscale: How NAT traversal works](https://tailscale.com/blog/how-nat-traversal-works)
- [Tailscale: NAT traversal improvements pt.1](https://tailscale.com/blog/nat-traversal-improvements-pt-1)
- [Tailscale: NAT traversal improvements pt.2 — cloud environments](https://tailscale.com/blog/nat-traversal-improvements-pt-2-cloud-environments)
- RFC 4787: NAT Behavioral Requirements for Unicast UDP
- RFC 5245: ICE (Interactive Connectivity Establishment)
- Birthday problem: P(collision) = 1 - e^(-n²/2m) where n=probes, m=port space

116
docs/PRD-ice-regather.md Normal file
View File

@@ -0,0 +1,116 @@
# PRD: Mid-Call ICE Re-Gathering
> Phase: Implemented (signal plane); transport hot-swap deferred
> Status: Partial (2026-04-14)
> Crate: wzp-client, wzp-proto, wzp-relay
## Problem
When a mobile device transitions between networks (WiFi -> cellular, IP address change), the active QUIC connection dies. The call stays on a dead path until timeout, then the user experiences silence. There is no mechanism to re-discover candidates and re-establish a direct path mid-call.
Android's `NetworkMonitor.onIpChanged` already fires on `onLinkPropertiesChanged`, but nothing consumes it for candidate re-gathering or path migration.
## Solution
Implement an `IceAgent` that manages the full candidate lifecycle — initial gathering, mid-call re-gathering on network change, and peer candidate application. A new `CandidateUpdate` signal message carries refreshed candidates to the peer through the relay.
## Implementation
### New Module: `crates/wzp-client/src/ice_agent.rs`
**IceAgent struct**:
- Owns `IceAgentConfig` (STUN config, portmap toggle, gather timeout, local ports)
- Monotonic `generation: AtomicU32` — incremented on each re-gather, peers reject stale updates
- `peer_generation: AtomicU32` — tracks last-seen peer generation for ordering
**Public API**:
- `gather()` -> `CandidateSet` — runs STUN + portmap + host candidates in parallel with timeout
- `re_gather()` -> `(CandidateSet, SignalMessage)` — increments generation, returns update to send
- `apply_peer_update(signal)` -> `Option<PeerCandidates>` — parses `CandidateUpdate`, rejects if generation <= last-seen
**CandidateSet**:
```rust
pub struct CandidateSet {
pub reflexive: Option<SocketAddr>,
pub local: Vec<SocketAddr>,
pub mapped: Option<SocketAddr>,
pub generation: u32,
}
```
### New Signal: `CandidateUpdate`
```rust
CandidateUpdate {
call_id: String,
reflexive_addr: Option<String>,
local_addrs: Vec<String>,
mapped_addr: Option<String>,
generation: u32,
}
```
- All address fields use `#[serde(default, skip_serializing_if)]` for backward compat
- Generation counter is mandatory — prevents stale updates from network reordering
### Relay Forwarding
`CandidateUpdate` is forwarded to the call peer using the same pattern as `MediaPathReport`:
1. Look up peer fingerprint + `peer_relay_fp` from `CallRegistry`
2. If cross-relay: wrap in `FederatedSignalForward` and forward via federation link
3. If local: send via `signal_hub.send_to()`
### Desktop Handling
Signal recv loop handles `CandidateUpdate`:
- Logs generation, reflexive, mapped, local count
- Emits `recv:CandidateUpdate` debug event
- Emits `signal-event` type `candidate_update` to JS frontend
- TODO: wire into `IceAgent.apply_peer_update()` + `race_upgrade()` for transport hot-swap
### Deferred: Transport Hot-Swap
The actual mid-call transport replacement is not yet wired. The designed approach:
- `Arc<RwLock<Arc<QuinnTransport>>>` — send/recv tasks clone inner Arc per frame
- On upgrade, swap inner Arc under write lock — next frame picks up new transport
- Android: `pending_ice_regather: AtomicBool` polled in recv task, triggers re-gather + swap
- Requires live testing to validate seamless audio continuity during swap
## Signal Flow
```
Network change (WiFi -> cellular)
|
v
IceAgent::re_gather()
|-- stun::discover_reflexive()
|-- portmap::acquire_port_mapping()
|-- local_host_candidates()
|
v
SignalMessage::CandidateUpdate { generation: N+1 }
|
v (via relay)
Peer IceAgent::apply_peer_update()
|
v
PeerCandidates { reflexive, local, mapped }
|
v
dual_path::race() with new candidates [NOT YET WIRED]
```
## Files
| File | Change |
|------|--------|
| `crates/wzp-client/src/ice_agent.rs` | New — IceAgent + CandidateSet |
| `crates/wzp-proto/src/packet.rs` | `CandidateUpdate` variant |
| `crates/wzp-relay/src/main.rs` | Forward `CandidateUpdate` to peer |
| `crates/wzp-client/src/featherchat.rs` | Map `CandidateUpdate` to `IceCandidate` type |
| `desktop/src-tauri/src/lib.rs` | Handle `CandidateUpdate` in signal recv loop |
## Testing
- 10 unit tests: generation monotonicity, apply_peer_update (all fields, empty fields, unparseable addrs, stale rejection, wrong signal type), default config, gather with no STUN, re_gather produces signal with incrementing generation
- 2 protocol roundtrip tests: CandidateUpdate full + minimal

77
docs/PRD-netcheck.md Normal file
View File

@@ -0,0 +1,77 @@
# PRD: Network Diagnostic (Netcheck)
> Phase: Implemented
> Status: Done (2026-04-14)
> Crate: wzp-client
## Problem
When P2P connections fail or call quality is poor, there is no diagnostic tool to understand why. Users and developers must manually probe STUN, check NAT type, test relay connectivity, and verify port mapping support — all separately. Tailscale's `netcheck` consolidates all of this into a single diagnostic report.
## Solution
A comprehensive `run_netcheck()` function that probes all network capabilities in parallel and produces a structured `NetcheckReport`. Exposed as a CLI subcommand (`wzp-client --netcheck`) and available for in-app diagnostics.
## Implementation
### New Module: `crates/wzp-client/src/netcheck.rs`
**NetcheckReport**:
```rust
pub struct NetcheckReport {
pub nat_type: NatType,
pub reflexive_addr: Option<String>,
pub ipv4_reachable: bool,
pub ipv6_reachable: bool,
pub hairpin_works: Option<bool>,
pub port_mapping: Option<PortMapProtocol>,
pub relay_latencies: Vec<RelayLatency>,
pub preferred_relay: Option<String>,
pub stun_latency_ms: Option<u32>,
pub upnp_available: bool,
pub pcp_available: bool,
pub nat_pmp_available: bool,
pub gateway: Option<String>,
pub duration_ms: u32,
pub stun_probes: Vec<NatProbeResult>,
pub port_allocation: Option<PortAllocation>,
}
```
**Probes (all parallel via `tokio::join!`)**:
1. **STUN probes**`probe_stun_servers()` to all configured STUN servers
2. **Relay latencies**`probe_reflect_addr()` to each configured relay
3. **Port mapping**`acquire_port_mapping()` to detect NAT-PMP/PCP/UPnP
4. **Gateway**`default_gateway()` for the router address
5. **IPv6** — attempt to bind `[::]:0` and send to an IPv6 STUN server
6. **Port allocation**`detect_port_allocation()` probes STUN servers from single socket to classify NAT pattern as PortPreserving/Sequential/Random (feeds into hard NAT prediction)
**Derived fields**:
- `nat_type` / `reflexive_addr` — from `classify_nat()` on STUN probes
- `ipv4_reachable` — true if any STUN probe succeeded
- `preferred_relay` — relay with lowest RTT
- `port_mapping` / `nat_pmp_available` / `pcp_available` / `upnp_available` — from portmap result
**Human-readable output**: `format_report()` produces a formatted text report with sections for NAT info, port mapping, STUN probes, relay latencies.
### CLI Integration
`wzp-client --netcheck <relay-addr>` — runs the diagnostic using the specified relay plus default STUN servers, prints the report, and exits.
### Deferred
- **Hairpin test** — send packet from shared endpoint to own reflexive addr to test NAT hairpinning. Architecture is in place (`hairpin_works: Option<bool>`) but the actual probe is not yet implemented.
- **Android/Desktop in-app UI** — expose via JNI (Android) and Tauri command (desktop) for user-facing diagnostics.
## Files
| File | Change |
|------|--------|
| `crates/wzp-client/src/netcheck.rs` | New — NetcheckReport + run_netcheck + format_report |
| `crates/wzp-client/src/lib.rs` | Add `pub mod netcheck` |
| `crates/wzp-client/src/cli.rs` | `--netcheck` flag + handler |
## Testing
- 5 unit tests: default config, report JSON serialization + roundtrip, RelayLatency serialization, format_report with empty relays, format_report with full data (STUN probes, relay latencies, preferred relay, port mapping)
- 1 integration test (`#[ignore]`): full netcheck run

View File

@@ -103,17 +103,27 @@ Sentinel value `0xFF` means "no change pending". The recv task polls on every re
### Tauri Desktop App (com.wzp.desktop)
The Tauri engine doesn't use `AdaptiveQualityController` — quality is resolved once at call start. Adding network monitoring requires first adding adaptive quality to the Tauri call engine, which is a larger change.
~~The Tauri engine doesn't use `AdaptiveQualityController` — quality is resolved once at call start.~~ **Update (2026-04-13):** Desktop now has `AdaptiveQualityController` wired into the recv task with `pending_profile` AtomicU8 bridge. Network monitoring on desktop is now feasible — the blocker was adaptive quality, which is done. Remaining work: platform-specific network change detection (macOS: `SCNetworkReachability` or `NWPathMonitor`; Linux: `netlink` socket).
### Mid-Call ICE Re-gathering
### Mid-Call ICE Re-gathering — PARTIALLY IMPLEMENTED (2026-04-14)
When the device's IP address changes, ideally we should:
1. Re-gather local host candidates (`local_host_candidates()`)
2. Re-probe STUN (`probe_reflect_addr()`)
3. Send updated candidates to the peer (`CandidateUpdate` signal message)
4. Attempt new dual-path race for path upgrade
When the device's IP address changes, the system now:
1. Re-gather local host candidates (`local_host_candidates()`)
2. Re-probe STUN (`stun::discover_reflexive()` + `portmap::acquire_port_mapping()`)
3. Send updated candidates to the peer (`CandidateUpdate` signal message)
4. Relay forwards `CandidateUpdate` to peer (same pattern as `MediaPathReport`) ✅
5. Peer receives and can parse via `IceAgent::apply_peer_update()`
6. Attempt new dual-path race for path upgrade — **NOT YET WIRED** (transport hot-swap)
`NetworkMonitor.onIpChanged` fires on `onLinkPropertiesChanged` — the hook is ready, but the signaling and re-racing logic is not yet implemented.
`NetworkMonitor.onIpChanged` fires on `onLinkPropertiesChanged` — the hook is ready.
The signaling plane is fully implemented via `IceAgent` + `CandidateUpdate`.
Remaining: wire `onIpChanged` → JNI → `pending_ice_regather` AtomicBool → recv task → `ice_agent.re_gather()` → transport swap.
New modules added in Phase 8 (Tailscale-inspired):
- `crates/wzp-client/src/ice_agent.rs` — candidate lifecycle management
- `crates/wzp-client/src/stun.rs` — public STUN server probing (independent of relay)
- `crates/wzp-client/src/portmap.rs` — NAT-PMP/PCP/UPnP port mapping
- `crates/wzp-client/src/netcheck.rs` — comprehensive network diagnostic
## Testing

View File

@@ -138,9 +138,75 @@ The existing relay connection carries `IceCandidate` signals. No new infrastruct
## Milestones
| Phase | Scope | Effort |
|-------|-------|--------|
| 1 | STUN client + candidate gathering | 2 days |
| 2 | QUIC hole punching + identity verification | 3 days |
| 3 | Adaptive quality on P2P connection | 2 days |
| 4 | Hybrid mode (relay + P2P, seamless migration) | 3 days |
| Phase | Scope | Effort | Status |
|-------|-------|--------|--------|
| 1 | STUN client + candidate gathering | 2 days | Done |
| 2 | QUIC hole punching + identity verification | 3 days | Done |
| 3 | Adaptive quality on P2P connection | 2 days | Done (#23) |
| 4 | Hybrid mode (relay + P2P, seamless migration) | 3 days | Done |
| 5 | Single-socket Nebula (shared signal+direct endpoint) | 2 days | Done |
| 6 | ICE path negotiation + dual-path race | 3 days | Done |
| 7 | IPv6 dual-socket | 2 days | Done (but `dual_path.rs` integration tests broken — missing `ipv6_endpoint` arg) |
| 8.1 | Public STUN client (RFC 5389) | 1 day | Done |
| 8.2 | PCP/PMP/UPnP port mapping | 2 days | Done |
| 8.3 | Mid-call ICE re-gathering + CandidateUpdate signal | 2 days | Done (signal plane; transport hot-swap TODO) |
| 8.4 | Netcheck diagnostic | 1 day | Done |
| 8.5 | Region-based relay selection (data model) | 1 day | Done |
| 8.6a | Hard NAT: port allocation detection | 1 day | Done |
| 8.6b | Hard NAT: sequential port prediction signal | 1 day | Done (signal + prediction fn; dial integration pending) |
| 8.6c | Hard NAT: birthday attack (256×1024 probes) | 3 days | Not started |
| 8.6d | Hard NAT: hybrid waterfall + background upgrade | 2 days | Not started |
## Implementation Status (2026-04-13)
Phases 1-2, 4-7 are implemented. First P2P call completed 2026-04-12.
### Known regression
Phase 7 added `ipv6_endpoint: Option<Endpoint>` parameter to `race()` in `crates/wzp-client/src/dual_path.rs` but the 3 test call sites in `crates/wzp-client/tests/dual_path.rs` (lines 111, 153, 191) were not updated — they pass 6 args instead of 7. Fix: add `None,` after the `shared_endpoint` arg in each call.
## Update (2026-04-13)
P2P adaptive quality (#23) now implemented:
- Both peers self-observe network quality from QUIC path stats
- Quality reports generated every ~1s and attached to outgoing packets
- AdaptiveQualityController drives codec switching on both P2P and relay calls
## Update (2026-04-14): Phase 8 — Tailscale-Inspired Enhancements
Added 5 new modules to bring NAT traversal capability close to Tailscale's:
### Phase 8.1: Public STUN Client (Done)
- `stun.rs`: RFC 5389 Binding Request/Response over raw UDP
- Independent reflexive discovery via public STUN servers (Google, Cloudflare)
- `detect_nat_type_with_stun()` combines relay + STUN probes for higher confidence
- STUN fallback in desktop's `try_reflect_own_addr()` when relay reflection fails
### Phase 8.2: PCP/PMP/UPnP Port Mapping (Done)
- `portmap.rs`: NAT-PMP (RFC 6886), PCP (RFC 6887), UPnP IGD
- Gateway discovery (macOS + Linux), try NAT-PMP → PCP → UPnP in sequence
- New candidate type: `PeerCandidates.mapped` + signal fields `caller_mapped_addr`/`callee_mapped_addr`/`peer_mapped_addr`
- Dial order: host → mapped → reflexive (mapped helps on symmetric NATs)
### Phase 8.3: Mid-Call ICE Re-Gathering (Done — signal plane)
- `ice_agent.rs`: `IceAgent` with `gather()`, `re_gather()`, `apply_peer_update()`
- `SignalMessage::CandidateUpdate` with monotonic generation counter
- Relay forwards `CandidateUpdate` like `MediaPathReport`
- Desktop handles and emits to JS frontend
- Transport hot-swap: designed but not yet wired into live call engine
### Phase 8.4: Netcheck Diagnostic (Done)
- `netcheck.rs`: comprehensive network diagnostic (NAT type, reflexive addr, IPv4/v6, port mapping, relay latencies)
- CLI: `wzp-client --netcheck <relay>`
### Phase 8.5: Region-Based Relay Selection (Done — data model)
- `relay_map.rs`: `RelayMap` sorted by RTT with `preferred()` selection
- `RegisterPresenceAck` extended with `relay_region` + `available_relays`
### Phase 8.6: Hard NAT Traversal (Phase A done, B-D pending)
- **Phase A (Done)**: Port allocation pattern detection — `PortAllocation` enum (`PortPreserving`/`Sequential{delta}`/`Random`/`Unknown`), `detect_port_allocation()` probes N STUN servers from single socket, `classify_port_allocation()` with wraparound + jitter tolerance, `predict_ports()` for sequential NATs
- **Phase B (signal ready)**: `HardNatProbe` signal message carries `port_sequence`, `allocation`, `external_ip` — relay forwarding implemented. Actual dial-to-predicted-ports integration into `dual_path::race()` pending.
- **Phase C (not started)**: Birthday attack (256 sockets × 1024 probes) for random NATs
- **Phase D (not started)**: Hybrid waterfall with background relay-to-direct upgrade
- `NetcheckReport.port_allocation` populated automatically from `detect_port_allocation()`
- See `docs/PRD-hard-nat.md` for full design

92
docs/PRD-portmap.md Normal file
View File

@@ -0,0 +1,92 @@
# PRD: NAT Port Mapping (PCP/PMP/UPnP)
> Phase: Implemented
> Status: Done (2026-04-14)
> Crate: wzp-client, wzp-proto, wzp-relay
## Problem
WarzonePhone falls back to relay-only when the client is behind a symmetric NAT (different external port per destination). The STUN-discovered reflexive address won't match what a peer sees, so direct hole-punching fails. Tailscale reports ~70% of consumer routers support NAT-PMP, PCP, or UPnP — protocols that let clients request explicit port mappings, making symmetric NATs traversable.
## Solution
Implement all three port mapping protocols, tried in sequence (NAT-PMP -> PCP -> UPnP). When a mapping is acquired, advertise the mapped address as a new candidate type alongside reflexive and host candidates. The relay cross-wires it into `CallSetup.peer_mapped_addr` so the peer can dial it.
## Implementation
### New Module: `crates/wzp-client/src/portmap.rs`
**NAT-PMP (RFC 6886)**:
- UDP to gateway:5351
- External address request (opcode 0) -> returns router's public IP
- Map UDP request (opcode 1) -> returns mapped external port + lifetime
- 12-byte request, 16-byte response
**PCP (RFC 6887)**:
- Same gateway:5351, version 2
- MAP opcode with client IP as IPv4-mapped IPv6
- 60-byte request/response with 12-byte nonce for anti-spoofing
- Superset of NAT-PMP, supports IPv6
**UPnP IGD**:
- SSDP M-SEARCH to 239.255.255.250:1900 for InternetGatewayDevice discovery
- Parse LOCATION header -> fetch device description XML -> find WANIPConnection controlURL
- SOAP `GetExternalIPAddress` -> router's public IP
- SOAP `AddPortMapping` -> maps the QUIC port
**Gateway discovery**:
- macOS: `route -n get default` (parse `gateway:` line)
- Linux/Android: `/proc/net/route` (parse hex gateway for 00000000 destination)
**Public API**:
- `acquire_port_mapping(internal_port, local_ip)` -> tries all 3, first success wins
- `release_port_mapping(mapping)` -> best-effort cleanup (lifetime=0 for NAT-PMP)
- `spawn_refresh(mapping)` -> background task renewing at half-lifetime
- `default_gateway()` -> cross-platform gateway discovery
### Signal Protocol Extensions
| Message | New Field | Purpose |
|---------|-----------|---------|
| `DirectCallOffer` | `caller_mapped_addr: Option<String>` | Caller's port-mapped address |
| `DirectCallAnswer` | `callee_mapped_addr: Option<String>` | Callee's port-mapped address |
| `CallSetup` | `peer_mapped_addr: Option<String>` | Relay cross-wires peer's mapped addr |
All fields use `#[serde(default, skip_serializing_if)]` for backward compatibility.
### Relay Cross-Wiring
`CallRegistry` extended with `caller_mapped_addr` / `callee_mapped_addr` fields + setter methods. The relay:
1. Extracts `caller_mapped_addr` from `DirectCallOffer`, stores in registry
2. Extracts `callee_mapped_addr` from `DirectCallAnswer`, stores in registry
3. Cross-wires into `CallSetup`: caller gets callee's mapped addr as `peer_mapped_addr`, and vice versa
### Candidate Priority
`PeerCandidates.mapped` added to `dual_path.rs`. Dial order:
1. Host (LAN) candidates — fastest on same-LAN
2. **Port-mapped** — stable even behind symmetric NATs
3. Server-reflexive (STUN) — standard hole-punching
4. Relay — always-available fallback
### Desktop Integration
Both `place_call()` and `answer_call()` call `acquire_port_mapping()` using the signal endpoint's local port. Privacy-mode answers (`AcceptGeneric`) skip portmap to keep the address hidden.
## Files
| File | Change |
|------|--------|
| `crates/wzp-client/src/portmap.rs` | New — NAT-PMP/PCP/UPnP client |
| `crates/wzp-client/src/dual_path.rs` | `PeerCandidates.mapped` field + dial_order update |
| `crates/wzp-proto/src/packet.rs` | `caller/callee_mapped_addr` + `peer_mapped_addr` fields |
| `crates/wzp-relay/src/call_registry.rs` | `caller/callee_mapped_addr` fields + setters |
| `crates/wzp-relay/src/main.rs` | Extract, store, cross-wire mapped addrs |
| `desktop/src-tauri/src/lib.rs` | Call portmap in place_call/answer_call |
## Testing
- 18 unit tests: NAT-PMP encoding, UPnP XML parsing (5 variants including real-world router XML), URL host extraction, error Display, protocol serde, PortMapping serialization, gateway detection, constants verification
- 2 integration tests (`#[ignore]`): gateway discovery, acquire_mapping
- 9 PeerCandidates tests: dial_order with all types, dedup, is_empty edge cases
- 12 protocol roundtrip tests: offer/answer/setup with mapped addr, backward compat without

View File

@@ -62,6 +62,16 @@ if debug_tap_enabled {
### Effort: 0.5 day
### Implementation Status (2026-04-13)
Fully implemented. `--debug-tap <room>` (or `*` for all rooms) logs:
- **Per-packet metadata** (`TAP`): direction, addr, seq, codec, timestamp, FEC fields, payload size, fan_out
- **Signal events** (`TAP SIGNAL`): `RoomUpdate` (count + participant names), `QualityDirective` (codec + reason), other signals by discriminant
- **Lifecycle events** (`TAP EVENT`): participant join (id, addr, alias), participant leave (id, addr, forwarded count, or room closed)
All output uses tracing `target: "debug_tap"` so it can be filtered with `RUST_LOG=debug_tap=info`.
---
## 2. Full Protocol Analyzer (Standalone Tool)
@@ -176,3 +186,15 @@ wzp-analyzer --replay capture.wzp --report report.html
- Modifying packets in transit
- Automated quality scoring (MOS estimation)
- Video support
## Implementation Status (2026-04-13)
All phases implemented:
- Phase 1 (Observer + stats): wzp-analyzer binary, passive room observer, per-participant stats — DONE
- Phase 2 (TUI): ratatui display with color-coded loss severity — DONE
- Phase 3 (Capture/Replay): Binary .wzp format + CaptureReader for offline replay — DONE
- Phase 4 (HTML report): Self-contained with Chart.js loss/jitter timelines — DONE
- Phase 5 (Encrypted decode): Stub — SFU E2E encryption requires session context. Header-only analysis works. — PARTIAL
Binary: `cargo build --bin wzp-analyzer`
Usage: `wzp-analyzer relay:4433 --room test [--capture out.wzp] [--html report.html] [--no-tui]`

68
docs/PRD-public-stun.md Normal file
View File

@@ -0,0 +1,68 @@
# PRD: Public STUN Client
> Phase: Implemented
> Status: Done (2026-04-14)
> Crate: wzp-client
## Problem
WarzonePhone's reflexive address discovery depends entirely on relay-based `Reflect` messages over an authenticated QUIC signal channel. If the relay is unreachable, overloaded, or not yet connected, the client cannot discover its public IP:port for P2P hole-punching. This single point of failure means call setup is delayed or falls back to relay-only unnecessarily.
Tailscale solves this by querying multiple public STUN servers in parallel, independent of its DERP relay infrastructure.
## Solution
Implement a minimal RFC 5389 STUN Binding client over raw UDP that queries public STUN servers (Google, Cloudflare) in parallel. This provides:
1. **Independent reflexive discovery** — works without any relay connection
2. **Redundancy** — STUN fallback when relay reflection fails
3. **Better NAT classification** — more probes = higher confidence in Cone vs Symmetric detection
4. **Faster call setup** — STUN can run before signal registration completes
## Implementation
### New Module: `crates/wzp-client/src/stun.rs`
**Wire format** (RFC 5389):
- 20-byte header: type (u16) + length (u16) + magic cookie (0x2112A442) + transaction ID (12 bytes)
- Binding Request (0x0001): no attributes, just the header
- Binding Response (0x0101): parses XOR-MAPPED-ADDRESS (0x0020, preferred) and MAPPED-ADDRESS (0x0001, fallback)
- XOR decoding: port XOR'd with top 16 bits of magic cookie, IPv4 XOR'd with cookie, IPv6 XOR'd with cookie || txn ID
**Public API**:
- `stun_reflect(socket, server, timeout)` — single-server probe with one retry on first-packet timeout
- `discover_reflexive(config)` — parallel probe of N servers, first success wins
- `probe_stun_servers(config)` — all-server probe returning `Vec<NatProbeResult>` for NAT classification
- `resolve_stun_server(host_port)` — DNS resolution preferring IPv4
**Default servers**: `stun.l.google.com:19302`, `stun1.l.google.com:19302`, `stun.cloudflare.com:3478`
**Error handling**: `StunError` enum — Io, Timeout, Malformed, TxnMismatch, ErrorResponse, NoMappedAddress, DnsError
### Integration Points
1. **`reflect.rs`**: New `detect_nat_type_with_stun()` runs relay probes and STUN probes concurrently via `tokio::join!`, merges results, re-classifies
2. **Desktop `lib.rs`**: `try_reflect_own_addr()` falls back to `try_stun_fallback()` when relay reflection fails or times out
3. **Desktop `detect_nat_type` command**: Uses `detect_nat_type_with_stun()` for combined relay + STUN classification
### Design Decisions
- **Separate UDP socket** per STUN probe — can't share the QUIC socket (quinn owns its I/O driver)
- **No external crate** — RFC 5389 Binding is ~200 lines of code, no need for `stun-rs` or `webrtc-rs`
- **Retry once** at half-timeout — handles the "first-packet problem" where some NATs drop the initial UDP packet to a new destination
- **IPv4 preferred** for DNS resolution — Phase 7 IPv6 is still flaky
## Files
| File | Change |
|------|--------|
| `crates/wzp-client/src/stun.rs` | New — STUN client |
| `crates/wzp-client/src/lib.rs` | Add `pub mod stun` |
| `crates/wzp-client/src/reflect.rs` | Add `detect_nat_type_with_stun()` |
| `crates/wzp-client/Cargo.toml` | Add `rand` dependency |
| `desktop/src-tauri/src/lib.rs` | STUN fallback in `try_reflect_own_addr()`, STUN in `detect_nat_type` |
## Testing
- 22 unit tests: encode/decode roundtrips, XOR-MAPPED-ADDRESS (IPv4, IPv6, high port), MAPPED-ADDRESS fallback (IPv4, IPv6), unknown family, attribute padding, unknown attributes skipped, truncated attributes, error response, bad cookie, txn mismatch, too short, no mapped address, XOR preferred over mapped, error Display, default config, empty servers
- 2 integration tests (`#[ignore]`): query `stun.l.google.com`, multi-server probe

View File

@@ -0,0 +1,314 @@
# PRD: Relay Concurrency — DashMap Room Sharding
## Problem
The relay's media forwarding hot path routes every packet through a single `Arc<Mutex<RoomManager>>`. In a room with N participants, all N per-participant tasks compete for this one lock on every packet. The lock hold time is short (~1ms, no I/O), but the serialization means a 100-participant room effectively runs single-threaded despite having a multi-core tokio runtime.
Separately, the federation manager holds `peer_links` locked across multiple network sends, meaning a slow federation peer blocks all others.
### Measured bottleneck (from code audit)
```
Per-packet hot path (room.rs:748-757, 968-976):
lock(room_mgr)
→ observe_quality() O(N) iterate qualities HashMap
→ others() O(M) clone Vec<ParticipantSender>
unlock
→ fan-out sends sequential, no lock held
```
Lock contention = O(N) per room per packet, where N = participants in the room.
### Current lock inventory (hot path only)
| Lock | Location | Hold Duration | I/O While Locked | Frequency |
|------|----------|---------------|-------------------|-----------|
| `RoomManager` | room.rs:749, 968 | ~1ms | No | Every packet, every participant |
| `RoomManager` | room.rs:845, 1041 | <1ms | No | Every 5s per participant |
| `RoomManager` | room.rs:870 | ~1ms | No (explicit `drop` before broadcast) | On leave |
| `peer_links` | federation.rs:409 | N × send latency | **YES**`send_raw_datagram` in loop | Every federation packet |
| `peer_links` | federation.rs:216 | N × send latency | **YES**`send_signal` in loop | Every federation signal |
| `dedup` | federation.rs:1066 | <1ms | No | Every federation ingress packet |
| `rate_limiters` | federation.rs:1113 | <1ms | No | Every federation ingress packet |
### Scaling impact
| Room Size | Effective Core Usage | Bottleneck |
|-----------|---------------------|------------|
| 3 people × 100 rooms | All cores | None |
| 10 people × 10 rooms | Most cores | Mild contention per room |
| 100 people × 1 room | ~1 core | RoomManager lock |
| 1000 people × 1 room | ~1 core | Severely serialized |
## Goals
- Eliminate the global RoomManager Mutex as a serialization point for media forwarding
- Allow per-room parallelism: packets in room A don't block packets in room B
- Fix federation `peer_links` lock held across network sends
- Maintain correctness: no double-delivery, no stale participant lists
- Zero-copy or minimal-clone for fan-out participant lists
- Keep the refactor incremental — each phase independently shippable
## Non-Goals
- Lock-free data structures (overkill for our scale; DashMap or per-room Mutex is sufficient)
- Changing the SFU forwarding model (no mixing, no transcoding)
- Optimizing single-room beyond ~1000 participants (conferencing at that scale needs a different architecture)
- Changing the wire protocol or client behavior
## Design Options Evaluated
### Option A: Per-Room `Arc<Mutex<Room>>`
**Approach:** Replace `HashMap<String, Room>` inside RoomManager with `HashMap<String, Arc<Mutex<Room>>>`. The outer HashMap is protected by a short-lived lock for room lookup only; the per-room lock protects participant state.
```rust
struct RoomManager {
rooms: Mutex<HashMap<String, Arc<Mutex<Room>>>>, // outer: room lookup
// ...
}
// Hot path becomes:
let room_arc = {
let rooms = room_mgr.rooms.lock().await;
rooms.get(&room_name).cloned() // Arc clone, <1ns
}; // outer lock released
if let Some(room) = room_arc {
let room = room.lock().await; // per-room lock
let others = room.others(participant_id);
drop(room);
// fan-out sends...
}
```
**Pros:**
- Rooms are fully independent — room A's lock doesn't block room B
- Minimal code change (~50 lines)
- Per-room lock contention = O(participants in that room), not O(total participants)
- Outer lock held for <1μs (just a HashMap get + Arc clone)
**Cons:**
- Two-level locking (room lookup + room lock) — slightly more complex
- Room creation/deletion still serialized through outer lock (acceptable, rare operation)
- Quality tracking needs to move into the Room struct
**Verdict: Best option. Biggest win for least effort.**
### Option B: `DashMap<String, Room>`
**Approach:** Replace `Mutex<HashMap<String, Room>>` with `dashmap::DashMap<String, Room>`. DashMap uses internal sharding (default 64 shards) with per-shard RwLocks.
```rust
struct RoomManager {
rooms: DashMap<String, Room>,
}
// Hot path:
if let Some(room) = room_mgr.rooms.get(&room_name) {
let others = room.others(participant_id); // read lock on shard
drop(room); // release shard lock
// fan-out sends...
}
```
**Pros:**
- No explicit locking in user code
- Built-in sharding (64 shards by default)
- Read-heavy workload benefits from RwLock per shard
**Cons:**
- New dependency (`dashmap` crate)
- DashMap guards can't be held across `.await` points (not `Send`)
- Mutable operations (join/leave/quality update) need `get_mut()` which takes exclusive shard lock
- Less control over lock granularity than Option A
- Quality tracking across rooms becomes awkward (can't iterate all rooms while holding one shard)
**Verdict: Good but Option A is simpler and more explicit.**
### Option C: Channel-Based Fan-Out
**Approach:** Replace direct `send_media()` calls with per-participant `mpsc::Sender` channels. Room join registers a sender; the forwarding loop just does `tx.send(pkt)` which is lock-free.
```rust
struct Room {
participants: Vec<(ParticipantId, mpsc::Sender<MediaPacket>)>,
}
// Each participant's task:
let (tx, mut rx) = mpsc::channel(64);
room_mgr.join(room, participant_id, tx);
// Forwarding in recv loop:
let senders = room.others(participant_id); // Vec<mpsc::Sender> clone
for tx in &senders {
let _ = tx.try_send(pkt.clone()); // non-blocking, no lock
}
```
**Pros:**
- Fan-out is completely lock-free (channel send is atomic)
- Backpressure per participant (full channel = drop packet, not block others)
- Natural decoupling: recv task → channel → send task
**Cons:**
- Requires cloning MediaPacket per participant (currently we clone ParticipantSender Arc, much cheaper)
- Additional memory: 64-packet channel buffer × N participants
- Still need a lock to get the sender list (unless we snapshot on join/leave)
- Adds latency: channel hop + wake adds ~1-5μs vs direct send
**Verdict: Over-engineered for current scale. Consider for 1000+ participant rooms.**
### Option D: Snapshot-on-Change (Optimistic Read)
**Approach:** Maintain a read-optimized `Arc<Vec<ParticipantSender>>` snapshot per room. Updated atomically on join/leave (rare). Readers just `Arc::clone()` — no lock at all.
```rust
struct Room {
participants: Vec<Participant>,
/// Atomically-updated snapshot of all senders (rebuilt on join/leave).
sender_snapshot: Arc<ArcSwap<Vec<ParticipantSender>>>,
}
// Hot path (zero locking!):
let senders = room.sender_snapshot.load(); // atomic load, ~1ns
for sender in senders.iter() {
if sender.id != participant_id { ... }
}
```
**Pros:**
- Zero lock contention on hot path — just an atomic pointer load
- Rebuild cost amortized over all packets between joins/leaves
- `arc-swap` crate is battle-tested and tiny
**Cons:**
- New dependency (`arc-swap`)
- Quality tracking still needs a mutable path (separate concern)
- Snapshot doesn't include mutable room state (quality tiers)
- More complex join/leave (must rebuild snapshot atomically)
**Verdict: Best theoretical performance, but adds complexity. Consider if DashMap proves insufficient.**
## Recommended Implementation: Option B (DashMap) + Federation Fix
DashMap is the right tool here. The original objections don't hold up:
- "Guards can't be held across `.await`" — we already drop locks before any async sends
- "Less control" — DashMap's 64 internal shards give finer granularity than manual per-room locks
- "New dependency" — one crate, battle-tested, widely used in the Rust ecosystem
DashMap's advantages over manual per-room `Arc<Mutex<Room>>`:
- **No two-level locking** — single `rooms.get()` vs outer-lock → Arc clone → drop → inner-lock
- **Read/write separation** — `get()` is a shared shard lock, multiple rooms on the same shard can read concurrently
- **Less code** — no manual Arc/Mutex wrapping, no explicit lock choreography
- **Iteration without global lock** — federation room announcements don't block media forwarding
### Phase 1: DashMap Room Storage (Biggest Win)
1. Add `dashmap` dependency to `wzp-relay`
2. Replace `rooms: HashMap<String, Room>` with `rooms: DashMap<String, Room>`
3. Move `qualities` and `room_tiers` into the `Room` struct (per-room state, not global)
4. RoomManager no longer needs a wrapping Mutex — it becomes `Arc<RoomManager>` directly
5. Per-packet hot path: `rooms.get(&name)` takes a shared shard lock, releases on drop
```rust
pub struct RoomManager {
rooms: DashMap<String, Room>,
acl: Option<HashMap<String, HashSet<String>>>, // read-only after init
event_tx: broadcast::Sender<RoomEvent>,
}
struct Room {
participants: Vec<Participant>,
qualities: HashMap<ParticipantId, ParticipantQuality>,
current_tier: Tier,
}
// Hot path becomes:
let (others, directive) = if let Some(mut room) = room_mgr.rooms.get_mut(&room_name) {
let directive = if let Some(ref qr) = pkt.quality_report {
room.observe_quality(participant_id, qr)
} else {
None
};
let o = room.others(participant_id);
(o, directive)
} else {
(vec![], None)
};
// Shard lock released here — fan-out sends are lock-free
```
**Files to modify:**
- `crates/wzp-relay/Cargo.toml` — add `dashmap` dependency
- `crates/wzp-relay/src/room.rs` — RoomManager struct, Room struct, all methods
- `crates/wzp-relay/src/lib.rs` — change from `Arc<Mutex<RoomManager>>` to `Arc<RoomManager>`
- `crates/wzp-relay/src/main.rs` — update RoomManager construction and all `.lock().await` call sites
- `crates/wzp-relay/src/federation.rs` — update room_mgr usage (no more `.lock().await`)
**Key behavior change:** `Arc<Mutex<RoomManager>>``Arc<RoomManager>`. Every call site that does `room_mgr.lock().await.some_method()` becomes `room_mgr.some_method()` directly. The DashMap handles internal locking.
**Concurrency improvement:**
- Before: 100 rooms × 10 people = all 1000 tasks compete for 1 Mutex
- After: 100 rooms × 10 people = distributed across 64 shards, ~15 tasks per shard average
- Within a room: participants still serialize through the shard lock, but hold time is <0.1ms for `get()` and `others()` (just Vec clone of Arcs)
### Phase 2: Federation Lock Fix
Clone the peer list, release lock, then send:
```rust
pub async fn forward_to_peers(&self, room_hash: &[u8; 8], media_data: &Bytes) {
let peers: Vec<_> = {
let links = self.peer_links.lock().await;
links.values().map(|l| (l.label.clone(), l.transport.clone())).collect()
}; // lock released immediately
for (label, transport) in &peers {
// send without holding lock — slow peer doesn't block others
}
}
```
Also apply to `broadcast_signal()` and `send_signal_to_peer()`.
**Files to modify:**
- `crates/wzp-relay/src/federation.rs` — 3 methods
**Concurrency improvement:** A slow federation peer no longer blocks all other peers' media delivery.
### Phase 3: Quality Tracking Optimization (Optional)
With DashMap, quality tracking uses `get_mut()` (exclusive shard lock) on every packet that carries a QualityReport. For rooms where quality reports are frequent, this creates write contention on the shard.
Option: Move quality observation to a background task:
1. Per-participant `AtomicU8` for latest loss/RTT (lock-free write from hot path)
2. Background task every 1s reads atomics, computes tiers, broadcasts directives
3. Hot path becomes read-only: `rooms.get()` (shared lock) → `others()` → done
**Reduces shard lock from exclusive (`get_mut`) to shared (`get`) on every packet.**
## Verification
1. **Correctness:** `cargo test -p wzp-relay` — all existing tests must pass
2. **Compile check:** `cargo check --workspace` — no regressions
3. **Load test:** 10 rooms × 10 participants, verify rooms forward concurrently
4. **Large room:** 1 room × 50 participants, no deadlocks
5. **Federation:** 3 relays, media bridges correctly with new lock pattern
6. **Benchmark:** Before/after packets-per-second on multi-core with `wzp-bench`
## Effort
- Phase 1: 1 day (DashMap migration + test updates)
- Phase 2: 0.5 day (federation clone-and-release)
- Phase 3: 0.5 day (optional, quality tracking with atomics)
- Total: 1.52 days
## Implementation Status (2026-04-13)
Phase 1 (DashMap): DONE — global Mutex → DashMap<String, Room> with 64 shards
Phase 2 (Federation clone-before-send): DONE — forward_to_peers, broadcast_signal, send_signal_to_peer
Phase 3 (Quality atomics): NOT DONE — optional optimization
See also: docs/REFACTOR-relay-concurrency.md for the full post-refactor analysis.

View File

@@ -0,0 +1,88 @@
# PRD: Region-Based Relay Selection
> Phase: Implemented (data model)
> Status: Done (2026-04-14)
> Crate: wzp-client, wzp-proto, wzp-relay
## Problem
Clients are configured with a single relay address. With multiple relays in the federation mesh, the client should automatically discover all available relays and select the lowest-latency one. Currently there is no mechanism for the relay to advertise its mesh peers to clients, and no client-side data structure to track relay health over time.
## Solution
1. Relays advertise their region and mesh peers in `RegisterPresenceAck`
2. Clients maintain a `RelayMap` sorted by measured RTT
3. `preferred()` returns the best relay for call setup
## Implementation
### New Module: `crates/wzp-client/src/relay_map.rs`
**RelayEntry**:
```rust
pub struct RelayEntry {
pub name: String,
pub addr: SocketAddr,
pub region: Option<String>,
pub rtt_ms: Option<u32>,
pub last_probed: Option<Instant>,
pub reachable: bool,
}
```
**RelayMap API**:
- `upsert(name, addr, region)` — add or update a relay entry
- `update_rtt(addr, rtt_ms)` — record probe result, marks reachable, re-sorts
- `mark_unreachable(addr)` — sorts unreachable entries to end
- `preferred()` -> `Option<&RelayEntry>` — lowest RTT reachable relay
- `populate_from_ack(relays, region)` — parse `RegisterPresenceAck.available_relays` (format: `"name|addr"`)
- `needs_reprobe(max_age)` — true if any entry has stale or missing probe
- `stale_entries(max_age)` — list of entries needing fresh probes
### Signal Protocol Extension
`RegisterPresenceAck` extended:
```rust
RegisterPresenceAck {
success: bool,
error: Option<String>,
relay_build: Option<String>,
relay_region: Option<String>, // NEW
available_relays: Vec<String>, // NEW — "name|addr" format
}
```
### Relay Config Extension
`RelayConfig` extended:
```rust
pub region: Option<String>, // e.g., "us-east", "eu-west"
pub advertised_addr: Option<SocketAddr>, // for available_relays population
```
### Relay Population
On `RegisterPresenceAck`, the relay populates:
- `relay_region` from `config.region`
- `available_relays` from `config.peers` (label|url format)
### Deferred
- **Automatic relay switching** — using `preferred()` to select relay during call setup instead of hardcoded config
- **Background reprobing** — periodic RTT measurements to keep the relay map fresh
- **Cross-relay RTT estimation** — using mesh probe data to estimate combined caller-RTT + callee-RTT for optimal relay placement
## Files
| File | Change |
|------|--------|
| `crates/wzp-client/src/relay_map.rs` | New — RelayMap + RelayEntry |
| `crates/wzp-client/src/lib.rs` | Add `pub mod relay_map` |
| `crates/wzp-proto/src/packet.rs` | `relay_region` + `available_relays` on RegisterPresenceAck |
| `crates/wzp-relay/src/config.rs` | `region` + `advertised_addr` fields |
| `crates/wzp-relay/src/main.rs` | Populate RegisterPresenceAck from config + peers |
## Testing
- 15 unit tests: preferred by RTT, unreachable not preferred, preferred empty/all-unreachable, populate_from_ack (valid + malformed entries), upsert updates/preserves region, needs_reprobe (empty/never/fresh), stale_entries, sort stability with equal RTT, mark_unreachable sorts to end, RelayEntry serialization
- 2 protocol tests: RegisterPresenceAck roundtrip with new fields, backward compat without new fields

View File

@@ -120,7 +120,7 @@
- **Web audio drift**: The browser AudioWorklet playback buffer caps at 200ms, but clock drift between the WebSocket message arrival rate and the AudioContext output rate can cause occasional underruns or accumulation. The cap prevents unbounded growth but may cause glitches.
- **Adaptive loop integration (resolved)**: AdaptiveQualityController is now fully wired into both desktop and Android send/recv tasks. Relay-coordinated codec switching broadcasts QualityDirective to all participants based on weakest-link policy.
- **Adaptive loop integration (resolved)**: AdaptiveQualityController wired into both desktop and Android send/recv tasks. Relay-coordinated codec switching broadcasts QualityDirective — now handled by both engines (fixed 2026-04-13). 5-tier classification (Studio64k through Catastrophic) with asymmetric hysteresis.
- **Relay FEC pass-through**: In room mode, the relay forwards packets opaquely without FEC decode/re-encode. This means FEC protection is end-to-end only, not per-hop. In forward mode, the relay pipeline does perform FEC decode/re-encode.
@@ -128,18 +128,18 @@
## Test Coverage
307+ tests across 7 crates (wzp-web has no Rust tests):
372+ tests across 7 crates (wzp-web has no Rust tests):
| Crate | Test Count |
|-------|------------|
| wzp-proto | ~79 |
| wzp-proto | ~84 |
| wzp-codec | ~69 |
| wzp-fec | ~21 |
| wzp-crypto | ~21 |
| wzp-transport | ~11 |
| wzp-relay | ~50 |
| wzp-relay | ~120 |
| wzp-client | ~57 |
| **Total** | **307+** |
| **Total** | **372+** |
Tests cover:
- Wire format roundtrip (header, quality report, full packet)
@@ -192,7 +192,62 @@ Run with `wzp-bench --all`. Representative results (Apple M-series, single core)
- **CI**: Gitea workflow defined for amd64/arm64/armv7 builds
- **Production**: Not yet deployed to production networks
## Recent Changes (2026-04-12)
## Recent Changes (2026-04-13)
### P2P Adaptive Quality (#23, 2026-04-13)
- QualityReport::from_path_stats() — construct reports from local quinn stats
- CallEncoder.pending_quality_report — one-shot attachment to source packets
- Send tasks generate quality reports every 50 frames (~1s) from path stats
- Recv tasks self-observe from own QUIC stats for P2P adaptation
- Both relay and P2P calls now have full adaptive quality
### Protocol Analyzer (#13-17, 2026-04-13)
- New binary: wzp-analyzer (crates/wzp-client/src/analyzer.rs, ~900 lines)
- Passive observer: joins room, receives all media, never sends
- TUI mode (ratatui): per-participant table with loss%, jitter, codec, color-coded
- No-TUI mode: stats printed to stderr every 2s
- Binary capture format (.wzp) with microsecond timestamps
- Replay mode: offline analysis from capture files
- HTML report: self-contained with Chart.js loss/jitter timelines
- Encrypted decode: stub (needs session key + nonce context for SFU E2E)
### Codebase Refactoring (2026-04-13)
- DashMap relay concurrency: global Mutex → 64-shard DashMap
- Federation clone-before-send: eliminated last lock-during-I/O
- Engine deduplication: 3 shared helpers, eliminated 250 lines duplication
- 29 federation tests (was 0)
- Clap CLI parser for relay (replaced 154-line manual parser)
- Magic number constants, error handling helpers, safety docs
### 5-Tier Adaptive Quality Classification (#9)
- `Tier` enum extended from 3 to 6 levels: Studio64k > Studio48k > Studio32k > Good > Degraded > Catastrophic
- WiFi thresholds: loss < 1%/RTT < 30ms (Studio64k) through loss >= 15%/RTT >= 200ms (Catastrophic)
- Cellular stays at Good ceiling (no studio tiers on mobile data)
- Asymmetric hysteresis: downgrade 3 reports, upgrade 5, studio upgrade 10
- `Tier` derives `Ord` — ordering matches quality level (Catastrophic=0, Studio64k=5)
- `weakest_tier()` simplified to `.min()` via Ord
### Client QualityDirective Handling (#27)
- Both desktop signal tasks (P2P and relay engines) now match `QualityDirective` signals
- Android signal task matches `QualityDirective` and stores profile index via `pending_profile_recv`
- Relay-coordinated codec switching now works end-to-end: relay broadcasts → clients react
- Closes the gap documented in PRD-coordinated-codec.md
### Debug Tap Enhancements (#11, #12)
- `log_signal()`: logs `RoomUpdate` (count + participant names), `QualityDirective` (codec + reason)
- `log_event()`: logs participant join/leave lifecycle events
- `log_stats()`: periodic 5-second summary — packets in/out, fan-out avg, seq gaps, codecs seen
- `TapStats` struct tracks per-participant metrics across the forwarding loop
- All output via `target: "debug_tap"` for RUST_LOG filtering
### Bug Fix: dual_path.rs Phase 7 regression
- Added missing `ipv6_endpoint: None` parameter to 3 `race()` call sites in integration tests
- Phase 7 IPv6 dual-socket changed the function signature but tests were not updated
### Build: Keystore sync (f17420a)
- `build.sh` syncs keystores from persistent cache before build
## Previous Changes (2026-04-12)
### Bluetooth Audio Routing
- 3-way route cycling: Earpiece → Speaker → Bluetooth SCO
@@ -260,3 +315,77 @@ Run with `wzp-bench --all`. Representative results (Apple M-series, single core)
- Logs initial state, poll count, and final state for HAL debugging
- Does NOT fail on timeout — Rust-side stall detector remains as safety net
- Targets Nothing Phone A059 intermittent silent calls on cold start
### Opus6k Frame Starvation Fix (2026-04-13)
- Root cause: partial reads from capture ring consumed samples that were discarded on retry
- `audio_read_capture(&mut buf[..1920])` with only 960 available → read 960, loop retried from buf[0], overwriting
- Added `wzp_native_audio_capture_available()` — check before reading (matches desktop pattern)
- `frame_samples` made mutable and updated on adaptive profile switch
- `buf` sized to max frame (1920) with `[..frame_samples]` slices throughout
- Result: Opus6k frame rate restored from ~11/s to expected 25/s
### Build Script Fixes (2026-04-13)
- Stale APK cleanup: delete all APKs before build, prefer `*release*.apk` on upload
- APK signing: added zipalign + apksigner pipeline to `build.sh` (was in `build-tauri-android.sh` only)
- Keystore persistence: `$BASE_DIR/data/keystore/` cache synced into source tree before build
- Fixes: 384MB debug APK uploaded instead of 25MB release; unsigned APK on alt server
### Phase 8: Tailscale-Inspired STUN/ICE Enhancements (2026-04-14)
5 new modules in `wzp-client`, 83 new unit tests (588 total across workspace).
#### Public STUN Client (`stun.rs`)
- Minimal RFC 5389 STUN Binding Request/Response over raw UDP
- XOR-MAPPED-ADDRESS (preferred) + MAPPED-ADDRESS (fallback) parsing
- Default servers: `stun.l.google.com:19302`, `stun1.l.google.com:19302`, `stun.cloudflare.com:3478`
- `discover_reflexive()` — first-success parallel probe across N servers
- `probe_stun_servers()` — full results for NAT classification
- Integrated into `detect_nat_type_with_stun()` combining relay + STUN probes
- Desktop STUN fallback in `try_reflect_own_addr()` when relay reflection fails
#### PCP/PMP/UPnP Port Mapping (`portmap.rs`)
- **NAT-PMP** (RFC 6886): UDP to gateway:5351, external address + port mapping
- **PCP** (RFC 6887): PCP MAP opcode, IPv4-mapped IPv6 client address
- **UPnP IGD**: SSDP M-SEARCH discovery + SOAP `AddPortMapping`/`GetExternalIPAddress`
- Gateway discovery: macOS (`route -n get default`), Linux (`/proc/net/route`)
- `acquire_port_mapping()` tries NAT-PMP → PCP → UPnP, first success wins
- `release_port_mapping()` + `spawn_refresh()` for lifecycle management
- Signal protocol: `caller_mapped_addr`/`callee_mapped_addr` on offer/answer, `peer_mapped_addr` on CallSetup
- `PeerCandidates.mapped` — new candidate type in dial order (host → mapped → reflexive)
#### Mid-Call ICE Re-Gathering (`ice_agent.rs`)
- `IceAgent`: owns candidate lifecycle with `gather()`, `re_gather()`, `apply_peer_update()`
- Monotonic generation counter prevents stale candidate updates from reordering
- `SignalMessage::CandidateUpdate` — new signal for mid-call candidate exchange
- Relay forwards `CandidateUpdate` to call peer (same pattern as `MediaPathReport`)
- Desktop handles `CandidateUpdate` in signal recv loop, emits to JS frontend
- Transport hot-swap architecture designed (TODO: wire into live call engine)
#### Netcheck Diagnostic (`netcheck.rs`)
- `NetcheckReport`: NAT type, reflexive addr, IPv4/v6, port mapping, relay latencies, gateway
- `run_netcheck()` — parallel probes for STUN + relay + portmap + IPv6
- `format_report()` — human-readable diagnostic output
- CLI: `wzp-client --netcheck <relay>` runs diagnostic
#### Region-Based Relay Selection (`relay_map.rs`)
- `RelayMap` sorted by RTT, `preferred()` returns lowest-latency reachable relay
- `populate_from_ack()` — parses `RegisterPresenceAck.available_relays`
- Stale detection (`needs_reprobe()`, `stale_entries()`)
- `RegisterPresenceAck` extended with `relay_region` and `available_relays`
#### Hard NAT Port Allocation Detection (`stun.rs` Phase A)
- `PortAllocation` enum: `PortPreserving` / `Sequential { delta }` / `Random` / `Unknown`
- `detect_port_allocation()` — sequential STUN probes from single socket, analyzes external port sequence
- `classify_port_allocation()` — pure classifier with wraparound handling, jitter tolerance (±1), 60% threshold for noisy sequences
- `predict_ports(last_port, delta, offset, spread)` — generates target port range for sequential NATs
- `HardNatProbe` signal message for peer coordination (carries port_sequence, allocation, external_ip)
- Relay forwards `HardNatProbe` to call peer
- `NetcheckReport.port_allocation` field populated automatically
- 17 new tests for classification, prediction, serde, Display
#### Relay End-to-End Wiring (2026-04-14)
- `CallRegistry` stores + cross-wires `caller_mapped_addr`/`callee_mapped_addr` into `CallSetup.peer_mapped_addr`
- `RelayConfig` extended with `region` + `advertised_addr` fields
- `RegisterPresenceAck` populates `relay_region` from config, `available_relays` from federation peers
- Desktop `place_call`/`answer_call` call `acquire_port_mapping()` and fill mapped addr fields
- Legacy `build-android-docker.sh` renamed to `build-android-docker-LEGACY.sh` to prevent accidental use

View File

@@ -0,0 +1,271 @@
# Codebase Refactoring Audit (2026-04-13)
> Full analysis of the WarzonePhone codebase after the DashMap relay refactor, DRED continuous tuning, and adaptive quality wiring. The codebase is ~15K lines of Rust across 8 crates plus a 1.7K-line Tauri engine. This document identifies every refactoring opportunity ranked by impact.
## Critical: engine.rs is 1,705 Lines With ~35% Duplication
`desktop/src-tauri/src/engine.rs` has two nearly-identical `CallEngine::start()` implementations:
- **Android path:** 880 lines (lines 3211200)
- **Desktop path:** 430 lines (lines 12031633)
### What's Duplicated (350+ lines)
| Block | Android Lines | Desktop Lines | Size | Identical? |
|-------|--------------|---------------|------|-----------|
| CallConfig initialization | 529539 | 13531363 | 23 lines | Yes |
| DRED tuner + frame_samples setup | 541555 | 13601375 | 15 lines | Yes |
| Adaptive quality profile switch | 651665 | 14141428 | 15 lines | Yes |
| Codec-to-QualityProfile match | 852864 | 14881500 | 19 lines | Yes |
| DRED ingest + gap fill | 886902 | 15111528 | 17 lines | Yes |
| Quality report ingestion | 905912 | 15311538 | 8 lines | Yes |
| Signal task (entire thing) | 11331180 | 15691616 | 48 lines | Yes |
### Suggested Fix: Extract Shared Helpers
```rust
// Top of engine.rs — shared between both platforms
fn build_call_config(quality: &str) -> CallConfig { ... }
fn codec_to_profile(codec: CodecId) -> QualityProfile { ... }
fn check_adaptive_switch(
pending: &AtomicU8,
encoder: &mut CallEncoder,
tuner: &mut DredTuner,
frame_samples: &mut usize,
tx_codec: &Mutex<String>,
) { ... }
async fn run_signal_task(
transport: Arc<QuinnTransport>,
running: Arc<AtomicBool>,
pending_profile: Arc<AtomicU8>,
participants: Arc<Mutex<Vec<ParticipantInfo>>>,
) { ... }
```
This would reduce engine.rs by ~200 lines and make the Android/desktop paths only differ in their audio I/O (Oboe vs CPAL).
**Effort:** 2-3 hours. **Impact:** High — every future change to the send/recv pipeline currently requires editing two places.
---
## High: SignalMessage Enum Has 36 Variants
`crates/wzp-proto/src/packet.rs` (1,727 lines) has a `SignalMessage` enum with 36 variants mixing orthogonal concerns:
- Legacy call signaling (CallOffer, CallAnswer, IceCandidate, Rekey...)
- Direct calling (RegisterPresence, DirectCallOffer, DirectCallAnswer, CallSetup...)
- Federation (FederationHello, GlobalRoomActive/Inactive, FederatedSignalForward)
- Relay control (SessionForward, PresenceUpdate, RouteQuery, RoomUpdate)
- NAT traversal (Reflect, ReflectResponse, MediaPathReport)
- Quality (QualityUpdate, QualityDirective)
- Call control (Ping/Pong, Hold/Unhold, Mute/Unmute, Transfer)
Every new feature adds variants here, and every match on `SignalMessage` must handle all 36 arms (or use `_` wildcard).
### Suggested Fix: Sub-Enum Grouping
```rust
enum SignalMessage {
Call(CallSignal), // CallOffer, CallAnswer, IceCandidate, Rekey, Hangup...
Direct(DirectCallSignal), // RegisterPresence, DirectCallOffer, CallSetup, MediaPathReport...
Federation(FedSignal), // FederationHello, GlobalRoomActive, FederatedSignalForward...
Control(ControlSignal), // Ping/Pong, Hold/Unhold, Mute/Unmute, QualityDirective...
Relay(RelaySignal), // SessionForward, PresenceUpdate, RouteQuery, RoomUpdate...
}
```
**Caution:** This is a wire-format change. Serde serialization must remain backward-compatible with already-deployed relays. Use `#[serde(untagged)]` or versioned deserialization. Consider doing this as a v2 protocol bump.
**Effort:** 1 day. **Impact:** High for maintainability, but risky for wire compatibility.
---
## High: Federation Has Zero Tests
`crates/wzp-relay/src/federation.rs` (1,132 lines) has **no unit tests and no integration tests**. This is the most complex file in the relay crate, handling:
- Peer link management (connect, reconnect, stale sweep)
- Federation media egress (forward_to_peers)
- Federation media ingress (handle_datagram: dedup, rate limit, local delivery, multi-hop)
- Cross-relay signal forwarding
- Room event subscription and GlobalRoomActive/Inactive broadcasting
The relay crate has 91 tests, but none cover federation. Any refactoring of federation (like the DashMap migration or clone-before-send) is flying blind.
### Suggested Fix
Priority test cases:
1. `forward_to_peers` with 0, 1, 3 peers — verify datagram construction and label tracking
2. `handle_datagram` — dedup (same packet twice → second dropped), rate limit (exceed → dropped)
3. Stale presence sweeper — verify cleanup after timeout
4. `broadcast_signal` — verify signal reaches all peers
5. Multi-hop forward — verify source peer excluded from re-forward
**Effort:** 1 day. **Impact:** Critical for safe refactoring.
---
## Medium: Federation `peer_links` Lock-During-Send
`broadcast_signal()` (line 216) holds `peer_links` Mutex **across async `send_signal()` calls**. A slow peer blocks all signal delivery. `forward_to_peers()` (line 406) holds it during sync sends (less severe but still serializes).
### Fix (30 minutes)
```rust
// Before:
let links = self.peer_links.lock().await;
for (fp, link) in links.iter() {
link.transport.send_signal(msg).await; // lock held across await!
}
// After:
let peers: Vec<_> = {
let links = self.peer_links.lock().await;
links.values().map(|l| (l.label.clone(), l.transport.clone())).collect()
};
for (label, transport) in &peers {
transport.send_signal(msg).await; // no lock held
}
```
Apply to `forward_to_peers()`, `broadcast_signal()`, and `send_signal_to_peer()`.
**Effort:** 30 minutes. **Impact:** Medium — eliminates last lock-during-I/O pattern.
---
## Medium: Magic Numbers Scattered Through engine.rs
```rust
// These appear as literals in multiple places:
tokio::time::sleep(Duration::from_millis(5)) // 6 occurrences
tokio::time::sleep(Duration::from_millis(100)) // 2 occurrences
Duration::from_millis(200) // 2 occurrences (signal timeout)
Duration::from_secs(10) // 1 occurrence (QUIC connect timeout)
Duration::from_secs(2) // 2 occurrences (heartbeat interval)
const DRED_POLL_INTERVAL: u32 = 25; // defined twice (Android + desktop)
vec![0i16; 1920] // 2 occurrences (should use FRAME_SAMPLES_40MS)
```
### Fix
```rust
// Top of engine.rs
const CAPTURE_POLL_MS: u64 = 5;
const RECV_TIMEOUT_MS: u64 = 100;
const SIGNAL_TIMEOUT_MS: u64 = 200;
const CONNECT_TIMEOUT_SECS: u64 = 10;
const HEARTBEAT_INTERVAL_SECS: u64 = 2;
const DRED_POLL_INTERVAL: u32 = 25;
// Already exists: const FRAME_SAMPLES_40MS: usize = 1920;
```
**Effort:** 15 minutes. **Impact:** Low but prevents bugs from inconsistent values.
---
## Medium: CLI Arg Parsing in Relay main.rs
`parse_args()` in main.rs is 154 lines of manual `while i < args.len()` parsing with `match args[i].as_str()`. Every new flag adds 5-10 lines of boilerplate.
### Suggested Fix
Replace with `clap` derive macro:
```rust
#[derive(clap::Parser)]
struct RelayArgs {
#[arg(long, default_value = "0.0.0.0:4433")]
listen: SocketAddr,
#[arg(long)]
remote: Option<String>,
#[arg(long)]
auth_url: Option<String>,
// ...
}
```
**Effort:** 1 hour. **Impact:** Medium — cleaner, auto-generates `--help`, validates types at parse time.
---
## Medium: Error Handling Inconsistency
13 instances of `.ok()` silently swallowing errors on `transport.close()` across the relay. Federation signal forwarding has inconsistent error handling — some paths log, some don't.
### Fix
```rust
// Helper at top of main.rs/federation.rs:
async fn close_transport(t: &impl MediaTransport, context: &str) {
if let Err(e) = t.close().await {
tracing::debug!(context, error = %e, "transport close error (non-fatal)");
}
}
```
**Effort:** 30 minutes. **Impact:** Better observability when debugging connection issues.
---
## Low: Unused Crypto Fields
`crates/wzp-crypto/src/handshake.rs` has `x25519_static_secret` and `x25519_static_public` fields marked `#[allow(dead_code)]`. These are derived from the identity seed but never used in any handshake flow.
**Decision needed:** Are these intended for a future feature (static key federation auth)? If not, remove. If yes, document the intended use.
**Effort:** 5 minutes to remove, or 10 minutes to document.
---
## Low: 20 Unsafe Functions Missing Safety Docs
`crates/wzp-native/src/lib.rs` has 20 `unsafe` functions (extern "C" FFI bridge to Oboe) without `/// # Safety` documentation. Clippy flags all of them.
**Effort:** 30 minutes. **Impact:** Clippy clean, better documentation for contributors.
---
## Low: quality.rs vs dred_tuner.rs Overlap
Both files deal with network quality → codec decisions, but they're complementary:
- `quality.rs`: discrete tier classification (Good/Degraded/Catastrophic) → codec profile
- `dred_tuner.rs`: continuous DRED frame mapping from loss/RTT/jitter
No consolidation needed, but add cross-references:
```rust
// In dred_tuner.rs:
//! See also: `quality.rs` for discrete tier classification that drives
//! codec switching. DredTuner operates within a tier, adjusting DRED
//! parameters continuously.
// In quality.rs:
//! See also: `dred_tuner.rs` for continuous DRED tuning within a tier.
```
**Effort:** 5 minutes.
---
## Summary: Priority Matrix
| # | Refactor | Effort | Impact | Risk |
|---|----------|--------|--------|------|
| 1 | Extract shared engine.rs helpers | 2-3h | High | Low |
| 2 | Federation tests | 1 day | Critical | None |
| 3 | Federation clone-before-send | 30 min | Medium | Low |
| 4 | Extract magic numbers to constants | 15 min | Low | None |
| 5 | Error handling helpers | 30 min | Medium | None |
| 6 | CLI parser → clap | 1h | Medium | Low |
| 7 | SignalMessage sub-enums | 1 day | High | High (wire compat) |
| 8 | Safety docs on unsafe fns | 30 min | Low | None |
| 9 | Remove/document dead crypto fields | 5 min | Low | None |
| 10 | Cross-reference quality.rs ↔ dred_tuner.rs | 5 min | Low | None |
**Recommended order:** 4 → 3 → 5 → 1 → 2 → 6 → 8 → 9 → 10 → 7
Items 4, 3, 5 are quick wins (under 1 hour total). Item 1 is the biggest maintainability win. Item 2 is the most important for safety. Item 7 should wait for a protocol version bump.

View File

@@ -0,0 +1,256 @@
# Relay Concurrency Refactor Guide
> Post-DashMap analysis: what was done, what remains, and what to do next.
## What Was Done (2026-04-13)
Replaced the global `Arc<Mutex<RoomManager>>` with `DashMap<String, Room>` inside `RoomManager`. The relay's media forwarding hot path no longer serializes through a single lock.
### Before
```
Participant A recv_media()
→ room_mgr.lock().await ← ALL participants, ALL rooms compete here
→ mgr.observe_quality(...) ← O(N) quality computation inside lock
→ mgr.others(...) ← clone Vec<ParticipantSender>
→ drop(lock)
→ fan-out sends
```
One `tokio::sync::Mutex` guarding all rooms, all participants, all quality state. A 100-room relay was effectively single-threaded for media forwarding.
### After
```
Participant A recv_media()
→ room_mgr.observe_quality(...) ← DashMap::get_mut(), per-room shard lock
→ room_mgr.others(...) ← DashMap::get(), shared shard lock
→ fan-out sends ← no lock held
```
64 internal shards. Rooms on different shards are fully parallel. Rooms on the same shard use RwLock semantics — reads (`others()`) are concurrent, writes (`observe_quality()`, `join()`, `leave()`) are exclusive per-shard only.
### Files Changed
| File | Change |
|------|--------|
| `crates/wzp-relay/Cargo.toml` | Added `dashmap = "6"` |
| `crates/wzp-relay/src/room.rs` | `HashMap<String, Room>``DashMap<String, Room>`, per-room quality/tier, all methods `&self` |
| `crates/wzp-relay/src/main.rs` | `Arc<Mutex<RoomManager>>``Arc<RoomManager>`, 3 lock sites removed |
| `crates/wzp-relay/src/federation.rs` | 11 lock sites removed, `room_mgr` field type changed |
| `crates/wzp-relay/src/ws.rs` | 3 lock sites removed, `room_mgr` field type changed |
### Measured Improvement
| Metric | Before | After |
|--------|--------|-------|
| Lock type (rooms) | 1 global `tokio::sync::Mutex` | 64-shard `DashMap` with per-shard RwLock |
| Cross-room blocking | Yes (all rooms share 1 lock) | No (rooms are independent) |
| Read concurrency within room | None (Mutex is exclusive) | Yes (`get()` is shared) |
| `.lock().await` sites | 20 across 4 files | 0 for room operations |
| Test count | 314 passing | 314 passing (0 regressions) |
---
## Current Lock Inventory
### Tier 0: Eliminated (Room Hot Path)
These are gone — DashMap handles them internally:
- ~~`room_mgr.lock().await` in media forwarding~~ → `room_mgr.others()` (DashMap shard)
- ~~`room_mgr.lock().await` in quality tracking~~ → `room_mgr.observe_quality()` (DashMap shard)
- ~~`room_mgr.lock().await` in join/leave~~ → `room_mgr.join()` / `.leave()` (DashMap entry)
### Tier 1: Federation `peer_links` (Medium Priority)
**Location:** `crates/wzp-relay/src/federation.rs:142`
```rust
peer_links: Arc<Mutex<HashMap<String, PeerLink>>>
```
**22 lock sites** across federation.rs. The most important:
| Method | Line | Hold Duration | I/O While Locked | Frequency |
|--------|------|---------------|-------------------|-----------|
| `forward_to_peers()` | 406 | 1-5ms (iterate + sync send) | Sync only | Per-packet batch |
| `broadcast_signal()` | 216 | N × send_signal latency | **YES (async)** | Per-signal |
| `handle_datagram()` multi-hop | 1123 | 1-2ms (iterate + sync send) | Sync only | Per-federation-packet |
| `send_signal_to_peer()` | 246 | send_signal latency | **YES (async)** | Per-signal |
| Stale sweeper | 523 | 1-5ms | No | Every 5s |
**Impact:** Only matters with 5+ federation peers or high federation datagram rates (>1000 pps). For 1-3 peers, contention is negligible.
### Tier 2: Control Plane (Low Priority)
These are on the connection setup / signal path, not the media hot path:
| Lock | Location | Frequency |
|------|----------|-----------|
| `session_mgr` | main.rs:450 | Per-connection setup |
| `signal_hub` | main.rs:453 | Per-signal lookup |
| `call_registry` | main.rs:454 | Per-call setup |
| `presence` | main.rs:283 | Per-presence change |
| `ACL` | room.rs:357 | Per-room join |
**Impact:** None. These handle rare events (connection setup, call signaling) and hold locks for <5ms with no I/O inside.
### Tier 3: Forward Mode Pipeline (Niche)
| Lock | Location | Notes |
|------|----------|-------|
| `RelayPipeline` | main.rs:198, 228 | Only used in `--remote` forward mode (relay-to-relay), not SFU room mode |
**Impact:** None for normal operation. Forward mode is a niche deployment.
---
## Suggested Next Refactors (Priority Order)
### 1. Federation `peer_links` Clone-Before-Send
**Effort:** 30 minutes
**Impact:** Eliminates the lock-held-during-iteration pattern in `forward_to_peers()` and `broadcast_signal()`
**Current:**
```rust
pub async fn forward_to_peers(&self, ...) {
let links = self.peer_links.lock().await; // held for entire loop
for (_fp, link) in links.iter() {
link.transport.send_raw_datagram(&tagged); // sync, but lock still held
}
}
```
**Fix:**
```rust
pub async fn forward_to_peers(&self, ...) {
let peers: Vec<(String, Arc<QuinnTransport>)> = {
let links = self.peer_links.lock().await;
links.values().map(|l| (l.label.clone(), l.transport.clone())).collect()
}; // lock released — hold time: ~1μs for Arc clones
for (label, transport) in &peers {
transport.send_raw_datagram(&tagged); // no lock held
}
}
```
Same treatment for `broadcast_signal()` (line 216) which currently holds the lock across **async** `send_signal()` calls — this is the worst offender since a slow peer blocks all signal delivery.
### 2. Federation `peer_links` → DashMap
**Effort:** 2 hours
**Impact:** Per-peer sharding, eliminates all cross-peer contention
Only worth doing if:
- Running 10+ federation peers
- `forward_to_peers()` shows up in profiling
- The clone-before-send fix from suggestion 1 is insufficient
```rust
peer_links: DashMap<String, PeerLink>
```
Most lock sites become `self.peer_links.get(&fp)` or `.get_mut(&fp)`. The multi-hop forward loop would use `.iter()` which takes temporary shared locks per shard.
### 3. Quality Tracking Out of Hot Path
**Effort:** 1 day
**Impact:** Reduces per-packet DashMap shard lock from exclusive (`get_mut`) to shared (`get`)
Currently, every packet with a `QualityReport` calls `observe_quality()` which uses `rooms.get_mut()` (exclusive shard lock). This serializes quality-carrying packets within the same DashMap shard.
**Fix:** Use per-participant `AtomicU8` for latest loss/RTT (written lock-free from hot path). A background task (every 1s) reads the atomics, computes tiers via `rooms.get_mut()`, and broadcasts `QualityDirective`. The per-packet hot path becomes purely read-only: `rooms.get()``others()`.
```rust
struct ParticipantQualityAtomic {
latest_loss: AtomicU8, // written per-packet (lock-free)
latest_rtt: AtomicU8, // written per-packet (lock-free)
}
// Hot path (per-packet):
if let Some(ref qr) = pkt.quality_report {
participant_quality.latest_loss.store(qr.loss_pct, Ordering::Relaxed);
participant_quality.latest_rtt.store(qr.rtt_4ms, Ordering::Relaxed);
}
let others = room_mgr.others(&room_name, participant_id); // DashMap::get() — shared lock
// Background task (every 1 second):
for room in room_mgr.rooms.iter_mut() { // DashMap::iter_mut() — exclusive per-shard
room.recompute_tiers_from_atomics();
if tier_changed { broadcast QualityDirective }
}
```
### 4. Lock-Free Participant Snapshot (Future)
**Effort:** 0.5 day
**Impact:** Zero-lock media hot path
Replace `Vec<Participant>` in `Room` with an `arc-swap` snapshot:
```rust
struct Room {
participants: Vec<Participant>,
sender_snapshot: arc_swap::ArcSwap<Vec<ParticipantSender>>,
}
```
The snapshot is rebuilt on join/leave (rare). The hot path does `sender_snapshot.load()` — an atomic pointer read with zero locking. DashMap wouldn't even be involved in the per-packet path.
Only worth doing if DashMap shard contention becomes measurable in profiling (unlikely for rooms <100 people).
---
## Decision Matrix
| Scenario | Current (DashMap) | + Clone-Before-Send | + Quality Atomics | + arc-swap |
|----------|-------------------|---------------------|-------------------|-----------|
| 10 rooms × 5 people | Saturates all cores | Same | Same | Same |
| 1 room × 100 people | Good (shared read) | Same | Better (no exclusive) | Best |
| 5 federation peers | 1-5ms contention | <1μs contention | Same | Same |
| 20 federation peers | 10-20ms contention | <1μs contention | Same | Same |
| 1000 rooms × 3 people | Excellent | Same | Same | Same |
**Recommendation:** Do suggestion 1 (clone-before-send, 30 min) now. Everything else is future optimization that current workloads don't need.
---
## Concurrency Diagram (Current State)
```
┌─────────────────────────────────┐
│ tokio multi-threaded │
│ work-stealing runtime │
└───────────────┬─────────────────┘
┌────────────────────────────┼────────────────────────────┐
│ │ │
┌──────▼──────┐ ┌───────▼───────┐ ┌───────▼───────┐
│ QUIC Accept │ │ Federation │ │ Signal Hub │
│ (per-conn │ │ (per-peer │ │ (per-client │
│ task) │ │ task) │ │ task) │
└──────┬──────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
┌──────▼──────┐ ┌───────▼───────┐ ┌───────▼───────┐
│ Per-Room │ │ peer_links │ │ signal_hub │
│ DashMap │◄──64 shards│ Mutex │◄──1 lock │ Mutex │
│ (media hot │ │ (federation │ │ (signal │
│ path) │ │ hot path) │ │ plane) │
└─────────────┘ └───────────────┘ └───────────────┘
│ │
No cross-room Low frequency
blocking (<1 call/sec)
```
## Files Reference
| File | Lines | Role |
|------|-------|------|
| `crates/wzp-relay/src/room.rs` | ~1275 | DashMap room storage, participant management, quality tracking, media forwarding loops |
| `crates/wzp-relay/src/federation.rs` | ~1152 | Peer link management, federation media egress/ingress, signal forwarding |
| `crates/wzp-relay/src/main.rs` | ~1746 | Connection accept, handshake dispatch, signal handling, room/federation wiring |
| `crates/wzp-relay/src/ws.rs` | ~250 | WebSocket bridge, room integration |
| `crates/wzp-relay/src/metrics.rs` | ~200 | Prometheus counters (lock-free atomics) |
| `crates/wzp-relay/src/trunk.rs` | ~150 | TrunkBatcher (per-instance, no shared state) |

View File

@@ -167,6 +167,18 @@ if [ "\$DO_PULL" = "1" ]; then
git reset --hard "origin/\$BRANCH"
git submodule update --init || true
echo ">>> HEAD: \$(git rev-parse --short HEAD) — \$(git log -1 --format=%s)"
# Ensure signing keystores exist. They're gitignored so git reset/clean
# may delete them. Copy from the persistent cache if available, or warn.
KS_DIR="\$BASE_DIR/data/source/android/keystore"
KS_CACHE="\$BASE_DIR/data/keystore"
mkdir -p "\$KS_DIR"
if [ -d "\$KS_CACHE" ] && ls "\$KS_CACHE"/*.jks >/dev/null 2>&1; then
cp -n "\$KS_CACHE"/*.jks "\$KS_DIR/" 2>/dev/null || true
echo ">>> Keystores synced from cache"
elif ! ls "\$KS_DIR"/*.jks >/dev/null 2>&1; then
echo ">>> WARNING: no keystores in \$KS_DIR or \$KS_CACHE — APK will be unsigned!"
fi
fi
GIT_HASH=\$(cd "\$BASE_DIR/data/source" && git rev-parse --short HEAD 2>/dev/null || echo unknown)
@@ -195,6 +207,8 @@ fi
# ── Tauri Android APK ──────────────────────────────────────────────────
if [ "\$BUILD_ANDROID" = "1" ]; then
notify "WZP [\$SERVER_TAG] Tauri Android build STARTED [\$BRANCH @ \$GIT_HASH] — \$GIT_MSG"
echo ">>> Cleaning stale APKs from prior builds..."
find "\$BASE_DIR/data/source/desktop/src-tauri/gen/android" -name "*.apk" -type f -delete 2>/dev/null || true
echo ">>> Building Tauri Android APK..."
PROFILE_FLAG="--debug"
@@ -248,13 +262,57 @@ fi
echo ">>> cargo tauri android build \${PROFILE_FLAG} --target aarch64 --apk"
cargo tauri android build \${PROFILE_FLAG} --target aarch64 --apk
# ─── Sign the APK ────────────────────────────────────────────────
# Release builds from cargo-tauri are unsigned. Sign with the project
# keystore so the APK can be installed on real devices.
BUILT_APK=\$(find gen/android -name "*.apk" -type f 2>/dev/null | sort -t/ -k1 | tail -1)
if [ -n "\$BUILT_APK" ]; then
KS_RELEASE="/build/source/android/keystore/wzp-release.jks"
KS_DEBUG="/build/source/android/keystore/wzp-debug.jks"
if [ -f "\$KS_RELEASE" ]; then
KEYSTORE="\$KS_RELEASE"; KS_PASS="wzphone2024"; KS_ALIAS="wzp-release"
elif [ -f "\$KS_DEBUG" ]; then
KEYSTORE="\$KS_DEBUG"; KS_PASS="android"; KS_ALIAS="wzp-debug"
else
KEYSTORE=""
fi
if [ -n "\$KEYSTORE" ]; then
ZIPALIGN=\$(find "\$ANDROID_HOME" -name zipalign -type f 2>/dev/null | head -1)
APKSIGNER=\$(find "\$ANDROID_HOME" -name apksigner -type f 2>/dev/null | head -1)
if [ -n "\$ZIPALIGN" ] && [ -n "\$APKSIGNER" ]; then
echo ">>> Signing APK with \$(basename \$KEYSTORE)..."
ALIGNED="\${BUILT_APK%.apk}-aligned.apk"
"\$ZIPALIGN" -f 4 "\$BUILT_APK" "\$ALIGNED"
"\$APKSIGNER" sign \
--ks "\$KEYSTORE" \
--ks-pass "pass:\$KS_PASS" \
--ks-key-alias "\$KS_ALIAS" \
--key-pass "pass:\$KS_PASS" \
"\$ALIGNED"
mv "\$ALIGNED" "\$BUILT_APK"
echo ">>> Signed: \$(ls -lh \$BUILT_APK | awk "{print \\\$5}")"
else
echo ">>> WARNING: zipalign/apksigner not found — APK is unsigned"
fi
else
echo ">>> WARNING: no keystore found — APK is unsigned"
fi
fi
echo ">>> Build artifacts:"
find gen/android -name "*.apk" -exec ls -lh {} \; 2>/dev/null
echo "APK_BUILT"
'
echo ">>> Uploading APK..."
APK=\$(find "\$BASE_DIR/data/source/desktop/src-tauri/gen/android" -name "*.apk" -type f 2>/dev/null | head -1)
# Clean stale APKs from prior builds so find doesn't pick an old
# debug APK over the fresh release one (or vice versa).
find "\$BASE_DIR/data/source/desktop/src-tauri/gen/android" -name "*.apk" -type f \
! -newer "\$BASE_DIR/data/source/desktop/src-tauri/gen/android/app/build/outputs" \
-delete 2>/dev/null || true
# Prefer release APK if it exists, else fall back to debug.
APK=\$(find "\$BASE_DIR/data/source/desktop/src-tauri/gen/android" -name "*release*.apk" -type f 2>/dev/null | head -1)
[ -z "\$APK" ] && APK=\$(find "\$BASE_DIR/data/source/desktop/src-tauri/gen/android" -name "*.apk" -type f 2>/dev/null | head -1)
if [ -n "\$APK" ]; then
APK_SIZE=\$(du -h "\$APK" | cut -f1)
URL=\$(upload_file "\$APK")