237 Commits

Author SHA1 Message Date
Siavash Sameni
80d5bd7628 fix: survive QUIC congestion — drop packets instead of killing send task
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m14s
send_datagram() returns Err(Blocked) when the QUIC congestion window
is full. This is transient — the window reopens once ACKs arrive.
Previously, all send paths treated this as fatal (break/return),
which killed the send task and cascaded via tokio::select! to kill
the entire call.

Now: log warning, drop the packet, continue. Brief audio glitch
(20-100ms) instead of complete call death. FEC on the receiver
side recovers most dropped packets.

Fixed in:
- CLI run_live send task (continue + error counter)
- CLI run_file_mode send paths (2 locations)
- Desktop engine send task

Also hardened recv tasks: transient errors (non-closed/reset)
are survived instead of causing exit.

Matches the fix applied to Android client (engine.rs).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:48:20 +04:00
Claude
20922455bd fix: send task crash on QUIC congestion + AEC toggle + debug reporter
Root cause: send_media() returns Err(Blocked) when QUIC congestion
window is full. The send task treated ANY send error as fatal (break),
killing the entire call. Now send errors drop the packet and continue.

Also hardened recv task to survive transient errors and added health
logging (recv gap tracking, periodic stats) to both send and recv.

Relay: added comprehensive debug logging — recv gaps, lock contention,
forward latency, send errors — all per-participant with 5s stats.

Other changes:
- AEC toggle in Settings (persisted, applied on next call)
- Debug report: records call audio (WAV), RMS histogram (CSV), logcat,
  stats. Emailed as zip via Android share intent after call ends.
- Replaced LinearProgressIndicator with Box (compose version compat)
- FileProvider for sharing debug zip attachments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 07:38:56 +00:00
Siavash Sameni
e468454464 feat: Tauri desktop GUI app with call engine
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m27s
- New desktop/ directory with Tauri v2 + Vite + TypeScript
- Rust backend: CallEngine wrapping wzp-client audio + transport
- Web frontend: connect screen, in-call screen with participants,
  mic/speaker mute, keyboard shortcuts (m/s/q)
- Dark theme UI, settings persistence via localStorage
- Platform-aware --os-aec: warns on Windows/Linux (not yet implemented)
- Workspace updated to include desktop/src-tauri

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:25:54 +04:00
Siavash Sameni
d1c96cd71f feat: macOS VoiceProcessingIO for hardware AEC + delay-compensated NLMS
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m33s
- Add --os-aec flag: uses Apple VoiceProcessingIO audio unit for
  hardware echo cancellation (same engine as FaceTime)
- New vpio feature + audio_vpio.rs: combined capture+playback via VPIO
- Improved software AEC: delay-compensated leaky NLMS with Geigel DTD
  (60ms tail, 40ms delay, configurable via --aec-delay)
- Add --aec-delay flag for tuning software AEC delay compensation
- Add dev-fast Cargo profile (opt-level 2 with incremental compilation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:10:10 +04:00
Siavash Sameni
1b00b5e2a4 feat: improved AEC, keyboard shortcuts, dedup participants, dev-fast profile
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m40s
AEC improvements:
- Reduce echo tail from 100ms to 30ms (3.3x faster, suited for laptops)
- Add double-talk detection: freeze adaptation when near-end speaks
- Add residual echo suppression
- Disable AEC by default in --android mode (macOS has built-in AEC)

CLI features:
- Keyboard shortcuts: m=mic mute, s=speaker mute, q=quit (raw terminal mode)
- Dedup participants in RoomUpdate display (same fingerprint+alias shown once)
- Add dev-fast profile (opt-level 2 with incremental compilation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:15:23 +04:00
Claude
e6564bab57 fix: mic mute crackling + add AEC/NoiseSuppressor + dedup room participants
Mic mute: the send loop now zeros the capture buffer when muted instead
of relying on write_audio() to skip writes. Previously stale ring data
and AGC amplification of near-silence caused crackling artifacts.

AEC: attach Android's hardware AcousticEchoCanceler to the AudioRecord
session. Also attach NoiseSuppressor when available. Both are released
on capture stop.

Room UI: deduplicate participants by fingerprint so ghost entries from
stale relay state don't show duplicate names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 06:06:35 +00:00
Siavash Sameni
cfb48df1ef feat: direct playout mode, AEC far-end, audio processing switches
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m28s
- Add --android/--direct-playout: bypass jitter buffer, decode on recv
  (matches Android engine architecture)
- Wire AEC far-end reference from decoded playout to encoder
- Add --no-aec, --no-agc, --no-fec, --no-silence, --no-denoise switches
- Fix BufferSize::Fixed(960) → Default for macOS CoreAudio compat
- Optimize wzp-codec, wzp-fec, audiopus, nnnoiseless in debug profile
- Add capture callback size diagnostic logging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:48:34 +04:00
Claude
aebf9156c0 fix: dedup participants in UI, wait for QUIC close ack before exiting
UI: deduplicate room participants by fingerprint so ghost entries from
stale relay state don't show duplicates.

Engine: after select! ends, call close_now() + connection.closed() with
500ms timeout to wait for the relay to acknowledge the CONNECTION_CLOSE.
Previously the close frame was queued but the runtime died before quinn
could retransmit if the first packet was lost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 05:40:06 +00:00
Claude
9bbaec6b35 fix: use shutdown_timeout so QUIC CONNECTION_CLOSE actually gets sent
shutdown_background() killed the tokio runtime before quinn could send the
CONNECTION_CLOSE frame on the wire, so the relay never knew the client left.
Now use shutdown_timeout(500ms) to give quinn time to flush the close frame,
matching the desktop client pattern (which uses 2s timeout).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 05:20:20 +00:00
Siavash Sameni
ba29d8354f fix: send alias via CallOffer handshake (match Android approach)
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m44s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:10:07 +04:00
Siavash Sameni
0908507a7a Merge remote-tracking branch 'origin/feat/android-voip-client' into feat/desktop-audio-rewrite 2026-04-06 09:04:55 +04:00
Siavash Sameni
860c90394d feat: rewrite desktop audio I/O with lock-free ring buffers
- Replace Mutex-based CPAL callbacks with atomic SPSC ring buffers
- Proper async send/recv loops (no block_on), 20ms playout tick
- Add signal task for RoomUpdate presence display
- Add --alias, --raw-room flags and key persistence (~/.wzp/identity)
- Add SetAlias signal variant and relay-side handling
- Graceful Ctrl+C shutdown with force-quit on second press

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:04:51 +04:00
Claude
a9c4260b4e fix: close QUIC connection on hangup so relay removes participant immediately
stop_call() now calls close_now() on the stored transport handle before
killing the tokio runtime. This sends a QUIC CONNECTION_CLOSE frame so
the relay's recv loop breaks immediately, triggering leave() + RoomUpdate
broadcast. Previously the runtime was killed first, so transport.close()
never ran and the relay kept stale participants until idle timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 04:58:24 +00:00
Claude
7eb136fcb3 fix: settings save button (back=discard), fix missing alias in featherchat tests
- Settings now uses draft state — changes only persist on explicit Save
- Back button discards unsaved changes
- Added applyServers() for batch server updates
- Added missing alias field to CallOffer in featherchat tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 04:30:23 +00:00
Claude
550a124972 fix: add missing alias arg to perform_handshake call in wzp-web
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 04:15:24 +00:00
Claude
0835c36d0f feat: settings page with persistence, client alias in handshake, fix null fingerprints
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m34s
- Add SettingsScreen with identity (alias, key backup/restore), audio defaults,
  server management, network prefs, and default room
- SettingsRepository persists all settings via SharedPreferences
- Auto-generate random display names on first launch (e.g. "Swift Wolf")
- Thread alias through CallOffer → relay handshake → RoomUpdate broadcast
- Derive caller fingerprint from identity key in relay handshake (fixes null
  fingerprints when --auth-url is not set)
- Persist identity seed for stable fingerprints across reconnects
- Add alias field to SignalMessage::CallOffer (serde default for backward compat)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 03:56:33 +00:00
Claude
8bf073aa80 fix: handle RoomUpdate variant in wzp-client signal type mapping
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m37s
Build Release Binaries / release (push) Has been skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:54:36 +00:00
Claude
2d4b8eebd5 feat: RoomUpdate protocol — broadcast participant list on join/leave
- Add RoomUpdate signal message to wzp-proto with participant count + list
- Add RoomParticipant struct (fingerprint + optional alias)
- Store fingerprint/alias in relay Participant struct
- Broadcast RoomUpdate to all room members on join and leave
- Add signal recv task in Android engine to handle RoomUpdate
- Surface room_participant_count + room_participants in CallStats JSON
- Show "X in room" with participant names in Android in-call UI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:12:24 +00:00
Claude
a23d9f5e41 feat: foreground service, dB gain sliders, speaker routing, live network stats
- Wire CallService foreground service for background calls (microphone type)
- Add Voice Volume + Mic Gain sliders (-20 to +20 dB) applied in Kotlin
- Connect AudioRouteManager for real speaker toggle via AudioManager
- Feed quinn QUIC RTT into PathMonitor, display Loss/RTT/Jitter from live data
- Nuclear teardown between calls — recreate engine + audio pipeline each call
- Fix re-entrant teardown loop from CallService notification callback
- Park audio threads as daemons to avoid libcrypto TLS destructor crash on exit
- Remove duplicate wakelocks from Activity (service owns them now)
- Strip AEC + denoise from capture path, keep AGC only (incremental approach)
- Fix .so copy target: libwzp_android.so not libwzp.so

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 17:45:00 +00:00
Claude
b3e56ecbd8 feat: add AGC to capture + playout paths, add server UI, DNS resolve
- Wire AutoGainControl on both capture (mic → encode) and playout
  (decode → speaker) paths to normalize volume levels
- Add server list with add/remove custom server dialog
- Add IPv4/IPv6 preference toggle for DNS resolution
- Resolve DNS hostnames to IP in Kotlin before passing to Rust engine
- Revert to IP addresses for default servers (DNS still broken on QUIC)

AGC confirmed working — voice levels noticeably improved in testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:02:33 +00:00
Claude
bf91cf25bd feat: add real audio pipeline with Opus + RaptorQ FEC
- AudioPipeline: Kotlin AudioRecord/AudioTrack on JVM threads, PCM
  shuttled to Rust via lock-free ring buffers + JNI
- FEC: RaptorQ fountain codes on encode (5 frames/block, 20% repair
  ratio for GOOD profile), decoder feeds repair symbols for recovery
- Real audio level meter from mic RMS (replaces fake animation)
- Room name editable in UI (default: "android")
- Relay changed to pangolin.manko.yoga:4433
- Stats overlay shows FEC recovered count
- CallState now synced from polled stats (fixes "Connecting" stuck bug)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:33:59 +00:00
Claude
af85a49e86 fix: eliminate all native thread creation — run everything single-threaded
pthread_create crashes on Android due to static bionic __init_tcb stubs
in the Rust std prebuilt rlibs. This is unfixable without rebuilding std.

Solution: run the entire call (QUIC connect, handshake, media send/recv)
on a single tokio current_thread runtime. The JNI startCall() now blocks,
so Kotlin dispatches it to Dispatchers.IO (JVM thread, not pthread).

Audio pipeline temporarily simplified to silence frames — will restore
once threading is solved (either via Java Thread or rebuilding std).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:52:28 +00:00
Claude
bae03365da fix: restore getauxval_fix.c + current_thread tokio — both needed
The getauxval override (dlsym wrapper) fixes SIGSEGV in
init_have_lse_atomics at library load time. The current_thread
tokio runtime avoids SEGV_ACCERR in pthread_create/__init_tcb.
Both fixes are required together.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:37:57 +00:00
Claude
9d9ce4706d fix: use current_thread tokio runtime — avoid pthread_create SEGV on Android
Multi-thread tokio runtime crashes with SEGV_ACCERR in __init_tcb
during pthread_create on Android (static bionic stubs from CRT).
Switch to current_thread runtime which runs network I/O on the
calling thread without spawning additional OS threads.

Also: clean up build.rs — use only libc++_shared.so (dynamic),
remove getauxval_fix.c hack, remove static c++/c++abi linking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:27:46 +00:00
Claude
9098e28a1f fix: SIGSEGV in getauxval — override broken CRT stub with dlsym wrapper
compiler-rt's init_have_lse_atomics calls getauxval(AT_HWCAP) at
library load time. The static getauxval from the CRT reads from
__libc_auxv which is NULL in shared libraries → SIGSEGV at 0x0.

Fix: compile getauxval_fix.c that provides a getauxval() which uses
dlsym(RTLD_DEFAULT) to find the real bionic getauxval at runtime.
Also switch to libc++_shared.so (bundled in APK) to avoid pulling
in static libc stubs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 08:39:57 +00:00
Claude
a8dd0c2f57 fix: also link libc++abi for RTTI — resolve missing __class_type_info vtable
- Compile all 62 Oboe source files (was headers-only, missing symbols)
- Link libc++_static + libc++abi with NDK sysroot search path
- Bump linker target from android21 to android26 (fixes pthread_atfork)
- Link liblog + libOpenSLES for Oboe runtime deps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 05:48:49 +00:00
Claude
778f4dd428 fix: link libc++ statically — crash on launch due to missing libc++_shared.so
- Set cpp_link_stdlib(None) to suppress cc crate's automatic linking
- Explicitly link both c++_static and c++abi with NDK sysroot search path
- Fixes RTTI vtable symbol (_ZTVN10__cxxabiv117__class_type_infoE) error
- Verified: only liblog.so remains as dynamic dependency

Closes #001

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 05:07:25 +00:00
Siavash Sameni
622fdee51f fix: also link libc++abi for RTTI — resolve missing __class_type_info vtable
Previous fix linked c++_static but not c++abi. Android NDK splits the
static C++ runtime into two archives: libc++_static.a (STL) and
libc++abi.a (RTTI/exceptions). Without c++abi, dlopen fails on
_ZTVN10__cxxabiv117__class_type_infoE.

Now using cpp_link_stdlib(None) to suppress cc crate auto-linking, then
explicitly linking both c++_static and c++abi via cargo:rustc-link-lib.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:00:14 +04:00
Siavash Sameni
e751af7e38 fix: link libc++ statically — crash on launch due to missing libc++_shared.so
The app crashed immediately when loading libwzp_android.so because the
cc crate's default dynamic linking produced a runtime dependency on
libc++_shared.so, which was never packaged into the APK.

Adding .cpp_link_stdlib(Some("c++_static")) to build.rs bakes the C++
runtime into libwzp_android.so directly, eliminating the missing .so.

See issues/001-libc++-shared-crash.md for full diagnosis and logcat trace.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 08:52:55 +04:00
Claude
8d5f6fe044 feat: wire QUIC transport, JNI bridge, connect UI + add docs
- Replace raw FFI with proper `jni` crate for string marshalling
- Wire QUIC transport in engine: connect to relay, crypto handshake
  (CallOffer/CallAnswer, X25519+Ed25519), send/recv MediaPackets
- Feed received packets into jitter buffer (was previously ignored)
- Add connect screen UI with CALL button (idle state) and in-call
  controls (mute, speaker, hang up, live stats)
- Hardcode relay 172.16.81.125:4433, room "android"
- Add comprehensive docs in docs/android/:
  architecture.md (8 mermaid diagrams), build-guide.md,
  debugging.md, maintenance.md, roadmap.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 04:43:49 +00:00
Claude
780309fede fix: crash on launch — don't auto-start call, handle null JNI strings, remove stdout tracing
- CallActivity no longer auto-starts a call on launch
- CallViewModel lazily inits engine only when startCall() is called
- nativeGetStats nullable return handled safely in Kotlin
- Removed tracing_subscriber::fmt() which panics on Android (no stdout)
- All JNI calls wrapped in try/catch on Kotlin side

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 02:04:23 +00:00
Claude
73ebcdd869 build: Android APK builds working — debug (8.9MB) and release (2.0MB)
- Fix C++ std::std:: double namespace in oboe_bridge.cpp
- Auto-fetch Oboe headers from GitHub in build.rs
- Configure cargo cross-compilation (.cargo/config.toml) with NDK linkers
- Fix Gradle settings (dependencyResolutionManagement), signing configs,
  Compose LinearProgressIndicator API, and Android manifest theme
- Add Gradle wrapper, .gitignore for build artifacts
- arm64-v8a only (raptorq crate incompatible with armv7 32-bit)
- Release APK: 2.0MB signed with wzp-release key
- Debug APK: 8.9MB signed with wzp-debug key

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 19:37:08 +00:00
Claude
e7b1c3372a feat: Android VoIP client — Phase 2 (JNI bridge, Compose UI, AEC pipeline wiring)
- JNI bridge with 8 extern functions (init, startCall, stopCall, setMute,
  setSpeaker, getStats, forceProfile, destroy) with panic catching
- Kotlin engine layer: WzpEngine JNI wrapper, WzpCallback interface,
  CallStats data class with JSON deserialization
- Jetpack Compose UI: InCallScreen with quality indicator (green/yellow/red),
  mute/speaker/hangup buttons, stats overlay, duration timer
- CallActivity with RECORD_AUDIO permission handling, Material3 theme
- CallService foreground service with WakeLock, WiFi lock, notification
- AudioRouteManager for speaker/earpiece/Bluetooth SCO switching
- AEC wired into CallEncoder pipeline: AEC → AGC → denoise → silence → encode
- AEC farend reference fed from decode path to encode path in pipeline
- Engine exposes set_aec_enabled/set_agc_enabled via AtomicBool flags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:16:38 +00:00
Claude
26e9c55f1f feat: Android VoIP client — Phase 1 (audio quality, network adaptation, crate skeleton)
- New wzp-android crate with Oboe C++ backend, lock-free SPSC ring buffers,
  engine orchestrator, codec pipeline, and Android Gradle project structure
- AEC (NLMS adaptive filter), AGC (two-stage with fast attack/slow release),
  windowed-sinc FIR resampler replacing linear interpolation (wzp-codec)
- Opus encoder tuning: complexity 7 default, set_expected_loss support
- Mobile jitter buffer: asymmetric EMA (fast up/slow down), handoff spike
  detection with 2s cooldown, configurable safety margin
- Network-aware quality control: cellular-specific thresholds, faster
  downgrade on cellular, proactive tier drop on WiFi→cellular handoff,
  FEC ratio boost during network transitions
- Handoff detection in PathMonitor via RTT jitter spike analysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:07:55 +00:00
Siavash Sameni
aa09275015 feat: WebSocket support in relay — browsers connect directly, no bridge
Implements WS_RELAY_SPEC.md: relay handles both QUIC and WebSocket clients
in shared rooms, eliminating the wzp-web bridge server.

Room abstraction (room.rs):
- New ParticipantSender enum: Quic(transport) | WebSocket(mpsc::Sender)
- send_raw() sends PCM bytes to either transport type
- join_ws() convenience method for WS clients
- Forwarding loops handle mixed QUIC+WS rooms:
  QUIC→QUIC: send_media (trunked if enabled)
  QUIC→WS: send_raw payload bytes
  WS→QUIC: send_raw wraps in MediaPacket
  WS→WS: send_raw binary

WebSocket handler (ws.rs):
- GET /ws/{room} → WebSocket upgrade via axum
- Auth: first msg {"type":"auth","token":"..."} → validates against FC
- mpsc channel bridges room fan-out to WS binary frames
- Session + presence lifecycle matches QUIC path
- Optional static file serving via --static-dir (tower-http ServeDir)

Config: --ws-port 8080, --static-dir ./static
Proto: MediaHeader::default_pcm() for WS→QUIC wrapping

63 relay + 54 proto tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:38:33 +04:00
Siavash Sameni
4fb15fe7a3 feat: P3-T3 bandwidth estimation — GCC-style congestion control
BandwidthEstimator tracks available bandwidth using dual signals:

Delay-based: EMA of RTT vs baseline minimum. If RTT > 1.5x baseline
→ Overuse (congestion). If RTT < 1.1x baseline → Underuse (headroom).
Baseline slowly drifts up to handle route changes.

Loss-based: sliding window of 10 loss samples. Average > 5% → congested.

Rate adaptation (AIMD):
- Overuse OR loss congested: decrease 15% (multiplicative)
- Underuse AND no loss: increase 5% (additive)
- Normal: hold steady
- Clamped to [min_bw, max_bw]

recommended_profile() maps bandwidth to quality tier:
- >= 25 kbps → GOOD (Opus 24k + 20% FEC)
- >= 8 kbps → DEGRADED (Opus 6k + 50% FEC)
- < 8 kbps → CATASTROPHIC (Codec2 1200 + 100% FEC)

from_quality_report() integrates with existing QualityReport packets.

54 proto tests passing (12 new bandwidth tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:51:08 +04:00
Siavash Sameni
e595fe6591 feat: P3-T6 per-session forwarding — relay links for hop-by-hop media
RelayLink: QUIC connection to peer relay (SNI "_relay") for forwarding
specific sessions. Methods: connect, forward, add/remove_session, is_idle.

RelayLinkManager: manages connections to multiple peers.
- get_or_connect: lazy connection establishment
- forward_to: send media packet to specific peer
- register/unregister_session: track which sessions use which links
- Auto-closes idle links on session unregister

Protocol: added SignalMessage::SessionForward { session_id,
target_fingerprint, source_relay } and SessionForwardAck { session_id,
room_name } for relay-link session setup signaling.

Building block for P3-T7 (call setup over mesh) which wires
route resolution + relay links + handshake into a complete flow.

62 relay tests + 42 proto tests passing (7 new relay_link tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:45:36 +04:00
Siavash Sameni
326aa491cc feat: P3-T5 route resolution — find relay path to any fingerprint
RouteResolver queries PresenceRegistry to determine how to reach a target:
- Route::Local — connected to this relay
- Route::DirectPeer(addr) — on a directly connected peer relay
- Route::Chain(addrs) — multi-hop (structure ready, single-hop for now)
- Route::NotFound — not in any known relay

Protocol: added SignalMessage::RouteQuery { fingerprint, ttl } and
RouteResponse { fingerprint, found, relay_chain } for peer-to-peer
route queries over probe connections.

HTTP API: GET /route/:fingerprint returns JSON with route type + chain.

Relay handles incoming RouteQuery on probe connections: looks up locally,
replies with RouteResponse. TTL decremented for future multi-hop forwarding.

55 relay tests + 42 proto tests passing (7 new route tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:38:24 +04:00
Siavash Sameni
464e95a4bd feat: P3-T4 relay presence registry — gossip fingerprints across relay mesh
PresenceRegistry tracks who is connected where:
- register_local/unregister_local for directly connected users
- update_peer for fingerprints reported by peer relays
- lookup returns Local or Remote(addr)
- expire_stale removes entries older than timeout

Gossip via probe connections:
- New SignalMessage::PresenceUpdate { fingerprints, relay_addr }
- Probes send local fingerprints every 10s alongside Ping/Pong
- Receiving relay updates its remote presence table

HTTP API on metrics port:
- GET /presence — all known fingerprints + locations
- GET /presence/:fingerprint — single lookup
- GET /peers — peer relays + their connected users

Wired into relay main:
- Registry created at startup
- register_local after auth+handshake
- unregister_local on disconnect
- Passed to probe mesh and metrics server

Also marks FC-10 as DONE in integration tracker.

48 relay tests + 42 proto tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:36:55 +04:00
Siavash Sameni
9e7fea7633 test: P2-T1-S5 long-session regression — 60s call with drift/loss assertions
3 tests in crates/wzp-client/tests/long_session.rs:

1. long_session_no_drift — 3000 frames (60s) through full encoder/decoder
   pipeline, asserts >95% decoded, 0 overruns, 0 underruns

2. long_session_with_simulated_loss — drops every 20th packet + reorders,
   asserts >90% decoded, confirms PLC fills gaps (2999/3000)

3. long_session_stats_consistency — verifies stats.total_decoded matches
   actual decoded count over 60s (no accounting drift)

Completes P2-T1-S5. Phase 2 is now fully done.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 20:59:27 +04:00
Siavash Sameni
6f4e8eb9f6 fix: URL-based room routing — /manwe serves index.html with room pre-filled
ServeDir now falls back to index.html for unknown paths (SPA routing).
https://host:port/manwe loads the page with room input pre-filled as "manwe".
JS getRoom() already reads the path, now the page actually loads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:51:47 +04:00
Siavash Sameni
634cd40fdc fix: web bridge low-latency config — disable silence suppression, reduce jitter buffer
PTT mode was causing delayed/lost audio because:
1. Silence suppression ate the start of speech after PTT release
2. Jitter buffer target depth was too high for interactive use

Web bridge now uses:
- suppression_enabled: false (PTT handles silence at browser level)
- jitter_target: 3 (60ms vs ~1s default)
- jitter_max: 20 (400ms cap)
- jitter_min: 1 (start playing after 20ms)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:31:23 +04:00
Siavash Sameni
6310864b0b fix: client sends Hangup before disconnect, relay handles timeouts gracefully
Client: sends SignalMessage::Hangup(Normal) before closing in all modes
(send-tone, file mode, silence mode) so the relay knows the session ended.

Relay: downgrades "timed out" / "reset" / "closed" recv errors from
ERROR to INFO since these are normal disconnect scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:15:47 +04:00
Siavash Sameni
4d2c9838c5 fix: eliminate all compiler warnings across client, relay, web
- Remove unused imports in featherchat.rs (tracing, QualityProfile)
- Remove unused comfort_noise field from CallEncoder (cn_level is used instead)
- Prefix unused _metrics_file in CliArgs
- Prefix unused _addr in Participant
- Remove unused RoomSlot struct and rooms field from web AppState
- Remove unused HashMap import from web main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:13:48 +04:00
Siavash Sameni
ab8a7f7a96 fix: client exits after --send-tone completes (was hanging on recv task)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:04:44 +04:00
Siavash Sameni
6d5ee55393 fix: install rustls crypto provider in wzp-client (same as relay/web)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:45:26 +04:00
Siavash Sameni
0dc381e948 feat: protocol improvements — live trunking, mini-frames, noise suppression, adaptive jitter
T6 wiring: Trunking in relay hot path
- TrunkedForwarder wraps transport with TrunkBatcher
- run_participant uses 5ms flush timer when trunking enabled
- send_trunk/recv_trunk on QuinnTransport
- --trunking flag on relay config
- 2 new tests: forwarder batches, auto-flush on full

T7 wiring: Mini-frames in encoder/decoder
- MediaPacket::encode_compact/decode_compact with MiniFrameContext
- CallEncoder sends mini-headers for consecutive frames (full every 50th)
- CallDecoder auto-detects full vs mini on receive
- mini_frames_enabled in CallConfig (default true)
- 3 new tests: encode/decode sequence, periodic full, disabled mode

Noise suppression (nnnoiseless/RNNoise)
- NoiseSupressor in wzp-codec: pure Rust ML-based noise removal
- Processes 960-sample frames as two 480-sample halves
- Integrated in CallEncoder before silence detection
- noise_suppression in CallConfig (default true)
- 4 new tests: creation, processing, SNR improvement, passthrough

T1-S4: Adaptive playout delay
- AdaptivePlayoutDelay: EMA-based jitter tracking (NetEq-inspired)
- Computes target_delay from observed inter-arrival jitter
- JitterBuffer::new_adaptive() uses adaptive delay
- adaptive_jitter in CallConfig (default true)
- 5 new tests: stable, jitter increase, recovery, clamping, estimate

272 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:24:53 +04:00
Siavash Sameni
34cd1017c1 feat: IAX2-inspired protocol improvements — trunking, mini-frames, silence suppression, call control (P2-T6/T7/T8/T9)
WZP-P2-T6: Trunking
- TrunkFrame/TrunkEntry: pack N session packets into one datagram
- Wire format: [count:u16][session_id:2][len:u16][payload]...
- TrunkBatcher: batches by count (10) or bytes (1200), flushes on limit
- 5 tests: encode/decode roundtrip, empty frame, batcher fill/flush, byte limit

WZP-P2-T7: Mini-frames
- MiniHeader: 4-byte delta header (timestamp_delta + payload_len)
- FRAME_TYPE_FULL (0x00) / FRAME_TYPE_MINI (0x01) discriminator
- MiniFrameContext: expands mini-headers to full by tracking baseline
- Saves 8 bytes per packet (5 vs 13 bytes with type prefix)
- 5 tests: encode/decode, wire size, context expand, no baseline, size comparison

WZP-P2-T8: Silence suppression
- SilenceDetector: RMS-based detection with hangover (5 frames = 100ms)
- ComfortNoise: low-level random noise generator
- CodecId::ComfortNoise variant for CN packets
- CallEncoder: suppresses silent frames, sends 1-byte CN every 200ms
- CallDecoder: generates comfort noise on CN packets
- ~50% bandwidth savings in typical conversations
- 6 tests: silence/speech detection, hangover, CN generation, RMS math, suppression

WZP-P2-T9: Call control signals
- SignalMessage: Hold, Unhold, Mute, Unmute, Transfer, TransferAck
- CallSignalType mapping in featherchat.rs for all new variants
- 4 serde roundtrip tests + signal type mapping tests

255 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:13:05 +04:00
Siavash Sameni
a64b79d953 feat: probe mesh mode + Grafana dashboard (T5-S6/S7) — completes T5
WZP-P2-T5-S6: Probe mesh mode
- ProbeMesh coordinator: wraps multiple ProbeRunners, spawns all concurrently
- mesh_summary(): scans registry, formats human-readable health table
- /mesh HTTP endpoint on metrics port alongside /metrics
- --probe-mesh flag, --mesh-status for CLI diagnostics
- Replaces individual probe spawn loop with ProbeMesh::run_all()
- 4 tests: mesh creation, empty/populated summary, zero targets

WZP-P2-T5-S7: Grafana dashboard
- docs/grafana-dashboard.json — importable directly into Grafana
- Row 1: Relay Health (sessions, rooms, packets/s, bytes/s, auth, handshake)
- Row 2: Call Quality (buffer depth, loss%, RTT, underruns per session)
- Row 3: Inter-Relay Mesh (RTT heatmap, loss, jitter, probe up/down)
- Row 4: Web Bridge (connections, frames bridged, auth failures, latency)
- Datasource variable ${DS_PROMETHEUS}, auto-refresh 10s
- Color thresholds: loss 2%/5%, RTT 100ms/300ms, probe up=green/down=red

T5 Telemetry & Observability is now COMPLETE (all 7 subtasks).
235 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:18:50 +04:00
Siavash Sameni
216ebf4a25 feat: per-session metrics + inter-relay health probe (T5-S2/S5)
WZP-P2-T5-S2: Per-session Prometheus metrics
- 5 new per-session gauges/counters: buffer_depth, loss_pct, rtt_ms,
  underruns, overruns — all labeled by session_id
- update_session_quality() reads QualityReport from packet headers
- update_session_buffer() tracks jitter buffer state per session
- remove_session_metrics() cleans up labels on disconnect
- Delta-aware counter increments avoid double-counting
- 2 tests: session_quality_update, session_metrics_cleanup

WZP-P2-T5-S5: Inter-relay health probe
- New probe.rs: ProbeConfig, ProbeMetrics, SlidingWindow, ProbeRunner
- --probe <addr> flag (repeatable) spawns background probe per target
- Sends Ping/s over QUIC, receives Pong, computes RTT/loss/jitter
- SlidingWindow(60): tracks last 60 pings, loss = missed pongs,
  jitter = std deviation of RTT
- Prometheus gauges: wzp_probe_rtt_ms, loss_pct, jitter_ms, up
  with target label
- Probe connections use SNI "_probe" — relay responds with Pong loop,
  skipping auth/handshake
- Auto-reconnect with 5s backoff on disconnect
- 6 tests: metrics_register, rtt/loss/jitter calculation,
  window eviction, empty edge cases

231 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:09:52 +04:00