Commit Graph

89 Commits

Author SHA1 Message Date
Claude
2d4b8eebd5 feat: RoomUpdate protocol — broadcast participant list on join/leave
- Add RoomUpdate signal message to wzp-proto with participant count + list
- Add RoomParticipant struct (fingerprint + optional alias)
- Store fingerprint/alias in relay Participant struct
- Broadcast RoomUpdate to all room members on join and leave
- Add signal recv task in Android engine to handle RoomUpdate
- Surface room_participant_count + room_participants in CallStats JSON
- Show "X in room" with participant names in Android in-call UI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:12:24 +00:00
Claude
a23d9f5e41 feat: foreground service, dB gain sliders, speaker routing, live network stats
- Wire CallService foreground service for background calls (microphone type)
- Add Voice Volume + Mic Gain sliders (-20 to +20 dB) applied in Kotlin
- Connect AudioRouteManager for real speaker toggle via AudioManager
- Feed quinn QUIC RTT into PathMonitor, display Loss/RTT/Jitter from live data
- Nuclear teardown between calls — recreate engine + audio pipeline each call
- Fix re-entrant teardown loop from CallService notification callback
- Park audio threads as daemons to avoid libcrypto TLS destructor crash on exit
- Remove duplicate wakelocks from Activity (service owns them now)
- Strip AEC + denoise from capture path, keep AGC only (incremental approach)
- Fix .so copy target: libwzp_android.so not libwzp.so

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 17:45:00 +00:00
Claude
b3e56ecbd8 feat: add AGC to capture + playout paths, add server UI, DNS resolve
- Wire AutoGainControl on both capture (mic → encode) and playout
  (decode → speaker) paths to normalize volume levels
- Add server list with add/remove custom server dialog
- Add IPv4/IPv6 preference toggle for DNS resolution
- Resolve DNS hostnames to IP in Kotlin before passing to Rust engine
- Revert to IP addresses for default servers (DNS still broken on QUIC)

AGC confirmed working — voice levels noticeably improved in testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:02:33 +00:00
Claude
2fa07286c3 feat: wakelock for background calls, server selector UI
- Partial wake lock + WiFi high-perf lock during calls — audio
  continues when screen is off / phone is locked
- Server selector: toggle between LAN (172.16.81.175) and Pangolin
  (pangolin.manko.yoga) before connecting
- Room name editable in idle screen

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:54:02 +00:00
Claude
bf91cf25bd feat: add real audio pipeline with Opus + RaptorQ FEC
- AudioPipeline: Kotlin AudioRecord/AudioTrack on JVM threads, PCM
  shuttled to Rust via lock-free ring buffers + JNI
- FEC: RaptorQ fountain codes on encode (5 frames/block, 20% repair
  ratio for GOOD profile), decoder feeds repair symbols for recovery
- Real audio level meter from mic RMS (replaces fake animation)
- Room name editable in UI (default: "android")
- Relay changed to pangolin.manko.yoga:4433
- Stats overlay shows FEC recovered count
- CallState now synced from polled stats (fixes "Connecting" stuck bug)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:33:59 +00:00
Claude
81c756c076 chore: switch relay to 172.16.81.175:4433 for testing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:01:51 +00:00
Claude
af85a49e86 fix: eliminate all native thread creation — run everything single-threaded
pthread_create crashes on Android due to static bionic __init_tcb stubs
in the Rust std prebuilt rlibs. This is unfixable without rebuilding std.

Solution: run the entire call (QUIC connect, handshake, media send/recv)
on a single tokio current_thread runtime. The JNI startCall() now blocks,
so Kotlin dispatches it to Dispatchers.IO (JVM thread, not pthread).

Audio pipeline temporarily simplified to silence frames — will restore
once threading is solved (either via Java Thread or rebuilding std).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:52:28 +00:00
Claude
bae03365da fix: restore getauxval_fix.c + current_thread tokio — both needed
The getauxval override (dlsym wrapper) fixes SIGSEGV in
init_have_lse_atomics at library load time. The current_thread
tokio runtime avoids SEGV_ACCERR in pthread_create/__init_tcb.
Both fixes are required together.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:37:57 +00:00
Claude
9d9ce4706d fix: use current_thread tokio runtime — avoid pthread_create SEGV on Android
Multi-thread tokio runtime crashes with SEGV_ACCERR in __init_tcb
during pthread_create on Android (static bionic stubs from CRT).
Switch to current_thread runtime which runs network I/O on the
calling thread without spawning additional OS threads.

Also: clean up build.rs — use only libc++_shared.so (dynamic),
remove getauxval_fix.c hack, remove static c++/c++abi linking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:27:46 +00:00
Claude
9098e28a1f fix: SIGSEGV in getauxval — override broken CRT stub with dlsym wrapper
compiler-rt's init_have_lse_atomics calls getauxval(AT_HWCAP) at
library load time. The static getauxval from the CRT reads from
__libc_auxv which is NULL in shared libraries → SIGSEGV at 0x0.

Fix: compile getauxval_fix.c that provides a getauxval() which uses
dlsym(RTLD_DEFAULT) to find the real bionic getauxval at runtime.
Also switch to libc++_shared.so (bundled in APK) to avoid pulling
in static libc stubs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 08:39:57 +00:00
Claude
f6d51fce61 fix: target API 26 in ELF — pthread_atfork blocked by bionic at API 21
The .note.android.ident ELF section had API level 0x15 (21), causing
Android's bionic linker to block pthread_atfork (used by rand crate).
Fix: pass -P 26 to cargo-ndk and set linker to android26-clang.
Verified: ELF now shows 0x1a (26).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 06:05:44 +00:00
Claude
a8dd0c2f57 fix: also link libc++abi for RTTI — resolve missing __class_type_info vtable
- Compile all 62 Oboe source files (was headers-only, missing symbols)
- Link libc++_static + libc++abi with NDK sysroot search path
- Bump linker target from android21 to android26 (fixes pthread_atfork)
- Link liblog + libOpenSLES for Oboe runtime deps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 05:48:49 +00:00
Claude
64566e9acb fix: logcat-server.py SyntaxError — global declaration after use
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 05:12:28 +00:00
Claude
10eb19cd24 feat: add logcat HTTP server for remote crash debugging
Simple Python script that captures adb logcat and serves it over HTTP.
Run on laptop, read from anywhere via curl/browser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 05:11:19 +00:00
Claude
778f4dd428 fix: link libc++ statically — crash on launch due to missing libc++_shared.so
- Set cpp_link_stdlib(None) to suppress cc crate's automatic linking
- Explicitly link both c++_static and c++abi with NDK sysroot search path
- Fixes RTTI vtable symbol (_ZTVN10__cxxabiv117__class_type_infoE) error
- Verified: only liblog.so remains as dynamic dependency

Closes #001

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 05:07:25 +00:00
Siavash Sameni
622fdee51f fix: also link libc++abi for RTTI — resolve missing __class_type_info vtable
Previous fix linked c++_static but not c++abi. Android NDK splits the
static C++ runtime into two archives: libc++_static.a (STL) and
libc++abi.a (RTTI/exceptions). Without c++abi, dlopen fails on
_ZTVN10__cxxabiv117__class_type_infoE.

Now using cpp_link_stdlib(None) to suppress cc crate auto-linking, then
explicitly linking both c++_static and c++abi via cargo:rustc-link-lib.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:00:14 +04:00
Claude
b204213a01 build: rebuild APK with static libc++ linking (fixes #001)
libc++_shared.so is no longer a runtime dependency — verified
via llvm-readelf. Only system libs (libdl, liblog, libm, libc) remain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 04:56:35 +00:00
Siavash Sameni
e751af7e38 fix: link libc++ statically — crash on launch due to missing libc++_shared.so
The app crashed immediately when loading libwzp_android.so because the
cc crate's default dynamic linking produced a runtime dependency on
libc++_shared.so, which was never packaged into the APK.

Adding .cpp_link_stdlib(Some("c++_static")) to build.rs bakes the C++
runtime into libwzp_android.so directly, eliminating the missing .so.

See issues/001-libc++-shared-crash.md for full diagnosis and logcat trace.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 08:52:55 +04:00
Claude
8d5f6fe044 feat: wire QUIC transport, JNI bridge, connect UI + add docs
- Replace raw FFI with proper `jni` crate for string marshalling
- Wire QUIC transport in engine: connect to relay, crypto handshake
  (CallOffer/CallAnswer, X25519+Ed25519), send/recv MediaPackets
- Feed received packets into jitter buffer (was previously ignored)
- Add connect screen UI with CALL button (idle state) and in-call
  controls (mute, speaker, hang up, live stats)
- Hardcode relay 172.16.81.125:4433, room "android"
- Add comprehensive docs in docs/android/:
  architecture.md (8 mermaid diagrams), build-guide.md,
  debugging.md, maintenance.md, roadmap.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 04:43:49 +00:00
Claude
780309fede fix: crash on launch — don't auto-start call, handle null JNI strings, remove stdout tracing
- CallActivity no longer auto-starts a call on launch
- CallViewModel lazily inits engine only when startCall() is called
- nativeGetStats nullable return handled safely in Kotlin
- Removed tracing_subscriber::fmt() which panics on Android (no stdout)
- All JNI calls wrapped in try/catch on Kotlin side

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 02:04:23 +00:00
Claude
73ebcdd869 build: Android APK builds working — debug (8.9MB) and release (2.0MB)
- Fix C++ std::std:: double namespace in oboe_bridge.cpp
- Auto-fetch Oboe headers from GitHub in build.rs
- Configure cargo cross-compilation (.cargo/config.toml) with NDK linkers
- Fix Gradle settings (dependencyResolutionManagement), signing configs,
  Compose LinearProgressIndicator API, and Android manifest theme
- Add Gradle wrapper, .gitignore for build artifacts
- arm64-v8a only (raptorq crate incompatible with armv7 32-bit)
- Release APK: 2.0MB signed with wzp-release key
- Debug APK: 8.9MB signed with wzp-debug key

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 19:37:08 +00:00
Claude
e7b1c3372a feat: Android VoIP client — Phase 2 (JNI bridge, Compose UI, AEC pipeline wiring)
- JNI bridge with 8 extern functions (init, startCall, stopCall, setMute,
  setSpeaker, getStats, forceProfile, destroy) with panic catching
- Kotlin engine layer: WzpEngine JNI wrapper, WzpCallback interface,
  CallStats data class with JSON deserialization
- Jetpack Compose UI: InCallScreen with quality indicator (green/yellow/red),
  mute/speaker/hangup buttons, stats overlay, duration timer
- CallActivity with RECORD_AUDIO permission handling, Material3 theme
- CallService foreground service with WakeLock, WiFi lock, notification
- AudioRouteManager for speaker/earpiece/Bluetooth SCO switching
- AEC wired into CallEncoder pipeline: AEC → AGC → denoise → silence → encode
- AEC farend reference fed from decode path to encode path in pipeline
- Engine exposes set_aec_enabled/set_agc_enabled via AtomicBool flags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:16:38 +00:00
Claude
26e9c55f1f feat: Android VoIP client — Phase 1 (audio quality, network adaptation, crate skeleton)
- New wzp-android crate with Oboe C++ backend, lock-free SPSC ring buffers,
  engine orchestrator, codec pipeline, and Android Gradle project structure
- AEC (NLMS adaptive filter), AGC (two-stage with fast attack/slow release),
  windowed-sinc FIR resampler replacing linear interpolation (wzp-codec)
- Opus encoder tuning: complexity 7 default, set_expected_loss support
- Mobile jitter buffer: asymmetric EMA (fast up/slow down), handoff spike
  detection with 2s cooldown, configurable safety margin
- Network-aware quality control: cellular-specific thresholds, faster
  downgrade on cellular, proactive tier drop on WiFi→cellular handoff,
  FEC ratio boost during network transitions
- Handoff detection in PathMonitor via RTT jitter spike analysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 18:07:55 +00:00
Siavash Sameni
aa09275015 feat: WebSocket support in relay — browsers connect directly, no bridge
Implements WS_RELAY_SPEC.md: relay handles both QUIC and WebSocket clients
in shared rooms, eliminating the wzp-web bridge server.

Room abstraction (room.rs):
- New ParticipantSender enum: Quic(transport) | WebSocket(mpsc::Sender)
- send_raw() sends PCM bytes to either transport type
- join_ws() convenience method for WS clients
- Forwarding loops handle mixed QUIC+WS rooms:
  QUIC→QUIC: send_media (trunked if enabled)
  QUIC→WS: send_raw payload bytes
  WS→QUIC: send_raw wraps in MediaPacket
  WS→WS: send_raw binary

WebSocket handler (ws.rs):
- GET /ws/{room} → WebSocket upgrade via axum
- Auth: first msg {"type":"auth","token":"..."} → validates against FC
- mpsc channel bridges room fan-out to WS binary frames
- Session + presence lifecycle matches QUIC path
- Optional static file serving via --static-dir (tower-http ServeDir)

Config: --ws-port 8080, --static-dir ./static
Proto: MediaHeader::default_pcm() for WS→QUIC wrapping

63 relay + 54 proto tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:38:33 +04:00
Siavash Sameni
59bf3f6587 docs: WS relay spec — add WebSocket listener to eliminate wzp-web bridge
Detailed implementation plan for adding WS support directly to wzp-relay:
- Abstract Participant over transport type (Quic + WebSocket enum)
- New --ws-port flag for browser connections
- Cross-transport fan-out (QUIC↔WS in same rooms)
- Auth, room management, session cleanup unchanged
- Eliminates wzp-web container entirely

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:27:52 +04:00
Siavash Sameni
4fb15fe7a3 feat: P3-T3 bandwidth estimation — GCC-style congestion control
BandwidthEstimator tracks available bandwidth using dual signals:

Delay-based: EMA of RTT vs baseline minimum. If RTT > 1.5x baseline
→ Overuse (congestion). If RTT < 1.1x baseline → Underuse (headroom).
Baseline slowly drifts up to handle route changes.

Loss-based: sliding window of 10 loss samples. Average > 5% → congested.

Rate adaptation (AIMD):
- Overuse OR loss congested: decrease 15% (multiplicative)
- Underuse AND no loss: increase 5% (additive)
- Normal: hold steady
- Clamped to [min_bw, max_bw]

recommended_profile() maps bandwidth to quality tier:
- >= 25 kbps → GOOD (Opus 24k + 20% FEC)
- >= 8 kbps → DEGRADED (Opus 6k + 50% FEC)
- < 8 kbps → CATASTROPHIC (Codec2 1200 + 100% FEC)

from_quality_report() integrates with existing QualityReport packets.

54 proto tests passing (12 new bandwidth tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:51:08 +04:00
Siavash Sameni
e595fe6591 feat: P3-T6 per-session forwarding — relay links for hop-by-hop media
RelayLink: QUIC connection to peer relay (SNI "_relay") for forwarding
specific sessions. Methods: connect, forward, add/remove_session, is_idle.

RelayLinkManager: manages connections to multiple peers.
- get_or_connect: lazy connection establishment
- forward_to: send media packet to specific peer
- register/unregister_session: track which sessions use which links
- Auto-closes idle links on session unregister

Protocol: added SignalMessage::SessionForward { session_id,
target_fingerprint, source_relay } and SessionForwardAck { session_id,
room_name } for relay-link session setup signaling.

Building block for P3-T7 (call setup over mesh) which wires
route resolution + relay links + handshake into a complete flow.

62 relay tests + 42 proto tests passing (7 new relay_link tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:45:36 +04:00
Siavash Sameni
326aa491cc feat: P3-T5 route resolution — find relay path to any fingerprint
RouteResolver queries PresenceRegistry to determine how to reach a target:
- Route::Local — connected to this relay
- Route::DirectPeer(addr) — on a directly connected peer relay
- Route::Chain(addrs) — multi-hop (structure ready, single-hop for now)
- Route::NotFound — not in any known relay

Protocol: added SignalMessage::RouteQuery { fingerprint, ttl } and
RouteResponse { fingerprint, found, relay_chain } for peer-to-peer
route queries over probe connections.

HTTP API: GET /route/:fingerprint returns JSON with route type + chain.

Relay handles incoming RouteQuery on probe connections: looks up locally,
replies with RouteResponse. TTL decremented for future multi-hop forwarding.

55 relay tests + 42 proto tests passing (7 new route tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:38:24 +04:00
Siavash Sameni
464e95a4bd feat: P3-T4 relay presence registry — gossip fingerprints across relay mesh
PresenceRegistry tracks who is connected where:
- register_local/unregister_local for directly connected users
- update_peer for fingerprints reported by peer relays
- lookup returns Local or Remote(addr)
- expire_stale removes entries older than timeout

Gossip via probe connections:
- New SignalMessage::PresenceUpdate { fingerprints, relay_addr }
- Probes send local fingerprints every 10s alongside Ping/Pong
- Receiving relay updates its remote presence table

HTTP API on metrics port:
- GET /presence — all known fingerprints + locations
- GET /presence/:fingerprint — single lookup
- GET /peers — peer relays + their connected users

Wired into relay main:
- Registry created at startup
- register_local after auth+handshake
- unregister_local on disconnect
- Passed to probe mesh and metrics server

Also marks FC-10 as DONE in integration tracker.

48 relay tests + 42 proto tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:36:55 +04:00
Siavash Sameni
fd95167705 chore: update featherChat submodule to v0.0.38 (feature/wzp-call-infrastructure)
featherChat now implements:
- FC-2: Call state management (calls.rs, CallState, sled persistence)
- FC-3: WS call signal routing (Offer→Ringing, Answer→Active, Hangup→Ended)
- FC-5: Group-to-room mapping (hash_room_name — same convention as WZP)
- FC-6: Presence API (online/devices per fingerprint, batch query)
- FC-7: Missed call notifications (sled storage, retrieval endpoint)

Only FC-10 (web bridge shared auth) remains on FC side.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:21:55 +04:00
Siavash Sameni
9e7fea7633 test: P2-T1-S5 long-session regression — 60s call with drift/loss assertions
3 tests in crates/wzp-client/tests/long_session.rs:

1. long_session_no_drift — 3000 frames (60s) through full encoder/decoder
   pipeline, asserts >95% decoded, 0 overruns, 0 underruns

2. long_session_with_simulated_loss — drops every 20th packet + reorders,
   asserts >90% decoded, confirms PLC fills gaps (2999/3000)

3. long_session_stats_consistency — verifies stats.total_decoded matches
   actual decoded count over 60s (no accounting drift)

Completes P2-T1-S5. Phase 2 is now fully done.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 20:59:27 +04:00
Siavash Sameni
993cf9ab7f docs: full system architecture with Mermaid diagrams + project README
ARCHITECTURE.md covers the entire system with 13 Mermaid diagrams:
- System overview (send/recv pipeline, relay SFU)
- Crate dependency graph (8 crates + featherChat)
- Wire formats (MediaHeader, MiniHeader, TrunkFrame, QualityReport, SignalMessage)
- Quality profiles with adaptive switching thresholds
- Cryptographic handshake sequence (X25519 + Ed25519)
- Identity model (BIP39 seed → HKDF → Ed25519/X25519 → Fingerprint)
- Relay modes (Room SFU, Forward, Probe)
- Web bridge architecture (Browser ↔ WS ↔ QUIC)
- FEC protection pipeline (RaptorQ + interleaving)
- Telemetry stack (Prometheus → Grafana)
- Session state machine
- Audio processing detail (denoise → VAD → encode → FEC → encrypt)
- Adaptive jitter buffer flow
- Deployment topology (multi-region)
- featherChat integration sequence

README.md: quick start, feature list, documentation index, build instructions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 16:41:39 +04:00
Siavash Sameni
6f4e8eb9f6 fix: URL-based room routing — /manwe serves index.html with room pre-filled
ServeDir now falls back to index.html for unknown paths (SPA routing).
https://host:port/manwe loads the page with room input pre-filled as "manwe".
JS getRoom() already reads the path, now the page actually loads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:51:47 +04:00
Siavash Sameni
634cd40fdc fix: web bridge low-latency config — disable silence suppression, reduce jitter buffer
PTT mode was causing delayed/lost audio because:
1. Silence suppression ate the start of speech after PTT release
2. Jitter buffer target depth was too high for interactive use

Web bridge now uses:
- suppression_enabled: false (PTT handles silence at browser level)
- jitter_target: 3 (60ms vs ~1s default)
- jitter_max: 20 (400ms cap)
- jitter_min: 1 (start playing after 20ms)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:31:23 +04:00
Siavash Sameni
6310864b0b fix: client sends Hangup before disconnect, relay handles timeouts gracefully
Client: sends SignalMessage::Hangup(Normal) before closing in all modes
(send-tone, file mode, silence mode) so the relay knows the session ended.

Relay: downgrades "timed out" / "reset" / "closed" recv errors from
ERROR to INFO since these are normal disconnect scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:15:47 +04:00
Siavash Sameni
4d2c9838c5 fix: eliminate all compiler warnings across client, relay, web
- Remove unused imports in featherchat.rs (tracing, QualityProfile)
- Remove unused comfort_noise field from CallEncoder (cn_level is used instead)
- Prefix unused _metrics_file in CliArgs
- Prefix unused _addr in Participant
- Remove unused RoomSlot struct and rooms field from web AppState
- Remove unused HashMap import from web main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:13:48 +04:00
Siavash Sameni
ab8a7f7a96 fix: client exits after --send-tone completes (was hanging on recv task)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:04:44 +04:00
Siavash Sameni
59268f0391 fix: add libssl-dev to Linux build deps (openssl-sys needs it)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:00:20 +04:00
Siavash Sameni
a833694568 refactor: build-linux.sh — persistent VM with --prepare/--build/--transfer steps
Replaces the single-shot ephemeral VM approach:
- --prepare: create VM, install deps (Rust, cmake, etc), upload source
- --build: build on VM with full output (iterate on errors)
- --transfer: download binaries to target/linux-x86_64/
- --destroy: delete VM when done
- --upload: re-upload source to existing VM
- --all: prepare + build + transfer (VM persists)

VM reuse: --prepare detects existing wzp-builder VM and just re-uploads.
All steps get VM IP from hcloud server list (last created).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:48:51 +04:00
Siavash Sameni
6d5ee55393 fix: install rustls crypto provider in wzp-client (same as relay/web)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:45:26 +04:00
Siavash Sameni
0dc381e948 feat: protocol improvements — live trunking, mini-frames, noise suppression, adaptive jitter
T6 wiring: Trunking in relay hot path
- TrunkedForwarder wraps transport with TrunkBatcher
- run_participant uses 5ms flush timer when trunking enabled
- send_trunk/recv_trunk on QuinnTransport
- --trunking flag on relay config
- 2 new tests: forwarder batches, auto-flush on full

T7 wiring: Mini-frames in encoder/decoder
- MediaPacket::encode_compact/decode_compact with MiniFrameContext
- CallEncoder sends mini-headers for consecutive frames (full every 50th)
- CallDecoder auto-detects full vs mini on receive
- mini_frames_enabled in CallConfig (default true)
- 3 new tests: encode/decode sequence, periodic full, disabled mode

Noise suppression (nnnoiseless/RNNoise)
- NoiseSupressor in wzp-codec: pure Rust ML-based noise removal
- Processes 960-sample frames as two 480-sample halves
- Integrated in CallEncoder before silence detection
- noise_suppression in CallConfig (default true)
- 4 new tests: creation, processing, SNR improvement, passthrough

T1-S4: Adaptive playout delay
- AdaptivePlayoutDelay: EMA-based jitter tracking (NetEq-inspired)
- Computes target_delay from observed inter-arrival jitter
- JitterBuffer::new_adaptive() uses adaptive delay
- adaptive_jitter in CallConfig (default true)
- 5 new tests: stable, jitter increase, recovery, clamping, estimate

272 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:24:53 +04:00
Siavash Sameni
34cd1017c1 feat: IAX2-inspired protocol improvements — trunking, mini-frames, silence suppression, call control (P2-T6/T7/T8/T9)
WZP-P2-T6: Trunking
- TrunkFrame/TrunkEntry: pack N session packets into one datagram
- Wire format: [count:u16][session_id:2][len:u16][payload]...
- TrunkBatcher: batches by count (10) or bytes (1200), flushes on limit
- 5 tests: encode/decode roundtrip, empty frame, batcher fill/flush, byte limit

WZP-P2-T7: Mini-frames
- MiniHeader: 4-byte delta header (timestamp_delta + payload_len)
- FRAME_TYPE_FULL (0x00) / FRAME_TYPE_MINI (0x01) discriminator
- MiniFrameContext: expands mini-headers to full by tracking baseline
- Saves 8 bytes per packet (5 vs 13 bytes with type prefix)
- 5 tests: encode/decode, wire size, context expand, no baseline, size comparison

WZP-P2-T8: Silence suppression
- SilenceDetector: RMS-based detection with hangover (5 frames = 100ms)
- ComfortNoise: low-level random noise generator
- CodecId::ComfortNoise variant for CN packets
- CallEncoder: suppresses silent frames, sends 1-byte CN every 200ms
- CallDecoder: generates comfort noise on CN packets
- ~50% bandwidth savings in typical conversations
- 6 tests: silence/speech detection, hangover, CN generation, RMS math, suppression

WZP-P2-T9: Call control signals
- SignalMessage: Hold, Unhold, Mute, Unmute, Transfer, TransferAck
- CallSignalType mapping in featherchat.rs for all new variants
- 4 serde roundtrip tests + signal type mapping tests

255 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:13:05 +04:00
Siavash Sameni
a64b79d953 feat: probe mesh mode + Grafana dashboard (T5-S6/S7) — completes T5
WZP-P2-T5-S6: Probe mesh mode
- ProbeMesh coordinator: wraps multiple ProbeRunners, spawns all concurrently
- mesh_summary(): scans registry, formats human-readable health table
- /mesh HTTP endpoint on metrics port alongside /metrics
- --probe-mesh flag, --mesh-status for CLI diagnostics
- Replaces individual probe spawn loop with ProbeMesh::run_all()
- 4 tests: mesh creation, empty/populated summary, zero targets

WZP-P2-T5-S7: Grafana dashboard
- docs/grafana-dashboard.json — importable directly into Grafana
- Row 1: Relay Health (sessions, rooms, packets/s, bytes/s, auth, handshake)
- Row 2: Call Quality (buffer depth, loss%, RTT, underruns per session)
- Row 3: Inter-Relay Mesh (RTT heatmap, loss, jitter, probe up/down)
- Row 4: Web Bridge (connections, frames bridged, auth failures, latency)
- Datasource variable ${DS_PROMETHEUS}, auto-refresh 10s
- Color thresholds: loss 2%/5%, RTT 100ms/300ms, probe up=green/down=red

T5 Telemetry & Observability is now COMPLETE (all 7 subtasks).
235 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:18:50 +04:00
Siavash Sameni
216ebf4a25 feat: per-session metrics + inter-relay health probe (T5-S2/S5)
WZP-P2-T5-S2: Per-session Prometheus metrics
- 5 new per-session gauges/counters: buffer_depth, loss_pct, rtt_ms,
  underruns, overruns — all labeled by session_id
- update_session_quality() reads QualityReport from packet headers
- update_session_buffer() tracks jitter buffer state per session
- remove_session_metrics() cleans up labels on disconnect
- Delta-aware counter increments avoid double-counting
- 2 tests: session_quality_update, session_metrics_cleanup

WZP-P2-T5-S5: Inter-relay health probe
- New probe.rs: ProbeConfig, ProbeMetrics, SlidingWindow, ProbeRunner
- --probe <addr> flag (repeatable) spawns background probe per target
- Sends Ping/s over QUIC, receives Pong, computes RTT/loss/jitter
- SlidingWindow(60): tracks last 60 pings, loss = missed pongs,
  jitter = std deviation of RTT
- Prometheus gauges: wzp_probe_rtt_ms, loss_pct, jitter_ms, up
  with target label
- Probe connections use SNI "_probe" — relay responds with Pong loop,
  skipping auth/handshake
- Auto-reconnect with 5s backoff on disconnect
- 6 tests: metrics_register, rtt/loss/jitter calculation,
  window eviction, empty edge cases

231 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:09:52 +04:00
Siavash Sameni
39f6908478 feat: Prometheus metrics on relay + web bridge, client JSONL export (T5-S1/S3/S4)
WZP-P2-T5-S1: Relay Prometheus /metrics
- RelayMetrics: active_sessions, active_rooms, packets/bytes_forwarded,
  auth_attempts (ok/fail), handshake_duration histogram
- --metrics-port flag spawns HTTP server
- Wired into auth, handshake, session, and packet forwarding paths
- 2 tests

WZP-P2-T5-S3: Web bridge Prometheus /metrics
- WebMetrics: active_connections, frames_bridged (up/down),
  auth_failures, handshake_latency histogram
- Added /metrics route to existing axum app
- Wired into WS connect/disconnect, auth, handshake, send/recv loops
- 2 tests

WZP-P2-T5-S4: Client --metrics-file JSONL
- ClientMetricsSnapshot with all telemetry fields
- MetricsWriter: writes one JSON line per second to file
- snapshot_from_stats() converts JitterStats to snapshot
- --metrics-file <path> flag
- 3 tests

223 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 12:44:57 +04:00
Siavash Sameni
3f813cd510 docs: telemetry & observability design — Prometheus, probes, Grafana
WZP-P2-T5 task breakdown with 7 subtasks:
- S1/S3: Prometheus /metrics on relay and web bridge
- S2: Per-session jitter/loss/RTT metrics
- S4: Client --metrics-file JSONL export
- S5/S6: Inter-relay health probes + mesh mode
- S7: Pre-built Grafana dashboard

Key design: multiplexed test lines between relays (~50 bytes/s)
provide continuous RTT/loss/jitter without meaningful BW cost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:29:17 +04:00
Siavash Sameni
59a00d371b feat: jitter buffer instrumentation — drift test, telemetry, parameter sweep
WZP-P2-T1-S1: Automated drift measurement
- New drift_test.rs: DriftTestConfig, DriftResult, run_drift_test()
- CLI --drift-test <secs>: sends tone, measures actual vs expected duration
- Interpretation tiers: EXCELLENT (<50ms) / GOOD / FAIR / POOR
- 2 unit tests: drift math verification, config defaults

WZP-P2-T1-S2: Jitter buffer telemetry
- JitterStats gains: total_decoded, underruns, overruns, max_depth_seen
- JitterBuffer: record_underrun(), record_decode(), reset_stats()
- CallDecoder: stats() getter, reset_stats()
- JitterTelemetry: periodic tracing::info! logger with configurable interval
- 4 unit tests: ingestion tracking, underrun tracking, reset, interval

WZP-P2-T1-S3: Parameter sweep
- New sweep.rs: SweepConfig, SweepResult, run_local_sweep()
- Tests 20 jitter buffer configs (5 target × 4 max depths) locally
- CLI --sweep: runs sweep, prints ASCII comparison table
- No network needed — pure encoder→decoder pipeline test
- 3 unit tests: config defaults, local sweep runs, table formatting

216 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:26:40 +04:00
Siavash Sameni
524d1145bb feat: complete WZP Phase 2 (T2/T3/T4) — adaptive quality, AudioWorklet, sessions
WZP-P2-T2: Adaptive quality switching
- QualityAdapter with sliding window of QualityReports
- Hysteresis: 3 consecutive reports before switching profiles
- Thresholds: loss>15%/rtt>200ms→CATASTROPHIC, loss>5%/rtt>100ms→DEGRADED
- CallConfig::from_profile() constructor
- 5 unit tests: good/degraded/catastrophic conditions, hysteresis, recovery

WZP-P2-T3: AudioWorklet migration (web bridge)
- audio-processor.js: WZPCaptureProcessor + WZPPlaybackProcessor
- Capture: buffers 128-sample AudioWorklet blocks → 960-sample frames
- Playback: ring buffer, Int16→Float32 conversion in worklet
- ScriptProcessorNode fallback if AudioWorklet unavailable
- Existing UI preserved (connect, room, PTT)

WZP-P2-T4: Concurrent session management (relay)
- SessionManager tracks active sessions with HashMap
- Enforces max_sessions limit from RelayConfig
- create_session/remove_session lifecycle
- Wired into relay main: session created after auth+handshake,
  cleaned up after run_participant returns
- 7 unit tests: create/remove, max enforced, room tracking, info, expiry

207 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:20:51 +04:00
Siavash Sameni
bf56d84ef0 test: 17 new tests for S-4/5/6/7/9 integration tasks
S-4 Room hashing + ACL (8 tests in featherchat_compat.rs):
- hash_room_name: deterministic, 32 hex chars, different inputs differ
- hash_room_name_matches_fc_convention: manual SHA-256 verification
- room_acl: open mode, enforced mode, allow-listed, deny-unlisted

S-5 Handshake integration (4 tests in handshake_integration.rs):
- handshake_succeeds: real QUIC, encrypt/decrypt cross-verified
- handshake_verifies_identity: different seeds, session still works
- auth_then_handshake: AuthToken + CallOffer/Answer in sequence
- handshake_rejects_bad_signature: tampered sig → error

S-6/7/9 Web+Proto+TLS (5 tests in featherchat_compat.rs):
- auth_response_with_eth_address: FC's extra field handled
- wzp_proto_has_auth_token_variant: serialize/deserialize roundtrip
- all_fc_call_signal_types_representable: all 7 types verified
- hash_room_name_used_as_sni_is_valid: unicode/special chars → valid hex
- wzp_proto_cargo_toml_is_standalone: no workspace inheritance

196 total tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:09:34 +04:00
Siavash Sameni
59069bfba2 feat: complete all WZP-S integration tasks (S-4/5/6/7/9)
WZP-S-4: Room access control
- hash_room_name() in wzp-crypto: SHA-256("featherchat-group:"+name)[:16]
- CLI --room flag hashes before SNI, web bridge does the same
- RoomManager gains ACL: with_acl(), allow(), is_authorized()
- join() returns Result, rejects unauthorized fingerprints

WZP-S-5: Crypto handshake wired into all live paths
- CLI: perform_handshake() after connect, before any mode
- Relay: accept_handshake() after auth, before room join
- Web bridge: perform_handshake() after auth, before audio
- Relay generates ephemeral identity at startup

WZP-S-6: Web bridge featherChat auth
- --auth-url flag: browsers send {"type":"auth","token":"..."} as first WS msg
- Validates against featherChat, passes token to relay
- --cert/--key flags for production TLS (replaces self-signed)

WZP-S-7: wzp-proto standalone
- Cargo.toml uses explicit versions (no workspace inheritance)
- FC can use as git dependency

WZP-S-9: All 6 hardcoded assumptions resolved
- Auth, hashed rooms, mandatory handshake, real TLS certs,
  profile negotiation, token validation

CLI also gains --room and --token flags.
179 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:59:05 +04:00