Files
wz-phone/notes
Siavash Sameni 34cd1017c1 feat: IAX2-inspired protocol improvements — trunking, mini-frames, silence suppression, call control (P2-T6/T7/T8/T9)
WZP-P2-T6: Trunking
- TrunkFrame/TrunkEntry: pack N session packets into one datagram
- Wire format: [count:u16][session_id:2][len:u16][payload]...
- TrunkBatcher: batches by count (10) or bytes (1200), flushes on limit
- 5 tests: encode/decode roundtrip, empty frame, batcher fill/flush, byte limit

WZP-P2-T7: Mini-frames
- MiniHeader: 4-byte delta header (timestamp_delta + payload_len)
- FRAME_TYPE_FULL (0x00) / FRAME_TYPE_MINI (0x01) discriminator
- MiniFrameContext: expands mini-headers to full by tracking baseline
- Saves 8 bytes per packet (5 vs 13 bytes with type prefix)
- 5 tests: encode/decode, wire size, context expand, no baseline, size comparison

WZP-P2-T8: Silence suppression
- SilenceDetector: RMS-based detection with hangover (5 frames = 100ms)
- ComfortNoise: low-level random noise generator
- CodecId::ComfortNoise variant for CN packets
- CallEncoder: suppresses silent frames, sends 1-byte CN every 200ms
- CallDecoder: generates comfort noise on CN packets
- ~50% bandwidth savings in typical conversations
- 6 tests: silence/speech detection, hangover, CN generation, RMS math, suppression

WZP-P2-T9: Call control signals
- SignalMessage: Hold, Unhold, Mute, Unmute, Transfer, TransferAck
- CallSignalType mapping in featherchat.rs for all new variants
- 4 serde roundtrip tests + signal type mapping tests

255 tests passing across all crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:13:05 +04:00

144 lines
14 KiB
Plaintext

1. Add trunking (biggest win): Multiplex multiple sessions into a single QUIC datagram batch. A TrunkFrame could pack N mini-packets (session_id:2 + payload) into one datagram, sharing the QUIC overhead. This is your multiplexing idea
from the telemetry discussion — the probe test lines are already a step toward this.
2. Mini-frame format: For consecutive packets from the same session, use a 4-byte mini-header (just timestamp delta + payload length) instead of the full 12-byte MediaHeader. IAX2 does this and it cuts header overhead by 67%.
3. Comfort noise / silence suppression: IAX2 supports CN frames — don't send packets during silence, saving ~50% bandwidth in typical conversations. WZP always sends frames even during silence.
4. Jitter buffer from IAX2's design: IAX2 uses adaptive playout delay based on observed jitter, not fixed target depth. This is exactly what T1-S4 should implement — your sweep tool (S3) can guide the parameters.
5. Call control completeness: IAX2 has HOLD, TRANSFER, QUELCH (mute from server), UNQUELCH. WZP's SignalMessage only has Offer/Answer/Hangup/Ringing. Adding these would help featherChat integration.
Priority Recommendations
┌──────────────────────┬────────────────────────┬───────────────────────────────────────────────────┐
│ From IAX2 │ Maps to │ Impact │
├──────────────────────┼────────────────────────┼───────────────────────────────────────────────────┤
│ Trunking │ New P2 task │ High — reduces per-packet overhead at relay scale │
├──────────────────────┼────────────────────────┼───────────────────────────────────────────────────┤
│ Mini-frames │ Wire format change │ Medium — 67% header reduction │
├──────────────────────┼────────────────────────┼───────────────────────────────────────────────────┤
│ Silence suppression │ Codec layer │ Medium — 50% BW savings in typical calls │
├──────────────────────┼────────────────────────┼───────────────────────────────────────────────────┤
│ Adaptive playout │ P2-T1-S4 │ Medium — fixes the jitter/drift issue │
├──────────────────────┼────────────────────────┼───────────────────────────────────────────────────┤
│ Call control signals │ SignalMessage variants │ Low — UX feature for featherChat │
└──────────────────────┴────────────────────────┴───────────────────────────────────────────────────┘
The trunking gap is the most significant architectural difference. Everything else WZP is either on par or ahead (crypto, FEC, identity).
/btwanswer my question , imagine we are connected to Mesh one, our id is 1 , (some eth wallet address) , and i'm trying to call 2 (some other eth address) which is not connected to our relay, like i'm connected to relay 1 , relay one is
connected to relay 3, and relay 3 to relay 2, how do we achieve a connection ?
Great question — this is relay-to-relay routing, and we don't have it yet. Here's what exists vs what's needed:
What We Have
- Client → Relay (direct QUIC)
- Relay → Relay probing (health metrics only, no media forwarding)
- Room-based forwarding within a single relay (SFU)
- Forward mode (relay → one remote relay, hardcoded pair)
What's Missing: Multi-Hop Call Routing
You (wallet 0x1) Peer (wallet 0x2)
│ │
▼ ▼
Relay 1 ──── Relay 3 ──── Relay 2
This requires:
1. Relay Discovery / Routing Table
Each relay needs to know which other relays exist and which users are connected where. Options:
- Gossip protocol: relays share their connected fingerprints with neighbors
- Central registry: featherChat server knows "user 0x2 is on Relay 2"
- DHT: fingerprint → relay mapping distributed across the mesh (like Kademlia)
2. Signaling Path Resolution
When you call 0x2:
1. Your client sends CallOffer through featherChat WS (already works)
2. featherChat knows 0x2's relay (from FC-5 group-to-room mapping or a presence API)
3. CallAnswer comes back with the relay chain: [relay1, relay3, relay2]
3. Media Path Setup
Two approaches:
A. Relay chaining (simpler):
You → Relay1 → Relay3 → Relay2 → Peer
Each relay forwards packets to the next hop. We already have forward mode (--remote) for a single pair — extend it to be per-session with a routing table. Latency adds up but it works.
B. QUIC relay-assisted NAT traversal (better):
You → Relay1 → Relay3 → Relay2 → Peer (signaling)
You ←──────── direct QUIC ──────────→ Peer (media, if NAT allows)
Use relays for signaling + ICE candidate exchange, then attempt a direct connection. Fall back to relay chain if direct fails.
4. Suggested Task Breakdown
┌───────────┬──────────────────────────┬─────────────────────────────────────────────────────────────────────────┐
│ ID │ Task │ Description │
├───────────┼──────────────────────────┼─────────────────────────────────────────────────────────────────────────┤
│ WZP-P3-T4 │ Relay peer registry │ Each relay announces connected fingerprints to neighbors │
├───────────┼──────────────────────────┼─────────────────────────────────────────────────────────────────────────┤
│ WZP-P3-T5 │ Route resolution │ Given a target fingerprint, find the relay chain │
├───────────┼──────────────────────────┼─────────────────────────────────────────────────────────────────────────┤
│ WZP-P3-T6 │ Per-session forwarding │ Relay forwards media for a specific session to next-hop relay │
├───────────┼──────────────────────────┼─────────────────────────────────────────────────────────────────────────┤
│ WZP-P3-T7 │ Call setup over mesh │ Client → FC signaling → resolve route → setup relay chain → media flows │
├───────────┼──────────────────────────┼─────────────────────────────────────────────────────────────────────────┤
│ WZP-P3-T8 │ Direct path optimization │ After relay chain works, attempt ICE/STUN direct connection │
└───────────┴──────────────────────────┴─────────────────────────────────────────────────────────────────────────┘
5. The Practical Shortcut
Since featherChat already has presence (FC-6) and group membership (FC-5), the easiest first pass:
1. featherChat server knows which relay each user registered with (add relay_addr to the auth/connect flow)
2. CallOffer payload already carries relay_addr — the callee's client reads it and connects directly to Relay 1
3. Both clients connect to the same relay (caller's), joining the same hashed room
This avoids multi-hop entirely — the callee just connects to the caller's relay. Only fails if the callee can't reach that relay (firewall, geo), which is when you need the full mesh routing.
Want me to add these as P3 tasks and start with the shortcut approach?
│ Silence suppression VAD/DTX in Opus vs our Yes (comfort noise) vs iax (native)
2. Oalgorithms: Signal pioneered adaptive jitter buffers, dynamic bitrate switching, and bandwidth estimation tuned across millions of calls. WZP's QualityAdapter is a first pass — their algorithms use ML-based predictors.
4. Echo cancellation / noise suppression: Signal uses RNNoise (ML-based), WhatsApp has proprietary DSP. WZP doesn't have any — it relies on the platform's built-in AEC.
3. Group calls: WebRTC SFUs (used by Signal, WhatsApp) handle 32+ participants with simulcast (multiple quality layers). WZP's room mode is a basic SFU without simulcast or quality layering.
Yes, partially. Here's what you can realistically use:
RNNoise (echo/noise suppression) — This is the easiest win. RNNoise is BSD-licensed C code, and there's a Rust crate (nnnoiseless — a pure Rust port). You could add it as a pre-processing step in CallEncoder before Opus encoding. No
legal issues, no dependency on Signal.
Jitter buffer algorithms — Signal's adaptive jitter buffer code is in their open-source ringrtc repo (GPLv3). You can study the algorithm (NetEq-style adaptive playout) but can't copy code directly into WZP (MIT) due to GPL. However,
the concepts (ML-based delay prediction, talkspurt detection) are well-documented in academic papers and WebRTC's NetEq source (BSD-licensed). You could:
- Port ideas from WebRTC's NetEq (BSD) — this is what Signal built on anyway
- Use your T5 telemetry + sweep tool to train a simple predictor specific to WZP
Bandwidth estimation — WebRTC's GCC (Google Congestion Control) is BSD-licensed and well-documented. You could implement a simplified version using your existing QualityReport data. Signal's version is derivative of this.
Simulcast / group quality layers — The concept is straightforward (encode at 2-3 quality levels, relay picks per-receiver). You already have QualityProfile with GOOD/DEGRADED/CATASTROPHIC — extend the encoder to produce multiple
streams and the relay to select per-participant. No need to copy Signal code for this.
What you CAN'T directly use:
- Signal's proprietary ML models (not published)
- WhatsApp's DSP (closed source)
- GPL code in MIT project without relicensing
Practical priority:
1. nnnoiseless crate for noise suppression (drop-in, huge quality win)
2. WebRTC NetEq-style adaptive playout for T1-S4 (BSD, replaces your fixed jitter buffer)
3. GCC-style bandwidth estimation for P3-T3
4. Simulcast for group calls (design from scratch, concept is simple)
The biggest gap is noise suppression — adding nnnoiseless would be maybe 2-3 hours of work and immediately noticeable.