fix(relay): debug tap signal logging, dual_path test regression, PRD updates
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m39s
Mirror to GitHub / mirror (push) Failing after 28s

- Add log_signal() and log_event() to DebugTap for RoomUpdate,
  QualityDirective, join/leave lifecycle events (task #11)
- Fix dual_path.rs Phase 7 regression: add missing ipv6_endpoint arg
  to 3 race() call sites
- Update PRDs to reflect actual implementation status: mark adaptive
  quality, coordinated codec, P2P, network awareness, protocol analyzer
- Update PROGRESS.md with QualityDirective gap and dual_path regression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-04-13 09:54:52 +04:00
parent 1a7dd935ee
commit ea5fc17c34
9 changed files with 131 additions and 19 deletions

View File

@@ -118,6 +118,7 @@ async fn dual_path_direct_wins_on_loopback() {
"test-room".into(),
"call-test".into(),
None, // Phase 5: tests use fresh endpoints (no shared signal)
None, // Phase 7: no IPv6 endpoint in tests
)
.await
.expect("race must succeed");
@@ -160,6 +161,7 @@ async fn dual_path_relay_wins_when_direct_is_dead() {
"test-room".into(),
"call-test".into(),
None, // Phase 5: tests use fresh endpoints (no shared signal)
None, // Phase 7: no IPv6 endpoint in tests
)
.await
.expect("race must succeed via relay fallback");
@@ -198,6 +200,7 @@ async fn dual_path_errors_cleanly_when_both_paths_dead() {
"test-room".into(),
"call-test".into(),
None, // Phase 5: tests use fresh endpoints (no shared signal)
None, // Phase 7: no IPv6 endpoint in tests
)
.await;
let elapsed = start.elapsed();

View File

@@ -509,7 +509,7 @@ async fn main() -> anyhow::Result<()> {
}
}
if let Some(ref tap) = config.debug_tap {
info!(filter = %tap, "debug tap enabled — logging packet headers");
info!(filter = %tap, "debug tap enabled — logging packets, signals, join/leave events");
}
// Phase 4: cross-relay direct-call dispatcher task.
@@ -1663,6 +1663,15 @@ async fn main() -> anyhow::Result<()> {
} else { update }
} else { update };
if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_signal(&room_name, &merged_update);
tap.log_event(&room_name, "join", &format!(
"participant={id} addr={addr} alias={}",
caller_alias.as_deref().unwrap_or("?")
));
}
}
room::broadcast_signal(&senders, &merged_update).await;
id
}

View File

@@ -50,6 +50,52 @@ impl DebugTap {
"TAP"
);
}
pub fn log_signal(&self, room: &str, signal: &wzp_proto::SignalMessage) {
match signal {
wzp_proto::SignalMessage::RoomUpdate { count, participants } => {
let names: Vec<&str> = participants.iter()
.map(|p| p.alias.as_deref().unwrap_or("?"))
.collect();
info!(
target: "debug_tap",
room = %room,
signal = "RoomUpdate",
count,
participants = ?names,
"TAP SIGNAL"
);
}
wzp_proto::SignalMessage::QualityDirective { recommended_profile, reason } => {
info!(
target: "debug_tap",
room = %room,
signal = "QualityDirective",
codec = ?recommended_profile.codec,
reason = reason.as_deref().unwrap_or(""),
"TAP SIGNAL"
);
}
other => {
info!(
target: "debug_tap",
room = %room,
signal = ?std::mem::discriminant(other),
"TAP SIGNAL"
);
}
}
}
pub fn log_event(&self, room: &str, event: &str, detail: &str) {
info!(
target: "debug_tap",
room = %room,
event,
detail,
"TAP EVENT"
);
}
}
/// Tracks network quality for a single participant in a room.
@@ -663,6 +709,11 @@ async fn run_participant_plain(
// Broadcast quality directive to all participants if tier changed
if let Some((directive, all_senders)) = quality_directive {
if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_signal(&room_name, &directive);
}
}
broadcast_signal(&all_senders, &directive).await;
}
@@ -754,7 +805,21 @@ async fn run_participant_plain(
let mut mgr = room_mgr.lock().await;
if let Some((update, senders)) = mgr.leave(&room_name, participant_id) {
drop(mgr); // release lock before async broadcast
if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_event(&room_name, "leave", &format!(
"participant={participant_id} addr={addr} forwarded={packets_forwarded}"
));
tap.log_signal(&room_name, &update);
}
}
broadcast_signal(&senders, &update).await;
} else if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_event(&room_name, "leave", &format!(
"participant={participant_id} addr={addr} (room closed)"
));
}
}
}

View File

@@ -61,12 +61,16 @@ Catastrophic → Codec2 1.2k (minimum viable voice)
- Encoder can switch codec mid-stream
- Decoder already auto-detects incoming codec from packet headers
### What's missing
### What's been implemented since PRD was written
1. **QualityReport ingestion** — neither Android engine nor desktop engine reads quality reports from the relay
2. **Profile switch loop** — no periodic check that feeds reports to `QualityAdapter` and applies recommended switches
3. **Upward adaptation**`QualityAdapter` only classifies into 3 tiers (GOOD/DEGRADED/CATASTROPHIC). Needs extension to recommend studio tiers when conditions are excellent (loss < 1%, RTT < 50ms)
4. **Notification to UI** — when quality changes, the UI should show the current active codec
1. **QualityReport ingestion**~~neither Android engine nor desktop engine reads quality reports from the relay~~ **Done**: both Android (`crates/wzp-android/src/engine.rs`) and desktop (`desktop/src-tauri/src/engine.rs`) recv tasks ingest quality reports and feed `AdaptiveQualityController`
2. **Profile switch loop**~~no periodic check~~ **Done**: `pending_profile` AtomicU8 bridges recv→send task in both engines; send task applies profile switch at frame boundary
3. **Notification to UI**~~when quality changes, the UI should show the current active codec~~ **Done**: `tx_codec`/`rx_codec` in desktop `EngineStatus`; `currentCodec`/`peerCodec` in Android `CallStats`
### What's still missing
1. **Upward adaptation**`QualityAdapter` only classifies into 3 tiers (GOOD/DEGRADED/CATASTROPHIC). Needs extension to recommend studio tiers when conditions are excellent (loss < 1%, RTT < 50ms). See Phase 2 below.
2. **Relay QualityDirective handling** — relay broadcasts coordinated quality directives but neither engine processes them (signals are silently discarded). See PRD-coordinated-codec.md for details.
## Requirements

View File

@@ -197,18 +197,26 @@ Implementation strategy: build for P2P first (simpler, 2 parties), then wrap the
| 5 | P2P quality adaptation (direct observation) | 1 day |
| 6 | Per-participant asymmetric encoding (Option 2) | 1 day |
## Implementation Status (2026-04-12)
## Implementation Status (2026-04-13)
Phases 1-2 are now implemented:
Phases 1-2 are implemented. Phase 3 has a critical gap.
### What was built
- **`QualityDirective` signal** (`crates/wzp-proto/src/packet.rs`): New `SignalMessage` variant with `recommended_profile` and optional `reason`
- **`ParticipantQuality`** (`crates/wzp-relay/src/room.rs`): Per-participant quality tracking using `AdaptiveQualityController`, created on join, removed on leave
- **Weakest-link broadcast**: `observe_quality()` method computes room-wide worst tier, broadcasts `QualityDirective` to all participants when tier changes
- **Desktop engine handling** (`desktop/src-tauri/src/engine.rs`): `AdaptiveQualityController` in recv task, `pending_profile` AtomicU8 bridge to send task, auto-mode profile switching
- **Desktop engine handling** (`desktop/src-tauri/src/engine.rs`): `AdaptiveQualityController` in recv task, `pending_profile` AtomicU8 bridge to send task, auto-mode profile switching based on **inbound quality reports**
### Phases 3-4 remaining
### Gap: QualityDirective signals silently discarded
- Phase 3: Client-side handling of `QualityDirective` (reacting to relay-pushed profile)
Both engines receive `QualityDirective` from the relay but **do not process it**:
- **Desktop** (`engine.rs` ~line 1152): signal recv loop matches `RoomUpdate` only; `QualityDirective` falls through the catch-all `Ok(Ok(Some(_))) => {}` arm
- **Android** (`engine.rs` ~line 1198): same pattern — `QualityDirective` falls through to a generic log `info!("signal received: {:?}", ...)` with no action
The relay broadcasts directives correctly, but clients ignore them. Desktop adaptive quality currently works **only** via local `AdaptiveQualityController` observing inbound quality reports — not via relay-coordinated directives.
### Phases remaining
- Phase 3: **Client-side handling of `QualityDirective`** — add match arms in both engines' signal recv loops to apply `recommended_profile` via `pending_profile` AtomicU8. ~0.5 day since the plumbing already exists.
- Phase 4: Upgrade proposal/negotiation protocol for quality recovery

View File

@@ -103,7 +103,7 @@ Sentinel value `0xFF` means "no change pending". The recv task polls on every re
### Tauri Desktop App (com.wzp.desktop)
The Tauri engine doesn't use `AdaptiveQualityController` — quality is resolved once at call start. Adding network monitoring requires first adding adaptive quality to the Tauri call engine, which is a larger change.
~~The Tauri engine doesn't use `AdaptiveQualityController` — quality is resolved once at call start.~~ **Update (2026-04-13):** Desktop now has `AdaptiveQualityController` wired into the recv task with `pending_profile` AtomicU8 bridge. Network monitoring on desktop is now feasible — the blocker was adaptive quality, which is done. Remaining work: platform-specific network change detection (macOS: `SCNetworkReachability` or `NWPathMonitor`; Linux: `netlink` socket).
### Mid-Call ICE Re-gathering

View File

@@ -138,9 +138,20 @@ The existing relay connection carries `IceCandidate` signals. No new infrastruct
## Milestones
| Phase | Scope | Effort |
|-------|-------|--------|
| 1 | STUN client + candidate gathering | 2 days |
| 2 | QUIC hole punching + identity verification | 3 days |
| 3 | Adaptive quality on P2P connection | 2 days |
| 4 | Hybrid mode (relay + P2P, seamless migration) | 3 days |
| Phase | Scope | Effort | Status |
|-------|-------|--------|--------|
| 1 | STUN client + candidate gathering | 2 days | Done |
| 2 | QUIC hole punching + identity verification | 3 days | Done |
| 3 | Adaptive quality on P2P connection | 2 days | Pending (needs 5-tier classification, task #9) |
| 4 | Hybrid mode (relay + P2P, seamless migration) | 3 days | Done |
| 5 | Single-socket Nebula (shared signal+direct endpoint) | 2 days | Done |
| 6 | ICE path negotiation + dual-path race | 3 days | Done |
| 7 | IPv6 dual-socket | 2 days | Done (but `dual_path.rs` integration tests broken — missing `ipv6_endpoint` arg) |
## Implementation Status (2026-04-13)
Phases 1-2, 4-7 are implemented. First P2P call completed 2026-04-12.
### Known regression
Phase 7 added `ipv6_endpoint: Option<Endpoint>` parameter to `race()` in `crates/wzp-client/src/dual_path.rs` but the 3 test call sites in `crates/wzp-client/tests/dual_path.rs` (lines 111, 153, 191) were not updated — they pass 6 args instead of 7. Fix: add `None,` after the `shared_endpoint` arg in each call.

View File

@@ -62,6 +62,16 @@ if debug_tap_enabled {
### Effort: 0.5 day
### Implementation Status (2026-04-13)
Fully implemented. `--debug-tap <room>` (or `*` for all rooms) logs:
- **Per-packet metadata** (`TAP`): direction, addr, seq, codec, timestamp, FEC fields, payload size, fan_out
- **Signal events** (`TAP SIGNAL`): `RoomUpdate` (count + participant names), `QualityDirective` (codec + reason), other signals by discriminant
- **Lifecycle events** (`TAP EVENT`): participant join (id, addr, alias), participant leave (id, addr, forwarded count, or room closed)
All output uses tracing `target: "debug_tap"` so it can be filtered with `RUST_LOG=debug_tap=info`.
---
## 2. Full Protocol Analyzer (Standalone Tool)

View File

@@ -120,7 +120,9 @@
- **Web audio drift**: The browser AudioWorklet playback buffer caps at 200ms, but clock drift between the WebSocket message arrival rate and the AudioContext output rate can cause occasional underruns or accumulation. The cap prevents unbounded growth but may cause glitches.
- **Adaptive loop integration (resolved)**: AdaptiveQualityController is now fully wired into both desktop and Android send/recv tasks. Relay-coordinated codec switching broadcasts QualityDirective to all participants based on weakest-link policy.
- **Adaptive loop integration (partial)**: AdaptiveQualityController is wired into both desktop and Android send/recv tasks for **inbound quality report observation**. Relay broadcasts QualityDirective to all participants based on weakest-link policy, but **neither engine processes QualityDirective signals** — they fall through catch-all match arms silently. Local adaptive quality works; relay-coordinated quality does not.
- **dual_path.rs test regression (Phase 7)**: Phase 7 (IPv6 dual-socket) added `ipv6_endpoint: Option<Endpoint>` parameter to `race()` in `crates/wzp-client/src/dual_path.rs`, but the integration tests in `crates/wzp-client/tests/dual_path.rs` were not updated — 3 call sites pass 6 args instead of 7. `cargo test --workspace` fails to compile.
- **Relay FEC pass-through**: In room mode, the relay forwards packets opaquely without FEC decode/re-encode. This means FEC protection is end-to-end only, not per-hop. In forward mode, the relay pipeline does perform FEC decode/re-encode.