Files
wz-phone/docs/REFACTOR-codebase-audit.md
Siavash Sameni fdb78e08bd
Some checks failed
Mirror to GitHub / mirror (push) Failing after 32s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
docs: full codebase refactoring audit with prioritized suggestions
Comprehensive analysis across all 8 crates + Tauri engine covering:
- engine.rs: 35% duplication between Android/desktop (350+ lines)
- SignalMessage: 36 variants mixing orthogonal concerns
- federation.rs: zero test coverage on 1,132 lines of complex logic
- peer_links: lock held across async sends (last lock-during-I/O)
- Magic numbers, error handling, CLI parsing, unsafe docs
- Priority matrix: 10 items ranked by effort/impact/risk

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:35:59 +04:00

10 KiB
Raw Blame History

Codebase Refactoring Audit (2026-04-13)

Full analysis of the WarzonePhone codebase after the DashMap relay refactor, DRED continuous tuning, and adaptive quality wiring. The codebase is ~15K lines of Rust across 8 crates plus a 1.7K-line Tauri engine. This document identifies every refactoring opportunity ranked by impact.

Critical: engine.rs is 1,705 Lines With ~35% Duplication

desktop/src-tauri/src/engine.rs has two nearly-identical CallEngine::start() implementations:

  • Android path: 880 lines (lines 3211200)
  • Desktop path: 430 lines (lines 12031633)

What's Duplicated (350+ lines)

Block Android Lines Desktop Lines Size Identical?
CallConfig initialization 529539 13531363 23 lines Yes
DRED tuner + frame_samples setup 541555 13601375 15 lines Yes
Adaptive quality profile switch 651665 14141428 15 lines Yes
Codec-to-QualityProfile match 852864 14881500 19 lines Yes
DRED ingest + gap fill 886902 15111528 17 lines Yes
Quality report ingestion 905912 15311538 8 lines Yes
Signal task (entire thing) 11331180 15691616 48 lines Yes

Suggested Fix: Extract Shared Helpers

// Top of engine.rs — shared between both platforms

fn build_call_config(quality: &str) -> CallConfig { ... }

fn codec_to_profile(codec: CodecId) -> QualityProfile { ... }

fn check_adaptive_switch(
    pending: &AtomicU8,
    encoder: &mut CallEncoder,
    tuner: &mut DredTuner,
    frame_samples: &mut usize,
    tx_codec: &Mutex<String>,
) { ... }

async fn run_signal_task(
    transport: Arc<QuinnTransport>,
    running: Arc<AtomicBool>,
    pending_profile: Arc<AtomicU8>,
    participants: Arc<Mutex<Vec<ParticipantInfo>>>,
) { ... }

This would reduce engine.rs by ~200 lines and make the Android/desktop paths only differ in their audio I/O (Oboe vs CPAL).

Effort: 2-3 hours. Impact: High — every future change to the send/recv pipeline currently requires editing two places.


High: SignalMessage Enum Has 36 Variants

crates/wzp-proto/src/packet.rs (1,727 lines) has a SignalMessage enum with 36 variants mixing orthogonal concerns:

  • Legacy call signaling (CallOffer, CallAnswer, IceCandidate, Rekey...)
  • Direct calling (RegisterPresence, DirectCallOffer, DirectCallAnswer, CallSetup...)
  • Federation (FederationHello, GlobalRoomActive/Inactive, FederatedSignalForward)
  • Relay control (SessionForward, PresenceUpdate, RouteQuery, RoomUpdate)
  • NAT traversal (Reflect, ReflectResponse, MediaPathReport)
  • Quality (QualityUpdate, QualityDirective)
  • Call control (Ping/Pong, Hold/Unhold, Mute/Unmute, Transfer)

Every new feature adds variants here, and every match on SignalMessage must handle all 36 arms (or use _ wildcard).

Suggested Fix: Sub-Enum Grouping

enum SignalMessage {
    Call(CallSignal),           // CallOffer, CallAnswer, IceCandidate, Rekey, Hangup...
    Direct(DirectCallSignal),   // RegisterPresence, DirectCallOffer, CallSetup, MediaPathReport...
    Federation(FedSignal),      // FederationHello, GlobalRoomActive, FederatedSignalForward...
    Control(ControlSignal),     // Ping/Pong, Hold/Unhold, Mute/Unmute, QualityDirective...
    Relay(RelaySignal),         // SessionForward, PresenceUpdate, RouteQuery, RoomUpdate...
}

Caution: This is a wire-format change. Serde serialization must remain backward-compatible with already-deployed relays. Use #[serde(untagged)] or versioned deserialization. Consider doing this as a v2 protocol bump.

Effort: 1 day. Impact: High for maintainability, but risky for wire compatibility.


High: Federation Has Zero Tests

crates/wzp-relay/src/federation.rs (1,132 lines) has no unit tests and no integration tests. This is the most complex file in the relay crate, handling:

  • Peer link management (connect, reconnect, stale sweep)
  • Federation media egress (forward_to_peers)
  • Federation media ingress (handle_datagram: dedup, rate limit, local delivery, multi-hop)
  • Cross-relay signal forwarding
  • Room event subscription and GlobalRoomActive/Inactive broadcasting

The relay crate has 91 tests, but none cover federation. Any refactoring of federation (like the DashMap migration or clone-before-send) is flying blind.

Suggested Fix

Priority test cases:

  1. forward_to_peers with 0, 1, 3 peers — verify datagram construction and label tracking
  2. handle_datagram — dedup (same packet twice → second dropped), rate limit (exceed → dropped)
  3. Stale presence sweeper — verify cleanup after timeout
  4. broadcast_signal — verify signal reaches all peers
  5. Multi-hop forward — verify source peer excluded from re-forward

Effort: 1 day. Impact: Critical for safe refactoring.


broadcast_signal() (line 216) holds peer_links Mutex across async send_signal() calls. A slow peer blocks all signal delivery. forward_to_peers() (line 406) holds it during sync sends (less severe but still serializes).

Fix (30 minutes)

// Before:
let links = self.peer_links.lock().await;
for (fp, link) in links.iter() {
    link.transport.send_signal(msg).await;  // lock held across await!
}

// After:
let peers: Vec<_> = {
    let links = self.peer_links.lock().await;
    links.values().map(|l| (l.label.clone(), l.transport.clone())).collect()
};
for (label, transport) in &peers {
    transport.send_signal(msg).await;  // no lock held
}

Apply to forward_to_peers(), broadcast_signal(), and send_signal_to_peer().

Effort: 30 minutes. Impact: Medium — eliminates last lock-during-I/O pattern.


Medium: Magic Numbers Scattered Through engine.rs

// These appear as literals in multiple places:
tokio::time::sleep(Duration::from_millis(5))    // 6 occurrences
tokio::time::sleep(Duration::from_millis(100))   // 2 occurrences
Duration::from_millis(200)                        // 2 occurrences (signal timeout)
Duration::from_secs(10)                           // 1 occurrence (QUIC connect timeout)
Duration::from_secs(2)                            // 2 occurrences (heartbeat interval)
const DRED_POLL_INTERVAL: u32 = 25;              // defined twice (Android + desktop)
vec![0i16; 1920]                                  // 2 occurrences (should use FRAME_SAMPLES_40MS)

Fix

// Top of engine.rs
const CAPTURE_POLL_MS: u64 = 5;
const RECV_TIMEOUT_MS: u64 = 100;
const SIGNAL_TIMEOUT_MS: u64 = 200;
const CONNECT_TIMEOUT_SECS: u64 = 10;
const HEARTBEAT_INTERVAL_SECS: u64 = 2;
const DRED_POLL_INTERVAL: u32 = 25;
// Already exists: const FRAME_SAMPLES_40MS: usize = 1920;

Effort: 15 minutes. Impact: Low but prevents bugs from inconsistent values.


Medium: CLI Arg Parsing in Relay main.rs

parse_args() in main.rs is 154 lines of manual while i < args.len() parsing with match args[i].as_str(). Every new flag adds 5-10 lines of boilerplate.

Suggested Fix

Replace with clap derive macro:

#[derive(clap::Parser)]
struct RelayArgs {
    #[arg(long, default_value = "0.0.0.0:4433")]
    listen: SocketAddr,
    #[arg(long)]
    remote: Option<String>,
    #[arg(long)]
    auth_url: Option<String>,
    // ...
}

Effort: 1 hour. Impact: Medium — cleaner, auto-generates --help, validates types at parse time.


Medium: Error Handling Inconsistency

13 instances of .ok() silently swallowing errors on transport.close() across the relay. Federation signal forwarding has inconsistent error handling — some paths log, some don't.

Fix

// Helper at top of main.rs/federation.rs:
async fn close_transport(t: &impl MediaTransport, context: &str) {
    if let Err(e) = t.close().await {
        tracing::debug!(context, error = %e, "transport close error (non-fatal)");
    }
}

Effort: 30 minutes. Impact: Better observability when debugging connection issues.


Low: Unused Crypto Fields

crates/wzp-crypto/src/handshake.rs has x25519_static_secret and x25519_static_public fields marked #[allow(dead_code)]. These are derived from the identity seed but never used in any handshake flow.

Decision needed: Are these intended for a future feature (static key federation auth)? If not, remove. If yes, document the intended use.

Effort: 5 minutes to remove, or 10 minutes to document.


Low: 20 Unsafe Functions Missing Safety Docs

crates/wzp-native/src/lib.rs has 20 unsafe functions (extern "C" FFI bridge to Oboe) without /// # Safety documentation. Clippy flags all of them.

Effort: 30 minutes. Impact: Clippy clean, better documentation for contributors.


Low: quality.rs vs dred_tuner.rs Overlap

Both files deal with network quality → codec decisions, but they're complementary:

  • quality.rs: discrete tier classification (Good/Degraded/Catastrophic) → codec profile
  • dred_tuner.rs: continuous DRED frame mapping from loss/RTT/jitter

No consolidation needed, but add cross-references:

// In dred_tuner.rs:
//! See also: `quality.rs` for discrete tier classification that drives
//! codec switching. DredTuner operates within a tier, adjusting DRED
//! parameters continuously.

// In quality.rs:
//! See also: `dred_tuner.rs` for continuous DRED tuning within a tier.

Effort: 5 minutes.


Summary: Priority Matrix

# Refactor Effort Impact Risk
1 Extract shared engine.rs helpers 2-3h High Low
2 Federation tests 1 day Critical None
3 Federation clone-before-send 30 min Medium Low
4 Extract magic numbers to constants 15 min Low None
5 Error handling helpers 30 min Medium None
6 CLI parser → clap 1h Medium Low
7 SignalMessage sub-enums 1 day High High (wire compat)
8 Safety docs on unsafe fns 30 min Low None
9 Remove/document dead crypto fields 5 min Low None
10 Cross-reference quality.rs ↔ dred_tuner.rs 5 min Low None

Recommended order: 4 → 3 → 5 → 1 → 2 → 6 → 8 → 9 → 10 → 7

Items 4, 3, 5 are quick wins (under 1 hour total). Item 1 is the biggest maintainability win. Item 2 is the most important for safety. Item 7 should wait for a protocol version bump.