manawenuz/wz-phone

Fork 0

Files

Siavash Sameni 1329abbeba

Mirror to GitHub / mirror (push) Failing after 34s

Details

Build Release Binaries / build-amd64 (push) Failing after 3m21s

Details

docs(prd): rewrite E2E PRD — prior approach broke multi-client voice

Document why wrapping QuinnTransport with EncryptingTransport using the
pairwise client↔relay key cannot work for an SFU (recipient has a different
key than sender). Propose two valid paths: MLS group keys (true E2E) or
hop-by-hop relay re-encryption (relay-trusted). Recommend hop-by-hop first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-25 17:44:57 +04:00

5.8 KiB

Raw Blame History

PRD: E2E Media Encryption (rewrite)

Status: proposed (supersedes prior version) Resolves: Real end-to-end media encryption between call participants. Replaces: The prior version of this PRD described wrapping QuinnTransport in EncryptingTransport using the pairwise client↔relay session. That approach was implemented (commit 52a6f5e) and broke voice between any two clients because the relay does not decrypt+re-encrypt — see "Why the prior fix failed" below. The wrapping was reverted in commit e8cab25.

Why the prior fix failed

wzp_client::handshake::perform_handshake performs ECDH between the client and the relay. Each client in a room ends up with a different pairwise session key (key_A for client A, key_B for client B, etc.).

The relay is an SFU — it forwards MediaPacket bytes between participants in a room without inspecting their payloads. The relay does not run a decrypt-then-encrypt step keyed per-recipient.

Wrapping QuinnTransport in EncryptingTransport therefore produced:

Client A:  plaintext --[encrypt key_A]--> ciphertext --> Relay
Relay:     forwards ciphertext (bytes) --> Client B
Client B:  ciphertext --[decrypt key_B]--> garbage --> silent audio

Result: every recipient saw decryption failures, audio went silent.

This is not a bug in EncryptingTransport — the wrapper does exactly what it claims. The bug was thinking the pairwise client-relay session was usable for participant-to-participant media. It isn't.

Goals

A future implementation must satisfy:

Two clients in a room can exchange media that the other client can decrypt.
The relay cannot decrypt any media payload (true E2E), OR alternatively, the relay can decrypt+re-encrypt per recipient (hop-by-hop, sometimes called SFU-trusted).
Joining and leaving the room mid-call rotates keys so departed members can't decrypt subsequent traffic (forward secrecy on membership change).
Compatible with the existing MediaPacket wire format (header in plaintext, payload encrypted).

Two valid approaches

Approach A — MLS group keys (true E2E)

Use the MLS protocol (e.g. via the openmls crate) to derive a shared group key that all room members possess and the relay does not.

Relay acts as a delivery service for MLS Handshake messages (Welcome, Commit, Proposal) but never sees the group secret.
Every media packet is AEAD-sealed with the current group epoch key.
Group rekey is triggered by:
- Member join/leave (forward secrecy on membership)
- Periodic (every N seconds or N packets) for post-compromise security
Each room maintains its own MLS group; the relay just stores opaque mls_blob payloads in SignalMessage::MlsHandshake.

Pros: real E2E. Relay compromise does not leak media. Cons: Significant complexity (MLS state machine per room, persistent ratchet trees, key schedule). Adds openmls dependency (~30 KLOC). Federation across relays is harder.

Approach B — Hop-by-hop re-encryption at the relay

The relay holds a CryptoSession per connected client (which it already does — see _crypto_session discarded in crates/wzp-relay/src/main.rs:1817). On forward:

Relay.recv_media(from A):    decrypt with key_A → plaintext
Relay.send_media(to B, C, D): for each recipient X, encrypt with key_X

This is the same model as Matrix Megolm-without-Megolm — encrypted hop-by-hop but the relay sees plaintext briefly in between.

Pros: Reuses existing per-client ChaChaSession. Implementation is ~100 lines in the relay's room forwarding loop. Federation works the same way (each relay-relay hop has its own session). Cons: Relay sees plaintext. A compromised relay can record and decrypt all media. This is not E2E — but it is strictly stronger than the current state (plaintext-over-QUIC-TLS exposes media to anyone with a TLS-terminating proxy on the relay).

Recommendation

Ship Approach B first. It's a small, well-scoped change that closes the relay-operator-can-see-plaintext-in-RAM gap without requiring an MLS rewrite. Then layer Approach A on top when the threat model demands relay-untrusted operation.

Out of scope for this PRD

Federation gossip key exchange (separate PRD)
SAS (Short Authentication String) verification UX (separate PRD)
Rekey on session compromise (handled by the chosen approach's group/pairwise rekey)

Acceptance criteria (Approach B, first iteration)

Relay's room forwarding loop (crates/wzp-relay/src/room.rs:354 and :1353) calls sender_session.decrypt() then recipient_session.encrypt() per recipient before send_media.
Each RoomMember holds its Box<dyn CryptoSession> (currently discarded as _crypto_session in main.rs:1817).
Client-side: re-add the EncryptingTransport wrapping in desktop/src-tauri/src/engine.rs (the two sites reverted in e8cab25).
Integration test: two-client mock room exchanges media; verify each recipient gets the sender's plaintext back after the relay double-hop.
Existing 825 tests still pass.

Verification

cargo test -p wzp-relay --test multi_client_relay_path should pass with two simulated clients sending audio in both directions and decrypting each other's frames.

Files to touch

crates/wzp-relay/src/main.rs — keep crypto_session per-client (drop the _ prefix)
crates/wzp-relay/src/room.rs — add decrypt/re-encrypt to forward path
crates/wzp-relay/src/session_mgr.rs — store sessions keyed by peer
desktop/src-tauri/src/engine.rs — restore EncryptingTransport wrapping (~2 sites)
crates/wzp-relay/tests/multi_client_relay_path.rs — new integration test

Risk / rollback

If multi-client tests fail in CI, the change is contained to the relay forwarding loop and one engine.rs edit — straightforward revert.

5.8 KiB Raw Blame History