Document why wrapping QuinnTransport with EncryptingTransport using the pairwise client↔relay key cannot work for an SFU (recipient has a different key than sender). Propose two valid paths: MLS group keys (true E2E) or hop-by-hop relay re-encryption (relay-trusted). Recommend hop-by-hop first. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.8 KiB
PRD: E2E Media Encryption (rewrite)
Status: proposed (supersedes prior version) Resolves: Real end-to-end media encryption between call participants. Replaces: The prior version of this PRD described wrapping
QuinnTransportinEncryptingTransportusing the pairwise client↔relay session. That approach was implemented (commit52a6f5e) and broke voice between any two clients because the relay does not decrypt+re-encrypt — see "Why the prior fix failed" below. The wrapping was reverted in commite8cab25.
Why the prior fix failed
wzp_client::handshake::perform_handshake performs ECDH between the client and the relay. Each client in a room ends up with a different pairwise session key (key_A for client A, key_B for client B, etc.).
The relay is an SFU — it forwards MediaPacket bytes between participants in a room without inspecting their payloads. The relay does not run a decrypt-then-encrypt step keyed per-recipient.
Wrapping QuinnTransport in EncryptingTransport therefore produced:
Client A: plaintext --[encrypt key_A]--> ciphertext --> Relay
Relay: forwards ciphertext (bytes) --> Client B
Client B: ciphertext --[decrypt key_B]--> garbage --> silent audio
Result: every recipient saw decryption failures, audio went silent.
This is not a bug in EncryptingTransport — the wrapper does exactly what it claims. The bug was thinking the pairwise client-relay session was usable for participant-to-participant media. It isn't.
Goals
A future implementation must satisfy:
- Two clients in a room can exchange media that the other client can decrypt.
- The relay cannot decrypt any media payload (true E2E), OR alternatively, the relay can decrypt+re-encrypt per recipient (hop-by-hop, sometimes called SFU-trusted).
- Joining and leaving the room mid-call rotates keys so departed members can't decrypt subsequent traffic (forward secrecy on membership change).
- Compatible with the existing
MediaPacketwire format (header in plaintext, payload encrypted).
Two valid approaches
Approach A — MLS group keys (true E2E)
Use the MLS protocol (e.g. via the openmls crate) to derive a shared group key that all room members possess and the relay does not.
- Relay acts as a delivery service for MLS Handshake messages (
Welcome,Commit,Proposal) but never sees the group secret. - Every media packet is AEAD-sealed with the current group epoch key.
- Group rekey is triggered by:
- Member join/leave (forward secrecy on membership)
- Periodic (every N seconds or N packets) for post-compromise security
- Each room maintains its own MLS group; the relay just stores opaque
mls_blobpayloads inSignalMessage::MlsHandshake.
Pros: real E2E. Relay compromise does not leak media.
Cons: Significant complexity (MLS state machine per room, persistent ratchet trees, key schedule). Adds openmls dependency (~30 KLOC). Federation across relays is harder.
Approach B — Hop-by-hop re-encryption at the relay
The relay holds a CryptoSession per connected client (which it already does — see _crypto_session discarded in crates/wzp-relay/src/main.rs:1817). On forward:
Relay.recv_media(from A): decrypt with key_A → plaintext
Relay.send_media(to B, C, D): for each recipient X, encrypt with key_X
This is the same model as Matrix Megolm-without-Megolm — encrypted hop-by-hop but the relay sees plaintext briefly in between.
Pros: Reuses existing per-client ChaChaSession. Implementation is ~100 lines in the relay's room forwarding loop. Federation works the same way (each relay-relay hop has its own session).
Cons: Relay sees plaintext. A compromised relay can record and decrypt all media. This is not E2E — but it is strictly stronger than the current state (plaintext-over-QUIC-TLS exposes media to anyone with a TLS-terminating proxy on the relay).
Recommendation
Ship Approach B first. It's a small, well-scoped change that closes the relay-operator-can-see-plaintext-in-RAM gap without requiring an MLS rewrite. Then layer Approach A on top when the threat model demands relay-untrusted operation.
Out of scope for this PRD
- Federation gossip key exchange (separate PRD)
- SAS (Short Authentication String) verification UX (separate PRD)
- Rekey on session compromise (handled by the chosen approach's group/pairwise rekey)
Acceptance criteria (Approach B, first iteration)
- Relay's room forwarding loop (
crates/wzp-relay/src/room.rs:354and:1353) callssender_session.decrypt()thenrecipient_session.encrypt()per recipient beforesend_media. - Each
RoomMemberholds itsBox<dyn CryptoSession>(currently discarded as_crypto_sessioninmain.rs:1817). - Client-side: re-add the
EncryptingTransportwrapping indesktop/src-tauri/src/engine.rs(the two sites reverted ine8cab25). - Integration test: two-client mock room exchanges media; verify each recipient gets the sender's plaintext back after the relay double-hop.
- Existing 825 tests still pass.
Verification
cargo test -p wzp-relay --test multi_client_relay_path should pass with two simulated clients sending audio in both directions and decrypting each other's frames.
Files to touch
crates/wzp-relay/src/main.rs— keepcrypto_sessionper-client (drop the_prefix)crates/wzp-relay/src/room.rs— add decrypt/re-encrypt to forward pathcrates/wzp-relay/src/session_mgr.rs— store sessions keyed by peerdesktop/src-tauri/src/engine.rs— restoreEncryptingTransportwrapping (~2 sites)crates/wzp-relay/tests/multi_client_relay_path.rs— new integration test
Risk / rollback
If multi-client tests fail in CI, the change is contained to the relay forwarding loop and one engine.rs edit — straightforward revert.