Files
wz-phone/docs/PRD/PRD-e2e-media-encryption.md
Siavash Sameni 1329abbeba
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 3m21s
docs(prd): rewrite E2E PRD — prior approach broke multi-client voice
Document why wrapping QuinnTransport with EncryptingTransport using the
pairwise client↔relay key cannot work for an SFU (recipient has a different
key than sender). Propose two valid paths: MLS group keys (true E2E) or
hop-by-hop relay re-encryption (relay-trusted). Recommend hop-by-hop first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 17:44:57 +04:00

99 lines
5.8 KiB
Markdown

# PRD: E2E Media Encryption (rewrite)
> **Status:** proposed (supersedes prior version)
> **Resolves:** Real end-to-end media encryption between call participants.
> **Replaces:** The prior version of this PRD described wrapping `QuinnTransport` in `EncryptingTransport` using the pairwise client↔relay session. That approach was implemented (commit `52a6f5e`) and **broke voice between any two clients** because the relay does not decrypt+re-encrypt — see "Why the prior fix failed" below. The wrapping was reverted in commit `e8cab25`.
---
## Why the prior fix failed
`wzp_client::handshake::perform_handshake` performs ECDH **between the client and the relay**. Each client in a room ends up with a **different** pairwise session key (key_A for client A, key_B for client B, etc.).
The relay is an SFU — it forwards `MediaPacket` bytes between participants in a room without inspecting their payloads. The relay does not run a decrypt-then-encrypt step keyed per-recipient.
Wrapping `QuinnTransport` in `EncryptingTransport` therefore produced:
```
Client A: plaintext --[encrypt key_A]--> ciphertext --> Relay
Relay: forwards ciphertext (bytes) --> Client B
Client B: ciphertext --[decrypt key_B]--> garbage --> silent audio
```
Result: every recipient saw decryption failures, audio went silent.
This is **not a bug in `EncryptingTransport`** — the wrapper does exactly what it claims. The bug was thinking the pairwise client-relay session was usable for participant-to-participant media. It isn't.
## Goals
A future implementation must satisfy:
- Two clients in a room can exchange media that the **other client** can decrypt.
- The **relay cannot decrypt** any media payload (true E2E), OR alternatively, the relay can decrypt+re-encrypt per recipient (hop-by-hop, sometimes called SFU-trusted).
- Joining and leaving the room mid-call rotates keys so departed members can't decrypt subsequent traffic (forward secrecy on membership change).
- Compatible with the existing `MediaPacket` wire format (header in plaintext, payload encrypted).
## Two valid approaches
### Approach A — MLS group keys (true E2E)
Use the [MLS protocol](https://datatracker.ietf.org/doc/rfc9420/) (e.g. via the `openmls` crate) to derive a shared **group key** that all room members possess and the relay does not.
- Relay acts as a **delivery service** for MLS Handshake messages (`Welcome`, `Commit`, `Proposal`) but never sees the group secret.
- Every media packet is AEAD-sealed with the current group epoch key.
- Group rekey is triggered by:
- Member join/leave (forward secrecy on membership)
- Periodic (every N seconds or N packets) for post-compromise security
- Each room maintains its own MLS group; the relay just stores opaque `mls_blob` payloads in `SignalMessage::MlsHandshake`.
**Pros:** real E2E. Relay compromise does not leak media.
**Cons:** Significant complexity (MLS state machine per room, persistent ratchet trees, key schedule). Adds `openmls` dependency (~30 KLOC). Federation across relays is harder.
### Approach B — Hop-by-hop re-encryption at the relay
The relay holds a `CryptoSession` per connected client (which it already does — see `_crypto_session` discarded in `crates/wzp-relay/src/main.rs:1817`). On forward:
```
Relay.recv_media(from A): decrypt with key_A → plaintext
Relay.send_media(to B, C, D): for each recipient X, encrypt with key_X
```
This is the same model as Matrix Megolm-without-Megolm — encrypted hop-by-hop but the relay sees plaintext briefly in between.
**Pros:** Reuses existing per-client `ChaChaSession`. Implementation is ~100 lines in the relay's room forwarding loop. Federation works the same way (each relay-relay hop has its own session).
**Cons:** Relay sees plaintext. A compromised relay can record and decrypt all media. This is **not E2E** — but it is strictly stronger than the current state (plaintext-over-QUIC-TLS exposes media to anyone with a TLS-terminating proxy on the relay).
## Recommendation
**Ship Approach B first.** It's a small, well-scoped change that closes the relay-operator-can-see-plaintext-in-RAM gap without requiring an MLS rewrite. Then layer Approach A on top when the threat model demands relay-untrusted operation.
## Out of scope for this PRD
- Federation gossip key exchange (separate PRD)
- SAS (Short Authentication String) verification UX (separate PRD)
- Rekey on session compromise (handled by the chosen approach's group/pairwise rekey)
## Acceptance criteria (Approach B, first iteration)
1. Relay's room forwarding loop (`crates/wzp-relay/src/room.rs:354` and `:1353`) calls `sender_session.decrypt()` then `recipient_session.encrypt()` per recipient before `send_media`.
2. Each `RoomMember` holds its `Box<dyn CryptoSession>` (currently discarded as `_crypto_session` in `main.rs:1817`).
3. Client-side: re-add the `EncryptingTransport` wrapping in `desktop/src-tauri/src/engine.rs` (the two sites reverted in `e8cab25`).
4. Integration test: two-client mock room exchanges media; verify each recipient gets the sender's plaintext back after the relay double-hop.
5. Existing 825 tests still pass.
## Verification
`cargo test -p wzp-relay --test multi_client_relay_path` should pass with two simulated clients sending audio in both directions and decrypting each other's frames.
## Files to touch
- `crates/wzp-relay/src/main.rs` — keep `crypto_session` per-client (drop the `_` prefix)
- `crates/wzp-relay/src/room.rs` — add decrypt/re-encrypt to forward path
- `crates/wzp-relay/src/session_mgr.rs` — store sessions keyed by peer
- `desktop/src-tauri/src/engine.rs` — restore `EncryptingTransport` wrapping (~2 sites)
- `crates/wzp-relay/tests/multi_client_relay_path.rs` — new integration test
## Risk / rollback
If multi-client tests fail in CI, the change is contained to the relay forwarding loop and one engine.rs edit — straightforward revert.