Files
featherChat/DESIGN.md
Siavash Sameni 1e2a83402d DESIGN.md: DNS-based key transparency, resolve remaining questions
- Key transparency via DNS TXT records with self-signatures
  (server can't MITM because it can't forge user's signature)
- Per-device ratchet sessions (Signal model), cross-device sync via seed
- LoRa deferred to later phases, not Phase 1
- Sealed sender before onion routing
- Phase 3 updated to include key transparency alongside federation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 20:55:15 +04:00

24 KiB

Warzone Messenger — Design Document

Problem Statement

Current chat.py has fundamental issues:

  • Identity is username-based — users pick names, no cryptographic identity. Device change = lost keys = broken encryption.
  • No forward secrecy — same ECDH key pair forever. Compromise one key, read all past messages.
  • No offline delivery — if you're not connected, messages are lost.
  • Single server — no federation, no redundancy. Server goes down = no chat.
  • Python — too slow for real deployment, hard to distribute as a single binary.

Design Goals

  1. Identity = key pair — your identity IS your private key seed. No usernames, no accounts.
  2. Signal-grade encryption — Double Ratchet for 1:1, Sender Keys for groups.
  3. Federation via DNS — servers discover each other using TXT records, like Matrix but simpler.
  4. Warzone-grade delivery — assumes intermittent connectivity, supports mule-based physical delivery.
  5. Single binary — Rust, compiles to one static binary per platform.
  6. ntfy for push — leverage existing notification infrastructure, no custom push servers.

1. Identity Model

Seed-Based Identity

seed (32 bytes) → Ed25519 signing keypair + X25519 encryption keypair
  • Seed: 32 random bytes, displayed as BIP39 mnemonic (24 words) for human backup
  • Signing key (Ed25519): signs messages, proves identity
  • Encryption key (X25519): used in key exchange for E2E encryption
  • Fingerprint: SHA-256(public_signing_key)[:16] displayed as hex, e.g. a3f8:c912:44be:7d01
  • Display name: optional, self-assigned, NOT part of identity. Can change anytime.
  • Address: fingerprint@server.example.com — the full address includes the home server

Key Storage

Platform Storage
CLI ~/.warzone/identity.seed (encrypted with passphrase via Argon2 + ChaCha20)
Browser IndexedDB (non-extractable CryptoKey) + seed backup prompt on first run
Mobile (PWA) Same as browser, seed shown as QR code for device transfer

Device Transfer

User scans QR code containing the seed (or types 24 words). New device derives the same keypair. Identity is portable — not tied to any server or device.

Trust Model

First-use trust (TOFU) by default. Users can verify fingerprints out-of-band (QR code scan, read aloud). Verified contacts are pinned — if their key changes, you get a hard warning (not just a dismissible notice).

Challenge: Username Squatting

Since identity is a fingerprint, not a name, there's no squatting. Display names are untrusted labels. The UI should prominently show fingerprints for new contacts and warn on display name collisions.

Challenge: Key Loss

Seed IS identity. Lose seed = lose identity forever. Mitigations:

  • BIP39 mnemonic backup (write on paper)
  • Optional seed escrow to a trusted contact (Shamir's Secret Sharing, 2-of-3)
  • Server never has seed — cannot help recover

2. Encryption Protocol

1:1 Messages — Signal Double Ratchet

Initial key exchange:
  Alice's X25519 identity key + ephemeral key
  Bob's X25519 identity key + signed pre-key + one-time pre-key
  → X3DH → shared secret
  → initialize Double Ratchet

Every message:
  Symmetric ratchet step → new message key
  Every N messages or on reply: DH ratchet step → new chain key
  → forward secrecy: compromise current key ≠ past messages readable
  → future secrecy: compromise current key, recovery after next DH ratchet

Pre-key bundles: each user uploads signed pre-keys to their home server. Other users fetch these to initiate sessions even when the recipient is offline.

Group Messages — Sender Keys (Signal protocol for groups)

Each member generates a Sender Key (random symmetric key + chain)
Sender Key distributed to all group members via 1:1 encrypted channels
Messages encrypted with sender's Sender Key
Sender Key ratchets forward on each message

Member join: new Sender Keys distributed to everyone
Member leave: all members rotate their Sender Keys

Challenge: Group Forward Secrecy

Sender Keys don't provide per-message forward secrecy like Double Ratchet. Trade-off: performance (one encrypt per message vs one per member). Acceptable for groups < 100. For larger groups, consider MLS (Message Layer Security, RFC 9420).

Challenge: Multi-Device

Each device has its own X25519 keypair derived from the seed. Sender encrypts to all of recipient's known devices. Device list is signed by the identity key and published to the home server.

Message Format

{
  "v": 1,
  "from": "a3f8c912...",           // sender fingerprint
  "to": "b7d1e845...",             // recipient fingerprint (or group ID)
  "ts": 1711443600,
  "type": "msg",                   // msg | file | key_exchange | receipt
  "session": "...",                 // Double Ratchet session ID
  "ratchet": { "dh": "...", "n": 42, "pn": 41 },
  "ciphertext": "base64...",       // encrypted payload
  "sig": "base64..."               // Ed25519 signature over everything above
}

3. Federation via DNS

Server Discovery

Each server operates under a domain. Federation is discovered via DNS TXT records:

_warzone._tcp.example.com  TXT  "v=wz1; endpoint=https://wz.example.com; pubkey=base64..."

Fields:

  • v=wz1 — protocol version
  • endpoint — HTTPS URL of the server API
  • pubkey — server's Ed25519 public key (for server-to-server auth)

Server-to-Server Protocol

Server A wants to deliver message to user@example.com:
1. DNS lookup: _warzone._tcp.example.com → TXT record → endpoint URL
2. TLS connection to endpoint
3. Mutual authentication: both servers verify each other's pubkey
4. Deliver encrypted message blob (server cannot read it)
5. Recipient's server queues it for delivery

Home Server Responsibilities

  • Store and forward messages for its users
  • Host pre-key bundles for key exchange
  • Serve user's device list and public keys
  • Relay messages to federated servers
  • Queue messages for offline users

Challenge: DNS Availability in Warzone

DNS may be unreliable or censored. Mitigations:

  • Hard-coded peer list: users can manually add server endpoints
  • DNS-over-HTTPS (DoH): bypass local DNS censorship
  • mDNS/local discovery: for LAN-only operation when internet is down
  • Gossip protocol: servers share their known peer list with each other

Challenge: Server Impersonation

Server pubkey in DNS TXT record prevents impersonation. But DNS itself could be hijacked. Mitigations:

  • DNSSEC validation
  • TOFU for server keys (pin on first contact)
  • Certificate transparency-style log for server key changes

Key Transparency via DNS

Use DNS as a decentralized public key registry — prevents the server from performing MITM attacks on key exchange.

Each user publishes their public key as a DNS TXT record, signed by their own identity key:

_wz._id.<hashed-fingerprint>.example.com  TXT  "v=wz1; fp=a3f8c912...; pubkey=base64...; sig=base64..."
  • fp — full fingerprint
  • pubkey — user's Ed25519 public identity key
  • sig — self-signature over (fp + pubkey), proving the DNS record was authored by the key holder

Verification flow:

Bob wants Alice's key:
1. Ask server → server says Alice's key is X
2. DNS lookup → _wz._id.<hash(alice-fp)>.example.com → key is X, self-signed
3. Match? → trusted
4. Mismatch? → HARD WARNING: server may be performing MITM
5. No DNS record? → fall back to TOFU (trust on first use)

Why DNS works here:

  • Decentralized: no single party controls all DNS (especially across domains)
  • The self-signature in the TXT record means even the DNS admin can't forge it without Alice's private key
  • DNSSEC adds transport integrity (record wasn't tampered in transit)
  • Records are globally cached and replicated — hard to silently change

Privacy concern: public DNS means anyone can enumerate users by scanning TXT records. Mitigation: subdomain is SHA-256(fingerprint)[:16] — you must already know the fingerprint to look up the record. This makes enumeration impractical.

Scalability: one TXT record per user. Fine for thousands of users per domain. Large orgs can shard across subdomains.

When users don't control DNS: in an org deployment, the admin controls the DNS zone. The admin could collude with the server to MITM. But the self-signature still protects — the admin would need the user's private key to forge a valid record. The only attack is deleting the record (forcing TOFU fallback), not replacing it.

Integration with federation: the same DNS zone handles both server discovery (_warzone._tcp) and user key transparency (_wz._id). One DNS zone, two purposes.


4. Warzone Delivery — Mule Protocol

Problem

In conflict zones, internet connectivity is intermittent, unreliable, or surveilled. Servers may be offline for hours or days. Traditional store-and-forward fails when both servers are rarely online simultaneously.

Mule Role

A mule is a device (phone, laptop, USB drive) that physically carries messages between disconnected networks.

Network A (offline)          Mule                    Network B (online)
     |                        |                           |
     |<-- connect to A -------|                           |
     |-- queued messages ---->|                           |
     |<-- delivery receipts --|                           |
     |                        |                           |
     |                        |--- travel physically ---->|
     |                        |                           |
     |                        |-- connect to B ---------->|
     |                        |-- deliver messages ------>|
     |                        |<-- queued for A ---------|
     |                        |<-- receipts for A --------|
     |                        |                           |
     |                        |<-- travel back -----------|
     |                        |                           |
     |<-- connect to A -------|                           |
     |<-- deliver from B -----|                           |
     |-- receipts for B ----->|                           |

Mule Protocol

  1. Authentication: mule presents its identity (keypair). Server checks if mule is authorized (allowlist or signed authorization token from an admin).

  2. Pickup: mule sends PICKUP request. Server gives all queued outbound messages (encrypted blobs — mule cannot read them). Server marks messages as "in transit by mule X".

  3. Delivery: mule connects to destination server, sends DELIVER with the blobs. Destination server validates signatures and queues for recipients.

  4. Receipts: destination server gives mule delivery receipts (signed). Mule carries these back.

  5. Receipt enforcement: on next pickup, mule MUST present receipts for previous delivery. If no receipts → server refuses new pickup (prevents mule from dropping messages silently). Exception: mule can present a signed "delivery failed" report explaining why.

  6. Deduplication: messages have unique IDs. Servers deduplicate on receive. Multiple mules can carry the same messages — first delivery wins, duplicates are silently dropped.

Queue Management

Message states on origin server:
  QUEUED       → waiting for delivery (direct or mule)
  IN_TRANSIT   → picked up by mule X at time T
  DELIVERED    → receipt received
  EXPIRED      → TTL exceeded, dropped

TTL: configurable per-message (default 7 days)
Retry: if IN_TRANSIT for > 24h with no receipt, re-queue

Challenge: Mule Compromise

Mule has encrypted blobs. Even if captured:

  • Messages are E2E encrypted — mule sees only ciphertext
  • Metadata (sender/recipient fingerprints) is visible to mule. Mitigation: wrap in an outer encryption layer to the destination server's public key, so mule only sees "blob for server X"
  • Mule authorization can be revoked by server admin

Challenge: Message Ordering

Mule delivery is inherently out-of-order. Messages carry sequence numbers per conversation. Clients reorder on display. Ratchet protocol handles out-of-order decryption natively (message keys are cached for skipped messages).

Challenge: Mule Bandwidth

Mule may carry gigabytes of messages on a USB drive, or megabytes on a phone over Bluetooth. Protocol must support:

  • Priority levels (urgent messages first)
  • Compression (zstd on the blob bundle)
  • Partial sync (resume interrupted transfer)
  • Size limits per mule (server respects mule's capacity declaration)

5. Notification via ntfy

Why ntfy

  • Self-hostable, simple HTTP API
  • Works on Android (no Google Play Services needed), iOS, desktop
  • Supports E2E encryption (ntfy's own, separate from ours)
  • Can be deployed alongside our server

Integration

User registers ntfy topic: fingerprint-derived, e.g. wz_a3f8c912
Server pushes notification on new message:
  POST https://ntfy.example.com/wz_a3f8c912
  Body: "New message from <display_name>"
  (NO message content — that's E2E encrypted)

User subscribes to their topic in ntfy app. Gets push notification, opens warzone client to read the actual message.

Challenge: ntfy Metadata

ntfy server sees that a notification was sent to a topic (i.e., someone messaged this user). Mitigation: self-host ntfy on the same server. Or accept the metadata leak as a trade-off for push notification functionality.


6. Rust Rewrite

Why Rust

  • Single static binary (no runtime dependencies)
  • Memory safety without GC
  • Excellent async I/O (tokio)
  • Cross-compile to Linux ARM (warzone routers, phones), Windows, macOS
  • WebAssembly target for browser client

Crate Selection

Function Crate
Async runtime tokio
HTTP server axum
Crypto ring or libsignal-protocol
Signal protocol libsignal-protocol-rust (Signal's official Rust impl)
Ed25519 ed25519-dalek
X25519 x25519-dalek
Argon2 argon2
DNS trust-dns-resolver
TLS rustls
Database sled (embedded) or sqlite via rusqlite
Serialization serde + bincode (wire) + serde_json (API)
BIP39 bip39
Compression zstd
CLI clap
TUI ratatui

Binary Targets

warzone-server    # server binary
warzone           # CLI client + TUI
warzone-mule      # mule binary (subset of client)
warzone.wasm      # browser client (via wasm-pack)

Architecture

┌─────────────────────────────────────────┐
│                  Server                  │
├──────────┬──────────┬───────────────────┤
│ HTTP API │ WS relay │ Federation (S2S)  │
├──────────┴──────────┴───────────────────┤
│           Message Router                 │
├──────────┬──────────┬───────────────────┤
│ Queue DB │ Key Store│ User Registry     │
│ (sled)   │ (sled)   │ (sled)            │
└──────────┴──────────┴───────────────────┘

┌─────────────────────────────────────────┐
│                 Client                   │
├──────────┬──────────┬───────────────────┤
│   TUI    │ Web(WASM)│    CLI            │
├──────────┴──────────┴───────────────────┤
│          Protocol Layer                  │
├──────────┬──────────┬───────────────────┤
│ Signal   │ Identity │ Storage           │
│ Protocol │ Manager  │ (sled/IndexedDB)  │
└──────────┴──────────┴───────────────────┘

7. Roadmap

Phase 0 — Current (Python prototype)

  • Basic chat server + web UI
  • WebSocket SSH tunnel
  • Nginx reverse proxy + ArvanCloud deployment
  • ECDH + AES-GCM DMs (basic, no forward secrecy)
  • Group chat with passwords
  • PWA support
  • File upload

Phase 1 — Identity & Crypto Foundation (Rust)

  • Rust project scaffold (cargo workspace: server, client, protocol, mule)
  • Seed-based identity (Ed25519 + X25519 from 32-byte seed)
  • BIP39 mnemonic generation and recovery
  • Seed encryption at rest (Argon2 + ChaCha20-Poly1305)
  • Pre-key bundle generation and storage
  • X3DH key exchange implementation
  • Double Ratchet for 1:1 messaging
  • Message signing (Ed25519)
  • Basic server: accept connections, store-and-forward

Phase 2 — Core Messaging

  • 1:1 E2E encrypted messaging (full Signal protocol)
  • Offline message queuing with TTL
  • Multi-device support (device list signed by identity key)
  • Sender Keys for group encryption
  • Group management (create, invite, leave, kick)
  • File transfer (chunked, encrypted)
  • Delivery receipts (sent, delivered, read)
  • Message ordering and deduplication
  • TUI client (ratatui)
  • Web client (WASM)

Phase 3 — Federation & Key Transparency

  • DNS TXT record format specification (server discovery + user key transparency)
  • User self-signed key publication to DNS (_wz._id.<hash>.domain)
  • Key verification: server response vs DNS record cross-check
  • Server-to-server mutual TLS authentication
  • Federated message delivery
  • Server key pinning (TOFU)
  • Federated pre-key bundle fetching
  • Gossip-based peer discovery fallback
  • Hard-coded peer list for DNS-free operation

Phase 4 — Warzone Delivery

  • Mule protocol specification
  • Mule authentication and authorization
  • Message pickup with capacity declaration
  • Delivery receipt enforcement
  • Outer encryption layer (hide metadata from mule)
  • Bundle compression (zstd)
  • Partial sync / resume
  • Priority levels
  • Mule CLI binary

Phase 5 — Transport Fallbacks

  • Bluetooth mule transfer (phone-to-phone, phone-to-server)
  • LoRa transport layer (low bandwidth, long range, last-resort)
  • mDNS / LAN discovery for local mesh
  • Wi-Fi Direct for nearby device sync

Phase 6 — Metadata Protection (Optional Layer)

  • Onion routing between federated servers (opt-in, requires good connectivity)
  • Padding and traffic shaping to resist traffic analysis
  • Sealed sender (server doesn't know who sent a message, only who receives)

Phase 7 — Polish & Operations

  • ntfy integration for push notifications
  • DoH for DNS resolution in censored networks
  • Admin CLI (manage users, mules, federation)
  • Monitoring and health checks
  • Rate limiting and abuse prevention
  • Audit logging
  • Server-at-rest encryption (optional, manual key on boot)
  • Cross-compilation CI (Linux x86/ARM, macOS, Windows, WASM)
  • Documentation and protocol specification

Resolved Decisions

Question Decision Rationale
MLS vs Sender Keys Sender Keys (groups ≤ 50) Simpler, sufficient for target group sizes. MLS revisited if needed later.
Metadata protection Optional onion layer Opt-in when connectivity allows. Not a blocker for core functionality. Sealed sender as a lighter alternative first.
Deniability Deniability by default (Signal model) Safety-first for users in hostile environments. Non-repudiation can be added as opt-in per-conversation later.
Server-at-rest encryption Optional, not in core Nice to have. Implement as a flag: --encrypt-db with passphrase on boot. E2E already protects message content.
Incentives / tokenization Not in scope This is an organizational/military tool. Participants cooperate by mandate, not incentive.
Transport fallbacks Bluetooth + LoRa Mules use Bluetooth for device-to-device. LoRa for extreme last-resort (low bandwidth but km range). LoRa is not Phase 1.
Key transparency DNS TXT records Each user self-signs their pubkey in a DNS TXT record. Server can't MITM because it can't forge the self-signature. Integrated with federation DNS in Phase 3.
Multi-device ratchet Per-device sessions Each device maintains its own Double Ratchet session with each contact (Signal's approach). Cross-device history sync via encrypted device-to-device channel using shared seed.

Open Questions

  1. LoRa investment: LoRa has ~250 byte payload limit. Emergency-only (receipts + short text) or a real feature? Not Phase 1 either way — but the compact binary format should be designed early so the message layer doesn't assume JSON everywhere.

  2. Legal: E2E encryption with mule delivery designed for warzone use has significant legal implications in many jurisdictions. Needs legal review before deployment.

  3. Sealed sender vs onion routing: Sealed sender (Signal's approach — server knows recipient but not sender) is lighter than full onion routing. Plan: sealed sender first as the default metadata protection, full onion routing as Phase 6 upgrade for when connectivity allows it.


8. Transport Layer Architecture

The protocol is transport-agnostic. The message format is the same regardless of how it travels. Transports are pluggable:

┌─────────────────────────────────────────────┐
│              Application Layer               │
│   (Signal Protocol, Message Routing, Queue)  │
├─────────────────────────────────────────────┤
│              Transport Abstraction            │
│   trait Transport {                           │
│     async fn send(&self, endpoint, blob);     │
│     async fn recv(&self) -> blob;             │
│   }                                           │
├──────┬──────┬──────┬──────┬────────┬────────┤
│ HTTPS│  WS  │ BT   │ LoRa │Wi-Fi   │ USB    │
│      │      │      │      │Direct  │ (file) │
└──────┴──────┴──────┴──────┴────────┴────────┘

HTTPS (Primary)

  • Standard server-to-server and client-to-server
  • TLS 1.3, certificate pinning
  • HTTP/2 for multiplexing
  • SSE or WebSocket for real-time push

Bluetooth (Mule + Nearby)

  • BLE for discovery, Bluetooth Classic for data transfer
  • Range: ~10-100m
  • Bandwidth: ~2 Mbps practical
  • Use case: mule syncs with server/client in proximity
  • Protocol: RFCOMM socket, same message blobs as HTTPS

LoRa (Last Resort)

  • Range: 2-15 km (line of sight), 1-5 km urban
  • Bandwidth: 0.3-50 kbps
  • Payload: ~250 bytes per packet
  • Use case: delivery receipts, short text, presence beacons
  • NOT for files or media — text-only, heavily compressed
  • Message format: compact binary (not JSON)
LoRa packet (250 bytes max):
  [1] version
  [1] type (text=0x01, receipt=0x02, beacon=0x03)
  [8] sender fingerprint (truncated)
  [8] recipient fingerprint (truncated)
  [4] timestamp (unix, 32-bit)
  [12] nonce
  [~216] ciphertext (AES-GCM, ~200 chars of text)

Wi-Fi Direct (Nearby Mesh)

  • Range: ~200m
  • Bandwidth: ~250 Mbps
  • Use case: local group sync when no internet, ad-hoc mesh
  • Devices form a local group, sync message queues peer-to-peer

USB / File (Sneakernet)

  • Export message queue to encrypted file
  • Copy to USB drive
  • Import on destination machine
  • Same as mule protocol but manual file transfer
  • warzone export --since 24h --to /mnt/usb/messages.wz
  • warzone import /mnt/usb/messages.wz