refactor: federation uses persistent WS instead of HTTP polling

- Server-to-server communication via WebSocket at /v1/federation/ws
- Auth as first WS frame (shared secret), presence + forwards over same connection
- Auto-reconnect every 3s on disconnect, instant presence push on connect
- Replaces HTTP REST polling (no more 5s intervals, lower latency)
- Removed dead HMAC helpers (auth is now direct secret comparison over WS)
- Simplified ARCHITECTURE.md mermaid diagrams for Gitea rendering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-03-28 16:56:13 +04:00
parent 3e0889e5dc
commit f8eaf30bb4
7 changed files with 364 additions and 306 deletions

View File

@@ -9,51 +9,14 @@
```mermaid
graph TB
subgraph Clients
CLI["CLI Client<br/>(warzone)"]
TUI["TUI Client<br/>(ratatui)"]
WEB["Web Client<br/>(WASM)"]
end
subgraph Protocol["warzone-protocol (shared library)"]
ID["Identity<br/>Ed25519 + X25519"]
X3DH["X3DH<br/>Key Agreement"]
DR["Double Ratchet<br/>Forward Secrecy"]
SK["Sender Keys<br/>Group Encryption"]
WIRE["WireMessage<br/>8 variants"]
end
subgraph ServerA["warzone-server (Alpha)"]
API_A["REST API<br/>(axum)"]
WS_A["WebSocket<br/>Relay"]
AUTH_A["Auth<br/>Middleware"]
CALLS_A["Call State<br/>Manager"]
FED_A["Federation<br/>Module"]
DB_A["sled DB<br/>7 trees"]
end
subgraph ServerB["warzone-server (Bravo)"]
API_B["REST API"]
WS_B["WebSocket Relay"]
FED_B["Federation Module"]
DB_B["sled DB"]
end
subgraph WZP["WarzonePhone"]
RELAY["WZP Relay<br/>(QUIC SFU)"]
BRIDGE["Web Bridge<br/>(audio)"]
end
CLI --> Protocol
TUI --> Protocol
WEB --> Protocol
Protocol -->|"HTTP / WS"| ServerA
Protocol -->|"HTTP / WS"| ServerB
FED_A <-->|"HTTP REST<br/>HMAC-SHA256"| FED_B
ServerA -->|"Call Signaling<br/>Token Validation"| WZP
ServerB -->|"Call Signaling"| WZP
CLI[CLI Client] --> PROTO[warzone-protocol]
TUI[TUI Client] --> PROTO
WEB[Web Client WASM] --> PROTO
PROTO -->|HTTP / WS| SRVA[Server Alpha]
PROTO -->|HTTP / WS| SRVB[Server Bravo]
SRVA <-->|Federation WS| SRVB
SRVA -->|Call Signaling| WZP[WarzonePhone Relay]
SRVB -->|Call Signaling| WZP
```
---
@@ -244,7 +207,7 @@ Offer | Answer | IceCandidate | Hangup | Reject | Ringing | Busy
| CLI/TUI | WS binary | 64 hex chars (recipient fp) + raw bincode |
| CLI/TUI | HTTP POST | JSON envelope with bincode as byte array |
| Web | WS JSON | `{"to": "fingerprint", "message": [bytes]}` |
| Server↔Server | HTTP POST | JSON with base64 message + HMAC auth header |
| Server↔Server | WS JSON | JSON frames over persistent federation WS |
---
@@ -339,19 +302,13 @@ sequenceDiagram
```mermaid
graph LR
subgraph ServerAlpha["Server Alpha"]
CA["Client A<br/>Client B"]
FHA["Federation Handle"]
subgraph Alpha[Server Alpha]
CA[Client A + B]
end
subgraph ServerBravo["Server Bravo"]
CC["Client C<br/>Client D"]
FHB["Federation Handle"]
subgraph Bravo[Server Bravo]
CC[Client C + D]
end
FHA <-->|"Presence sync<br/>(every 5s)"| FHB
FHA -->|"Forward message<br/>(HTTP POST)"| FHB
FHB -->|"Forward message<br/>(HTTP POST)"| FHA
Alpha <-->|Persistent WS\nPresence + Forward| Bravo
```
### Configuration
@@ -365,8 +322,7 @@ Each server has a `federation.json`:
"peer": {
"id": "bravo",
"url": "http://10.0.0.2:7700"
},
"presence_interval_secs": 5
}
}
```
@@ -374,41 +330,40 @@ Start with: `warzone-server --federation federation.json`
### Presence Sync
Every 5 seconds, each server POSTs its connected fingerprint list to the peer:
On startup each server opens a persistent WebSocket to its peer and authenticates with the shared secret. Presence updates and message forwards flow over this single connection:
```
POST /v1/federation/presence
X-Federation-Token: SHA-256(secret || body)
{ "server_id": "alpha", "fingerprints": ["aabb...", "ccdd..."], "timestamp": ... }
WS /v1/federation/ws
Auth: {"type":"auth","secret":"HMAC(shared_secret)"}
Presence: {"type":"presence","fingerprints":["aabb...","ccdd..."]}
Forward: {"type":"forward","to":"<fp>","message":"<base64>"}
```
The receiving server replaces its remote presence set entirely. If 3 intervals pass without a sync, the remote set is cleared (peer assumed down).
The receiving server replaces its remote presence set on each presence frame. If the WebSocket drops, the server auto-reconnects every 3 seconds and re-sends its full presence list.
### Message Forwarding
```mermaid
sequenceDiagram
participant A as Client A (Alpha)
participant SA as Server Alpha
participant SB as Server Bravo
participant C as Client C (Bravo)
A->>SA: Send message to C
SA->>SA: push_to_client(C) — not local
SA->>SA: remote_presence.contains(C) — yes
SA->>SB: POST /v1/federation/forward<br/>X-Federation-Token: HMAC
SB->>SB: Verify HMAC
SB->>C: push_to_client(C) via WS
SB->>SA: { "delivered": true }
Note over SA,SB: Persistent WS connection
SA->>SB: {"type":"auth","secret":"..."}
SA->>SB: {"type":"presence","fingerprints":["A","B"]}
SB->>SA: {"type":"presence","fingerprints":["C","D"]}
Note over SA: Client A sends message to C
SA->>SB: {"type":"forward","to":"C","message":"base64..."}
Note over SB: Deliver to Client C via local WS
```
### Degradation
| Scenario | Behavior |
|----------|----------|
| Peer unreachable | Message queued locally, retried on next connection |
| Presence stale (>15s) | Remote fingerprints cleared, treated as offline |
| Peer restarts | Presence repopulates within 5 seconds |
| WS disconnected | Auto-reconnect every 3s, messages queue locally |
| Peer restarts | Presence repopulates on WS reconnect |
| HMAC mismatch | Request rejected with 401 |
---
@@ -632,15 +587,14 @@ sequenceDiagram
participant SB as Server Bravo
participant C as Client C (Bravo)
Note over SA,SB: Presence sync (every 5s)
SA->>SB: POST /federation/presence [A, B]
SB->>SA: POST /federation/presence [C, D]
Note over SA,SB: Persistent WS between servers
SA->>SB: presence ["A","B"]
SB->>SA: presence ["C","D"]
A->>SA: Message for C
SA->>SA: Not local, C in remote presence
SA->>SB: POST /federation/forward (HMAC auth)
SA->>SB: forward to C via federation WS
SB->>C: Push via local WS
SB->>SA: { "delivered": true }
```
---