Files
nick-doc/08 - Operations/Scanner Operations.md
Siavash Sameni dceaf82934 audit: 2026-05-30 full-codebase audit — report, issues, docs, runbooks
Full-codebase-audit 2026-05-30 outputs:
- Audit report: 09 - Audits/Full Codebase Audit - 2026-05-30.md
- 81 issue files ISSUE-055..135 (decisions + 1 skipped no-brainer).
- Scanner docs from scratch (was zero): architecture, data model, API ref, payment
  flow, operations runbook + repo README.
- Doc-sync updates across API reference, data models, flows, design system.
- Secret Rotation Runbook (08 - Operations) for the exposed credentials.
- Reusable workflow guide (07 - Development) + .claude/workflows/full-codebase-audit.js.

Issues remain status:open intentionally — the code fixes are uncommitted-then-committed
working-tree changes per repo and aren't "resolved" until merged/deployed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 18:48:04 +04:00

221 lines
7.8 KiB
Markdown

---
title: Scanner Operations
tags: [operations, scanner, deployment]
created: 2026-05-30
---
# Scanner Operations
Runbook for deploying, configuring, monitoring, and troubleshooting the AMN Pay Scanner microservice.
---
## 1. Configuration reference
All configuration via environment variables. See `.env.example` in the scanner repo.
| Variable | Default | Required | Description |
|---|---|---|---|
| `PORT` | `8080` | no | HTTP listen port |
| `DB_PATH` | `./scanner.db` | no | SQLite database path |
| `CHAINS_JSON_PATH` | `./supported-chains.json` | no | Supported chains config |
| `TOKENS_JSON_PATH` | `./tokens.json` | no | Token registry |
| `SCANNER_API_KEY` | _(none)_ | **yes (prod)** | Bearer token for all non-health endpoints. Generate with `openssl rand -hex 32` |
| `POLL_INTERVAL_SEC` | `15` | no | Chain poll interval in seconds |
| `INTENT_TTL_HOURS` | `24` | no | Pending/confirming intents older than this are expired (0 = disabled) |
| `WEBHOOK_RETRY_HOURS` | `6` | no | Interval between automatic webhook_failed re-delivery passes (0 = disabled) |
| `TRONGRID_API_KEY` | _(none)_ | recommended | TronGrid API key; without it rate limits are very low |
| `TONCENTER_API_KEY` | _(none)_ | recommended | TonCenter API key |
| `RPC_BSC` | _(chain config)_ | no | Override BSC RPC URL (chain 56) |
| `RPC_ARB` | _(chain config)_ | no | Override Arbitrum RPC URL (chain 42161) |
| `RPC_ETH` | _(chain config)_ | no | Override Ethereum RPC URL (chain 1) |
| `RPC_POLYGON` | _(chain config)_ | no | Override Polygon RPC URL (chain 137) |
| `RPC_BASE` | _(chain config)_ | no | Override Base RPC URL (chain 8453) |
> [!warning]
> If `SCANNER_API_KEY` is not set, the scanner logs a warning and accepts all requests. Never run this way in production.
---
## 2. Docker deployment
The scanner ships as a single Docker image. The Dockerfile uses a two-stage build (Go 1.25 builder → Alpine 3.21 runtime).
### Quick start (dev)
```bash
cd scanner/
cp .env.example .env
# edit .env — set SCANNER_API_KEY, RPC overrides, etc.
docker build -t amn-scanner:dev .
docker run -d \
--name amn-scanner \
-p 8080:8080 \
-v $(pwd)/data:/data \
--env-file .env \
amn-scanner:dev
```
### Production (via arcane-cli / Watchtower)
The scanner is deployed manually via `arcane-cli` (not gitops). Watchtower does NOT manage it automatically. After pushing a new image, redeploy with:
```bash
arcane-cli project redeploy --json <project-id>
```
The SQLite database is stored on a named Docker volume (`/data`). Do not recreate the volume between deploys — it holds the checkpoint and intent state.
---
## 3. Health check
```bash
curl http://localhost:8080/health
# {"status":"ok","time":"2026-05-30T12:00:00Z"}
```
Docker `HEALTHCHECK` is already configured in the Dockerfile (30 s interval, 5 s timeout, 3 retries).
---
## 4. Monitoring
### Scanner status endpoint
```bash
curl -H "Authorization: Bearer $SCANNER_API_KEY" \
http://localhost:8080/scanner/status | jq .
```
Check:
- `lag` — should be near 0 for healthy chains (blocks behind for EVM, seconds for TON)
- `pendingIntents` — number of unresolved intents per chain
- `lastScannedBlock` — should advance each poll
### Logs
The scanner uses Go's `log/slog` structured logger with level prefixes. Key log patterns:
| Pattern | Meaning |
|---|---|
| `[scanner] worker started` | Worker goroutine began for this chain |
| `[evm] intent confirming` | EVM tx seen, waiting for confirmations |
| `[evm] intent confirmed` | EVM: N confirmations reached |
| `[tron] MATCH` / `[ton] MATCH` | Transfer matched, going to confirmed |
| `[webhook] delivered` | Webhook POST succeeded |
| `[webhook] non-2xx response` | Backend returned error (will retry) |
| `[webhook] all retries exhausted` | Intent moved to webhook_failed |
| `[scanner] reconciling confirmed intents` | Startup crash recovery in progress |
| `[evm] scanner lag` | Chain lag > 100 blocks (investigate RPC) |
---
## 5. Adding / modifying chains
Edit `supported-chains.json`. Fields:
| Field | Notes |
|---|---|
| `chainId` | Numeric EIP-155 chain ID (arbitrary int for Tron/TON) |
| `chainType` | `"evm"` (default) / `"tron"` / `"ton"` |
| `rpcUrl` | Primary RPC endpoint |
| `publicRpcUrl` | Fallback RPC (EVM only) |
| `proxyAddress` | ERC20FeeProxy address (EVM); USDT contract (Tron); USDT Jetton master (TON) |
| `confirmationThreshold` | Blocks required (EVM); ignored for Tron/TON |
| `verified` | `true` to activate the worker; `false` to disable without deleting |
> [!important]
> Changing `proxyAddress` for an EVM chain only affects new scans. Existing pending intents will still be matched against the old address until they expire or are confirmed.
After editing, restart the scanner container to pick up the new config.
---
## 6. Adding tokens to the registry
Edit `tokens.json`. Each entry:
```json
{ "chainId": 56, "address": "0x...", "symbol": "USDC", "decimals": 18, "name": "USD Coin" }
```
Token registry is used only for populating `tokenSymbol` and `decimals` in the `checkoutBlock` response. Omitting a token does not break scanning — it just leaves those fields empty.
---
## 7. Manual webhook retry
Force immediate re-delivery of all `webhook_failed` intents:
```bash
curl -X POST -H "Authorization: Bearer $SCANNER_API_KEY" \
http://localhost:8080/admin/webhooks/retry
# {"queued": N}
```
---
## 8. Database inspection
The SQLite database (`/data/scanner.db`) can be inspected with the `sqlite3` CLI inside the container:
```bash
docker exec -it amn-scanner sqlite3 /data/scanner.db
# Check stuck intents
SELECT intent_id, chain_id, status, created_at, webhook_delivered_at
FROM intents
WHERE status NOT IN ('confirmed', 'expired')
ORDER BY created_at DESC;
# Check chain checkpoints
SELECT chain_id, last_scanned_block, updated_at FROM checkpoints;
# Count by status
SELECT status, count(*) FROM intents GROUP BY status;
```
---
## 9. Troubleshooting
### Intent stuck in `pending`
1. Check `/scanner/status` — is the chain worker running and advancing (`lag` > 0 for a long time = RPC issue)?
2. Check that `chainId` and `tokenAddress` match exactly what is in `supported-chains.json` and `tokens.json`.
3. For EVM: verify the `proxyAddress` matches the contract the buyer is calling.
4. For Tron: confirm the destination address is stored in EVM-hex (0x) format in the DB.
5. Check scanner logs for `REJECT` messages around the expected tx time.
### Webhook never received by backend
1. Check `webhook_delivered_at` in the DB — if not null, the scanner delivered successfully and the backend side is the issue.
2. If null and status is `webhook_failed`: check backend logs for the incoming POST; verify `X-AMN-Signature` validation code.
3. If status is `confirmed` but `webhook_delivered_at` is null: startup reconciliation may re-deliver on next restart.
4. Use `POST /admin/webhooks/retry` to trigger immediate retry.
### High lag on EVM chain
1. Check RPC endpoint availability and rate limits.
2. Consider setting a `RPC_*` env override to a premium RPC (Alchemy, Infura, QuickNode).
3. The scanner falls back to `publicRpcUrl` if the primary fails but public nodes have lower limits.
### Intent confirmed but amount looks wrong
The scanner accepts any amount **>=** `intent.Amount`. Overpayments are not flagged. Underpayments result in the intent staying pending until TTL expiry.
---
## 10. CI/CD notes
- Woodpecker CI pipeline is in `.woodpecker/`.
- Telegram notify steps were removed (no TG secrets configured).
- Deploy step was removed — the scanner is deployed manually via `arcane-cli`.
- The CI pipeline builds and pushes the Docker image to the Gitea registry.
- Image tag format: `dev-<VERSION>` (from the `VERSION` file).
> [!tip]
> After CI completes, verify the image is in the registry before redeploying. Silent CI failures can leave a stale image tagged. Check the registry tag timestamp, not just the CI green light.