Files
nick-doc/08 - Operations/Scanner Operations.md
Siavash Sameni dceaf82934 audit: 2026-05-30 full-codebase audit — report, issues, docs, runbooks
Full-codebase-audit 2026-05-30 outputs:
- Audit report: 09 - Audits/Full Codebase Audit - 2026-05-30.md
- 81 issue files ISSUE-055..135 (decisions + 1 skipped no-brainer).
- Scanner docs from scratch (was zero): architecture, data model, API ref, payment
  flow, operations runbook + repo README.
- Doc-sync updates across API reference, data models, flows, design system.
- Secret Rotation Runbook (08 - Operations) for the exposed credentials.
- Reusable workflow guide (07 - Development) + .claude/workflows/full-codebase-audit.js.

Issues remain status:open intentionally — the code fixes are uncommitted-then-committed
working-tree changes per repo and aren't "resolved" until merged/deployed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 18:48:04 +04:00

7.8 KiB

title, tags, created
title tags created
Scanner Operations
operations
scanner
deployment
2026-05-30

Scanner Operations

Runbook for deploying, configuring, monitoring, and troubleshooting the AMN Pay Scanner microservice.


1. Configuration reference

All configuration via environment variables. See .env.example in the scanner repo.

Variable Default Required Description
PORT 8080 no HTTP listen port
DB_PATH ./scanner.db no SQLite database path
CHAINS_JSON_PATH ./supported-chains.json no Supported chains config
TOKENS_JSON_PATH ./tokens.json no Token registry
SCANNER_API_KEY (none) yes (prod) Bearer token for all non-health endpoints. Generate with openssl rand -hex 32
POLL_INTERVAL_SEC 15 no Chain poll interval in seconds
INTENT_TTL_HOURS 24 no Pending/confirming intents older than this are expired (0 = disabled)
WEBHOOK_RETRY_HOURS 6 no Interval between automatic webhook_failed re-delivery passes (0 = disabled)
TRONGRID_API_KEY (none) recommended TronGrid API key; without it rate limits are very low
TONCENTER_API_KEY (none) recommended TonCenter API key
RPC_BSC (chain config) no Override BSC RPC URL (chain 56)
RPC_ARB (chain config) no Override Arbitrum RPC URL (chain 42161)
RPC_ETH (chain config) no Override Ethereum RPC URL (chain 1)
RPC_POLYGON (chain config) no Override Polygon RPC URL (chain 137)
RPC_BASE (chain config) no Override Base RPC URL (chain 8453)

Warning

If SCANNER_API_KEY is not set, the scanner logs a warning and accepts all requests. Never run this way in production.


2. Docker deployment

The scanner ships as a single Docker image. The Dockerfile uses a two-stage build (Go 1.25 builder → Alpine 3.21 runtime).

Quick start (dev)

cd scanner/
cp .env.example .env
# edit .env — set SCANNER_API_KEY, RPC overrides, etc.

docker build -t amn-scanner:dev .
docker run -d \
  --name amn-scanner \
  -p 8080:8080 \
  -v $(pwd)/data:/data \
  --env-file .env \
  amn-scanner:dev

Production (via arcane-cli / Watchtower)

The scanner is deployed manually via arcane-cli (not gitops). Watchtower does NOT manage it automatically. After pushing a new image, redeploy with:

arcane-cli project redeploy --json <project-id>

The SQLite database is stored on a named Docker volume (/data). Do not recreate the volume between deploys — it holds the checkpoint and intent state.


3. Health check

curl http://localhost:8080/health
# {"status":"ok","time":"2026-05-30T12:00:00Z"}

Docker HEALTHCHECK is already configured in the Dockerfile (30 s interval, 5 s timeout, 3 retries).


4. Monitoring

Scanner status endpoint

curl -H "Authorization: Bearer $SCANNER_API_KEY" \
     http://localhost:8080/scanner/status | jq .

Check:

  • lag — should be near 0 for healthy chains (blocks behind for EVM, seconds for TON)
  • pendingIntents — number of unresolved intents per chain
  • lastScannedBlock — should advance each poll

Logs

The scanner uses Go's log/slog structured logger with level prefixes. Key log patterns:

Pattern Meaning
[scanner] worker started Worker goroutine began for this chain
[evm] intent confirming EVM tx seen, waiting for confirmations
[evm] intent confirmed EVM: N confirmations reached
[tron] MATCH / [ton] MATCH Transfer matched, going to confirmed
[webhook] delivered Webhook POST succeeded
[webhook] non-2xx response Backend returned error (will retry)
[webhook] all retries exhausted Intent moved to webhook_failed
[scanner] reconciling confirmed intents Startup crash recovery in progress
[evm] scanner lag Chain lag > 100 blocks (investigate RPC)

5. Adding / modifying chains

Edit supported-chains.json. Fields:

Field Notes
chainId Numeric EIP-155 chain ID (arbitrary int for Tron/TON)
chainType "evm" (default) / "tron" / "ton"
rpcUrl Primary RPC endpoint
publicRpcUrl Fallback RPC (EVM only)
proxyAddress ERC20FeeProxy address (EVM); USDT contract (Tron); USDT Jetton master (TON)
confirmationThreshold Blocks required (EVM); ignored for Tron/TON
verified true to activate the worker; false to disable without deleting

Important

Changing proxyAddress for an EVM chain only affects new scans. Existing pending intents will still be matched against the old address until they expire or are confirmed.

After editing, restart the scanner container to pick up the new config.


6. Adding tokens to the registry

Edit tokens.json. Each entry:

{ "chainId": 56, "address": "0x...", "symbol": "USDC", "decimals": 18, "name": "USD Coin" }

Token registry is used only for populating tokenSymbol and decimals in the checkoutBlock response. Omitting a token does not break scanning — it just leaves those fields empty.


7. Manual webhook retry

Force immediate re-delivery of all webhook_failed intents:

curl -X POST -H "Authorization: Bearer $SCANNER_API_KEY" \
     http://localhost:8080/admin/webhooks/retry
# {"queued": N}

8. Database inspection

The SQLite database (/data/scanner.db) can be inspected with the sqlite3 CLI inside the container:

docker exec -it amn-scanner sqlite3 /data/scanner.db

# Check stuck intents
SELECT intent_id, chain_id, status, created_at, webhook_delivered_at
FROM intents
WHERE status NOT IN ('confirmed', 'expired')
ORDER BY created_at DESC;

# Check chain checkpoints
SELECT chain_id, last_scanned_block, updated_at FROM checkpoints;

# Count by status
SELECT status, count(*) FROM intents GROUP BY status;

9. Troubleshooting

Intent stuck in pending

  1. Check /scanner/status — is the chain worker running and advancing (lag > 0 for a long time = RPC issue)?
  2. Check that chainId and tokenAddress match exactly what is in supported-chains.json and tokens.json.
  3. For EVM: verify the proxyAddress matches the contract the buyer is calling.
  4. For Tron: confirm the destination address is stored in EVM-hex (0x) format in the DB.
  5. Check scanner logs for REJECT messages around the expected tx time.

Webhook never received by backend

  1. Check webhook_delivered_at in the DB — if not null, the scanner delivered successfully and the backend side is the issue.
  2. If null and status is webhook_failed: check backend logs for the incoming POST; verify X-AMN-Signature validation code.
  3. If status is confirmed but webhook_delivered_at is null: startup reconciliation may re-deliver on next restart.
  4. Use POST /admin/webhooks/retry to trigger immediate retry.

High lag on EVM chain

  1. Check RPC endpoint availability and rate limits.
  2. Consider setting a RPC_* env override to a premium RPC (Alchemy, Infura, QuickNode).
  3. The scanner falls back to publicRpcUrl if the primary fails but public nodes have lower limits.

Intent confirmed but amount looks wrong

The scanner accepts any amount >= intent.Amount. Overpayments are not flagged. Underpayments result in the intent staying pending until TTL expiry.


10. CI/CD notes

  • Woodpecker CI pipeline is in .woodpecker/.
  • Telegram notify steps were removed (no TG secrets configured).
  • Deploy step was removed — the scanner is deployed manually via arcane-cli.
  • The CI pipeline builds and pushes the Docker image to the Gitea registry.
  • Image tag format: dev-<VERSION> (from the VERSION file).

Tip

After CI completes, verify the image is in the registry before redeploying. Silent CI failures can leave a stale image tagged. Check the registry tag timestamp, not just the CI green light.