ADR-0001 — GCP-native architecture topology
ADR-0001 — GCP-native architecture topology
Status: draft Date: 2026-05-18 Owner: tech-architect Supersedes: none Related canon: D-008 (stack), D-009 (residency)
Context
D-008 ratified GCP as the backend platform. D-009 requires all data resident in europe-central2-warsaw with strict GDPR posture. The product profile (D-007) is hardcore, full social, real-time co-op + chat, restrictive anti-cheat. This ADR locks the component shape, region pinning, and three scaling bands so that backend-engineer (phase 8b) and ops planning have a deterministic target.
Decision — component topology
Mobile (iOS Swift / Android Kotlin) | | HTTPS + WebSocket (mTLS optional) vCloud Load Balancer (global anycast, region pinning policy = EU) | +----> Cloud Armor (DDoS + WAF + bot rules) | vCloud Run service: walkrpg-api (NestJS 11) europe-central2 | \ | \--> Cloud Run service: walkrpg-realtime (WS gateway) europe-central2 | (separate service so REST scales independently of WS connections) | vCloud SQL for Postgres 16 (db-custom, HA optional) europe-central2-warsaw | vMemorystore Redis (streak cache, presence, rate limit) europe-central2
Cloud KMS (session JWT signing key) europe-central2Secret Manager (Firebase Admin creds, DB password) europe-central2Cloud Storage (asset bucket, GDPR export bucket) europe-central2 (CMEK with KMS key)Cloud Logging + Cloud Monitoring log sink configured to EU bucketCloud Scheduler (cron triggers) europe-central2Cloud Tasks (deferred reconciliation, GDPR delete) europe-central2
Firebase (federation broker — see ADR-0005) pass-through, no PII at rest in USFirebase App Check (Play Integrity / DeviceCheck) pass-through verificationAll compute and storage that touches user PII or game state lives in europe-central2-warsaw. The single exception is the Firebase Auth federation broker (US-based, pass-through, holds only opaque sub claims) — covered in ADR-0005.
Service breakdown
| Service | Purpose | Why separate |
|---|---|---|
walkrpg-api | REST + Swagger + auth callbacks + step ingest + tree state + GDPR export | Stateless, autoscales on CPU. Should not be tied to long-lived WS connections. |
walkrpg-realtime | WebSocket gateway: guild chat, co-op session presence, regional event broadcasts | Long-lived TCP. Different scaling rules (per-connection, not per-request). Targets Cloud Run min-instances=1 for low ping. |
walkrpg-jobs (lightweight) | Cron + Cloud Tasks consumer: reconciliation passes, streak decay sweep, GDPR delete pipeline, anonymization | Isolated so ad-hoc long jobs don’t block API workers. |
Phase 8b ships walkrpg-api only. walkrpg-realtime and walkrpg-jobs land in phase 11 (closed beta vertical slice).
Region pinning policy
- Cloud SQL:
europe-central2-warsaw(multi-zone HA disabled for closed beta, enabled at 1k DAU). - Cloud Run services:
europe-central2, min-instances tuned per band. - Redis (Memorystore):
europe-central2, single-region. Replication enabled at 10k DAU band. - KMS, Secret Manager, Storage, Logging: all pinned to
europe-central2/eubucket location. - Load Balancer: Global anycast IP, but routing policy restricts forwarding to EU backends so a non-EU edge node still terminates traffic at the Warsaw service. Acceptable 100-300ms RTT for non-EU hardcore players (acknowledged in D-009 R3 synthesis tension #2).
Three-band cost estimate
Estimates are monthly USD, rounded, list price (no committed-use discount). All include Cloud SQL backups and 7-day log retention. Excludes mobile app store fees.
Amendment (2026-05-18) — Band Zero added per ADR-0006. Band Zero is the test-phase only band (~20 self-testers on CEO laptop + Cloudflare Tunnel). It is NOT a production target — none of the GCP components described elsewhere in this ADR are provisioned at Band Zero. Bands A / B / C below remain the production-target scaling horizon and unfreeze when the production migration is greenlit.
Band Zero — test phase (~20 self-testers, local + Cloudflare Tunnel)
| Component | Spec | Cost |
|---|---|---|
| Compute | Local Docker Compose on CEO laptop (NestJS + Postgres) | $0 |
| Database | Local Postgres 16 container, no managed backup | $0 |
| External reach | Cloudflare Tunnel free tier | $0 |
| Auth | Mock JWT signed by NestJS-local key (no Firebase) | $0 |
| Attestation | Disabled (no App Check) | $0 |
| Logging / monitoring | stdout + local file rotation | $0 |
| Band Zero total | $0/month + CEO time + electricity |
Test scope: register, ingest steps, complete Quest 001, allocate 4 points, unlock keystone, read profile + tree state. Multiplayer / chat / co-op / reconciliation / GDPR pipeline / Layer 3 anti-cheat — all out of scope at Band Zero. See ADR-0006 for the full delta and migration plan to Band A.
Band A — launch (100 DAU)
| Component | Spec | Cost |
|---|---|---|
| Cloud SQL Postgres | db-custom-1-3840 (1 vCPU / 3.75 GiB), 20 GB SSD, daily backup | $55 |
Cloud Run (walkrpg-api) | min-instances=0, ~5M requests/mo, 1 vCPU / 512 MiB | $15 |
| Memorystore Redis | BASIC 1 GB | $40 |
| Cloud KMS | 1 key, ~50k operations | $1 |
| Secret Manager | 3 secrets, low-frequency access | $1 |
| Cloud Storage | 5 GB assets + GDPR exports | $1 |
| Cloud Logging | 5 GB retention | $3 |
| Cloud Armor | per-policy + per-request | $7 |
| Cloud Load Balancer | global, low traffic | $20 |
| Band A total | ~$143/month |
Band B — validation (1k DAU)
| Component | Spec | Cost |
|---|---|---|
| Cloud SQL Postgres | db-custom-2-7680 HA enabled, 50 GB SSD | $230 |
Cloud Run (walkrpg-api) | min-instances=1, ~50M req/mo | $80 |
Cloud Run (walkrpg-realtime) | min-instances=1, ~1k concurrent WS | $40 |
| Memorystore Redis | STANDARD_HA 2 GB | $120 |
| Cloud KMS | ~500k operations | $3 |
| Secret Manager | unchanged | $1 |
| Cloud Storage | 50 GB | $3 |
| Cloud Logging + Monitoring | 30 GB retention | $15 |
| Cloud Armor | scaled rules | $20 |
| Cloud Load Balancer | scaled | $50 |
| Firebase App Check | included free up to 10M/mo | $0 |
| Band B total | ~$562/month |
Band C — early growth (10k DAU)
| Component | Spec | Cost |
|---|---|---|
| Cloud SQL Postgres | db-custom-4-15360 HA + read replica, 200 GB SSD | $850 |
Cloud Run (walkrpg-api) | min-instances=3, ~500M req/mo | $400 |
Cloud Run (walkrpg-realtime) | min-instances=3, ~10k concurrent WS | $300 |
Cloud Run (walkrpg-jobs) | min-instances=1 | $30 |
| Memorystore Redis | STANDARD_HA 5 GB | $260 |
| Cloud KMS | ~5M operations | $25 |
| Secret Manager | unchanged | $1 |
| Cloud Storage | 500 GB | $20 |
| Cloud Logging + Monitoring | 100 GB retention | $50 |
| Cloud Armor | hardened rules | $70 |
| Cloud Load Balancer | scaled | $120 |
| Cloud CDN (asset bucket) | 1 TB egress | $80 |
| Anti-cheat ops (Play Integrity / DeviceCheck verification quota) | per D-007 $10k/year band | $830 |
| Band C total | ~$3,036/month |
Band C aligns with the D-007 anti-cheat ops note ($10k/year) and gives headroom for early closed beta scale. Crossing 10k DAU is the trigger to move from db-custom-4 to a regional Cloud Spanner evaluation, but Spanner is out of scope until post-launch.
Scaling triggers + autoscaling defaults
- Cloud Run autoscaling: target CPU 60%, max-instances 50 (Band A) / 200 (B) / 1000 (C).
- Cloud SQL: monitor
database_cpu_utilization>70% sustained → upgrade tier. - Redis: monitor
memory_usage>75% → upgrade tier. - Postgres connection pool: PgBouncer sidecar deployed at Band B, target pool size 100.
Backup + DR
- Cloud SQL daily snapshots, 7d retention at Band A / 30d at Band B+.
- Point-in-time recovery (PITR) enabled at all bands.
- KMS key import/export disabled. Disaster: KMS keys are versioned but not exportable, regional outage of europe-central2 = downtime, acceptable for closed-beta SLA. Multi-region KMS evaluated at Band C+.
Open follow-ups
- WebSocket service: Cloud Run vs GKE Autopilot benchmark needed at Band B before live co-op rolls out. Tracked in phase 11 brief.
- CDN: enabled only at Band C. Closed beta serves assets directly from
walkrpg-api.
Consequences
- Cost discipline at launch holds the bill under $200/month.
- All compute pinned to Warsaw means non-EU players ship traffic through transit ISPs to EU. Latency tension acknowledged.
- Two-Cloud-Run-service split (api + realtime) is a Path B-friendly shape: scaling REST workers does not churn long-lived WS connections.