Phase 14 — VPS provisioning runbook

Paired with ADR-0009 — VPS migration and ADR-0010 — branch strategy + envs. Sequential CLI commands; copy-paste in order. CEO runs sections A through H. Orchestrator pairs on section I (CI/CD) and J (mobile reconfig). Sections K (day-2 ops), L (branch migration), and M (hotfix workflow) are reference material added by ADR-0010.

Time estimate: ~90 minutes CEO time end-to-end if domain is already registered. Add ~30 minutes if registering domain first.

Required before starting:

Hetzner Cloud account (or signup → KYC pass)
Cloudflare account
Domain registered or ready to register at Cloudflare Registrar
Local ~/.ssh/id_ed25519 keypair (generate with ssh-keygen -t ed25519 -C "your@email" if missing)
GitLab access to gitlab.com:positive-walkers/walkrpg

Placeholders used throughout:

<VPS_IP> — Hetzner-assigned IPv4 (from §A step 4)
<root> — CEO’s domain root (e.g., morris.example)
<deploy-pubkey> — output of cat ~/.ssh/id_ed25519.pub on CEO laptop

A — Hetzner provisioning (~10 min)

Open https://console.hetzner.cloud/. If no account: sign up, complete KYC (Hetzner requires ID verification for new accounts; can take up to 24h on weekends — plan accordingly).

A2. Create project (or reuse existing)

Hetzner organizes resources by project. Create a walkrpg-prod project if none exists. Open it.

A3. Add SSH key to project

Console → Security → SSH Keys → “Add SSH Key”. Paste contents of cat ~/.ssh/id_ed25519.pub from CEO laptop. Name it ceo-laptop-ed25519.

A4. Create server

Console → Servers → “Add Server”:

Field	Value
Location	Nuremberg (NUR) or Falkenstein (FSN1) — pick whichever has lower latency from CEO location (both DE/EU, GDPR-compliant)
Image	Ubuntu 24.04
Type	CX22 (Shared vCPU x2 ARM, 4GB RAM, 40GB NVMe, 20TB egress) — €5.18/mo + VAT
Networking	IPv4 + IPv6 (default; keep both)
SSH keys	Check the `ceo-laptop-ed25519` key from A3
Volumes	None
Firewalls	None (we use ufw on the host instead — Hetzner-side firewall optional, skip for now)
Backups	OFF (we use `pg_dump` + 7-day local rotation per ADR-0009 §9; Hetzner backup feature is +20% surcharge, defer to ops follow-up)
Placement groups	None
Labels	`phase=14`, `env=prod`
Cloud config	Leave blank
Name	`walkrpg-api-1`

Click “Create & Buy now”.

A5. Note the IPv4

Once provisioning completes (~30s), copy the assigned IPv4 from the server detail page. Record as <VPS_IP>.

A6. First connectivity test

From CEO laptop:

ssh root@<VPS_IP>

Should connect without password prompt. Type exit to disconnect.

If it prompts for password or fails: check SSH key was attached at creation (Hetzner does not let you add SSH keys to an already-created server without console-level recovery). Easiest fix: destroy + recreate.

B — SSH hardening + base user (~10 min)

B1. SSH back in as root

ssh root@<VPS_IP>

B2. Create `deploy` user

adduser --disabled-password --gecos "" deploy
usermod -aG sudo deploy

B3. Allow `deploy` passwordless sudo (provisioning only — tighten later if desired)

echo "deploy ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/deploy
chmod 440 /etc/sudoers.d/deploy

B4. Copy authorized_keys to `deploy`

mkdir -p /home/deploy/.ssh
cp /root/.ssh/authorized_keys /home/deploy/.ssh/authorized_keys
chown -R deploy:deploy /home/deploy/.ssh
chmod 700 /home/deploy/.ssh
chmod 600 /home/deploy/.ssh/authorized_keys

B5. Test `deploy` login (from CEO laptop, separate terminal)

ssh deploy@<VPS_IP>

Should connect. Do NOT close the root session yet — keep it open in case the next step breaks SSH.

B6. Harden sshd config (still as root)

sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sed -i 's/^#*PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sed -i 's/^#*PubkeyAuthentication.*/PubkeyAuthentication yes/' /etc/ssh/sshd_config
sed -i 's/^#*ChallengeResponseAuthentication.*/ChallengeResponseAuthentication no/' /etc/ssh/sshd_config
sed -i 's/^#*X11Forwarding.*/X11Forwarding no/' /etc/ssh/sshd_config

Append client-alive directives:

cat >> /etc/ssh/sshd_config <<EOF

# WalkRPG Phase 14 hardening
ClientAliveInterval 300
ClientAliveCountMax 2
EOF

B7. Restart sshd

systemctl restart sshd

B8. Verify `deploy` still works (from CEO laptop, NEW terminal — keep the root session open)

ssh deploy@<VPS_IP>

If this works: root login is now disabled, password auth is disabled, key-only auth works. Close the root session.

If this fails: do NOT close the root session. Diagnose and fix from there.

B9. Install ufw + fail2ban (as `deploy` via sudo)

sudo apt update
sudo apt install -y ufw fail2ban

B10. Configure ufw

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw --force enable
sudo ufw status verbose

Expected output: Status: active, three allow rules visible.

B11. Confirm fail2ban is running

sudo systemctl status fail2ban

Should show active (running). Default sshd jail is enabled out of the box on Ubuntu 24.04.

C — Docker + Docker Compose install (~5 min)

C1. Install Docker via the official convenience script (as `deploy`)

curl -fsSL https://get.docker.com -o /tmp/get-docker.sh
sudo sh /tmp/get-docker.sh
rm /tmp/get-docker.sh

C2. Add `deploy` to the `docker` group

sudo usermod -aG docker deploy

exit
ssh deploy@<VPS_IP>

C4. Verify Docker + Docker Compose

docker --version
docker compose version

Both should print versions. Docker Compose v2 ships as a Docker plugin since recent versions — docker compose (no hyphen) is the canonical invocation.

C5. Smoke test

docker run --rm hello-world

Should print “Hello from Docker!”. If not: investigate before proceeding.

D — Cloudflare DNS + Registrar setup (CEO does on Cloudflare side, ~10 min)

This section is web UI, not CLI. Follow each step in the Cloudflare dashboard.

The VPS hosts the WalkRPG backend (prod + staging per ADR-0010), the personal portfolio site (per ADR-0009 §3.1 portfolio coexistence amendment), AND the WalkRPG wiki (per ADR-0009 §3.5 wiki coexistence amendment). Six hostnames share the single Hetzner CX22 + nginx + Let’s Encrypt SAN cert:

Hostname	Routes to	Purpose
`morrisassert.dev`	`web:3000`	Next.js portfolio (apex)
`www.morrisassert.dev`	`web:3000`	Next.js portfolio (www alias)
`api.walkrpg.morrisassert.dev`	`api:3000`	NestJS backend, prod env (main branch)
`api-staging.walkrpg.morrisassert.dev`	`api-staging:3000`	NestJS backend, staging env (dev branch), ADR-0010 §4
`walkrpg.morrisassert.dev`	nginx 503	Reserved for future public WalkRPG frontend
`wiki.morrisassert.dev`	static (volume)	Astro Starlight wiki, gated by Cloudflare Access (§D2)

D1. Confirm domain at Cloudflare Registrar

If the domain is not yet registered or not yet at Cloudflare Registrar:

Cloudflare Dashboard → Registrar → Register a domain (or transfer-in an existing one).
Choose a .com or similar; Cloudflare Registrar is at-cost (~$10-12/year for .com).

For the canonical configuration the registrar’s domain is morrisassert.dev (.dev is a Google TLD, ~$13/year, mandatory-HTTPS via the Chrome HSTS preload list — strictly upgrades security posture).

If the domain is already in your Cloudflare account, skip ahead.

D2. Open the zone

Cloudflare Dashboard → choose the domain (morrisassert.dev).

D3. Add the six A records

DNS tab → Add record. Repeat for each row below:

Type	Name	IPv4 address	Proxy status	TTL
A	`@` (or `morrisassert.dev`)	`<VPS_IP>`	Proxied (orange cloud)	Auto
A	`www`	`<VPS_IP>`	Proxied (orange cloud)	Auto
A	`api.walkrpg`	`<VPS_IP>`	Proxied (orange cloud)	Auto
A	`api-staging.walkrpg`	`<VPS_IP>`	Proxied (orange cloud)	Auto
A	`walkrpg`	`<VPS_IP>`	Proxied (orange cloud)	Auto
A	`wiki`	`<VPS_IP>`	Proxied (orange cloud)	Auto

Save each record. All six point at the same VPS IPv4; nginx differentiates by Host header / server_name. The Cloudflare orange cloud is mandatory for the wiki record because Cloudflare Access (configured in §D2 below) operates at edge — bypassing the proxy bypasses the auth gate.

D4. Set SSL/TLS mode to Full (strict)

SSL/TLS tab → Overview → Encryption mode → Select Full (strict).

This is critical. Not Flexible, not Full — Full (strict). ADR-0009 §5 explains why.

D5. Always Use HTTPS

SSL/TLS tab → Edge Certificates → Always Use HTTPS → ON.

D6. HSTS (optional, recommended)

SSL/TLS tab → Edge Certificates → HTTP Strict Transport Security (HSTS) → Enable. Max age: 6 months. Include subdomains: OFF (only the API subdomain is on Cloudflare; not safe to include all subdomains). Preload: OFF.

D7. Universal SSL certificate

SSL/TLS tab → Edge Certificates → Universal SSL → ensure it’s enabled (default). This handles the browser-facing cert. Cloudflare provisions automatically, ~15 minutes.

D8. Verify DNS propagation

From CEO laptop (or any external machine):

dig morrisassert.dev +short
dig www.morrisassert.dev +short
dig api.walkrpg.morrisassert.dev +short
dig api-staging.walkrpg.morrisassert.dev +short
dig walkrpg.morrisassert.dev +short
dig wiki.morrisassert.dev +short

All six should return Cloudflare IPs (104.x.x.x or 172.x.x.x range), not your <VPS_IP>. That confirms the proxy is active for every hostname. Allow 1-2 minutes after creating the A records.

D2 — Cloudflare Access setup for wiki + Swagger (~10 min, web UI)

The wiki hostname (wiki.morrisassert.dev) is gated by Cloudflare Access (Zero Trust free tier, ≤50 users). The Swagger UI on the API hostname (path /api/docs/*) is gated by the same mechanism when re-enabled. Both Applications run on the free tier and require no additional Cloudflare paid plan.

CF Access intercepts every request at edge, redirects unauthenticated users to its identity-broker flow, then forwards authenticated requests to the origin with a Cf-Access-Jwt-Assertion header. nginx trusts this header implicitly today; defense-in-depth JWT validation against the per-application JWKS endpoint is a follow-up (see ADR-0009 §3.5 wiki coexistence amendment for the follow-up tracking).

D2.1 Enable Zero Trust on the Cloudflare account

Cloudflare Dashboard → sidebar → Zero Trust.

If the team has never been onboarded: Cloudflare prompts you to pick a team subdomain (e.g., walkrpg.cloudflareaccess.com) and plan. Select the Free plan — covers up to 50 users, all the features Phase 14 needs, no card on file required (Cloudflare may still ask for a card for plan-tier compliance; the free tier is genuinely $0).

Pick a team name; the team subdomain becomes <team>.cloudflareaccess.com (this is the identity broker hostname).

D2.2 Create the wiki Access Application

Zero Trust → Access → Applications → Add an application → Self-hosted.

Field	Value
Application name	`walkrpg-wiki`
Session duration	`24 hours`
Application domain	Subdomain `wiki`, Domain `morrisassert.dev` (the form splits these into two selectors)
Application path	(leave blank — gates the whole host)
App launcher visibility	OFF (no need for the CF App Launcher UI for a single-tenant tool)

Click Next.

D2.3 Choose identity providers

Identity providers screen → at minimum select One-time PIN (CF emails a 6-digit code to the address on the allowlist). Optional fast-follows:

Google (free, requires OAuth client setup at console.cloud.google.com — defer to follow-up)
GitHub (free, requires OAuth app at github.com/settings/developers — defer to follow-up)

One-time PIN alone is fine for the closed cohort.

Click Next.

D2.4 Create the access policy

Policies screen → Add a policy.

Field	Value
Policy name	`morris-only`
Action	Allow
Session duration	(inherit application — 24 hours)
Configure rules → Include → Selector	Emails
Configure rules → Include → Value	`<CEO_EMAIL>` (the email address used for cohort access)

Save policy.

Future tester onboarding: edit this policy’s Emails selector to include additional addresses, comma-separated. No code changes required.

Click Next, then Add application.

D2.5 Create the Swagger Access Application (path-gated)

Repeat D2.2-D2.4 with these differences:

Field	Value
Application name	`walkrpg-swagger`
Application domain	Subdomain `api.walkrpg`, Domain `morrisassert.dev`
Application path	`/api/docs/*` (path-gated — ONLY this prefix requires auth; the rest of the API stays open per the mock-auth JWT model from ADR-0006)
Identity providers	Same — One-time PIN minimum
Policy	Same `morris-only` shape, same Email allowlist

This Application only activates when SWAGGER_ENABLED=true is set on the API (currently false in prod per ADR-0009 §13.1). Configuring the Access policy now means the gate is live the moment Swagger is re-enabled; no scramble at the time of re-enable.

D2.6 Verify the wiki gate

From CEO laptop (in a fresh browser session, NOT logged in to CF):

https://wiki.morrisassert.dev/

Expected: CF Access splash page asks for the email. Enter the CEO email → CF emails the 6-digit PIN → enter the PIN → wiki loads.

If the CF Access splash does NOT appear and the wiki loads directly: the Application is either misconfigured or the orange cloud is off for the wiki A record. Re-check D3 + D2.2.

If the wiki returns 502 or 404 after auth: the wiki-builder container has not populated the wiki-static volume yet. Check docker compose ps on the VPS — walkrpg-wiki-builder should be running with the latest image SHA.

D2.7 Adding a new tester later

CF dashboard → Zero Trust → Access → Applications → click walkrpg-wiki → Policies tab → click morris-only → Configure rules → Include → Emails → add the new address → Save.

No origin restart, no DNS change, no cert reissuance. Takes effect within ~30s.

E — Repo deploy keys + secrets (~10 min)

E1. Generate deploy key on VPS (as `deploy`)

ssh-keygen -t ed25519 -C "walkrpg-vps-deploy" -f ~/.ssh/walkrpg_deploy -N ""
cat ~/.ssh/walkrpg_deploy.pub

Copy the public key output.

E2. Add deploy key to GitLab project

In browser: GitLab → positive-walkers/walkrpg → Settings → Repository → Deploy Keys → Add new key.

Field	Value
Title	`walkrpg-vps deploy key`
Key	(paste public key from E1)
Grant write permissions	OFF (read-only sufficient — CI pushes images, VPS only pulls source)

Add key.

E3. Configure git on VPS to use the deploy key

cat >> ~/.ssh/config <<EOF

Host gitlab.com
  HostName gitlab.com
  User git
  IdentityFile ~/.ssh/walkrpg_deploy
  IdentitiesOnly yes
EOF
chmod 600 ~/.ssh/config

E4. Accept GitLab’s host key

ssh -T [email protected]

Type yes when prompted. Expected response: Welcome to GitLab, @deploy-key-name! (or similar). Connection then closes; that’s normal — GitLab does not allow shell sessions.

E5. Clone the repo

cd /home/deploy
git clone [email protected]:positive-walkers/walkrpg.git
cd walkrpg
git status

Should show On branch master, working tree clean.

E6. Create `.env` file

The .env lives at /home/deploy/walkrpg/backend/.env. It is NOT committed to git.

Generate strong secrets first:

echo "JWT_SECRET=$(openssl rand -hex 32)"
echo "POSTGRES_PASSWORD=$(openssl rand -base64 24 | tr -d '/+=')"

Copy both values. Now create the .env:

mkdir -p /home/deploy/walkrpg/backend
cat > /home/deploy/walkrpg/backend/.env <<EOF
NODE_ENV=production
PORT=3000

# Postgres — service name 'db' resolves on docker network
POSTGRES_USER=walkrpg
POSTGRES_PASSWORD=<paste-from-above>
POSTGRES_DB=walkrpg
DATABASE_URL=postgresql://walkrpg:<paste-from-above>@db:5432/walkrpg?schema=public

# JWT (ADR-0006 mock-auth posture)
JWT_SECRET=<paste-from-above>
JWT_ISSUER=walkrpg-api-prod
JWT_AUDIENCE=walkrpg-mobile
AUTH_MODE=mock

# CORS — adjust if the mobile build needs additional origins
CORS_ALLOWED_ORIGINS=https://api.walkrpg.<root>,https://walkrpg.<root>

# Swagger gating (ADR-0009 §13.1) — OFF in prod
SWAGGER_ENABLED=false
EOF

chmod 600 /home/deploy/walkrpg/backend/.env

Edit the file with nano or vi to paste the actual values in the <paste-from-above> slots. Each <paste-from-above> appears twice (POSTGRES_PASSWORD and inside DATABASE_URL); both must match.

E7. Verify `.env` mode

ls -l /home/deploy/walkrpg/backend/.env

Expected: -rw------- 1 deploy deploy .... If group/world readable: chmod 600 /home/deploy/walkrpg/backend/.env.

F — Docker compose stack (~10 min)

Prerequisite: the implementation files referenced below must exist in the repo. They ship as a separate paired implementation session AFTER the CEO confirms the domain is registered and ready (see “Implementation handoff” in ADR-0009 §17). The runbook references the canonical filenames so the paired session has zero ambiguity:

File	Purpose
`backend/Dockerfile`	node:22-alpine base, pnpm install, copy `data/` workspace, CMD `tsx src/main.ts` (ADR-0009 §13.2 explains why tsx in prod)
`backend/docker-compose.prod.yml`	api + db + nginx + certbot + web + wiki-builder services per ADR-0009 §3.1
`backend/docker-compose.staging.yml`	api-staging + db-staging services per ADR-0010 §4 — added at ADR-0010 ratification
`backend/nginx/walkrpg.conf`	reverse proxy 443→api:3000, 443→api-staging:3000 (ADR-0010), LE cert paths, HSTS, HTTP→HTTPS redirect
`backend/scripts/backup-postgres.sh`	daily `pg_dump` script invoked by cron per ADR-0009 §9
`backend/.env.prod.example`	env template for the prod stack
`backend/.env.staging.example`	env template for the staging stack — added at ADR-0010 ratification

If those files do not yet exist when you reach this section, STOP and surface to the orchestrator. The orchestrator runs the paired implementation session, commits the files to main, and resumes the runbook from F1 below.

Bring-up order matters. docker-compose.prod.yml references the staging stack’s network as external: true. Bring staging UP FIRST (it owns the network’s lifecycle), THEN prod. Tear down in reverse order. The two .env files (.env.prod, .env.staging) must both be populated on the VPS at /home/deploy/walkrpg/backend/ with mode 600 before bring-up.

F1. Pull the latest source on the VPS

cd /home/deploy/walkrpg
git pull origin master

F2. First-time bootstrap — bring up the staging network owner, then prod db

The staging compose file owns the walkrpg-net-staging docker network (the prod compose references it as external: true). Bring up the staging db FIRST so the network exists when prod starts:

cd /home/deploy/walkrpg
docker compose -f backend/docker-compose.staging.yml up -d db-staging
docker compose -f backend/docker-compose.prod.yml up -d db

Wait for both DBs to be healthy:

docker compose -f backend/docker-compose.prod.yml ps
docker compose -f backend/docker-compose.staging.yml ps

Status should be running (healthy) for both db (prod) and db-staging. If starting, wait 10s and re-check.

F3. Run Prisma migrations against both DBs

docker compose -f backend/docker-compose.prod.yml run --rm api prisma migrate deploy
docker compose -f backend/docker-compose.staging.yml run --rm api-staging prisma migrate deploy

Expected output for each: All migrations have been successfully applied. or No pending migrations to apply. Exit code 0.

If exit code non-zero: check docker compose logs db (or db-staging) for connection issues, verify the matching .env file’s DATABASE_URL matches POSTGRES_PASSWORD exactly, retry.

F4. Start the rest of both stacks — temporarily without TLS

Before the LE cert exists, nginx cannot start in TLS mode. The bootstrap profile in docker-compose.prod.yml (the nginx-bootstrap service, profile bootstrap) serves plaintext :80 for ACME validation and smoke tests across all six hostnames.

# Bring up the staging stack first (full — api-staging + db-staging),
# then the prod stack with the bootstrap nginx profile + api + web +
# wiki-builder.
docker compose -f backend/docker-compose.staging.yml up -d
docker compose -f backend/docker-compose.prod.yml --profile bootstrap up -d \
  nginx-bootstrap api web wiki-builder

nginx-bootstrap joins both walkrpg-net and walkrpg-net-staging so the ACME HTTP-01 challenge resolves for every hostname including api-staging.walkrpg.morrisassert.dev.

F5. Verify api is reachable over plain HTTP

From the VPS:

curl -i http://localhost:3000/

Expected: NestJS root response (likely 404 with JSON error envelope — that’s fine, it means the app is running).

From CEO laptop:

curl -i http://api.walkrpg.<root>/

Expected: response makes it through Cloudflare and back. If Cloudflare strips HTTP (per D5), you may see a redirect to HTTPS — that’s expected and means the next step (LE cert) is needed.

G — Let’s Encrypt cert (~5 min)

The cert is a multi-domain SAN cert covering all six hostnames on this VPS (portfolio apex + www + prod api subdomain + staging api subdomain + reserved walkrpg subdomain + wiki subdomain). LE issues one fullchain that all six nginx server blocks reference. Renewal is shared. Standard LE limit is 100 SAN entries per cert — six is far below.

The cert lands under the first -d argument’s directory. With morrisassert.dev listed first, the live path is /etc/letsencrypt/live/morrisassert.dev/fullchain.pem (and privkey.pem). The walkrpg.conf nginx config matches this path.

G1. Run certbot against the LE staging server (verify the flow)

docker compose -f backend/docker-compose.prod.yml run --rm certbot \
  certonly --webroot --webroot-path=/var/www/certbot \
  --staging \
  --email <ceo-email> \
  --agree-tos \
  --no-eff-email \
  -d morrisassert.dev \
  -d www.morrisassert.dev \
  -d api.walkrpg.morrisassert.dev \
  -d api-staging.walkrpg.morrisassert.dev \
  -d walkrpg.morrisassert.dev \
  -d wiki.morrisassert.dev

Expected: Successfully received certificate. (staging LE-server cert is untrusted but proves the SAN flow works for all six hostnames; note “staging” here means the LE staging acme directory, not the WalkRPG staging env).

If failure: most common cause is the HTTP-01 challenge not reaching the webroot for one of the hostnames. Check:

Cloudflare proxy is on (orange cloud) for every A record.
nginx is serving /.well-known/acme-challenge/ from the webroot volume for every server_name.
ufw allows port 80.

If only one hostname fails: certbot lists exactly which -d flag could not be challenged. Re-check that A record’s proxy + DNS propagation.

G2. Delete the staging cert before requesting production

docker compose -f backend/docker-compose.prod.yml run --rm certbot \
  delete --cert-name morrisassert.dev

G3. Run certbot against the production LE server

docker compose -f backend/docker-compose.prod.yml run --rm certbot \
  certonly --webroot --webroot-path=/var/www/certbot \
  --email <ceo-email> \
  --agree-tos \
  --no-eff-email \
  -d morrisassert.dev \
  -d www.morrisassert.dev \
  -d api.walkrpg.morrisassert.dev \
  -d api-staging.walkrpg.morrisassert.dev \
  -d walkrpg.morrisassert.dev \
  -d wiki.morrisassert.dev

Expected: Successfully received certificate. Cert lands at /etc/letsencrypt/live/morrisassert.dev/fullchain.pem + privkey.pem. The fullchain SAN list contains all six hostnames; openssl x509 -in /etc/letsencrypt/live/morrisassert.dev/fullchain.pem -noout -text | grep DNS confirms.

G4. Switch nginx to full TLS mode

Restore the nginx config to its TLS-enabled form (depends on the bootstrap pattern used in F4):

docker compose -f backend/docker-compose.prod.yml down nginx
docker compose -f backend/docker-compose.prod.yml up -d nginx

Or, if using the profile pattern:

docker compose -f backend/docker-compose.prod.yml up -d nginx

(without --profile bootstrap).

G5. Verify TLS

From CEO laptop:

curl -I https://api.walkrpg.<root>/

Expected: HTTP/2 404 (or 200 if a root route exists). TLS handshake succeeded. Check headers include strict-transport-security from nginx.

In a browser, open https://api.walkrpg.<root>/. Lock icon should be present. Click the lock → cert details → issued by Let’s Encrypt (or Cloudflare Inc ECC CA-3 — that’s the edge cert; both being valid is the goal). No “Not Secure” warning.

H — Smoke test (~5 min)

H1. End-to-end auth/callback test

From CEO laptop:

curl -i -X POST https://api.walkrpg.<root>/auth/callback \
  -H "Content-Type: application/json" \
  -H "X-Request-Id: $(uuidgen)" \
  -d '{"email":"[email protected]","displayName":"Phase 14 Smoke Tester"}'

Expected: HTTP/2 200 or HTTP/2 201 with response body containing session.accessToken, walker.id, isFirstLogin: true. Per ADR-0006 mock-auth shape.

H2. Use the returned token to fetch profile

TOKEN="<paste accessToken from H1 response>"

curl -i -X GET https://api.walkrpg.<root>/walker/profile \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "X-Request-Id: $(uuidgen)"

Expected: HTTP/2 200 with walker profile body.

H3. Inspect logs

On VPS:

docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml logs --tail 50 api

Should show structured JSON log lines including the X-Request-Id values from H1 and H2.

H4. Common failure modes

Symptom	Likely cause	Fix
`curl: (6) Could not resolve host`	DNS not propagated yet	Wait 1-2 minutes, retry. `dig api.walkrpg.<root>` should return Cloudflare IPs.
`HTTP/2 502` or `HTTP/2 504`	api container not running or unreachable from nginx	`docker compose ps`, `docker compose logs api`. Likely `tsx` boot failure — check `.env` syntax.
`HTTP/2 503` after a few minutes	api container restarting in a loop	Same diagnosis path. Check `DATABASE_URL` matches `POSTGRES_PASSWORD`.
Browser shows `ERR_SSL_VERSION_OR_CIPHER_MISMATCH`	Cloudflare SSL mode wrong (must be Full strict, not Flexible)	Cloudflare dashboard → SSL/TLS → Overview → Full (strict).
`Origin SSL certificate is not trusted` (CF error 526)	LE cert not present on origin or cert mismatch	Re-run G3, verify `/etc/letsencrypt/live/api.walkrpg.<root>/fullchain.pem` exists, `docker compose restart nginx`.
Authentication errors after successful 200 on /auth/callback	`JWT_SECRET` mismatch between mint and verify, or env not loaded	Verify api container sees env via `docker compose exec api printenv

If H1 and H2 both succeed against the prod hostname: Phase 14 prod is functionally live. Proceed to H5 for the staging smoke test, then section I (CI/CD).

H5. Repeat H1+H2 against staging

curl -i -X POST https://api-staging.walkrpg.morrisassert.dev/auth/callback \
  -H "Content-Type: application/json" \
  -H "X-Request-Id: $(uuidgen)" \
  -d '{"email":"[email protected]","displayName":"Staging Smoke"}'

Expected: same shape as H1 — 200/201 with session.accessToken. The staging endpoint is a fully independent stack — its JWT_SECRET differs from prod, so the token returned here cannot verify against prod and vice versa.

STAGING_TOKEN="<paste accessToken from above>"
curl -i https://api-staging.walkrpg.morrisassert.dev/walker/profile \
  -H "Authorization: Bearer ${STAGING_TOKEN}"

Expected: 200 with walker profile body.

If H5 fails but H1+H2 pass: the staging stack is independently broken — check docker compose -f backend/docker-compose.staging.yml ps + logs, verify .env.staging is populated, verify the walkrpg-net-staging network exists (docker network ls | grep staging), and that the prod nginx joined it (docker network inspect walkrpg-net-staging shows the walkrpg-nginx-1 container).

If H1, H2, and H5 all succeed: both envs are live. Proceed to section I (CI/CD).

I — GitLab CI/CD pipeline (~20 min, orchestrator-paired)

This section is paired with the orchestrator + backend-engineer. The runbook lists the moving parts; the orchestrator authors .gitlab-ci.yml in a separate paired session if not already present.

I1. Confirm `.gitlab-ci.yml` exists at repo root

If not present, surface to orchestrator. The orchestrator authors it per ADR-0009 §7 + ADR-0010 §9.

The pipeline shape is multi-env (per ADR-0010 §9):

lint + test run on every push (any branch) and every MR.
build:image runs on push to dev OR main only. Pushes one image tagged with :<short-sha> AND :<branch-name>.
deploy-staging runs on push to dev only. SSH to VPS, pull image, docker compose -f backend/docker-compose.staging.yml .... Serialized via resource_group: staging.
deploy-prod runs on push to main only. SSH to VPS, pull image, docker compose -f backend/docker-compose.prod.yml .... Serialized via resource_group: production.

The two deploy jobs CAN run in parallel (different resource groups, different envs), but each one serializes against itself so concurrent merges to the same branch don’t race on docker-compose state.

I2. Add GitLab CI Variables

In browser: GitLab → positive-walkers/walkrpg → Settings → CI/CD → Variables → Add variable:

Key	Type	Value	Flags
`SSH_DEPLOY_KEY`	File	(paste contents of VPS `~/.ssh/walkrpg_deploy` PRIVATE key — from VPS `cat /home/deploy/.ssh/walkrpg_deploy`)	Protected: ON, Masked: ON (note: File-type masking masks the file contents in logs; the value itself does not show)
`SSH_DEPLOY_HOST`	Variable	`<VPS_IP>` or `api.walkrpg.<root>` (IP is more reliable for SSH; CF proxy doesn’t proxy SSH) — recommend `<VPS_IP>`	Protected: ON
`SSH_KNOWN_HOSTS`	Variable	Output of `ssh-keyscan <VPS_IP>` from CEO laptop (paste all lines)	Protected: ON

The GitLab Container Registry credentials ($CI_REGISTRY_USER, $CI_JOB_TOKEN) are automatically injected by GitLab for same-project pushes — no manual variable needed.

I3. Push a no-op commit to `main` to trigger the prod pipeline

From CEO laptop (or any clone). NOTE: post-ADR-0010 the default branch is main, not master. The migration steps live in §L below.

git checkout main
git pull origin main
git commit --allow-empty -m "ci: trigger Phase 14 first deploy"
# Direct push to main is denied by branch protection — go through an MR.
# For the very first pipeline kick, use a throwaway feature branch:
git checkout -b feature/ci-bootstrap
git push -u origin feature/ci-bootstrap
# Open MR feature/ci-bootstrap -> dev in GitLab UI, merge it. Then open
# MR dev -> main and merge that to trigger deploy-prod.

For ongoing day-2 deploys, follow the workflow in ADR-0010 §5.

I4. Watch the pipeline

GitLab → positive-walkers/walkrpg → CI/CD → Pipelines. The new pipeline should show lint → test → build → deploy-staging (for the dev push) or lint → test → build → deploy-prod (for the main push). Each stage runs sequentially within its branch.

Expected total runtime: 4-8 minutes first run (Docker layer cache cold).

I5. Verify deploy succeeded

After the deploy stage finishes green:

# From CEO laptop — repeat the H1 smoke test
curl -i -X POST https://api.walkrpg.<root>/auth/callback \
  -H "Content-Type: application/json" \
  -d '{"email":"[email protected]","displayName":"Post Deploy"}'

Expected: 200/201 with session.

On VPS, check the image SHA:

docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml images api

Should show registry.gitlab.com/positive-walkers/walkrpg/backend:<short-sha> matching the latest master commit.

I6. Pipeline troubleshooting

Stage fails	Look at	Common fix
`lint`	Job log	Local `pnpm lint` not run; fix locally and re-push
`test`	Job log	Test failures from unrelated changes; investigate
`build-image`	Job log	Dockerfile path wrong, dependencies fail to install, or registry auth misconfigured
`deploy`	Job log + `docker compose logs` on VPS	SSH key mismatch (re-check I2 SSH_DEPLOY_KEY paste), wrong known_hosts, or migration failure

J — Mobile reconfig (CEO + orchestrator pairing, post-J0)

Prerequisite (J0): Phase 14 backend is live + smoke tests pass.

J1. Update `local.properties`

In walkrpg-mobile/android/local.properties:

base.url=https://api.walkrpg.<root>/

Replace whatever local-tunnel hostname was there during Phase 13.

J2. Update network security config

Edit walkrpg-mobile/android/app/src/main/res/xml/network_security_config.xml. Drop the IP-specific debug-overrides block that allowed cleartext to the CEO-laptop tunnel.

Minimal config:

<?xml version="1.0" encoding="utf-8"?>
<network-security-config>
    <base-config cleartextTrafficPermitted="false" />
</network-security-config>

Or, if no overrides remain, delete the file entirely and remove the android:networkSecurityConfig attribute from AndroidManifest.xml.

J3. Rebuild APK

cd walkrpg-mobile/android
./gradlew clean assembleDebug

J4. Install on test device

adb install -r app/build/outputs/apk/debug/app-debug.apk

J5. Smoke test on device

Open app → mock-auth screen → enter email + displayName → submit. Expected: auth succeeds against the public endpoint, walker profile loads.

In the device logcat:

adb logcat | grep -i walkrpg

Should show outgoing requests to https://api.walkrpg.<root>/.... No cleartext warnings.

J6. Commit the mobile changes

In walkrpg-mobile repo:

git add android/local.properties android/app/src/main/res/xml/network_security_config.xml
git commit -m "chore(mobile): point base.url at Phase 14 VPS endpoint"
git push origin master

local.properties is typically gitignored — if so, commit the change to a local.properties.example template instead and document the swap in the mobile README.

K — Day-2 operational reference

K1. Postgres backup cron

Install the cron job (one-time on VPS as deploy):

sudo mkdir -p /var/backups/walkrpg
sudo chown deploy:deploy /var/backups/walkrpg
sudo chmod 700 /var/backups/walkrpg

crontab -e

Add lines:

0 3 * * * docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml exec -T db pg_dump -U walkrpg walkrpg | gzip > /var/backups/walkrpg/walkrpg-$(date +\%Y\%m\%d).sql.gz
0 4 * * * find /var/backups/walkrpg/ -name 'walkrpg-*.sql.gz' -mtime +7 -delete

Verify with:

crontab -l

K2. Manual backup (ad-hoc)

docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml exec -T db \
  pg_dump -U walkrpg walkrpg | gzip > /var/backups/walkrpg/walkrpg-manual-$(date +%Y%m%d-%H%M).sql.gz

K3. Restore from backup

gunzip -c /var/backups/walkrpg/walkrpg-YYYYMMDD.sql.gz | \
  docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml exec -T db \
  psql -U walkrpg walkrpg

Schedule a restore drill within 7 days of Phase 14 launch. Documented as an ops follow-up.

K4. Cert renewal verification

certbot runs in the compose stack and polls every 12h. Force-check renewal:

docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml run --rm certbot renew

Cert expiry visible at:

docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml run --rm certbot certificates

LE issues 90-day certs; renewal happens at 30 days remaining.

K5. Log inspection

# All services, last 100 lines, follow mode
docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml logs --tail 100 -f

# api only
docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml logs --tail 100 -f api

# nginx only
docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml logs --tail 100 -f nginx

Filter by X-Request-Id:

docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml logs api | grep "<request-id>"

K6. Rollback to previous deploy

cd /home/deploy/walkrpg
git log --oneline -5 origin/master           # find target SHA
git checkout <prev-sha>
docker compose -f backend/docker-compose.prod.yml pull api
docker compose -f backend/docker-compose.prod.yml run --rm api pnpm prisma migrate deploy
docker compose -f backend/docker-compose.prod.yml up -d api

Note: a rollback that crosses a Prisma migration is risky if the migration is destructive (column drops). Inspect the diff before rolling back across migrations.

Return to current after recovery:

git checkout master
docker compose -f backend/docker-compose.prod.yml pull api
docker compose -f backend/docker-compose.prod.yml up -d api

K7. Full stack stop / start

# Stop everything (api, db, nginx, certbot)
docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml stop

# Start everything
docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml start

# Down (stop + remove containers, keep volumes)
docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml down

# Down + remove volumes (DESTRUCTIVE — wipes db)
docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml down -v

K8. Update Docker images outside of CI

Manual pull (skips CI deploy stage — use only for hotfixes):

cd /home/deploy/walkrpg
git pull origin master
docker compose -f backend/docker-compose.prod.yml pull
docker compose -f backend/docker-compose.prod.yml run --rm api pnpm prisma migrate deploy
docker compose -f backend/docker-compose.prod.yml up -d

K9. VPS resource check

# CPU + RAM
htop

# Disk
df -h
du -sh /home/deploy/walkrpg/backend/pgdata/
du -sh /var/backups/walkrpg/

# Network
ss -tunap | grep LISTEN

Hetzner Cloud Console also shows CPU/RAM/disk/network graphs per VPS, free, no opt-in.

K10. Secret rotation

To rotate JWT_SECRET or POSTGRES_PASSWORD:

# Edit .env
nano /home/deploy/walkrpg/backend/.env
# (paste new values, save)

# Postgres password rotation requires DB-side update too
docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml exec db \
  psql -U walkrpg -c "ALTER USER walkrpg WITH PASSWORD '<new-password>';"

# Restart api to pick up new env
docker compose -f /home/deploy/walkrpg/backend/docker-compose.prod.yml up -d api

Note: rotating JWT_SECRET invalidates all live sessions immediately. All testers must re-auth. Schedule outside of active test windows.

K11. Adding a new tester (currently: no special action)

In mock-auth mode, any caller can POST /auth/callback and self-register. No allowlist exists. Sharing the public hostname api.walkrpg.<root> is the only “invite” step.

When the mock-auth posture flips to Firebase (post-production-migration unfreeze), this changes — tester onboarding becomes a Firebase Auth user invite.

K12. Incident response checklist (skeleton)

If the API is down:

docker compose ps — are containers running?
docker compose logs --tail 200 api — boot failure or runtime crash?
docker compose logs --tail 200 nginx — TLS / upstream errors?
curl http://localhost:3000/ from VPS — does api respond on internal network?
curl https://api.walkrpg.<root>/ from CEO laptop — does Cloudflare reach origin?
Hetzner Cloud Console — is the VPS up? Network OK? Disk full?
Cloudflare dashboard → Analytics — is traffic hitting CF at all?

If a security incident is suspected:

Pull VPS off-network at Hetzner Cloud Console (network detach) — preserves forensic state.
Snapshot the disk (Hetzner Cloud Console → Server → Snapshots) before any cleanup.
Write to ~/.claude/walkrpg/CRITICAL.md for CEO surfacing.
Investigate from a separate, clean VPS or local environment.

L — Branch migration from `master` to `main` + protection rules (one-time, ~10 min)

This section is the operational counterpart to ADR-0010 §10. Run once at ADR-0010 ratification time. Subsequent day-2 workflow follows the patterns in §M and ADR-0010 §5.

L1. Confirm clean working tree

From CEO laptop:

cd /path/to/walkrpg
git status

Must report nothing to commit, working tree clean on master. If not, commit or stash the outstanding work before proceeding.

L2. Rename `master` to `main` in GitLab

In browser: GitLab → positive-walkers/walkrpg → Settings → Repository → Default branch.

Field	Value
Default branch	`main`

If main does not yet exist, GitLab UI offers a one-click rename of the existing default branch. Click “Rename master to main”. The rename is non-destructive — all history is preserved verbatim under the new name.

Confirm in the Branches list that master is gone and main is the new default.

L3. Update local clones

git fetch origin
git branch -m master main          # rename local master to main
git branch -u origin/main main     # re-track to the new remote
git remote set-head origin -a      # update the symbolic-ref HEAD
git pull origin main

Verify:

git branch -vv

Should show * main <sha> [origin/main].

L4. Branch `dev` from `main`

git checkout main
git pull origin main
git checkout -b dev
git push -u origin dev

L5. Apply branch protection rules (GitLab UI)

GitLab → positive-walkers/walkrpg → Settings → Repository → Protected branches.

Add main protection:

Field	Value
Branch	`main`
Allowed to merge	Maintainers
Allowed to push and merge	No one
Allowed to force push	OFF
Allow deletion	OFF
Required approvals (merge requests)	1

Add dev protection:

Field	Value
Branch	`dev`
Allowed to merge	Maintainers
Allowed to push and merge	No one
Allowed to force push	OFF
Allow deletion	OFF
Required approvals (merge requests)	1

feature/* and hotfix/* need no protection. Force-push is fine on those during MR iteration.

L6. Enable auto-delete-source-branch (recommended)

GitLab → positive-walkers/walkrpg → Settings → Merge requests → “Enable ‘Delete source branch’ option by default” → ON.

This keeps the branch list clean — merged feature/hotfix branches disappear automatically.

L7. Enable squash-only (recommended)

GitLab → positive-walkers/walkrpg → Settings → Merge requests → Squash commits when merging → “Require”. This forces squash merges, keeping the protected-branch log clean.

L8. Verify the migration

From CEO laptop:

# Direct push to main must FAIL.
git checkout main
git commit --allow-empty -m "test: should fail"
git push origin main
# Expected: `! [remote rejected] main -> main (protected branch hook declined)`

If the push succeeds: branch protection is misconfigured. Re-check L5.

Reset the noise:

git reset --hard HEAD~1

L9. From this point forward

All new work flows through feature/* branches and MRs per ADR-0010 §5. The next deploy of WalkRPG goes through:

git checkout dev && git pull && git checkout -b feature/<name>
Work + push.
MR feature/<name> → dev → orchestrator review → CEO merge → auto-deploy staging.
MR dev → main → orchestrator review → CEO merge → auto-deploy prod.

M — Hotfix workflow (reference)

Use when a bad commit reaches prod and needs a fix faster than the normal feature → dev → main release cadence. The hotfix flow bypasses staging (which would slow the fix) but still requires an MR + orchestrator review pass.

M1. Branch from `main`

git checkout main
git pull origin main
git checkout -b hotfix/<short-name>

M2. Fix + commit + push

# (edit + test locally)
git add <files>
git commit -m "fix(<scope>): <short description>"
git push -u origin hotfix/<short-name>

M3. Open MR `hotfix/<name> → main`

GitLab UI → Merge requests → New merge request → source hotfix/<short-name>, target main.

Title: same as the commit subject. Body: brief paragraph + what was broken + what was fixed + how the fix was tested locally.

M4. Orchestrator review

From CEO laptop:

claude /review <MR-URL>

Orchestrator dispatches relevant leads (tech-architect for backend infra, narrative-designer for content, etc.), reviews the diff inline, posts a summary verdict.

Expected verdict: APPROVE (hotfixes are small + scoped). If NEEDS_CHANGES, iterate on the hotfix branch (force-push OK) and re-run the review.

M5. CEO merges to `main`

Squash merge. main advances; auto-deploys to prod.

M6. Verify the fix in prod

# Smoke test against prod
curl -i -X POST https://api.walkrpg.morrisassert.dev/auth/callback \
  -H "Content-Type: application/json" \
  -d '{"email":"[email protected]","displayName":"Hotfix Verify"}'

Or whatever endpoint exercises the fix. Confirm the fix lands as expected.

M7. Sync `dev` to track prod

The dev branch must carry the hotfix forward; otherwise the next release MR (dev → main) would re-introduce the bug. Two patterns:

Pattern A — cherry-pick onto a sync branch (cleaner if dev has diverged from main):

git checkout dev
git pull origin dev
git checkout -b feature/sync-hotfix-<short-name>
git cherry-pick <merged-squash-sha-on-main>
git push -u origin feature/sync-hotfix-<short-name>

Open MR feature/sync-hotfix-<short-name> → dev. Orchestrator review (typically a quick APPROVE because the diff is identical to the already-reviewed hotfix). Merge.

Pattern B — MR the hotfix branch itself to dev (works if dev is close to main):

# The hotfix branch still exists if you haven't deleted it.
# Open a SECOND MR: hotfix/<short-name> → dev.

Pattern A is more robust when dev has open features that haven’t shipped yet — the squash-merge SHA is the cleanest carrier of the fix.

M8. Clean up

Delete the hotfix branch from GitLab if auto-delete is on (L6); otherwise:

git push origin --delete hotfix/<short-name>
git branch -d hotfix/<short-name>     # local

Appendix — Sequenced overview

Section	Time	Run by	Blocker if it fails
A — Hetzner provisioning	~10 min	CEO	yes
B — SSH hardening	~10 min	CEO	yes
C — Docker install	~5 min	CEO	yes
D — Cloudflare DNS	~10 min	CEO	yes
E — Repo deploy keys + secrets	~10 min	CEO	yes
F — Docker compose stacks (prod + staging)	~15 min	CEO (or paused for orchestrator)	yes — requires compose / Dockerfile / nginx files to exist
G — Let’s Encrypt cert (SAN over 6 hostnames)	~5 min	CEO	yes
H — Smoke test (prod + staging)	~5 min	CEO	yes — must pass to consider Phase 14 live
I — GitLab CI/CD (multi-env)	~20 min	Orchestrator + CEO	no — Phase 14 is live even without CI; this enables push-to-deploy
J — Mobile reconfig	~15 min	Orchestrator + CEO	no — separate concern from backend liveness
K — Day-2 ops	reference	CEO + ops	no — reference material
L — Branch migration `master` → `main` + protection rules (one-time)	~10 min	CEO	once; gates ADR-0010 going live
M — Hotfix workflow (reference)	~15 min when invoked	CEO + orchestrator	per-incident

End of runbook.

Phase 14 — VPS provisioning runbook

Phase 14 — VPS provisioning runbook

A — Hetzner provisioning (~10 min)

A1. Sign in to Hetzner Cloud Console

A2. Create project (or reuse existing)

A3. Add SSH key to project

A4. Create server

A5. Note the IPv4

A6. First connectivity test

B — SSH hardening + base user (~10 min)

B1. SSH back in as root

B2. Create deploy user

B3. Allow deploy passwordless sudo (provisioning only — tighten later if desired)

B4. Copy authorized_keys to deploy

B5. Test deploy login (from CEO laptop, separate terminal)

B6. Harden sshd config (still as root)

B7. Restart sshd

B8. Verify deploy still works (from CEO laptop, NEW terminal — keep the root session open)

B9. Install ufw + fail2ban (as deploy via sudo)

B10. Configure ufw

B11. Confirm fail2ban is running

C — Docker + Docker Compose install (~5 min)

C1. Install Docker via the official convenience script (as deploy)

C2. Add deploy to the docker group

C3. Apply group change to current shell (or re-login)

C4. Verify Docker + Docker Compose

C5. Smoke test

D — Cloudflare DNS + Registrar setup (CEO does on Cloudflare side, ~10 min)

D1. Confirm domain at Cloudflare Registrar

D2. Open the zone

D3. Add the six A records

D4. Set SSL/TLS mode to Full (strict)

D5. Always Use HTTPS

D6. HSTS (optional, recommended)

D7. Universal SSL certificate

D8. Verify DNS propagation

D2 — Cloudflare Access setup for wiki + Swagger (~10 min, web UI)

D2.1 Enable Zero Trust on the Cloudflare account

D2.2 Create the wiki Access Application

D2.3 Choose identity providers

D2.4 Create the access policy

D2.5 Create the Swagger Access Application (path-gated)

D2.6 Verify the wiki gate

D2.7 Adding a new tester later

E — Repo deploy keys + secrets (~10 min)

E1. Generate deploy key on VPS (as deploy)

E2. Add deploy key to GitLab project

E3. Configure git on VPS to use the deploy key

E4. Accept GitLab’s host key

E5. Clone the repo

E6. Create .env file

E7. Verify .env mode

F — Docker compose stack (~10 min)

F1. Pull the latest source on the VPS

F2. First-time bootstrap — bring up the staging network owner, then prod db

F3. Run Prisma migrations against both DBs

F4. Start the rest of both stacks — temporarily without TLS

F5. Verify api is reachable over plain HTTP

G — Let’s Encrypt cert (~5 min)

G1. Run certbot against the LE staging server (verify the flow)

G2. Delete the staging cert before requesting production

G3. Run certbot against the production LE server

G4. Switch nginx to full TLS mode

G5. Verify TLS

H — Smoke test (~5 min)

H1. End-to-end auth/callback test

H2. Use the returned token to fetch profile

H3. Inspect logs

H4. Common failure modes

H5. Repeat H1+H2 against staging

I — GitLab CI/CD pipeline (~20 min, orchestrator-paired)

I1. Confirm .gitlab-ci.yml exists at repo root

I2. Add GitLab CI Variables

I3. Push a no-op commit to main to trigger the prod pipeline

I4. Watch the pipeline

I5. Verify deploy succeeded

I6. Pipeline troubleshooting

J — Mobile reconfig (CEO + orchestrator pairing, post-J0)

J1. Update local.properties

J2. Update network security config

B2. Create `deploy` user

B3. Allow `deploy` passwordless sudo (provisioning only — tighten later if desired)

B4. Copy authorized_keys to `deploy`

B5. Test `deploy` login (from CEO laptop, separate terminal)

B8. Verify `deploy` still works (from CEO laptop, NEW terminal — keep the root session open)

B9. Install ufw + fail2ban (as `deploy` via sudo)

C1. Install Docker via the official convenience script (as `deploy`)

C2. Add `deploy` to the `docker` group

E1. Generate deploy key on VPS (as `deploy`)

E6. Create `.env` file

E7. Verify `.env` mode

I1. Confirm `.gitlab-ci.yml` exists at repo root

I3. Push a no-op commit to `main` to trigger the prod pipeline

J1. Update `local.properties`

L — Branch migration from `master` to `main` + protection rules (one-time, ~10 min)

L2. Rename `master` to `main` in GitLab

L4. Branch `dev` from `main`

M1. Branch from `main`

M3. Open MR `hotfix/<name> → main`

M5. CEO merges to `main`

M7. Sync `dev` to track prod