Files
rfc-app/deploy/RUNBOOK.md
T
Ben Stull ee6e3491e7 Drop "prototype/carryover" framing now that v1 is shipped
SPEC, DEV docs, and code comments still talked about the codebase as
a rewrite-in-progress against an external prototype. With v1 shipped
the framing reads oddly — it implies code is provisional when it's
the production thing. Recast §18 as "the technical stack," strip
"carryover from the prototype" comments across backend (api.py,
chat.py, providers.py) and frontend (DiffView, PromptBar,
SelectionTooltip, modelStyles), and rework SPEC §1 / §18 to introduce
OHM up front rather than as a follow-on to a prototype reference.

Also:
- RUNBOOK: bump Python prereq to 3.11+ to match the production VM
  (was 3.13).
- Remove IMPLEMENTATION-PROMPT.md — the original implementation brief
  is no longer load-bearing.
- Add deploy/DEPLOY-NEW-SESSION-PROMPT.md as the durable
  deploy-handoff prompt for new sessions.
2026-05-25 10:32:46 -07:00

13 KiB

Runbook

Single-host deployment of the RFC app at rfc.wiggleverse.org, sharing infrastructure with git.wiggleverse.org (same Gitea instance, same nginx, same Let's Encrypt). The shape matches §4.2: one process, one SQLite file, no separate worker.

Bring-up order: host prep → Gitea side (bot, OAuth, meta repo) → app side (code, venv, build, .env) → web server side (nginx, certbot) → systemd → smoke test.

Every step is idempotent or no-op-on-rerun, so re-running this runbook to recover from a partial install is safe.


0. Prerequisites

  • Ubuntu/Debian-style host with nginx and certbot already serving git.wiggleverse.org over HTTPS.
  • DNS: an A record for rfc.wiggleverse.org pointing at the same IP as git.wiggleverse.org.
  • Python 3.11+ available system-wide (the project has no requires-python pin; the current production VM runs 3.11 on Debian bookworm). Node 20+ available (for npm run build once; the build output is what runs in production — Node isn't needed at runtime).
  • git, openssl, and rsync on the host.

1. First-time bring-up

1.1 Host prep

Create the system user and the install directory.

sudo useradd --system --shell /usr/sbin/nologin --home-dir /opt/rfc-app rfc-app
sudo mkdir -p /opt/rfc-app
sudo chown rfc-app:rfc-app /opt/rfc-app

Clone the repo. HTTPS, since we don't push from the server.

sudo -u rfc-app git clone https://git.wiggleverse.org/ben.stull/rfc-app.git /opt/rfc-app

1.2 Gitea side

1.2.1 Create the bot service account. In the Gitea web UI, signed in as a Gitea admin:

  • Site Administration → User Accounts → Create User Account
  • Username: rfc-bot (or whatever you want)
  • Email: anything sensible (e.g. rfc-bot@wiggleverse.org)
  • Password: random, you won't use it interactively
  • Send email confirmation: off

Then sign in as the bot, open Settings → Applications → Generate New Token, name it rfc-app, grant scopes:

  • write:repository
  • write:user
  • write:admin (needed because the bot creates per-RFC repos on graduation and deletes branches per §12 hygiene)

Copy the token. It goes into .env as GITEA_BOT_TOKEN.

1.2.2 Create the org and add the bot. The meta repo lives inside an org. In Gitea: Create Organization → wiggleverse. Then Members → Invite → rfc-bot → Owner.

1.2.3 Register the OAuth2 application. Site Administration → Integrations → OAuth2 Applications → Create Application:

  • Name: RFC App
  • Redirect URI: https://rfc.wiggleverse.org/auth/callback

Copy the client ID and client secret. They go into .env.

1.3 App side

1.3.1 Python venv + deps.

sudo -u rfc-app python3 -m venv /opt/rfc-app/backend/.venv
sudo -u rfc-app /opt/rfc-app/backend/.venv/bin/pip install \
    -r /opt/rfc-app/backend/requirements.txt

1.3.2 Build the frontend. Build locally and copy dist/ across:

# On your laptop:
cd frontend && npm install && npm run build
rsync -a dist/ ben.stull@<host>:/tmp/rfc-app-dist/
# On the host:
sudo -u rfc-app mkdir -p /opt/rfc-app/frontend/dist
sudo cp -r /tmp/rfc-app-dist/. /opt/rfc-app/frontend/dist/
sudo chown -R rfc-app:rfc-app /opt/rfc-app/frontend/dist

Or build on the host directly if Node is installed there:

cd /opt/rfc-app/frontend && sudo -u rfc-app npm install
sudo -u rfc-app npm run build

1.3.3 Write .env.

sudo -u rfc-app cp /opt/rfc-app/backend/.env.example /opt/rfc-app/backend/.env
sudoedit /opt/rfc-app/backend/.env    # set every value

Required values for production (see .env.example for the comments on each field):

GITEA_URL=https://git.wiggleverse.org
GITEA_BOT_USER=rfc-bot
GITEA_BOT_TOKEN=<from 1.2.1>
GITEA_ORG=wiggleverse
META_REPO=meta

OAUTH_CLIENT_ID=<from 1.2.3>
OAUTH_CLIENT_SECRET=<from 1.2.3>

APP_URL=https://rfc.wiggleverse.org
SECRET_KEY=<openssl rand -hex 32>
OWNER_GITEA_LOGIN=ben.stull
GITEA_WEBHOOK_SECRET=<openssl rand -hex 32>

DATABASE_PATH=/opt/rfc-app/backend/data/rfc-app.db

For the §15.4 email loop, either leave SMTP_HOST unset (stdout fallback — fine for the very first deploy while a provider is being chosen) or fill in the SMTP block:

SMTP_HOST=smtp.postmarkapp.com
SMTP_PORT=587
SMTP_USER=<provider-supplied>
SMTP_PASSWORD=<provider-supplied>
SMTP_STARTTLS=1
EMAIL_FROM=notifications@wiggleverse.org
EMAIL_FROM_NAME=Wiggleverse

Configure SPF and DKIM records for wiggleverse.org with the chosen provider before sending real traffic. The single non-spoofing envelope identity per §15.9 is what every outbound email uses; spoofing the actor's address would land everything in spam.

If a real bounce/complaint webhook lands later, set WEBHOOK_EMAIL_BOUNCE_SECRET to a long random string and configure the provider's webhook to inject it as X-Webhook-Secret.

Lock the file down — it carries secrets:

sudo chmod 600 /opt/rfc-app/backend/.env
sudo chown rfc-app:rfc-app /opt/rfc-app/backend/.env

1.3.4 Seed the meta repo. This creates wiggleverse/meta on Gitea, populates the hand-authored files, and registers the webhook against APP_URL/api/webhooks/gitea.

sudo -u rfc-app -H bash -c \
  'cd /opt/rfc-app/backend && .venv/bin/python ../scripts/seed_meta_repo.py'

Re-running is safe; every step is upsert-shaped.

1.4 Web server side

1.4.1 nginx vhost.

sudo cp /opt/rfc-app/deploy/nginx/rfc.wiggleverse.org.conf \
    /etc/nginx/sites-available/rfc.wiggleverse.org
sudo ln -s /etc/nginx/sites-available/rfc.wiggleverse.org \
    /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Make the nginx user able to read /opt/rfc-app/frontend/dist:

sudo usermod -a -G rfc-app www-data
sudo chmod -R g+rX /opt/rfc-app/frontend/dist
sudo systemctl reload nginx

1.4.2 Let's Encrypt cert.

sudo certbot --nginx -d rfc.wiggleverse.org

1.5 systemd

sudo cp /opt/rfc-app/deploy/systemd/rfc-app.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now rfc-app
sudo systemctl status rfc-app

Watch the logs:

sudo journalctl -u rfc-app -f

Expected startup line:

RFC app started — meta repo wiggleverse/meta

1.6 Smoke test

In a browser at https://rfc.wiggleverse.org:

  1. The landing page renders (§14.1 — title, pitch, three-item deck, sign-in affordance).
  2. Click Sign in with Gitea → OAuth round-trip → catalog lands.
  3. Click + Propose New RFC, fill in title/slug/pitch, submit. The pending-ideas disclosure shows the new PR.
  4. As the owner, click the PR row, then Merge proposal. The catalog refreshes with the super-draft.
  5. Open /admin and confirm the four-tab home base loads (Users / Graduation queue / Audit log / Permission events).
  6. Open /settings/notifications and confirm the five sub-sections render (per-category email, digest cadence, quiet hours, watches, mute list).

If anything misfires, the troubleshooting section below covers the common failure modes.


2. Day-2 operations

2.1 Logs

sudo journalctl -u rfc-app -f                 # follow
sudo journalctl -u rfc-app --since "1 hour ago"
sudo journalctl -u rfc-app -p err             # errors only

The app logs at INFO level by default. Notable log lines to watch:

  • RFC app started — meta repo ... — startup completed.
  • reconciler: starting sweep / reconciler: sweep complete — the five-minute §4.1 safety-net pass.
  • digest tick failed / hygiene tick failed — a scheduler tick crashed; the next tick will retry but the underlying error wants a look. Stack trace lands next to the warning.
  • email (stdout fallback): to=...SMTP_HOST is unset and the email loop is logging envelopes instead of sending.

2.2 Database backup

The SQLite file at DATABASE_PATH carries every app-canonical row (users, threads, messages, watches, notifications, the audit log). The §4 cache rebuilds from Gitea, so an empty backup of the cached_* tables is recoverable — but the app-canonical tables aren't, so a backup is load-bearing.

Daily snapshot (cron, as rfc-app):

0 3 * * * sqlite3 /opt/rfc-app/backend/data/rfc-app.db ".backup /opt/rfc-app/backend/data/backup-$(date +\%F).db"

Retention is your call; 30 daily snapshots is the easy default.

Restore a snapshot:

sudo systemctl stop rfc-app
sudo -u rfc-app cp /opt/rfc-app/backend/data/backup-YYYY-MM-DD.db \
    /opt/rfc-app/backend/data/rfc-app.db
sudo systemctl start rfc-app

The reconciler will refill the cache from Gitea on first sweep.

2.3 Secret rotation

SECRET_KEY invalidates every active session cookie. To rotate:

NEW=$(openssl rand -hex 32)
sudoedit /opt/rfc-app/backend/.env    # SECRET_KEY=$NEW
sudo systemctl restart rfc-app

Every signed-in user is bounced to the landing page and re-authenticates through OAuth. Existing email-unsubscribe URLs become invalid (per §15.4 they're signed against SECRET_KEY); a user can still unsubscribe through /settings/notifications.

GITEA_BOT_TOKEN rotates without service disruption — write the new value, restart. Old tokens stay valid in Gitea until revoked there.

GITEA_WEBHOOK_SECRET rotates in two steps: update the value in .env, restart, then update the secret in Gitea's webhook config to match. A brief window where webhooks are refused; the reconciler covers it.

2.4 The §12 hygiene timer cadence

The hygiene scheduler runs every HYGIENE_TICK_SECONDS (default 3600). Each tick checks cached_branches for two boundaries:

  • 30 days idle (no commits, no PR) — the branch flips to state='closed'. The branch stays in Gitea, but new chat is disabled per §8.4.
  • 90 days closed (or 90 days post-merge for a merged-PR branch) — the bot deletes the branch from Gitea. The cached_branches.state flips to deleted, and the audit log records the action with actor_user_id=NULL and on_behalf_of=<bot login> per §15.9.

Pinned branches (cached_branches.pinned=1) skip both passes. Per-user branch_chat_seen cursors survive branch deletion — chat history is app-canonical, not cached, and persists indefinitely.

If a branch needs to be kept alive past 30 days without commits, pin it from the admin surface (or directly: UPDATE cached_branches SET pinned = 1 WHERE rfc_slug = ? AND branch_name = ?).

2.5 Updating after a push

sudo -u rfc-app git -C /opt/rfc-app pull
sudo -u rfc-app /opt/rfc-app/backend/.venv/bin/pip install \
    -r /opt/rfc-app/backend/requirements.txt
# Rebuild the frontend locally and rsync dist/ as in 1.3.2.
sudo systemctl restart rfc-app

The §5 schema migrations run on startup and are append-only. A restart is the entire deploy.


3. Rollback

If a deploy goes sideways, the rollback shape is:

sudo -u rfc-app git -C /opt/rfc-app log --oneline -10
sudo -u rfc-app git -C /opt/rfc-app checkout <prior-commit>
sudo -u rfc-app /opt/rfc-app/backend/.venv/bin/pip install \
    -r /opt/rfc-app/backend/requirements.txt
# Rebuild + rsync the frontend dist from the prior commit's state.
sudo systemctl restart rfc-app

The schema migrations are append-only, so rolling code back without rolling the schema back is the safe default. If a migration introduced a column the new code requires, the old code ignores the extra column — SQLite reads the rows fine.

If the database itself got into a bad state (a botched manual UPDATE, say), restore from the most recent backup per §2.2.


4. Troubleshooting

  • systemctl status rfc-app shows RuntimeError: Required environment variable ... is not set. The .env is missing a value, or EnvironmentFile= in the systemd unit isn't finding it. Confirm /opt/rfc-app/backend/.env exists and is mode 0600 owned by rfc-app.
  • OAuth callback returns "Invalid state". The redirect URI in Gitea must match APP_URL/auth/callback exactly. Confirm it's https://rfc.wiggleverse.org/auth/callback.
  • The catalog stays empty after a merge. Check the webhook: journalctl -u rfc-app | grep webhook. Gitea's Settings → Webhooks → Recent Deliveries on the meta repo shows the delivery status; the reconciler will catch up within 5 minutes anyway.
  • 502 Bad Gateway on /api/* or /auth/*. uvicorn isn't running or isn't bound to 127.0.0.1:8000. systemctl status rfc-app.
  • 403 from nginx on static assets. The nginx user can't read /opt/rfc-app/frontend/dist. Apply the chmod from 1.4.1.
  • OAuth works, but the user can't propose. The users row was created with role contributor; only OWNER_GITEA_LOGIN's login gets owner on first sign-in. Confirm .env has the right value and you signed in with that account.
  • Email isn't going out and no error logs. Most likely SMTP_HOST is unset; the stdout fallback is in play and envelopes are in the journal as email (stdout fallback): .... Set the SMTP block per §1.3.3 to enable real sends.
  • The §12 hygiene sweep isn't deleting an obviously stale branch. Confirm cached_branches.pinned = 0 for the row, and that last_commit_at (or the joined merged_at) actually predates the 90-day cutoff. The actions audit log carries every hygiene gesture with action_kind IN ('close_idle_branch', 'delete_stale_branch', 'delete_post_merge_branch').