Files
rfc-app/deploy/RUNBOOK.md
T
Ben Stull ee6e3491e7 Drop "prototype/carryover" framing now that v1 is shipped
SPEC, DEV docs, and code comments still talked about the codebase as
a rewrite-in-progress against an external prototype. With v1 shipped
the framing reads oddly — it implies code is provisional when it's
the production thing. Recast §18 as "the technical stack," strip
"carryover from the prototype" comments across backend (api.py,
chat.py, providers.py) and frontend (DiffView, PromptBar,
SelectionTooltip, modelStyles), and rework SPEC §1 / §18 to introduce
OHM up front rather than as a follow-on to a prototype reference.

Also:
- RUNBOOK: bump Python prereq to 3.11+ to match the production VM
  (was 3.13).
- Remove IMPLEMENTATION-PROMPT.md — the original implementation brief
  is no longer load-bearing.
- Add deploy/DEPLOY-NEW-SESSION-PROMPT.md as the durable
  deploy-handoff prompt for new sessions.
2026-05-25 10:32:46 -07:00

410 lines
13 KiB
Markdown

# Runbook
Single-host deployment of the RFC app at `rfc.wiggleverse.org`, sharing
infrastructure with `git.wiggleverse.org` (same Gitea instance, same nginx,
same Let's Encrypt). The shape matches §4.2: one process, one SQLite file,
no separate worker.
Bring-up order: host prep → Gitea side (bot, OAuth, meta repo) → app side
(code, venv, build, .env) → web server side (nginx, certbot) → systemd →
smoke test.
Every step is idempotent or no-op-on-rerun, so re-running this runbook to
recover from a partial install is safe.
---
## 0. Prerequisites
- Ubuntu/Debian-style host with nginx and certbot already serving
`git.wiggleverse.org` over HTTPS.
- DNS: an `A` record for `rfc.wiggleverse.org` pointing at the same IP as
`git.wiggleverse.org`.
- Python 3.11+ available system-wide (the project has no `requires-python`
pin; the current production VM runs 3.11 on Debian bookworm). Node 20+
available (for `npm run build` once; the build output is what runs in
production — Node isn't needed at runtime).
- `git`, `openssl`, and `rsync` on the host.
---
## 1. First-time bring-up
### 1.1 Host prep
Create the system user and the install directory.
```sh
sudo useradd --system --shell /usr/sbin/nologin --home-dir /opt/rfc-app rfc-app
sudo mkdir -p /opt/rfc-app
sudo chown rfc-app:rfc-app /opt/rfc-app
```
Clone the repo. HTTPS, since we don't push from the server.
```sh
sudo -u rfc-app git clone https://git.wiggleverse.org/ben.stull/rfc-app.git /opt/rfc-app
```
### 1.2 Gitea side
**1.2.1 Create the bot service account.** In the Gitea web UI, signed in
as a Gitea admin:
- **Site Administration → User Accounts → Create User Account**
- Username: `rfc-bot` (or whatever you want)
- Email: anything sensible (e.g. `rfc-bot@wiggleverse.org`)
- Password: random, you won't use it interactively
- Send email confirmation: off
Then sign in as the bot, open **Settings → Applications → Generate New
Token**, name it `rfc-app`, grant scopes:
- `write:repository`
- `write:user`
- `write:admin` (needed because the bot creates per-RFC repos on
graduation and deletes branches per §12 hygiene)
Copy the token. It goes into `.env` as `GITEA_BOT_TOKEN`.
**1.2.2 Create the org and add the bot.** The meta repo lives inside an
org. In Gitea: **Create Organization → wiggleverse**. Then **Members →
Invite → rfc-bot → Owner**.
**1.2.3 Register the OAuth2 application.** **Site Administration →
Integrations → OAuth2 Applications → Create Application**:
- Name: `RFC App`
- Redirect URI: `https://rfc.wiggleverse.org/auth/callback`
Copy the client ID and client secret. They go into `.env`.
### 1.3 App side
**1.3.1 Python venv + deps.**
```sh
sudo -u rfc-app python3 -m venv /opt/rfc-app/backend/.venv
sudo -u rfc-app /opt/rfc-app/backend/.venv/bin/pip install \
-r /opt/rfc-app/backend/requirements.txt
```
**1.3.2 Build the frontend.** Build locally and copy `dist/` across:
```sh
# On your laptop:
cd frontend && npm install && npm run build
rsync -a dist/ ben.stull@<host>:/tmp/rfc-app-dist/
# On the host:
sudo -u rfc-app mkdir -p /opt/rfc-app/frontend/dist
sudo cp -r /tmp/rfc-app-dist/. /opt/rfc-app/frontend/dist/
sudo chown -R rfc-app:rfc-app /opt/rfc-app/frontend/dist
```
Or build on the host directly if Node is installed there:
```sh
cd /opt/rfc-app/frontend && sudo -u rfc-app npm install
sudo -u rfc-app npm run build
```
**1.3.3 Write `.env`.**
```sh
sudo -u rfc-app cp /opt/rfc-app/backend/.env.example /opt/rfc-app/backend/.env
sudoedit /opt/rfc-app/backend/.env # set every value
```
Required values for production (see `.env.example` for the comments on
each field):
```ini
GITEA_URL=https://git.wiggleverse.org
GITEA_BOT_USER=rfc-bot
GITEA_BOT_TOKEN=<from 1.2.1>
GITEA_ORG=wiggleverse
META_REPO=meta
OAUTH_CLIENT_ID=<from 1.2.3>
OAUTH_CLIENT_SECRET=<from 1.2.3>
APP_URL=https://rfc.wiggleverse.org
SECRET_KEY=<openssl rand -hex 32>
OWNER_GITEA_LOGIN=ben.stull
GITEA_WEBHOOK_SECRET=<openssl rand -hex 32>
DATABASE_PATH=/opt/rfc-app/backend/data/rfc-app.db
```
For the §15.4 email loop, either leave `SMTP_HOST` unset (stdout
fallback — fine for the very first deploy while a provider is being
chosen) or fill in the SMTP block:
```ini
SMTP_HOST=smtp.postmarkapp.com
SMTP_PORT=587
SMTP_USER=<provider-supplied>
SMTP_PASSWORD=<provider-supplied>
SMTP_STARTTLS=1
EMAIL_FROM=notifications@wiggleverse.org
EMAIL_FROM_NAME=Wiggleverse
```
Configure SPF and DKIM records for `wiggleverse.org` with the chosen
provider before sending real traffic. The single non-spoofing envelope
identity per §15.9 is what every outbound email uses; spoofing the
actor's address would land everything in spam.
If a real bounce/complaint webhook lands later, set
`WEBHOOK_EMAIL_BOUNCE_SECRET` to a long random string and configure the
provider's webhook to inject it as `X-Webhook-Secret`.
Lock the file down — it carries secrets:
```sh
sudo chmod 600 /opt/rfc-app/backend/.env
sudo chown rfc-app:rfc-app /opt/rfc-app/backend/.env
```
**1.3.4 Seed the meta repo.** This creates `wiggleverse/meta` on Gitea,
populates the hand-authored files, and registers the webhook against
`APP_URL/api/webhooks/gitea`.
```sh
sudo -u rfc-app -H bash -c \
'cd /opt/rfc-app/backend && .venv/bin/python ../scripts/seed_meta_repo.py'
```
Re-running is safe; every step is upsert-shaped.
### 1.4 Web server side
**1.4.1 nginx vhost.**
```sh
sudo cp /opt/rfc-app/deploy/nginx/rfc.wiggleverse.org.conf \
/etc/nginx/sites-available/rfc.wiggleverse.org
sudo ln -s /etc/nginx/sites-available/rfc.wiggleverse.org \
/etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
```
Make the nginx user able to read `/opt/rfc-app/frontend/dist`:
```sh
sudo usermod -a -G rfc-app www-data
sudo chmod -R g+rX /opt/rfc-app/frontend/dist
sudo systemctl reload nginx
```
**1.4.2 Let's Encrypt cert.**
```sh
sudo certbot --nginx -d rfc.wiggleverse.org
```
### 1.5 systemd
```sh
sudo cp /opt/rfc-app/deploy/systemd/rfc-app.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now rfc-app
sudo systemctl status rfc-app
```
Watch the logs:
```sh
sudo journalctl -u rfc-app -f
```
Expected startup line:
```
RFC app started — meta repo wiggleverse/meta
```
### 1.6 Smoke test
In a browser at `https://rfc.wiggleverse.org`:
1. The landing page renders (§14.1 — title, pitch, three-item deck,
sign-in affordance).
2. Click **Sign in with Gitea** → OAuth round-trip → catalog lands.
3. Click **+ Propose New RFC**, fill in title/slug/pitch, submit.
The pending-ideas disclosure shows the new PR.
4. As the owner, click the PR row, then **Merge proposal**. The
catalog refreshes with the super-draft.
5. Open `/admin` and confirm the four-tab home base loads
(Users / Graduation queue / Audit log / Permission events).
6. Open `/settings/notifications` and confirm the five sub-sections
render (per-category email, digest cadence, quiet hours, watches,
mute list).
If anything misfires, the troubleshooting section below covers the
common failure modes.
---
## 2. Day-2 operations
### 2.1 Logs
```sh
sudo journalctl -u rfc-app -f # follow
sudo journalctl -u rfc-app --since "1 hour ago"
sudo journalctl -u rfc-app -p err # errors only
```
The app logs at INFO level by default. Notable log lines to watch:
- `RFC app started — meta repo ...` — startup completed.
- `reconciler: starting sweep` / `reconciler: sweep complete` — the
five-minute §4.1 safety-net pass.
- `digest tick failed` / `hygiene tick failed` — a scheduler tick
crashed; the next tick will retry but the underlying error wants a
look. Stack trace lands next to the warning.
- `email (stdout fallback): to=...``SMTP_HOST` is unset and the
email loop is logging envelopes instead of sending.
### 2.2 Database backup
The SQLite file at `DATABASE_PATH` carries every app-canonical row
(users, threads, messages, watches, notifications, the audit log). The
§4 cache rebuilds from Gitea, so an empty backup of the cached_* tables
is recoverable — but the app-canonical tables aren't, so a backup is
load-bearing.
Daily snapshot (cron, as `rfc-app`):
```sh
0 3 * * * sqlite3 /opt/rfc-app/backend/data/rfc-app.db ".backup /opt/rfc-app/backend/data/backup-$(date +\%F).db"
```
Retention is your call; 30 daily snapshots is the easy default.
Restore a snapshot:
```sh
sudo systemctl stop rfc-app
sudo -u rfc-app cp /opt/rfc-app/backend/data/backup-YYYY-MM-DD.db \
/opt/rfc-app/backend/data/rfc-app.db
sudo systemctl start rfc-app
```
The reconciler will refill the cache from Gitea on first sweep.
### 2.3 Secret rotation
`SECRET_KEY` invalidates every active session cookie. To rotate:
```sh
NEW=$(openssl rand -hex 32)
sudoedit /opt/rfc-app/backend/.env # SECRET_KEY=$NEW
sudo systemctl restart rfc-app
```
Every signed-in user is bounced to the landing page and re-authenticates
through OAuth. Existing email-unsubscribe URLs become invalid (per
§15.4 they're signed against `SECRET_KEY`); a user can still unsubscribe
through `/settings/notifications`.
`GITEA_BOT_TOKEN` rotates without service disruption — write the new
value, restart. Old tokens stay valid in Gitea until revoked there.
`GITEA_WEBHOOK_SECRET` rotates in two steps: update the value in `.env`,
restart, then update the secret in Gitea's webhook config to match. A
brief window where webhooks are refused; the reconciler covers it.
### 2.4 The §12 hygiene timer cadence
The hygiene scheduler runs every `HYGIENE_TICK_SECONDS` (default 3600).
Each tick checks `cached_branches` for two boundaries:
- **30 days idle** (no commits, no PR) — the branch flips to
`state='closed'`. The branch stays in Gitea, but new chat is
disabled per §8.4.
- **90 days closed** (or 90 days post-merge for a merged-PR branch) —
the bot deletes the branch from Gitea. The `cached_branches.state`
flips to `deleted`, and the audit log records the action with
`actor_user_id=NULL` and `on_behalf_of=<bot login>` per §15.9.
Pinned branches (`cached_branches.pinned=1`) skip both passes. Per-user
`branch_chat_seen` cursors survive branch deletion — chat history is
app-canonical, not cached, and persists indefinitely.
If a branch needs to be kept alive past 30 days without commits, pin
it from the admin surface (or directly: `UPDATE cached_branches SET
pinned = 1 WHERE rfc_slug = ? AND branch_name = ?`).
### 2.5 Updating after a push
```sh
sudo -u rfc-app git -C /opt/rfc-app pull
sudo -u rfc-app /opt/rfc-app/backend/.venv/bin/pip install \
-r /opt/rfc-app/backend/requirements.txt
# Rebuild the frontend locally and rsync dist/ as in 1.3.2.
sudo systemctl restart rfc-app
```
The §5 schema migrations run on startup and are append-only. A restart
is the entire deploy.
---
## 3. Rollback
If a deploy goes sideways, the rollback shape is:
```sh
sudo -u rfc-app git -C /opt/rfc-app log --oneline -10
sudo -u rfc-app git -C /opt/rfc-app checkout <prior-commit>
sudo -u rfc-app /opt/rfc-app/backend/.venv/bin/pip install \
-r /opt/rfc-app/backend/requirements.txt
# Rebuild + rsync the frontend dist from the prior commit's state.
sudo systemctl restart rfc-app
```
The schema migrations are append-only, so rolling code back without
rolling the schema back is the safe default. If a migration introduced
a column the new code requires, the old code ignores the extra
column — SQLite reads the rows fine.
If the database itself got into a bad state (a botched manual UPDATE,
say), restore from the most recent backup per §2.2.
---
## 4. Troubleshooting
- **`systemctl status rfc-app` shows `RuntimeError: Required environment
variable ... is not set`.** The `.env` is missing a value, or
`EnvironmentFile=` in the systemd unit isn't finding it. Confirm
`/opt/rfc-app/backend/.env` exists and is mode 0600 owned by
`rfc-app`.
- **OAuth callback returns "Invalid state".** The redirect URI in Gitea
must match `APP_URL/auth/callback` exactly. Confirm it's
`https://rfc.wiggleverse.org/auth/callback`.
- **The catalog stays empty after a merge.** Check the webhook:
`journalctl -u rfc-app | grep webhook`. Gitea's **Settings → Webhooks
→ Recent Deliveries** on the meta repo shows the delivery status; the
reconciler will catch up within 5 minutes anyway.
- **`502 Bad Gateway` on /api/\* or /auth/\*.** uvicorn isn't running
or isn't bound to `127.0.0.1:8000`. `systemctl status rfc-app`.
- **`403` from nginx on static assets.** The nginx user can't read
`/opt/rfc-app/frontend/dist`. Apply the chmod from 1.4.1.
- **OAuth works, but the user can't propose.** The `users` row was
created with role `contributor`; only `OWNER_GITEA_LOGIN`'s login
gets `owner` on first sign-in. Confirm `.env` has the right value
and you signed in with that account.
- **Email isn't going out and no error logs.** Most likely `SMTP_HOST`
is unset; the stdout fallback is in play and envelopes are in the
journal as `email (stdout fallback): ...`. Set the SMTP block per
§1.3.3 to enable real sends.
- **The §12 hygiene sweep isn't deleting an obviously stale branch.**
Confirm `cached_branches.pinned = 0` for the row, and that
`last_commit_at` (or the joined `merged_at`) actually predates the
90-day cutoff. The `actions` audit log carries every hygiene gesture
with `action_kind IN ('close_idle_branch', 'delete_stale_branch',
'delete_post_merge_branch')`.