Files
rfc-app/PHILOSOPHY.md
T
Ben Stull 5dbcac8906 docs: name OHM as the corpus, "Human" as the first RFC
Reframe the Open Human Model in SPEC.md and PHILOSOPHY.md from "the
first RFC the framework will produce" to "the corpus of RFCs the
framework produces, of which the first defines *human*." Earlier
phrasing collapsed the project (OHM) and the first entry into one
name; this teases them apart.

Also surface the OpenXML APIs / UX downstream-consumer point: OHM is
English-first by design — the markdown bodies are canonical, and the
structured artifacts downstream systems need to actually let humans
and machines interact are derived from that English source, not
authored alongside it. This is part of why markdown round-trip
fidelity matters structurally (cf. the Phase 1 CM6 swap).

Updates the obvious example renames — slug `open-human-model` →
`human`, title "Open Human Model" → "Human", PR-list / breadcrumb /
notification examples — so the SPEC's worked-example consistently
shows OHM-as-corpus with Human as a member. Test fixtures and the
README seed-script invocation still carry the old slug; those are
left for a separate pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 10:06:30 -07:00

234 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Words first
*A standards process for shared meaning between humans and machines.*
Large language models work brilliantly with programming languages because
every word in Python or C has a definitive meaning enforced by tooling.
They struggle with natural language because no such dictionary exists for
words like *consent*, *trait*, or *agency* — words that do enormous work
in any system that interacts with humans. Trained on the open corpus,
LLMs inherit and amplify the ambiguity, producing text that looks crisp
and is, on inspection, fog.
The Wiggleverse RFC framework is a standardization process for natural-
language vocabulary, modeled on the way ISO C, POSIX, and the IETF RFCs
produced the standards that underwrite modern computing. Each RFC defines
one word: its meaning, its relationships to other defined words, and the
protocol by which humans and machines interact with it. The first RFC
defines *human*. Together, the corpus of RFCs the process produces is
the **Open Human Model** — a shared English-language vocabulary that
digital representations of humans, and the systems that interact with
them, can be built on without re-litigating what every word means. The
English is canonical; the OpenXML APIs and UX surfaces a downstream
system needs to actually let humans and machines interact are derived
from it, not authored alongside it.
This is public work. Humans and machines are both invited and both
required. The shared understanding the framework is reaching for — how
things work, in the physical and digital realms, between humans and
other humans, between machines and other machines, and between humans
and machines — cannot be produced by any one of those participants alone,
and we do not intend to try.
---
## The asymmetry
Large language models have been transformative for code in a way they have
not been for prose. The gap is wider than enthusiasm; it is structural.
When a model writes Python, every word in the language has a definitive
meaning. `for`, `range`, `await`, `class` — these are not approximations.
A compiler will refuse to run text that uses them incorrectly. The model
was trained on millions of examples where these words behave the same way
every time. Beneath the model is a substrate of agreement: programmer,
machine, and language designer all converge on what the word does. Output
flows downhill from that agreement, and the productivity gains are real.
When a model writes a strategy memo, none of this is true. *Inclusive*,
*scalable*, *user-centric*, *ethical* — these words do enormous work, and
the work they do is to mean different things to different readers. There
is no compiler. There is no refusal. The model averages across every
meaning the training data ever held and produces text that looks crisp
and is, on inspection, fog. Two readers extract two memos. Three months
later, when someone asks why the decisions diverged from the intent, the
document cannot answer, because the document never held a single intent
to begin with.
This is not a model failure. It is a *prerequisite* failure. We are
asking LLMs to compute in a language we never finished defining.
## Code's dictionary; natural language's absence of one
Programming languages have spent fifty years building the dictionary.
Every keyword has a specification. Every type has an interface. Every
API surface has a contract. The dictionary is enforced by tooling: type
checkers, linters, runtime errors. Drift between what you meant and what
you wrote is caught at the interface, loudly, by a machine whose job is
to refuse.
Natural language has no such dictionary, except locally, briefly, and by
accident — within a small team that has worked together long enough to
triangulate one. Outside that team, the dictionary evaporates. LLMs
trained on the open corpus inherit that evaporation. They are, at scale,
drift amplifiers. The more fluent the model, the more confidently the
drift compounds.
This is the entropy the framework is trying to address.
## What this actually is
What the framework produces is a stack of specifications.
The closer analogues are not programming languages — Python, JavaScript —
but the standards underneath them: ISO C, POSIX, the IETF RFCs that
define HTTP and TCP, the W3C recommendations that define HTML. Each is a
document, painstakingly argued and then formally adopted, that other
systems reference rather than reinvent. POSIX did not write the
operating systems that use it; it specified the surface those systems
could agree on. HTTP does not implement any particular web server; it
specifies the surface every web server has to honor.
The Wiggleverse RFC framework is the standardization process. The RFCs
it produces are the specifications. The corpus of those specifications
is the **Open Human Model** — the shared vocabulary that every digital
representation of a human, and every system that interacts with one,
can be built on without re-litigating what *consent*, *trait*, or
*agency* means each time. The first RFC defines *human* itself; the
rest define the constellation around it.
The analogy stretches in one important way, and the stretch is worth
naming. POSIX worked because it codified convention that already
existed in fragmented form across Unix vendors; the committee's job was
to harmonize working implementations. This framework has the harder
job. There are no working implementations of *consent* or *agency* to
harmonize — only twenty arguments per term and no convergence. We are
specifying the vocabulary in the first place. The reason this is now
tractable, and was not in any previous attempt, is the LLM as
participant: it provides the surface area and surfaces the cases humans
alone could never enumerate, while the humans provide the refusal and
the judgment. That dyad is what every previous attempt at a natural-
language standard at this resolution has lacked.
## The proposal
Before we build *with* LLMs in any domain that touches natural-language
concepts — identity, consent, value, harm, fairness, intent — we have to
build the dictionary. And the dictionary has to be built collaboratively,
with humans and machines together, because neither can do it alone.
Humans cannot. The history of dictionary-building by committee is a
history of dictionaries no one consults. We do not have the labor or the
patience to enumerate the surface area at the resolution machines need,
and we do not have the introspective access to our own usage to surface
the cases that actually matter.
Machines cannot either. A model trained on existing text has already
internalized the ambiguity it is meant to resolve. Asking it to dictate
definitions is asking the disease to write the cure.
But a human and a model in a careful argument can pin down what neither
could pin down alone. The model proposes; the human refuses or refines;
the argument is captured; the definition tightens. The transcript of the
argument becomes a thing both can index against later. This is what the
RFC framework is for.
## What it means to define a word
A definition, in the sense this framework cares about, has three parts.
The first is the **meaning** — a tight, unambiguous statement of what
the word picks out in the world. Not a dictionary gloss; a specification,
written so that a careful reader and a careful model would agree about
whether any given thing is or is not in the word's extension.
The second is the **relationships** — how the word connects to other
defined words. Definitions in isolation drift. A definition embedded in
an ontology — *is-a*, *part-of*, *implies*, *excludes* — is anchored.
The graph of definitions becomes the substrate that future definitions
stand on. This is the same move programming languages make when they
let one type be defined in terms of others.
The third is the **protocol** — how humans interact with the word, and
how machines interact with the word. A definition that humans use and a
definition that machines use are not separately valid; they have to be
the same definition, expressed in two registers. The RFC document is
the human register. The structured metadata around it is the machine
register. Both pass through the same review.
A word is "defined" when all three parts exist, have been argued over,
and have graduated to canonical status. Until then, it is a draft, and
anything built on top of it inherits the draftness.
## How the framework operationalizes this
The structural decisions in `SPEC.md` are not arbitrary. Each one is in
service of the philosophy above.
- The **meta-repository** holds the catalog of definitions. Every
defined word, and every word currently being defined, is one entry.
The catalog is itself a piece of public infrastructure — version-
controlled, branchable, auditable.
- **Super-drafts** are the moment a word enters the conversation.
Proposing a definition costs nothing in identifier space, because
most proposals will not survive the argument, and that is fine.
- **Graduation** is the moment a definition becomes load-bearing. It
gets a stable identifier (`RFC-NNNN`), its own repository, and a
permanent home. From that point forward, other RFCs can build on
it. Graduation is rare and ceremonial because what comes after it
is dependency.
- **AI participation in chat** is not a feature; it is the mechanism.
The model is one of the participants in the argument that produces
the definition. Its proposals are subject to refusal and refinement,
the same as any contributor's. The transcript of the argument is
preserved because the argument is the evidence the definition was
earned.
- **Branches, PRs, and tracked changes** exist because definitions
evolve, and the framework needs to make that evolution legible — to
humans reading later, and to machines computing against the current
state.
The first RFC defines *human*. The constellation around it — *trait*,
*preference*, *consent*, *harm*, *agency* — follows. Together, the
corpus the process produces is the Open Human Model: the dictionary
that everything else built here will stand on, and the English source
from which the OpenXML APIs and UX surfaces of downstream systems can
be derived. This is not a small project.
## An invitation
This is public work, and it is meant to be.
The vocabulary the framework is producing is for anyone who will need
to interact across the humanmachine boundary, or across the machine
machine boundary where the machines are acting on behalf of humans.
That is, in the limit, everyone. The framework is therefore designed
as public infrastructure: every entry sits in a public meta-repository,
every argument lives in a public chat thread, every change to a
graduated definition is a public PR with a reviewable trail.
The invitation is to participate. Propose a word. Argue against a
draft. Refine a definition. Flag a relationship the current ontology
is missing. Humans and machines are both invited, and both are
required. The shared understanding the framework is reaching for —
how things work, in the physical and digital realms, between humans
and other humans, between machines and other machines, and between
humans and machines — cannot be produced by any one of those
participants alone, and we do not intend to try.
## A note on humility
This work will not finish. Languages do not finish; they accrete, drift,
get pruned. The RFC framework is not a one-time act of definition; it is
a sustained practice of definition, of arguing about words in public, of
being willing to refine canonical entries when use reveals what spec
missed.
The claim is not that we can finalize meaning. The claim is that we
cannot build responsibly with LLMs in domains we have not even tried to
define. The work begins with one word, argued carefully, with a model
and a human together. Then another. Then the relationships between them.
Then the systems those definitions enable.
Build the dictionary first.