Files

T

Ben Stull 5dbcac8906 docs: name OHM as the corpus, "Human" as the first RFC

Reframe the Open Human Model in SPEC.md and PHILOSOPHY.md from "the
first RFC the framework will produce" to "the corpus of RFCs the
framework produces, of which the first defines *human*." Earlier
phrasing collapsed the project (OHM) and the first entry into one
name; this teases them apart.

Also surface the OpenXML APIs / UX downstream-consumer point: OHM is
English-first by design — the markdown bodies are canonical, and the
structured artifacts downstream systems need to actually let humans
and machines interact are derived from that English source, not
authored alongside it. This is part of why markdown round-trip
fidelity matters structurally (cf. the Phase 1 CM6 swap).

Updates the obvious example renames — slug `open-human-model` →
`human`, title "Open Human Model" → "Human", PR-list / breadcrumb /
notification examples — so the SPEC's worked-example consistently
shows OHM-as-corpus with Human as a member. Test fixtures and the
README seed-script invocation still carry the old slug; those are
left for a separate pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-25 10:06:30 -07:00

12 KiB

Raw Blame History

Words first

A standards process for shared meaning between humans and machines.

Large language models work brilliantly with programming languages because every word in Python or C has a definitive meaning enforced by tooling. They struggle with natural language because no such dictionary exists for words like consent, trait, or agency — words that do enormous work in any system that interacts with humans. Trained on the open corpus, LLMs inherit and amplify the ambiguity, producing text that looks crisp and is, on inspection, fog.

The Wiggleverse RFC framework is a standardization process for natural- language vocabulary, modeled on the way ISO C, POSIX, and the IETF RFCs produced the standards that underwrite modern computing. Each RFC defines one word: its meaning, its relationships to other defined words, and the protocol by which humans and machines interact with it. The first RFC defines human. Together, the corpus of RFCs the process produces is the Open Human Model — a shared English-language vocabulary that digital representations of humans, and the systems that interact with them, can be built on without re-litigating what every word means. The English is canonical; the OpenXML APIs and UX surfaces a downstream system needs to actually let humans and machines interact are derived from it, not authored alongside it.

This is public work. Humans and machines are both invited and both required. The shared understanding the framework is reaching for — how things work, in the physical and digital realms, between humans and other humans, between machines and other machines, and between humans and machines — cannot be produced by any one of those participants alone, and we do not intend to try.

The asymmetry

Large language models have been transformative for code in a way they have not been for prose. The gap is wider than enthusiasm; it is structural.

When a model writes Python, every word in the language has a definitive meaning. for, range, await, class — these are not approximations. A compiler will refuse to run text that uses them incorrectly. The model was trained on millions of examples where these words behave the same way every time. Beneath the model is a substrate of agreement: programmer, machine, and language designer all converge on what the word does. Output flows downhill from that agreement, and the productivity gains are real.

When a model writes a strategy memo, none of this is true. Inclusive, scalable, user-centric, ethical — these words do enormous work, and the work they do is to mean different things to different readers. There is no compiler. There is no refusal. The model averages across every meaning the training data ever held and produces text that looks crisp and is, on inspection, fog. Two readers extract two memos. Three months later, when someone asks why the decisions diverged from the intent, the document cannot answer, because the document never held a single intent to begin with.

This is not a model failure. It is a prerequisite failure. We are asking LLMs to compute in a language we never finished defining.

Code's dictionary; natural language's absence of one

Programming languages have spent fifty years building the dictionary. Every keyword has a specification. Every type has an interface. Every API surface has a contract. The dictionary is enforced by tooling: type checkers, linters, runtime errors. Drift between what you meant and what you wrote is caught at the interface, loudly, by a machine whose job is to refuse.

Natural language has no such dictionary, except locally, briefly, and by accident — within a small team that has worked together long enough to triangulate one. Outside that team, the dictionary evaporates. LLMs trained on the open corpus inherit that evaporation. They are, at scale, drift amplifiers. The more fluent the model, the more confidently the drift compounds.

This is the entropy the framework is trying to address.

What this actually is

What the framework produces is a stack of specifications.

The closer analogues are not programming languages — Python, JavaScript — but the standards underneath them: ISO C, POSIX, the IETF RFCs that define HTTP and TCP, the W3C recommendations that define HTML. Each is a document, painstakingly argued and then formally adopted, that other systems reference rather than reinvent. POSIX did not write the operating systems that use it; it specified the surface those systems could agree on. HTTP does not implement any particular web server; it specifies the surface every web server has to honor.

The Wiggleverse RFC framework is the standardization process. The RFCs it produces are the specifications. The corpus of those specifications is the Open Human Model — the shared vocabulary that every digital representation of a human, and every system that interacts with one, can be built on without re-litigating what consent, trait, or agency means each time. The first RFC defines human itself; the rest define the constellation around it.

The analogy stretches in one important way, and the stretch is worth naming. POSIX worked because it codified convention that already existed in fragmented form across Unix vendors; the committee's job was to harmonize working implementations. This framework has the harder job. There are no working implementations of consent or agency to harmonize — only twenty arguments per term and no convergence. We are specifying the vocabulary in the first place. The reason this is now tractable, and was not in any previous attempt, is the LLM as participant: it provides the surface area and surfaces the cases humans alone could never enumerate, while the humans provide the refusal and the judgment. That dyad is what every previous attempt at a natural- language standard at this resolution has lacked.

The proposal

Before we build with LLMs in any domain that touches natural-language concepts — identity, consent, value, harm, fairness, intent — we have to build the dictionary. And the dictionary has to be built collaboratively, with humans and machines together, because neither can do it alone.

Humans cannot. The history of dictionary-building by committee is a history of dictionaries no one consults. We do not have the labor or the patience to enumerate the surface area at the resolution machines need, and we do not have the introspective access to our own usage to surface the cases that actually matter.

Machines cannot either. A model trained on existing text has already internalized the ambiguity it is meant to resolve. Asking it to dictate definitions is asking the disease to write the cure.

But a human and a model in a careful argument can pin down what neither could pin down alone. The model proposes; the human refuses or refines; the argument is captured; the definition tightens. The transcript of the argument becomes a thing both can index against later. This is what the RFC framework is for.

What it means to define a word

A definition, in the sense this framework cares about, has three parts.

The first is the meaning — a tight, unambiguous statement of what the word picks out in the world. Not a dictionary gloss; a specification, written so that a careful reader and a careful model would agree about whether any given thing is or is not in the word's extension.

The second is the relationships — how the word connects to other defined words. Definitions in isolation drift. A definition embedded in an ontology — is-a, part-of, implies, excludes — is anchored. The graph of definitions becomes the substrate that future definitions stand on. This is the same move programming languages make when they let one type be defined in terms of others.

The third is the protocol — how humans interact with the word, and how machines interact with the word. A definition that humans use and a definition that machines use are not separately valid; they have to be the same definition, expressed in two registers. The RFC document is the human register. The structured metadata around it is the machine register. Both pass through the same review.

A word is "defined" when all three parts exist, have been argued over, and have graduated to canonical status. Until then, it is a draft, and anything built on top of it inherits the draftness.

How the framework operationalizes this

The structural decisions in SPEC.md are not arbitrary. Each one is in service of the philosophy above.

The meta-repository holds the catalog of definitions. Every defined word, and every word currently being defined, is one entry. The catalog is itself a piece of public infrastructure — version- controlled, branchable, auditable.
Super-drafts are the moment a word enters the conversation. Proposing a definition costs nothing in identifier space, because most proposals will not survive the argument, and that is fine.
Graduation is the moment a definition becomes load-bearing. It gets a stable identifier (RFC-NNNN), its own repository, and a permanent home. From that point forward, other RFCs can build on it. Graduation is rare and ceremonial because what comes after it is dependency.
AI participation in chat is not a feature; it is the mechanism. The model is one of the participants in the argument that produces the definition. Its proposals are subject to refusal and refinement, the same as any contributor's. The transcript of the argument is preserved because the argument is the evidence the definition was earned.
Branches, PRs, and tracked changes exist because definitions evolve, and the framework needs to make that evolution legible — to humans reading later, and to machines computing against the current state.

The first RFC defines human. The constellation around it — trait, preference, consent, harm, agency — follows. Together, the corpus the process produces is the Open Human Model: the dictionary that everything else built here will stand on, and the English source from which the OpenXML APIs and UX surfaces of downstream systems can be derived. This is not a small project.

An invitation

This is public work, and it is meant to be.

The vocabulary the framework is producing is for anyone who will need to interact across the human–machine boundary, or across the machine– machine boundary where the machines are acting on behalf of humans. That is, in the limit, everyone. The framework is therefore designed as public infrastructure: every entry sits in a public meta-repository, every argument lives in a public chat thread, every change to a graduated definition is a public PR with a reviewable trail.

The invitation is to participate. Propose a word. Argue against a draft. Refine a definition. Flag a relationship the current ontology is missing. Humans and machines are both invited, and both are required. The shared understanding the framework is reaching for — how things work, in the physical and digital realms, between humans and other humans, between machines and other machines, and between humans and machines — cannot be produced by any one of those participants alone, and we do not intend to try.

A note on humility

This work will not finish. Languages do not finish; they accrete, drift, get pruned. The RFC framework is not a one-time act of definition; it is a sustained practice of definition, of arguing about words in public, of being willing to refine canonical entries when use reveals what spec missed.

The claim is not that we can finalize meaning. The claim is that we cannot build responsibly with LLMs in domains we have not even tried to define. The work begins with one word, argued carefully, with a model and a human together. Then another. Then the relationships between them. Then the systems those definitions enable.

Build the dictionary first.

12 KiB Raw Blame History Unescape Escape