← All posts
agentic-ai claude-code developer-tooling

CLAUDE.md as project constitution

· 6 min read

Our frontend AGENTS.md is 505 lines. Backend is 714. Anthropic recommends under 200.

I covered the ETH Zurich study in the harness engineering post: more instructions means worse adherence. Doesn’t matter that Opus 4.6 has a 1M token context window. More lines means more chances for the model to lose the thread, contradict itself, or just ignore a rule on line 600.

I ran an audit to see how bad it actually was. Four parallel research agents, each looking at our AGENTS.md files from a different angle. About 40% were behavioral rules the agent genuinely needs at session start. 12% was boilerplate we could delete outright. The remaining 49% was reference material that has no business being in context until the agent actually touches the relevant code. Half the file, carried into every session for nothing. The question was how it got that way and what to do about it.

Needed at session start 40%
Load on demand (reference material) 49%
Deletable boilerplate 12%
Audit of our 714-line backend AGENTS.md. Less than half belongs in every session.

The temptation

Every time an agent makes a mistake the instinct is to add a rule. “Use await not .then().” “Don’t import from @internal packages.” “Name GraphQL mutations with the verb first.” Each one makes sense on its own. Before long the convention file is 700 lines and reads like a legal document.

Most of these rules are things the model could figure out by reading the existing code. If every mutation in the codebase already starts with a verb, the model follows the pattern. Adding a rule for something the codebase demonstrates is just noise in the context window.

That’s how our files grew. An agent made a mistake, someone added a rule. The rule worked so it stayed. Nobody went back to check whether the codebase had gotten consistent enough to make it redundant. Rules accumulated, never got pruned, and the file turned from a useful set of constraints into an encyclopedia that was actively hurting adherence on the rules that mattered.

The OpenAI Codex team used about 100 lines of AGENTS.md as the entry point for 1,500 merged PRs. One hundred lines for a project that shipped over a million lines of code. That number stuck with me because our files were seven times longer and our PRs were not seven times better.

It’s getting noisier in here

The danger isn’t that convention files are too long. It’s that noise drowns out signal.

When the agent has 714 lines of instructions it doesn’t weight them equally. A rule on line 50 and a rule on line 680 don’t get the same attention. The context window might be 1M tokens but the model’s focus is finite. Research on “context rot” shows LLMs degrade with as little as 100 tokens of irrelevant content in the window. Every rule the model could have inferred from the codebase is actively making it worse at following the rules it can’t.

I noticed this before I found the research to explain it. We had a rule about never skipping pre-commit hooks. Hard constraint, the kind of thing the agent genuinely can’t infer from the code. But it was buried in a 714-line file between “use kebab-case for file names” (which the entire codebase already demonstrates) and a paragraph about our GraphQL naming conventions (which the schema makes obvious). The agent would occasionally skip hooks. Not because the rule wasn’t there, but because it was lost in a sea of rules that didn’t need to be.

This compounds with the context budget problem from the composition post. Skills register descriptions into context. Convention files load at session start. MCP servers add tool descriptions. Each one is fine individually. Together they’re competing for the model’s attention with the thing I actually asked it to do.

The fix isn’t “write less.” It’s “write less that loads at session start, and load the rest when it’s relevant.”

The filter

After the audit I needed a way to decide what stays in the convention file and what moves out. The filter is simple: can the model figure this out from the code, and does it need this to avoid a mistake it can’t recover from?

If the codebase already demonstrates the pattern, don’t document it. If the model can infer the convention from existing examples, don’t document it. One-time setup instructions belong in a script. Reference material for a specific area of the codebase should load on demand when the agent actually touches those files.

What survives:

Hard constraints the model can’t infer. “Don’t use any in TypeScript.” “All GraphQL mutations must go through the command bus.” “Never skip pre-commit hooks.” The agent needs to be told these because the code alone doesn’t make them obvious.

Architecture boundaries. Where the seams are, what talks to what. The model can read individual files but can’t always see the intent behind the separation. Why the frontend and backend share a repo but deploy independently. Why certain services talk through Kafka instead of direct calls.

Non-obvious conventions. Naming patterns that break from standard practice. Config values that look wrong but are correct. Decisions that look like bugs to someone who wasn’t in the room.

Everything else moves into path-scoped rules. Anthropic’s .claude/rules/ system does exactly this. A rule file with path-scoped frontmatter only loads when the agent touches files that match the pattern. Backend rules load when the agent opens Go files. Frontend rules load when it opens TypeScript. Universal rules (branch strategy, PR format, test requirements) stay in the main file at session start.

A cross-workspace session that touches both sides still loads both rule sets. But a focused session on a single Go service only carries the Go rules. Fewer rules in context, better adherence on the ones that are there.

We went from 714 lines in one file to about 80 at session start plus domain-specific rules that load contextually. The agent follows the 80-line constitution better than it ever followed the 714-line encyclopedia.

Before: one file, always loaded 714 lines
After: session start constitution 80 lines
714 lines down to 80 at session start. The rest loads on demand via path-scoped rules.

What I’d tell someone starting out

If you’re setting up CLAUDE.md or AGENTS.md for the first time, start under 100 lines. Write the rules the model genuinely can’t infer from your code. Architecture boundaries, hard constraints, non-obvious conventions. Stop there.

If you already have a long convention file, run the audit. You don’t need four parallel agents for it. Read the file and ask yourself for each rule: would the model get this right from the codebase alone? If yes, delete it. If the rule only applies to part of the codebase, move it to a path-scoped rule file.

The instinct to add rules will keep coming back. Every time the agent makes a mistake the temptation is to add a line. Check whether the codebase is inconsistent first. A consistent codebase teaches better than a longer convention file.

The ETH Zurich study says write less. The OpenAI Codex team proved 100 lines is enough for 1,500 PRs. Our experience matches: 80 lines at session start with contextual loading beats 714 lines of everything all the time. A constitution is short because it has to be.


Part of a series on agentic development tooling. See also: Harness engineering: the model is just the horse, Five revisions of start-ticket, Ship-it, Composing over reimplementing