Rule drift

Addy Osmani has a clean line about agent harnesses. Every mistake becomes a rule. The harness ratchets toward the behaviour you want, one failure at a time, and the rulebook only grows. The framing is right and his post is worth reading. It is also half the move. The other half — the one that keeps the rulebook from collapsing under its own weight — is consolidation, and most write-ups about harness engineering skip it.

What the ratchet gives you

Take a code-review harness with a smells list — the patterns the reviewer agent should flag on any PR. Say it has nine entries. Bare except: is on it because one past review approved a bare except and the bug it hid surfaced in prod two weeks later. SELECT * is on it because a query that looked harmless against a twenty-row table table-scanned a million-row one in staging. Each entry came from a specific review that should not have shipped, and the post-mortem on that review is the reason the rule exists.

A rule earned through a real failure cannot be argued out of, because the cost of the failure is the only argument the rule ever needs to make. The list does not grow by taste. It grows by evidence. The bar for adding a pattern is one review the harness produced that should not have shipped — anything weaker is opinion, and opinion is the wrong currency for a rulebook the system reads on every run.

That is what Osmani is naming when he calls it a ratchet. The constraint only moves one direction. Each new entry is a small step the harness will never have to take again, because the rule that prevents it is in the rulebook now, and every future review is written with it in scope.

Where the ratchet drifts

Imagine the same harness has two smells lists. One lives in reviewer.md, under the Rules section that the reviewer step reads while drafting comments on a PR. The other lives in final-pass.md, as the sweep the audit step runs against the finished review before it posts. The lists are almost identical. They are edited as separate documents. By the time anyone notices, one has picked up TODO without a ticket reference and the other has not.

A rule that lives in two harness files drifts because each file is edited independently and no compiler checks prose for consistency. When a new smell shows up in a missed review, the next edit lands in whichever file is open. The other file goes one more cycle without the rule. The reviewer step now knows to flag a pattern the audit step will not catch. The harness is silently inconsistent with itself, and the silence is the part that matters — there is no error, no log line, no exception. Just two pieces of prose that have started disagreeing.

Markdown has no type checker. Nothing in the file system tells you that the smells list in one file is a superset of the smells list in another, or that the blocker-class list in one file has acquired an entry the other has not. The contract between the two files exists only in the operator's memory of having written them, and memory is the wrong layer for a contract to live in.

The second move

The fix is a structural refactor of the harness, not an additive one. Extract every review rule — tone, smells to flag, blocker-class issues, what the reviewer skips, sign-off — into a new file called review-rules.md. The Rules section disappears from reviewer.md. The smells and blocker-class sweeps disappear from final-pass.md. Each is replaced with a single line pointing at review-rules.md. A grep for SELECT * across the harness now returns one file.

Rules that live in one canonical file cannot drift, because there is only one place to edit and every consumer reads the same version. Consolidation removes the surface area drift needs to occur on. The next time a review surfaces a new smell, there is exactly one place to add it. The reviewer step picks up the new line on its next run. So does the audit step. The lists cannot disagree, because there is only one list.

The cost is one indirection on the reading side — reviewer.md and final-pass.md now follow a pointer instead of inlining the rule. The benefit is that the rule has one home, and the next edit cannot accidentally fork it into two slightly different rules. That is the move the ratchet framing leaves out. Accrete, then put the accretion somewhere it cannot fork.

What this means for the rulebook you already have

Osmani cites HumanLayer's discipline of keeping AGENTS.md to about sixty lines. The reasoning is that long rulebooks dilute the individual rules — the model treats line 41 with less weight than line 4, and an over-long list trains the system to skim past most of it. The advice is good. It is also the accretion half.

The length signal is real, but length is not the underlying problem — distribution is. A 60-line AGENTS.md that is also repeating four of its rules inside a hook script, three more inside a subagent system prompt, and one more inside a tool description is already drifting, because each copy was edited on a different day and each is one paragraph off from the others by now. The line count looks healthy and the rulebook is still incoherent.

The consolidation move is a single grep. For each rule in AGENTS.md, search the rest of the harness — hooks, subagent prompts, tool descriptions, audit checklists, any markdown file the pipeline reads — and look for the same idea phrased differently. If the rule already exists somewhere else, neither location is canonical, and the next edit will pick one of them by accident. Pick a file, leave the rule there, and replace every other copy with a pointer. The cost is one minute of grep per rule. The benefit is that the next edit cannot fork the rulebook.

Osmani's ratchet is right; every mistake should become a rule. The post just stops one beat early. A harness that only accretes is a harness that drifts, and the drift is invisible until two files disagree about the same word. The full move is two beats — accrete, then canonicalise. The first beat is where the rules come from. The second is what keeps them meaning the same thing.