Pattern shopping

There is a genre of post that lays out a roadmap for mastering agentic design patterns. ReAct first, then Reflection, then Planning, then Tool Use, then Multi-agent. Each pattern gets a short paragraph on when to reach for it and a short paragraph on what it costs. The pieces are useful as references and I have nothing against them as references. The problem is that they read as curricula, and a pattern catalog read as a curriculum produces decisions made by vocabulary rather than by need.

The catalog reads forward

ReAct is a loop — think, act, observe, repeat. Reflection is generation, self-critique, revision. Planning is decomposing the task into ordered steps before any of them run. Tool use is calling out to a fixed catalog of external functions. Multi-agent is splitting the work across specialists under a coordinator. Each pattern is presented the same way: here is the name, here is the loop, here is when to use it, here is what it costs.

The catalog reads in that order — name, then move, then constraint — because that is the only order an enumeration can read in. The name is the index. You cannot look up "the move that fixes outputs you cannot validate inline" — you can only look up Reflection and discover it might be. The sequence is forced by the format.

Engineering runs backward

The pipeline I wrote this post in has three components. A planning phase that runs before any prose. An audit phase that reads the draft against a written critique list. A vocabulary file the draft gets swept against. The catalog has names for all three: the planning phase is Planning, the audit phase is Reflection, the vocabulary file is closer to Tool Use.

None of those pieces got chosen because they appeared in a catalog. The planning phase exists because drafts kept opening with abstraction instead of a concrete anchor. The audit phase exists because AI-tell words kept slipping through the drafting pass and I could not catch them in the same pass that wrote them. The vocabulary file exists because the same rules were drifting across two files — I named that one already, in rule drift. Each piece is the response to a specific failure, and the catalog names them only after the fact.

Engineering runs backward from the catalog because the design decision is which constraint binds, not which name applies. The name is a label on the answer, not the answer.

Pattern shopping

Three teams over the last year. The criticism is the catalog, not the engineers.

Team A wrapped a Reflection loop around a model output that was supposed to be JSON in a specific schema. The Reflection loop generated the JSON, the critic checked it, the reviser fixed it if not. The loop ran in two to four seconds per call. A JSON-schema validator on the same output would have run in milliseconds and either passed or failed with a precise reason. The team could name the pattern but not the constraint Reflection was supposed to answer.

Team B reached for Planning on a task that already had a fixed sequence of steps. The plan was generated, the executor walked through it, the plan generator re-planned whenever something unexpected came back. The system worked. A deterministic pipeline of the same steps would have been faster, cheaper, and easier to debug. Planning was the right name for the move, but the move was answering a constraint that was not present.

Team C decomposed a single agent into five specialists because the prompt was getting long. The bug count went up — coordination is its own surface — and the latency went up because the coordinator now had to make a call before any specialist ran. The bottleneck had been prompt length, which responds to retrieval or summarisation, not to specialisation.

Pattern shopping happens because the reader picks the most recently learned move rather than the one their failure mode demands. The move and the failure detach. The system still gets built; it just gets built around a vocabulary, and the vocabulary's weight is wrong for the constraint at hand.

The ancestor test

The diagnostic is one sentence per pattern: name the pre-LLM engineering move it descends from.

ReAct is a debug loop with logging — produce an output, observe its effect, decide what to do next, repeat. Any engineer who has shipped a long-running process has written that loop, usually with print statements as the observation step. The loop body is now a language model; the loop itself is decades old. Reflection is code review with a linter — the model as both author and reviewer, the linter as the deterministic check the reviewer reaches for. Planning is task decomposition — the move every engineer makes the first time they write a one-paragraph spec before they write the code. Tool use is API integration with a fixed catalog of endpoints. Multi-agent is service decomposition; the trade-offs (coordination cost, ownership of state, routing logic) are the same ones distributed systems have always carried.

If the ancestor is unfamiliar, the pattern probably is too, and the catalog is doing the work of scaffolding rather than reference. Scaffolding is not a bad thing — most learning needs it — but it is a different thing, and the genre does not label it that way.

The catalog reads correctly when the reader brings the constraint to it, because then the catalog only has to supply the name. The expensive part — recognising which move applies — is already done.

The replacement

A design review template that fits on one screen. For each pattern in a proposed design, the author has to answer three questions. What failure mode does this pattern answer? What is the cheapest alternative we considered? What breaks if we remove this pattern? If the author cannot answer all three, the design goes back. The template took ten minutes to write and has caught more over-architecture than any technical-design book on my shelf.

Intake runs the same direction. The first document for any new system is a failure-mode list, not a pattern list. Each failure mode is one sentence — the model produces malformed JSON, the prompt grows beyond context limits, the user-facing latency exceeds two seconds. Patterns enter the doc second, attached to specific failure modes. A pattern with no failure attached gets cut.

Hiring runs the same logic. A standard question I now ask in technical interviews: walk me through a system you have shipped, and for each architectural choice — not just the obvious patterns, all of it — name the constraint that drove it. A candidate who pattern-shops will lead with the names they used; one who has thought backwards will lead with the constraints those names answered, and that difference is audible inside twenty minutes. It is a more reliable hiring signal than any algorithms round I have run.

A constraint-first design review surfaces over-architecture before it ships because each pattern has to defend its place rather than appear by default. The same logic carries through intake and hiring. The CTO seat acts on three levers — approval, intake, hiring — and pulling them in the same direction is what stops the team pattern-shopping.

When the shelves are empty

An engineer on my team joined six months ago. Three years into their career, all of it building on top of LLMs and managed APIs — no service decomposition shipped, no debug loop written in print statements, no integration tests against an unreliable third-party API. Smart, ships, fast. The ancestor test points at empty shelves for them.

The scaffolding is not the catalog and not the pre-LLM history. It is a curated wiki of the failures the team has had, each entry organised by the constraint it taught — JSON output versus Reflection, fixed pipelines versus Planning, prompt length versus specialisation, and the rest as they accrete. Each entry carries the constraint, the move the team almost made (or made and recovered from), and the cheapest right move once the constraint was named. New engineers read the wiki first, the pattern catalog second.

A wiki of past failures works as scaffolding because it gives the engineer the constraint side of every pattern before they have earned it through experience, so the names in the catalog have somewhere to attach. The catalog comes out only after the wiki has primed the constraint half. The reading order matches engineering's order.

The wiki has a limit. It is reactive. It only covers failures the team has already named, which means the engineer reading it will still pattern-shop on any constraint the wiki does not anticipate. The wiki is a stopgap that fills in until the team grows an engineer who has the constraint side already, and stops being load-bearing the moment that engineer is in the room.

The catalog itself is fine. The genre that presents it as a curriculum is the move I am pushing back on. The intake template and the wiki of past failures between them give a team a way to read the catalog backward — pattern shopping gets harder when the design review will not approve a pattern without a failure attached, and newer engineers get the constraint side they have not lived through. The harder problem is the failure modes the wiki does not yet name. The wiki catches up by waiting for the loss, and the next pattern-shopping incident on a constraint nobody has spoken about will look indistinguishable from the rest until the loss has a name.