← Back to The book shelf
#006 v6

Judgement-shaped problems

/6 min read

"Agent" has become a label for anything with an LLM in it. Some of the systems wearing the label do work that integrations could not reach. Most do not. The difference is not what is in the box — it is whether the problem itself is judgement-shaped, and whether the if/else has actually moved out of the code into the model's inference. If neither is true, the system is an integration in costume.

What integrations were built for

Take the simplest possible automation: an invoice arrives, a notification posts to Slack, the sheet updates. Most automation looks like that. There is a known input, a known output, and the work is wiring the two together with as little surprise as possible. Payment integrations, ETL pipelines, scheduled syncs, webhooks routing events between systems — the shape integrations were built for is "I have a known input and a known output, and I want them connected reliably."

Integrations encode that input space at design time because that is the shape of problem they were built for. Every input class is a branch I write. Every branch is code I maintain. As long as my branches cover the inputs that arrive, the system runs cleanly.

The trouble starts when the real input space is open. Free-form text, mixed-mode requests, exceptions I did not enumerate when I shipped. Each new shape is a new branch, and the branches start interacting. Surface area grows quadratically with input variance, and the integration architecture — which scaled gracefully under enumerable inputs — becomes a maintenance liability under non-enumerable ones. Not because the architecture is wrong. Because the problem changed shape underneath it.

What "agent" means when the word means something

A useful definition of "agent" should make it clear which systems are doing something integrations could not. Mine is narrow on purpose: an agent is a system that has moved the input → output decision from my code to the model's inference. The if/else does not vanish. It gets relocated. Where the integration's branches live in source I write, the agent's branches live in the prompt I send and the reasoning the model does at runtime.

This is an architectural change, not a marketing one. Relocating the if/else changes who maintains the input space — me, ahead of time, or the model, in the moment. The cost shifts from code I must write to inference I must trust.

Take support-email triage. The integration version classifies by keywords or by a per-intent classifier. It works for the cases I anticipated, and fails on the ones I did not — the customer who is angry but polite, the bug report disguised as a feature request, the urgent thread buried in pleasantries. Each new failure mode is a new branch. The branches multiply. The system grows brittle.

The agent version reads the email and decides what to do with it. The decision logic is no longer in my code; it is in the model's interpretation of the prompt and the email. I have not removed the if/else. I have moved it to a place where new shapes do not require new code.

That gives the reader a diagnostic. Can I write the flowchart? If yes, the input is integration-shaped, and an agent is overkill. If every attempt hits "and 200 edge cases," the input is judgement-shaped, and the relocation is what lets the problem be answered at all.

The label is uncalibrated

Most systems calling themselves agents are not. They are integrations with a model call somewhere in the middle. The shape is familiar: read input, ask an LLM to extract structured fields, run a deterministic flow on what the LLM returned. The if/else still lives in my code. The model is doing fuzzy parsing, not judgement.

I have a name for this pattern: model-assisted integration. It is not bad architecture. For many problems it is the right architecture — fuzzy parsing is a real capability when the input is structured-but-messy. The error is not in building one. The error is in calling it an agent and inheriting the runtime properties of one in marketing copy without inheriting them in code.

Most production "agents" are this shape because the label gets applied for marketing rather than for architecture, and there is no standard yet to push back. The presence of an LLM call is not the test. The test is where the decision lives. If the flow is fixed and the LLM is filling in fields, the system is a model-assisted integration. If the flow is decided by the model at every step, the system is doing what the word "agent" should mean. Most things in production are the first one wearing the second one's clothes.

The failure mode trade-off

Even when the input is genuinely judgement-shaped, an agent is not always the right answer. The relocation comes with a permanent cost: how the system fails.

Integrations fail loud. A schema does not match. A field is missing. A request times out. The error is visible at the boundary between systems, easy to log, easy to alert on. Whatever the bug is, I see it, and I can fix it.

Agents fail soft. The output is plausibly wrong — confident, well-formed, in the right shape — and it passes through the rest of the system as if it were correct. There may be no exception, no log line, no alert. The error becomes visible only downstream, when something acts on the wrong output. Sometimes I never see it.

This is not a bug in any particular agent. It is a runtime property. Agents fail soft because inference is non-deterministic by construction; the output is sampled from a distribution that includes plausibly-wrong answers. Better models reduce the distribution's tail. They do not eliminate it.

That makes the choice between integration and agent partly a choice of which failure mode I can afford. A financial transaction routing system cannot tolerate plausibly-wrong outputs; integration is the only honest answer there, regardless of how judgement-shaped the input feels. A content-tagging system can tolerate the occasional miscategorisation; soft failure is a cost the system can absorb. The shape of the input matters; the cost of soft failure matters more.

Where agents earn their cost

Three shapes of problem reliably reward the relocation, because each satisfies all three conditions at once: the input space is open, soft failure is acceptable, and inference is cheaper than the alternative.

Under-determined intent extraction. Support-email triage, customer-feedback classification, free-form ticket routing. Input cannot be enumerated; output is one of a manageable number of buckets; soft failure is annoying but recoverable.

Cross-domain reasoning. Legal documents to action items. Procurement notices to compliance checks. Call transcripts to CRM updates. Input format varies; target schema varies; the mapping requires interpretation no fixed code path can carry.

Generative work. CMS content drafts, marketing copy, product descriptions. The output does not exist before the system runs; there is no input to map to — only an input to think about.

I have shipped systems in each of these shapes, and the diagnostic from earlier is what I run before I pick a stack. Can I write the flowchart? If yes, ship the flowchart. If every attempt hits "and then it depends," check whether the failure mode I would inherit is one I can afford. If both answers point to an agent, the relocation pays. Tedium of maintaining the integration is recoverable. Plausibly-wrong output that nobody catches is not.

Most production systems wearing the agent label sit between those two — integrations in costume, where the if/else still lives in the code. The label will keep moving until something forces it not to. Until then, the question to ask is shape, not name.

Share: