← Back to The book shelf
#010 v9

Disposable by default

/5 min read

Andrej Karpathy's line is that vibe coding raises the floor and agentic engineering raises the ceiling. Two recent pieces run with it — one from MindStudio walking through the framework, one from Fan Wu on Design Bootcamp mapping it onto product work — and both tell it as a story about people. Amateurs vibe their way to a working demo; professionals orchestrate agents and check the output; the arc from one to the other is a ladder you climb. The metaphor is clean, and I think it points at the wrong variable. What decides which mode you should be in isn't who you are. It's whether the code you're about to generate has to survive.

The ladder that isn't one

Fan Wu's piece stacks the work in three layers: strategic product thinking on top, agentic systems in the middle, vibe practice at the bottom — the why, the what, the how. It reads as a climb, bottom to top. Karpathy's floor-and-ceiling image does the same work: the floor is where beginners stand, the ceiling is what experts reach for. Both describe a person moving up and staying up.

When these two pieces first crossed my feed I read them the same way. What changed my mind was a smaller observation: the same engineer drops from the ceiling to the floor and back inside a single afternoon. The ladder framing breaks because the mode you pick changes far faster than your skill does. Skill only moves one direction, and slowly — you are not less experienced after lunch than before it. But the choice between vibing something out and verifying it line by line flips a dozen times a day, which means it tracks something that changes minute to minute. Skill isn't it.

What actually changes is the artifact

The MindStudio piece points to Peter Steinberger running dozens of agents in parallel and checking every output before it lands. The easy reading is that this is just what professionals do — verification as a mark of seniority. But look at what he's actually checking: outputs headed into a codebase that teammates, and his own later self, will build on top of. He isn't verifying because he's senior. He's verifying because the output crosses a line where someone downstream has to trust it without re-deriving it.

Verification is only worth its cost once an output has to be trusted by a second reader, because that is the only time being wrong is expensive. Below that line, a wrong output costs you the thirty seconds it takes to throw it away. Above it, a wrong output costs whoever inherits it — and they inherit it without the context that would let them catch the error cheaply. Call the line the survival boundary. Under it sit the disposable things: the throwaway script, the one-off data transform, the prototype you demo once and close. Over it, the output has a second reader — a teammate, production, the next agent that loads your code as context. Verification is the toll for crossing.

Why the label hides the real skill

Karpathy's own late-2025 note is that the models got reliable enough that he mostly stopped correcting them. In nearly the same breath comes the example everyone repeats: a frontier model that refactors a sprawling codebase cleanly and then, asked something about the physical world, recommends walking to a car wash. The capability is spiky. Reliability is high on average and unpredictable in any single spot, so the decision to verify can't be set once and worn like a title — it has to be made fresh against this output and whatever depends on it.

Treating the mode as identity fails in two directions, because the label answers "who am I" when the artifact is asking "will anyone trust this later." The engineer who has decided he's an agentic engineer now reviews a five-line script he'll run twice and delete, paying a toll at a boundary he never crossed. The engineer who has decided vibe coding is fine ships the demo that quietly becomes the backend. Both got the question wrong from opposite ends, and in each case the miss wasn't about skill — it was a misread of how long the thing they made was going to live.

The question to ask instead

Fan Wu assigns each model a role — ChatGPT as thinking partner, Claude as product partner, Gemini as the executor — and frames it as how he orchestrates. MindStudio leans the other way, folding database, auth, and payments behind the agent so the orchestration disappears, while a tool like Remy generates a full app from an annotated markdown spec. Between them, the floor and the ceiling stop being separate rooms. When the platform handles the plumbing and the model writes the app, "what kind of engineer am I" gets harder to even answer. The one question that survives the tooling is whether the thing you just generated has to survive too.

So ask it directly, per artifact, before you accept or check an output: who is the second reader? If you can't name one — no teammate, no production path, no future agent that will read this as context — the output is disposable, and verifying it is wasted motion. If you can name one, it crossed the boundary, and you owe it the check. The test is fast enough to run in your head on every generation, which is the whole point: the check has to cost less than what it's checking, or it isn't worth running. This only bites if you've watched a prototype get promoted to production without a rewrite; if you haven't yet, the tell is the demo nobody on the team is willing to delete.

The vibe-versus-agentic question gets sold as a fork in your career — pick a side, grow up, become the engineer who verifies. It's smaller and more constant than that: a call you make dozens of times a day about a single artifact, answered "disposable" most of the time and "this one has to survive" the rest. The case I haven't worked out is the artifact that changes its answer after you've decided — the script you correctly called disposable on Monday that someone quietly builds on by Friday, the boundary sliding under code you've long stopped checking. Nobody re-runs the decision, because nothing tells them it moved.

Share: