Why Bad Code Is the Most Expensive It Has Ever Been

Matt Pocock — "Software Fundamentals Matter More Than Ever" — Conference Talk, v4F1gFy-hqg

Matt Pocock opened his talk with a provocation: the fear that engineering skills are becoming obsolete is exactly backwards. Software fundamentals matter more now than they ever have. Because the better your codebase, the more value AI can extract from it. And the worse it gets, the faster the compound cost of entropy eats your velocity alive.

Pocock arrived at this conviction the hard way. He had been experimenting with the "specs to code" movement — the idea that you write a specification, feed it to an LLM, and ignore the generated code. If something breaks, you patch the spec and recompile. He tried it. The first run produced passable output. The second produced worse code. The third was worse still. By the fourth iteration, he had garbage. The reason, he realized, is software entropy: every change to a codebase made without thinking about the whole design makes the system harder to understand and modify. The "specs to code" loop was not producing code. It was accelerating decay.

Code is not cheap. In fact, bad code is the most expensive it has ever been.

The core thesis is simple: AI thrives in a healthy codebase and flounders in a sick one. The bounty AI offers — speed, parallelization, AFK agents — is only accessible if the underlying system is easy to change. This reframes every classic software principle not as nostalgia, but as a direct multiplier on AI effectiveness.

Failure Mode: The AI Builds the Wrong Thing

The first breakdown Pocock diagnosed is familiar: you have an idea in your head, the AI builds something totally different. The root cause is that you and the LLM do not share a design concept — an ephemeral, invisible theory of what you are building. This is Frederick Brooks's insight from The Design of Design: when multiple people (or agents) collaborate on a system, there must be a shared, unwritten understanding floating between them.

Pocock's fix is brutally effective. He created a Claude skill called GRILL ME with a single instruction: "Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one by one." The result? The AI fires 40, 60, sometimes 100 questions at the human before it is satisfied. The conversation then becomes a product requirements document, or for small changes, directly becomes issues that an AFK agent can pick up.

Pocock argues this is better than Claude Code's default plan mode, which is "extremely eager to create an asset." The AI wants to start working. The human needs to reach alignment first.

Failure Mode: The AI Speaks a Different Language

The second failure mode is verbosity masquerading as miscommunication. The AI uses too many words, talks past you, and produces implementations that drift from your intent. Pocock recognized this instantly: it is the classic gap between developers and domain experts. The fix, borrowed from Domain-Driven Design, is a ubiquitous language.

He built another skill that scans the codebase, extracts terminology, and outputs a markdown file of shared definitions — markdown tables of every term the codebase and the human agree on. He keeps this file open while grilling the AI and while planning. The effect is striking. Reading the AI's thinking traces, he noticed it plans more precisely and implements more concisely when grounded in a shared vocabulary. The codebase and the conversation converge.

Failure Mode: It Works, Then It Doesn't

The third breakdown: after alignment, the AI builds the right thing, but the code doesn't work. The obvious answer is feedback loops — static types, browser access for frontends, automated tests. But Pocock noticed that even with these guardrails, the LLM "outruns its headlights." It writes huge blocks of code and only then thinks to type-check or test. In The Pragmatic Programmer, this is the rule that the rate of feedback is your speed limit. The AI ignores it by default.

The forcing function is TDD. Write the test first. Make it pass. Refactor. This forces the LLM into small, deliberate steps. But testing is hard, and it is especially hard when the codebase is poorly structured. Good codebases are easy to test. And the shape of a testable codebase, according to John Ousterhout's A Philosophy of Software Design, is one full of deep modules.

Property	Deep Modules	Shallow Modules
Functionality	Lots hidden behind a simple interface	Little functionality exposed through a complex interface
AI Navigation	AI understands boundaries quickly; tests at the interface	AI gets lost in a maze of tiny blobs; can't find dependencies
Human Load	Designer owns the interface; delegates implementation	Must review every implementation detail; cognitive overload

Pocock's visual was memorable. A shallow-module codebase is a field of tiny blobs the AI must walk through. It attempts to explore, gets lost, and makes mistakes. A deep-module codebase is the same code organized behind clear boundaries with simple interfaces on top. The AI tests at the boundary. You design the interface. The AI handles the interior.

Design the Interface, Delegate the Implementation

This becomes a survival strategy for the human developer. With AI shipping more code than ever, your brain cannot keep up reviewing every line. Deep modules let you treat interiors as gray boxes. You verify behavior at the interface, not the implementation. Pocock warns this only works for non-critical modules, but for the majority of an application, it is transformative.

Crucially, this requires you to know your module map intimately. It must become part of your ubiquitous language and your planning skills. In his PRDs, Pocock is explicit about which modules change and how their interfaces evolve. This is Kent Beck's advice: invest in the design of the system every day. Specs-to-code fails because it divests from design. AI-assisted engineering works because it reinvests.

The Strategic Role of the Human

Pocock's closing frame is the most powerful. Think of AI as a brilliant tactical sergeant on the ground — executing changes, passing tests, merging code. But you are the strategic general above. You need a design concept, a shared language, deep module boundaries, and a practiced hand at interface design. Those are not new skills. They are the same software fundamentals engineers have been building for decades. The difference is that in an AI age, they are no longer just good practice. They are rate-limiting factors on your entire team's velocity.

As code generation becomes commoditized, the chokepoint shifts from typing speed to what to build and how to structure it. The engineers who will thrive are not the ones who prompt best. They are the ones who can look at a codebase and know where the deep modules should go.

Why This Matters for Diffie

For Diffie — an AI-native browser testing tool — Pocock's framework maps directly onto the product. Browser testing is notorious for shallow-module sprawl: endless page objects, flaky selectors, brittle assertion layers. AI agents thrown at that mess will accelerate entropy, not solve it. The winning move is to treat Diffie's own codebase and the test suites it generates as a system of deep modules.

Your ICP — frontend engineers at fast-moving startups — is drowning in shallow test suites. They do not need more tests. They need a testable architecture with clear boundaries: the page interface, the action interface, the assertion interface. If Diffie can generate deep-module test structures instead of shallow, scattered scripts, it becomes a force multiplier in exactly the way Pocock describes.

For your GTM motion, this is the narrative wedge: AI makes code cheap, but good architecture is scarce. Position Diffie as the tool that does not just generate tests, but enforces the deep-module discipline that makes AI-generated code trustworthy. Your outbound should target engineering leaders who have already felt the pain of AI-generated entropy — the ones who watched a "vibe coding" sprint produce a codebase they are now afraid to touch. They are the buyers who will pay for structure.

The final implication is personal. As a founder building in YC's pressure cooker, the temptation is to let AI outrun your headlights — ship fast, review later, fix in the next sprint. Pocock's warning is that entropy compounds faster when the agent is tireless. The founders who win will be the ones who slow down enough to design the interface before they delegate the implementation. That discipline scales. That is the moat.