Shipping in Days, Not Quarters: Inside Anthropic's Product Playbook

Cat Wu — Head of Product, Claude Code & Co-Work, Anthropic — Lenny's Podcast, PplmzlgE0kg

At Anthropic, product timelines have collapsed from six months to one month, then to one week, and sometimes to one day. Not because the team works harder. Because they removed every barrier to shipping. Cat Wu, who leads product for Claude Code and Co-work, described the system in detail. The result is one of the fastest-moving product organizations in technology — and a blueprint for how AI-native teams should operate.

The timelines for a lot of our product features have gone down from six months to one month and sometimes to even one day.

The Two-Brain Problem: Vision and Pathfinding

Wu works in a tight mind-meld with Boris, the tech lead who created Claude Code. The split is deliberate. Boris owns the vision — "what the product needs to be in three months, six months, the AGI-pilled version." Wu owns the path: figuring out the steps from today to that vision, then making sure marketing, sales, finance, and infrastructure are bought in and unblocked.

The line is blurry by design. About 80% of decisions are obvious to both; the remaining 20% they split by who cares more. What matters is that no single person is the bottleneck. Engineering ideas flow directly into shipped features without sitting in a PM queue. On a team of engineers with strong product taste, many ship end-to-end — from a tweet of user feedback to a merged PR — with almost no product involvement.

Anthropic's bet is clear: hire engineers with product taste, reduce overhead, and let people operate across role boundaries. Wu estimates the Claude Code team is around 30-40 PMs total across research PMs, developer platform, enterprise, and growth — but the cultural emphasis is on shrinking the coordination tax, not adding headcount to manage it.

What Changes When Code Becomes Cheap

The fundamental shift Wu described is that code is no longer the expensive part. Before AI, technology shifts were slower. You could plan six to twelve months ahead because the cost of writing and coordinating code was high. Now code is cheap, models improve monthly, and the premium skill is deciding what to build, not how to build it.

This changes what a PM optimizes for. Instead of aligning multi-quarter roadmaps with partner teams, the job becomes: how do I get an idea into users' hands by Friday? Wu distilled the fastest PMs' playbook into three moves:

Move	What It Means in Practice
Set Clear Goals	LLMs are generalists, which creates ambiguity. A great PM narrows the field: "Our key user is professional developers. The problem is permission fatigue. The goal is zero prompts for enterprise devs to safely complete tasks." That rules out most approaches immediately.
Ship in Research Preview	Anthropic brands almost everything as research preview. Users know it is early, might be unsupported, and is an experiment. This slashes commitment overhead and lets the team ship in a week instead of a quarter.
Tight Cross-Functional Process	When an engineer marks a feature ready, they drop it in an evergreen launch room. Docs, marketing, and developer relations turn around announcements the next day. The PM builds the pipeline; the team runs it.

The Art of Being the Right Amount of "AGI-Pilled"

Wu argued the hardest product skill today is calibrating ambition against current capability. It is easy to build for the superintelligent model — a text box that does everything. The hard work is figuring out, for today's model, how to elicit maximum capability. How do you guide users onto the golden path? How do you patch the model's weaknesses while surfacing its strengths?

This is a rare and shifting skill. Anthropic's own product history proves it. The to-do list in Claude Code was originally a crutch — early models would start refactors, change five of twenty call sites, and stop. The team forced a human-like checklist. Then Opus 4 arrived and the model naturally completed its own lists. The crutch became optional. With newer models, they stripped system prompt reminders that were no longer needed. Every model launch triggers a cleanup cycle: which prompting interventions can we remove?

The same dynamic opened new features. Code review had been prototyped multiple times but accuracy was never high enough to trust. Only with recent models did the team feel confident running multiple code review agents across the entire codebase — catching real bugs engineers had to fix before merge. The pattern: build prototypes before the model is ready, then swap in the new model and measure the gap. If it closes, ship.

The Role of Product Taste in an Agentic World

As code becomes cheaper, Wu was explicit about what becomes more valuable: product taste. Anthropic receives tens of thousands of GitHub issues asking for everything under the sun. Selecting which to build, and how, is a judgment call. This skill can come from any background — engineering, design, writing — but it is scarce, and Anthropic will hire anyone who demonstrates it strongly.

Engineering backgrounds are particularly useful right now because they calibrate effort. A PM who knows something takes an hour to build stops debating and just does it. A PM who knows a feature is infrastructurally heavy weighs the cost accurately. But Wu expects this advantage to erode as models improve. The constant is taste: the conviction to say "this is what our product should feel like."

This extends to Claude's character — what Wu calls "the vibe." People love Claude because it is lighthearted, positive, low-ego, and earnest. It apologizes when wrong. It breaks down overwhelming tasks. It gives genuine feedback rather than agreeing with everything. The character is not a nice-to-have; it is a core product decision about what it feels like to collaborate with this coworker.

Process as Subtractive, Not Additive

The Anthropic product culture is low on ceremony and high on autonomy. There is no摩拜 of roadmaps or sign-off chains. Instead, the team removes barriers. The process between engineering, marketing, and docs is so tight that an engineer can ship and the announcement goes out the next day without friction. The expectation is that every person on the team feels empowered to take an idea from thought to world in under a week.

Even PRDs have been rethought. They are reserved for genuinely ambiguous features or heavy infrastructure projects. Most of the time, the team runs on shared principles — who the key users are, why they matter, what trade-offs are acceptable — and weekly metrics readouts where everyone sees the same numbers and trend lines. The goal is distributed decision-making. When the context is shared, individuals do not need to wait for PM approval.

It's very hard to be the right amount of AGI-pilled. The hard thing is figuring out, for the current model, how do you elicit the maximum capability?

Automate to 100%, Not 95%

Wu's closing advice for individual operators was precise: find repetitive tasks, automate them with Claude or Co-work, and iterate until the success rate is 100%. She is ruthless on this point. A 95% automation is not an automation — it is a liability. You will still check it manually, which means you saved no time. The last 5% to 10% takes elbow grease: teaching preferences, giving feedback, forcing the skill to update. But that is where the leverage lives.

She is equally ruthless about the opposite failure mode: over-customization. There is a camp that adds endless skills and MCPs and workflows, optimizing their setup instead of doing the work. "I think the simple setups actually work better." The goal is not the most elaborate toolchain. The goal is shipping.

Why This Matters for Diffie

For Diffie, the Anthropic playbook is a direct template for GTM and product velocity. Anthropic ships in days because they define success narrowly, brand early outputs as experiments, and trust engineers to move end-to-end. Diffie is building AI browser testing for fast-moving frontend teams. The same playbook applies.

First, ICP clarity. Anthropic's internal discipline — "professional developers at enterprises" — lets them say no to everything else. Diffie must be equally sharp about the ideal customer profile: which frontend teams are moving fast enough that automated browser testing is a bottleneck, and which are not? The sharper the edge, the faster the iteration loop.

Second, research previews as a GTM strategy. Anthropic ships almost everything early and explicitly labels it. This builds trust with power users, generates signal, and avoids the trap of polishing features before validating demand. Diffie should consider a "research preview" tier for new testing modes — ship to design partners, gather feedback, iterate publicly. The customers who will champion you are the ones who helped build it.

Third, taste as a hiring and brand filter. Wu said Anthropic will hire anyone with demonstrated product taste. Diffie's brand should project the same signal: we are not a testing utility; we are the team with taste in how browser testing should feel. That is a positioning wedge against legacy players who sell checklist compliance and incumbents who sell feature matrices.

Finally, the automation bar. Wu's insistence on 100% success rates is directly relevant to Diffie's core promise. A flaky test is worse than no test — you lose trust, you waste time debugging the tool. Diffie's moat is not just generating tests. It is generating tests that are reliable enough to run unsupervised. Anthropic's "trust but verify" loop for agent output is the same loop Diffie must close for browser assertions.

The deeper lesson: in an AI world, the teams that win are not the ones with the best models. They are the ones with the fastest feedback loops between customer pain, product decision, and shipped experiment. Anthropic's process is the model. Diffie's speed should match it.