Fast Code, Expensive Confidence: Building Software With LLMs

For years, we repeated the same two words: “Architecture matters.” In the AI era, that sentence becomes less a belief and more a constraint. Not because LLMs are magical, but because they change the economics of building software in a way that exposes every weakness we used to tolerate.

This post is my attempt to pin down what I think is changing—and why. I’m also writing it as a timestamp: something I can come back to later and see which predictions held up, and which did not.

Code is getting cheap—at least in the sense that producing something that looks like code is suddenly fast. With modern LLMs, agentic workflows, and tool-connection standards like MCP, the throughput of a single developer can jump dramatically.

When bottlenecks move, your architecture either becomes a force multiplier—or an amplifier of chaos.

The story many teams are living right now is familiar: you describe a feature, you get a feature. You describe an application, you get an application. Some people call it “vibe coding”: you steer with language, the model fills in implementation details, and momentum feels effortless—until it doesn’t. The uncomfortable part is not that the code is obviously bad. It’s that it often looks extremely legitimate—style, structure, naming, and flow.

And yet the behavior can be subtly wrong, dangerously incomplete, or misaligned with real constraints that were never made explicit.

In other words: AI doesn’t mainly produce garbage. It produces confidence.

Before LLMs, serious code review was disproportionately a senior responsibility—partly because experience helps you see failure modes quickly, and partly because review time was limited. Now, if your team can generate more code per week, review can no longer remain an elite activity. It becomes basic literacy, because the volume of change grows while the cost of shipping a mistake stays the same (or increases).

This is also why testing quietly becomes the center of gravity. When a probabilistic engine is authoring a meaningful portion of your codebase, you need a deterministic counterpart: a harness that does not care whether the code “reads well,” only whether it behaves correctly. In AI-assisted development, tests stop being hygiene and start being the definition of reality. They are the anchor that turns high throughput into something you can trust.

And then there is the critical limitation that shapes everything: context.

LLMs do not reason over an infinite, stable world model of your system. They operate over a bounded, lossy slice of information. The larger and messier the context, the more likely they are to miss constraints, invent details, or “complete patterns” that feel right but aren’t real. This isn’t a moral failure of AI; it’s a mechanical fact of how the tool works.

If that’s true, then architecture stops being an abstract discipline about boxes and arrows. It becomes a practical strategy for keeping both humans and machines inside a tractable slice of reality.

Why architecture becomes more important, not less

In the AI era, the value of architecture rises because architecture is what makes speed safe.

Small, isolated components—libraries, modules, services—reduce the amount of information needed to make a correct change. They shrink the blast radius, narrow the review surface, and compress the “working set” that a developer (or an LLM) must hold in mind. The goal is not fragmentation for its own sake. The goal is to minimize the context required to do the next change correctly.

Clear interfaces become the next lever. When boundaries are explicit—OpenAPI for HTTP services, Protobuf/gRPC for RPC-style contracts, well-defined Kafka event schemas for async communication—you force the system to speak in machine-readable terms. It helps humans and keeps LLMs honest: follow the contract instead of guessing intent from internals.

The next lever is standard solutions—not just shared patterns, but shared tools. LLMs often try to implement things from scratch, even when a well-known library would produce less code, more readable code, and fewer invented edge cases. I ran into this generating Nginx configs: the model first produced a Bash script that printed config files, which meant any future change required someone (or another model) to reverse engineer the generator. When I asked for a more standard approach, it suggested using Jinja templates and a small Python script built on a Jinja library; the result was smaller, more readable, and easier to evolve because there was less bespoke code to analyze. The same works for logging, metrics, error handling, retries, and so on. The less custom code you have, the less surface area for AI hallucination.

With crisp boundaries and real contracts, rewrites become controlled experiments rather than political events.

A team can take a component and replace it in days—or even hours—because the interface remains stable and the verification harness stays intact. You can test two implementations against the same E2E suite, benchmark them under load, and keep the better one. Before AI acceleration, “rewrite this service” often meant an undefined number of sprints and a long tail of integration pain. Now, the rewrite itself is less scary. The scary part is whether you can prove it’s equivalent (or better). That proof is architecture plus tests plus measurement.

This is where I think the winners will diverge sharply. Teams that treat AI as a code firehose will generate more output and accumulate more inconsistency. Teams that treat AI as a component-replacement engine—constrained by contracts, tests, and benchmarks—will accumulate leverage.

Verification becomes the product

As AI accelerates implementation, the highest-value artifacts are no longer the code itself, but the mechanisms that make it safe to change. Verification becomes the real differentiator.

End-to-end tests, benchmarks, and observability stop being optional: tests catch the most common AI failure mode (locally plausible, globally incompatible code), benchmarks surface regressions introduced by high rewrite volume, and observability tightens the feedback loop from production.

This is where architecture directly amplifies verification quality.

Smaller components and well-bounded services allow tests and benchmarks to become far more precise. Functional tests can target a single responsibility instead of navigating an entire system. Benchmarks can measure exactly what changed, instead of averaging noise across unrelated paths. When a component has a narrow purpose, failures are easier to attribute, regressions are easier to detect, and results are harder to misinterpret.

In large, entangled systems, verification becomes blunt and expensive. In small, isolated components, it becomes sharp and cheap.

If you want a crisp mental model: in the AI era, features are cheap; regressions are expensive; verification is the moat—and good architecture determines how defensible that moat actually is.

Product evolution accelerates, roles do not collapse

There is a popular belief that in the AI era, software developers will turn into product managers—or that product managers will turn into developers—because building software is becoming trivial. I do not buy it.

Product development is not code assembly; it is market understanding, customer discovery, positioning, negotiation, and continuous feedback. Software engineering is not typing speed; it is system thinking, architecture, constraints, correctness, performance, and math. LLMs accelerate parts of both, but they do not erase the boundary between them.

What does change is the distance between the roles.

AI compresses the gap between idea and execution. Engineers can prototype faster. Product hypotheses can be tested earlier. Variants can be shipped, measured, and discarded with far less friction. A/B tests, feature flags, and experimental flows stop being expensive decisions and become routine tools.

This only works if the system is designed for it.

Modular architecture makes product evolution safe. Small, well-bounded components allow teams to change behavior locally, test ideas in isolation, and roll back without collateral damage. Combined with AI-assisted implementation, this turns experimentation into a controlled process rather than a gamble.

The outcome is not role convergence, but tighter coupling. Teams where product and engineering operate closer—sharing context, iterating together, and grounding decisions in real system behavior—will move faster and learn faster. AI magnifies this advantage. Architecture decides whether that magnification produces insight or instability.

A new reason strongly typed languages become more appealing

There is another shift I expect to grow in importance: strictly typed languages become more attractive—not primarily for ideology, but for physics.

When code is generated quickly, you want fast, automated ways to reject incorrect code before it even reaches review. A strong compiler is exactly that: a scalable gatekeeper that turns many classes of mistakes into immediate feedback. Even in languages that have long had compilers, the difference is how central compilation becomes to the AI workflow. The model can generate code, attempt to compile, read errors, and iterate—effectively using the compiler as a tool-assisted verifier. That feedback loop can be automated and brutally fast.

More importantly, rich type systems help humans keep track of AI-generated abstractions.

One risk of AI-assisted development is abstraction inflation: layers of helpers, wrappers, factories, and generic “clean architecture” shapes that look sophisticated but obscure intent. In a language with expressive types, you can force abstractions to carry meaning. You can model invariants in the type system, constrain illegal states, and encode contracts in a way that is difficult to “hand-wave” with pretty-looking code.

When the codebase is growing faster, types become a form of documentation that does not drift as easily, because the compiler enforces it. They also become a way to localize reasoning: you can understand what a function can and cannot do from its signature and constraints, rather than reading a page of generated implementation hoping it matches the story.

This doesn’t mean dynamic languages disappear. It means the comparative advantage of strong typing increases when the author is sometimes a machine and the pace of change is higher than your patience for ambiguity.

Manual code is still here (it just changes shape)

AI makes implementation faster most of the time, but it does not eliminate direct code writing. There are still moments when the fastest path is to write the thing yourself—either because the model is missing the point and the back-and-forth costs more than the code, or because you are building something genuinely new where there is no “standard solution” for the model to remix.

And there are domains where you will keep a tighter grip by default: new algorithms, performance-sensitive paths, and mission-critical behavior where you want an engineer to understand every line and every failure mode. Think of it like inline assembly in a C codebase: it should be rare, deliberate, and justified—but when you need it, nothing else is a substitute.

Readme for humans, readme for AI

There is one more practical idea that becomes surprisingly effective in this world: treat each component as something that needs an onboarding packet not only for humans, but for AI-assisted work.

Many teams already maintain a README per service. In the AI era, it’s worth shaping that README so it can be pasted into an LLM context window and reliably constrain behavior. The most valuable content is not marketing text; it is operational truth: what the component does, what it must never do, the public contracts, the invariants, how to run tests, and what “good” looks like in performance and failure behavior.

If you do that, you’re not just helping the next developer. You are actively shrinking the context the model needs, which reduces hallucination and improves correctness.

Summary

AI does not make architecture obsolete. It makes architecture measurable.

If LLMs speed up implementation, clarity, boundaries, tests, benchmarks, and review discipline become more valuable. The teams that succeed will not be the teams that generate the most code. They will be the teams that turn generation into a controlled, testable, reversible process.

In that sense, architecture becomes your token budget: the structure that determines whether the next change fits inside a small, reliable slice of reality—or whether you’re about to ship confidence instead of correctness.