Back to all posts

Software Isn't Easy - The Explosion (And Ensloppification) of Code

April 8th, 2026

11 min read

Thoughts

Coding

I use AI to write code every single day. I'm not exaggerating, it is genuinely a deeply embedded part of my workflow at this point. I use Claude Code, I use Copilot, I always give the latest shiny AI-powered coding tools a try just to see how they feel. And you know what? The results are incredible! There are moments where I'm sitting there watching an agent write me a new feature and I feel like I'm living in a sci-fi movie. It's exciting in a way that is difficult to describe if you have not experienced it yourself.

So, first of all I want to be really clear upfront: this is not an anti-AI post. I am not here to tell you to stop using these tools. I think they are genuinely transformative, and I think engineers who refuse to adopt them are making a big mistake.

That said, I am still very worried. Worried about the future of software itself, and about the future of a society built upon software we can't trust or maintain.

The conversation that started this

I was texting with a close friend of mine recently. He's a VC investor, very smart guy, runs a portfolio of tech startups. We were chatting about AI coding tools and at some point he said something along the lines of "It's crazy how easy software dev is now." And I just... could not agree less. So naturally I pushed back.

What followed was a long back-and-forth where I tried to explain why that sentiment, while understandable, is actually kind of dangerous. His perspective made total sense from where he sits. He's seeing his portfolio companies ship faster than ever, he's seeing small teams build products that would've taken months, he's seeing the dollars and cents of it all and thinking "this is clearly the future." And he's not wrong about that!

But the idea that software development is now easy? That you can just hand the keys to an agent and tell it to cook? That one non-technical dreamer and a few well-chosen prompts can build a real, production-grade software product? I think that is one of the most dangerous assumptions floating around the tech industry right now.

What people get wrong

Something that I think gets lost in all the hype is that writing code and building software are not the same thing. They never have been. Writing code is maybe 10% of what makes software actually work. The other 90% is architecture, system design, security, maintainability, understanding failure modes, debugging, testing, monitoring, operational awareness, and (perhaps most importantly) the accumulated intuition of knowing what not to do.

AI has made the 10% essentially free. And that's amazing. But it's given a lot of people the illusion that the other 90% came along for the ride. I am of the belief that it did not (at least not yet).

I see this sentiment everywhere, but particularly in the VC and startup world. There's this narrative that software is "solved" now, that the bottleneck is no longer engineering but rather just having the right idea and being able to describe it clearly enough. Y Combinator reported that 25% of startups in their Winter 2025 batch had codebases that were 95% AI-generated. People talk about this like it's an achievement. I hear it and feel a pit in my stomach.

The compounding problem

One of the things that makes AI-generated code so insidious is that the errors don't always announce themselves. A human developer makes mistakes too, obviously. But a human learns. A human makes a mistake, gets burned by it, and then doesn't make it again (or at the very least makes it less often). A human is also a bottleneck. There's only so much code a person can write in a day, which means there's only so many mistakes they can introduce. The errors compound slowly, and usually the pain of the accumulated mess eventually motivates the human to go back and clean things up.

An agent has no such learning mechanism. It will make the same categories of mistakes over and over, because it has no memory of having made them before (not in any meaningful sense, anyway). And crucially, there is no bottleneck. An army of agents can produce an absolutely staggering amount of code in a very short time. Each individual error might be small and harmless on its own. But when they compound at machine speed with no human in the loop feeling the pain of that accumulation? You can end up with a codebase that is fundamentally untrustworthy and unmaintainable in a few short weeks or months.

Then one day you try to add a new feature and nothing works. Or worse, your users' data leaks. Or worse still, you don't even know that it leaked because the code that was supposed to catch that was also written by an agent that didn't understand the security implications.

The numbers are not encouraging

I don't love being a "well actually, the data says..." guy (okay you got me, I do), but the data here is pretty hard to ignore.

CodeRabbit analysed hundreds of real-world pull requests and the results were pretty grim across the board. AI-generated PRs averaged 1.7x more issues overall, with 1.4x more of them being critical. The bots produced 1.75x more logic errors, 1.57x more security findings, and were 2.74x more likely to introduce cross-site scripting vulnerabilities. A separate Cortex report corroborated the trend. Even as PRs per developer rose 20% year-over-year, incidents per PR climbed 23.5% and change failure rates jumped roughly 30%. More code, more problems. Meanwhile, Veracode tested over 100 LLMs across four languages and found that 45% of AI-generated code samples failed basic security tests.

There's a project at Georgia Tech called the "Vibe Security Radar" that tracks CVEs (basically, officially disclosed security vulnerabilities) directly attributable to AI-generated code. In January 2026 they found six. In February, fifteen. In March, thirty five. And the researchers estimate the real number is 5 to 10 times higher, because most AI traces get stripped from commits before they're published. This trend is accelerating at an alarming rate.

So AI-generated code is buggier and less secure. But at least it's faster, right? That's the whole trade-off everyone assumes they're making — we sacrifice some quality for a massive speed boost, and we'll clean it up later. Except that might not even be true either. Enter the notorious METR study, which is probably my favourite study in this whole debate because of how beautifully counterintuitive it is. They ran a proper randomised controlled trial with experienced open-source developers doing real tasks on their own repositories, projects they'd been working on for an average of five years. The developers using AI tools (Cursor Pro with Claude 3.5/3.7 Sonnet, frontier stuff at the time) were 19% slower than those working without AI. To make matters worse, the developers believed they were 20% faster. A 39-point gap between perception and reality. The tools feel productive even when they're not.

The deal we thought we were getting, worse code at faster shipping speeds, might actually be more of a "worse code at roughly the same shipping speed (or worse), along with a false sense of productivity papering over the whole thing". Has a lovely ring to it, don't you think? Now imagine that dynamic playing out in the hands of a non-technical AI-builder who has no way to evaluate code quality in the first place, and whose only metric for success is lines of code and shipping speed. That is a recipe for disaster.

Amazon should give you pause

Amazon, one of the most sophisticated engineering organisations on the planet, home to some of the most talented software engineers alive, had to implement a 90-day "code safety reset" across 335 of their most critical systems in March 2026. This came after a series of outages, at least one of which was linked to their AI coding assistant Q, that caused around 1.6 million errors and 120,000 lost orders in one incident, and then roughly 6.3 million lost orders in another just three days later.

Their SVP of e-commerce services explicitly cited "novel GenAI usage for which best practices and safeguards are not yet fully established" as a contributing factor. They now require dual-reviewer approval and senior engineer sign-off for changes to critical systems.

This is Amazon. If even they can't just "let the agents cook", I think the rest of us should at least take that into consideration when deciding how we choose to utilise agents in our own software engineering.

The real value was never in the writing

At the end of the day, the true value of a software engineer is and always has been in understanding code. The value of actually writing code has always been low in comparison.

Sitting down and actually typing out code, translating a known solution into syntax, that was never the hard part. The hard part was always knowing what to type, why to type it, how it fits into the larger system, and what could go wrong. Junior engineers are hired to become senior engineers, not because their code output is valuable on day one (it often isn't lol). The code they write is more like an apprenticeship exercise, a means to build understanding.

AI has compressed the typing-it-out part to near zero. That's great. But if you skip the understanding part and just let agents generate code that you don't review, don't comprehend, and can't maintain, you're going to end up with a massive pile of text that just happens to compile.

So what do I actually think we should do?

Use the tools, but stay in the code. Don't remove yourself from the loop. Review what the agents produce. Understand the architecture. Make the important design decisions yourself. Use AI for the boilerplate, the exploration, the rubber-ducking. But always be the quality gate.

Slow down. Just a little. I know this is an unpopular opinion in a world that worships shipping speed above all else. But the 5 extra minutes you spend understanding what the agent just generated will save you 5 days of debugging later. Or 5 months of security remediation. Or a lawsuit.

Never commit code you don't understand. I said this in a blog post over a year ago and I think it's even more true now. The moment you start committing code that you can't explain, that's the moment you've lost control. And you might not feel the consequences for weeks or months. But they're coming.

If you're a non-technical tech startup founder, hire an engineer. A real one. Not to write all the code by hand, but to be the person who understands the system, reviews the AI output, makes the architectural decisions, and keeps the whole thing from falling apart. The AI can do the heavy lifting of code generation. But you need someone who actually knows where the "load-bearing walls" are.

Why am I writing this?

Partly because I just needed to get these thoughts out of my head and into the world. I've been having this same conversation with different people for months now and I figured it was time to just write it down.

But also because I want this to exist as a sort of record. A timestamp. I genuinely don't know how this is all going to play out. Maybe the models get good enough that everything I've said here looks embarrassingly shortsighted and wrong in two years lol. Maybe agents figure out how to not write slop, how to truly understand systems at a deep level, how to maintain long-term architectural coherence. I would love that, actually.

Or maybe it won't even matter at all. Maybe the world will just run on slop software (slopware?) and the agents get good enough at managing slop that nobody ever needs to look under the hood again. I don't love that timeline, but I suppose I can't rule it out either.

Either way, right now in April 2026, with the tools we have today? Software isn't easy. It never was. And pretending otherwise is a bet I wouldn't want to make.

Stay vigilant out there!

— Nathan

← Back to all posts