Every software engineer gets the same early lessons. Don’t use magic numbers. Don’t repeat yourself. Keep coupling low and cohesion high. Watch out for shotgun surgery, god objects and long parameter lists. We extract, we abstract, we generalise and we centralise, all in the name of maintainability, extensibility and performance.
Then LLMs showed up and started writing most of the code.
If you let Claude Code, Cursor or Codex loose on a codebase and focus purely on outcomes, your repo will end up absolutely littered with code smells. Duplicated logic in three places. Random hardcoded constants. A 400 line function that nobody asked for. And honestly? It often still works. The feature ships. The tests pass. The user is happy.
So the question I keep getting asked by founders and CTOs is this. If the machine can just keep working around the mess, do code smells actually matter anymore?
My answer is yes. But probably not for the reasons you were taught at university.
The old reasons still hold up (mostly)
Let’s do a quick sanity check on the classic arguments for clean code before we move on.
Maintainability and extensibility still matter. Software is rarely a one-shot deliverable. It grows. New features get added. Bugs get fixed. Teams change. A codebase that’s tangled, duplicated and inconsistent is harder to change safely, whether a human or an LLM is doing the changing. The “LLMs can refactor anything” argument falls apart the moment you ask them to refactor something subtle across twelve files and they miss two of them.
Performance still matters too, and arguably more than it did five years ago. We’re heading into a world where GPU and memory supply is tight, power costs are high and every millisecond of compute has a real dollar figure attached to it. Sloppy code that allocates too much, queries too often or bloats bundle size isn’t just inelegant. It’s expensive. You pay for it on your hosting bill, and your users pay for it in battery life and data.
None of this is new. What’s new is the third reason.
The new reason: code smells cost you tokens
Here’s the part that nobody talks about enough. When you’re vibe coding with an LLM, your codebase is the context window. Every file the assistant has to load to understand what’s going on is tokens. Every duplicated implementation it has to read through is tokens. Every magic number it has to trace back to its meaning is tokens.
Tokens are money. Tokens are also attention. A model with a messy 200k token context is a model that’s more likely to miss something, hallucinate a function that doesn’t exist or “fix” a bug in one place while leaving the same bug in three others.
Classic code smells have a direct cost in an LLM workflow:
- Duplication means the model reads the same logic multiple times and has to reason about which copy to touch.
- Magic numbers means the model has to grep around to figure out what
0.0347actually represents. - High coupling means touching one file pulls ten others into context before the model feels safe making a change.
- Long functions and god objects mean bigger files, bigger context and a higher chance the model loses the thread halfway through.
- Inconsistent naming means the model second-guesses itself and wastes tokens clarifying.
A clean, well-named, modular codebase is cheaper to work on with an LLM. Full stop. If you’re running a startup burning through API credits during “vibe sessions”, this is a real line item. We’ve seen teams cut their token spend noticeably just by doing one focused refactor pass before letting Claude loose on a feature.
But the LLM can just fix it, right?
Yes and no. An LLM can absolutely refactor a scruffy codebase. We do it all the time. But there are two problems with relying on that as your strategy.
First, the LLM doesn’t know what “clean” means for your project. Without guidance it’ll apply whatever patterns it saw most in its training data, which may or may not match your conventions. You still need a human in the loop to set direction, otherwise you end up with a codebase that’s been “cleaned” into something nobody on the team recognises.
Second, the mess has a nasty tendency to grow faster than the refactors. If every new feature adds three new smells and your cleanup sessions only remove two, you’re going backwards. The compound interest works against you.
The honest position is that LLMs make it easier to tolerate some smells temporarily, but they don’t remove the need for taste and judgment. If anything they raise the bar. The things humans should be doing are the things LLMs are worst at. Architectural decisions. Naming conventions. Knowing which duplication is fine and which will haunt you in six months.
What we actually do at Add Jam
We work on long-lived codebases. Some of the Ruby on Rails apps we maintain have been in production for the better part of a decade. Some of the React Native apps we build today will still be shipping updates in 2030. That perspective changes how we feel about code smells.
For quick throwaway scripts and prototypes, we genuinely don’t care. Let it be ugly. Ship it. Learn the thing. If the prototype dies, great, no harm done.
For anything that’s going to live past a month we hold the line. Not because we’re purists, but because we’ve seen what happens when you don’t. We’ve inherited plenty of “it worked fine, why would we refactor it” codebases and the answer is always the same. Because now every change takes three times as long, the LLM gets confused, the tests are flaky and nobody wants to touch the auth module.
We still care about magic numbers. We still care about coupling. We still care about duplication. We just now also care about how much context a feature takes to load into Claude, which is a surprisingly useful proxy for “how tangled is this thing, really”.
So should you care?
Yes. Maybe more than before. The reasons have shifted a bit. Maintainability and performance still apply, but now you can add “my AI assistant works better and costs less on a clean codebase” to the list. That’s not a small thing if you’re building a product that you plan to keep shipping against for years.
Don’t get religious about it. A pragmatic amount of duplication is fine. A hardcoded constant that’s only referenced once isn’t worth extracting. But the core instinct you were taught at school, that messy code has a real cost, is still correct. The cost just shows up in new places now.
If you’re staring at a codebase that your team (or your AI assistants) seem to be slowing down on, it might be time for a code review. We’re happy to take a look and tell you honestly what’s worth cleaning up and what’s fine to leave alone. Book a free chat if you’d like to talk it through.