The programming atrophy problem: AI makes you worse at coding

I’ve been using AI coding tools daily for over a year now and helping develop our current agentic setup for the last few couple of months. Built 25+ custom skills, a knowledge vault, a large part of the whole harness. So when three independent studies came out showing that heavy AI tool usage makes developers measurably worse at coding, I started digging into it.

The productivity gains are well documented at this point. These studies looked at what we’re giving up in exchange.

The studies

Anthropic (January 2026)

52 engineers, randomized controlled trial. Split into teams with and without AI assistance, measured code comprehension over time.

Result: 17% lower comprehension in the AI-assisted group. Debugging skills showed the steepest decline. The engineers who used AI the most understood their own codebases the least.

METR (2025-2026)

The original study (16 experienced developers, early 2025) found developers were 19% slower with AI assistance. But they believed they were 24% faster. A 43-point gap between perceived and actual performance.

METR ran a follow-up with 57 developers in late 2025. The tools had improved, and they found evidence that the speed picture has gotten better. But the study hit a different problem: 30-50% of developers refused to submit tasks where they’d have to work without AI. The selection bias was severe enough that METR called their own data “unreliable” and is redesigning the experiment.

The miscalibration finding from the original study hasn’t been contradicted. The speed finding might not hold with current tools. The fact that developers now refuse to work without AI is its own kind of data.

0% (baseline)

Believed faster

+24%

-19%

Actually slower

43pt perception gap (original study)

METR original study (early 2025, 16 devs). Follow-up found improved speed but 30-50% refused to work without AI.

Microsoft/CMU (CHI 2025)

319 knowledge workers, 936 real-world AI usage examples. One finding that stuck with me: higher confidence in AI correlated with less critical thinking applied to the output. Higher self-confidence in your own abilities correlated with more critical thinking, but at a higher cognitive cost.

For routine tasks, people reported spending less effort thinking critically when using AI. For high-stakes tasks, they actually spent more effort than they would without AI, mostly on verification. Critical thinking didn’t disappear, it shifted from “doing the work” to “checking the work.” The researchers put it well: routine AI usage deprives you of the practice opportunities that keep your judgment sharp, “leaving them atrophied and unprepared when the exceptions do arise.”

The feedback loop

What connects these studies is a cycle that gets worse the better your tooling is. You build a better harness. The agent handles more. Engineers write less code directly. Skills degrade. Engineers get worse at catching the agent’s mistakes. Quality drifts, but nobody notices because the people who would notice are the ones who degraded.

I notice this in my own work. After a long stretch of letting Claude handle most of the implementation, I’m measurably rustier when I sit down to write something from scratch. The instinct to reach for the agent is strong even when the task would be faster by hand.

The junior problem

The atrophy studies looked at experienced engineers who had skills to lose. Juniors face a different version of this. If you learn to code with AI tools from day one, you might never build the mental model that lets you debug effectively. Stack Overflow’s reporting on this found that entry-level tech hiring dropped 25% year-over-year in 2024. Stanford’s Digital Economy study showed employment for developers aged 22-25 down nearly 20% from its 2022 peak.

Entry-level hiring (YoY decline) 25%

Dev employment age 22-25 (from 2022 peak) 20%

Stack Overflow (2024 hiring data) and Stanford Digital Economy study (employment trend).

That’s a pipeline problem. If juniors can’t get hired, they don’t become mid-levels. If they do get hired but lean on AI through the learning phase, they ship code they can’t explain and can’t debug when it breaks. Either way, the industry is producing fewer people who understand systems from the ground up.

I don’t have a clean answer for this one. At dala.care we’re lucky to have a strong foundation built with good engineering before this AI explosion happened. But I think about what happens at newer or larger companies where a junior’s entire workflow is “describe the feature, review the output, ship it.” To me that doesn’t feel like engineering. It’s something, and it might even produce working software, but it’s building on a foundation you don’t understand.

What I changed

The instinct when you’re deep in harness engineering is to automate everything you can. More skills, more hooks, more agent-handled tasks. After reading these studies I started being more deliberate about what I hand to the agent and what I keep.

I split work by atrophy risk now. The agent gets formatting, boilerplate, migrations, test scaffolding. Stuff where the skills are mechanical and I’m not losing anything by delegating. Architecture decisions, debugging sessions where I don’t know what’s wrong yet, security-sensitive code stays with me, in-so far as no such code makes it into our products until it’s thoroughly reviewed by the team. That split is more inefficient but it’s about which skills I can afford to let atrophy and which I can’t.

I reworked our Claude Code skills to be less autonomous. The early versions tried to handle entire workflows end-to-end. Now they’re designed to surface the right problems at the right time and let me make the decisions. The agent does the legwork, I do the judgment calls. That keeps me in the loop where it matters.

I read diffs more carefully than I used to. The Microsoft/CMU study found that people stop scrutinizing AI output over time. I’ve caught myself doing it. Forcing myself to read agent-generated code like I wrote it has become a habit worth protecting.

And I’ve stopped trusting how productive I feel. That 43-point miscalibration gap from METR means my gut sense of “I’m shipping fast” is unreliable. I try to check against actual output when I can.

What I’m still figuring out

Nobody has longitudinal data on what happens to a team that goes all-in on AI assistance for two years. The Anthropic study was 52 engineers over weeks. The real effects compound over longer timescales than anyone has measured yet.

I use these tools every day. I’ve built my workflow around them and I believe there’s no going back so I’m not arguing we should stop. But I think we need to be more honest about the trade-off. The gains are real but so is the degradation. Pretending otherwise is how you end up with a team that ships fast and understands nothing and that only leads to pain.

And despite all this I don’t believe there is reason to panic yet, the fact that this is getting noticed and talked about is encouranging and I have faith that good engineers and solid principles and processes will guide our path forward through the slop. We just gotta keep our eye on the ball.

Next up: how I built a knowledge vault that costs zero LLM tokens at retrieval, and why the data it captures matters more than the search tech it relies on.

Sources: Anthropic AI Coding Skill Formation Study, METR AI Developer Productivity (original), METR Study Redesign (2026), Microsoft/CMU Critical Thinking Study, Stack Overflow: AI vs Gen Z