lessons.md

$ cat lessons.md

5 things that burned me in agentic AI development (and how I do it now)

I use agentic coding tools every day on real production work, not on toy weekend repos. Claude Code in the terminal, Cursor when I want to stay in the editor, Codex for the occasional second opinion. They are genuinely good now. They are also very good at confidently doing the wrong thing, and over the past year I have stepped on most of the rakes you can step on.

This is not an anti-AI rant. I would not go back. But the hype skips the part where you actually learn to drive these things, and the only way I learned was by getting burned. So here are the five that hurt the most, and what I do differently now.

1. I let the agent run without a plan

The mistake: I would type one sentence, "add caching to the export endpoint", and hit go. The agent is eager. It starts editing immediately. Twenty minutes later it has touched eight files, invented an abstraction I never asked for, and the thing technically works but in a shape I would never have chosen. Now I either accept a mess or throw it all away. Usually I threw it away.

The fix: plan before edit, always. I use plan mode (or just force a planning step in the prompt) so the agent first writes out what it intends to do, which files it will touch, and why. I read that. I argue with it. Then it gets to write code. The current consensus across the 2026 tooling is the same: sketch the architecture, get the plan right, only then let it implement. Five minutes on a plan saves an hour of cleanup. The biggest single source of failure is starting before you have agreed on the approach.

2. I drowned the agent in context

The mistake: more context felt safer, so I gave it everything. The whole file, the whole module, "here, read the codebase". Then I ran one long session for hours. Quality fell off a cliff and I could not figure out why. The agent started contradicting decisions it had made an hour earlier, reintroducing bugs it had just fixed, losing the thread.

The fix: treat context as the scarce resource it is. Context management is honestly the single most important skill in this whole game right now. I keep tasks small and scoped. I start a fresh session for a fresh problem instead of letting one balloon for hours. I point the agent at the three files that matter, not the thirty that might. A focused agent with a clean window beats a stuffed one every time. If a session has gone long and weird, I do not fight it, I reset and re-state the goal.

3. I trusted generated code because it looked right

The mistake: the code reads cleanly, the variable names are sensible, it runs. So I skimmed it and shipped it. The trouble is that AI code fails in subtler ways than human code. It is plausible on the surface and wrong underneath: an off-by-one in a boundary case, a swallowed error, a security assumption that does not hold. The reviews that bit me were the ones where everything looked fine.

The fix: review AI code harder, not softer. The numbers floating around in 2026 are not subtle: AI-authored changes carry meaningfully more defects per change than human ones, and a scary share introduce at least one real vulnerability. So I read generated code more slowly than my own, not faster. I assume it is plausible and wrong until I have checked the edges myself. The best engineers I know right now are not the fastest at producing AI code, they are the best at not trusting it.

4. I gave it no guardrails and it touched things it should not

The mistake: I let an agent run with broad permission to execute commands and edit freely, because stopping to approve every step was annoying. Then it helpfully ran a command against the wrong environment, or rewrote a config file I considered off-limits, or did a "cleanup" I never sanctioned. Convenience, right up until it is not.

The fix: deterministic guardrails, not vibes. Model-level prompts ("please do not touch production") are not a safety system, they are a suggestion. I use hooks and permission rules so the dangerous things are blocked by code, not by the model remembering to behave. Protected paths, restricted commands, an approval gate on anything destructive. I scope what the agent can reach to what the task needs and nothing more. The point is that a guardrail you can rely on is one the model cannot talk itself out of.

5. I let it write code I did not understand

The mistake: this is the quiet one, and the worst. The agent produced something that worked, I did not fully follow how, and I moved on anyway. Repeat that for a few weeks and you have a codebase you cannot debug, because you never actually understood it. AI did not make the hard part disappear, it deleted the easy 80% and left the 20% that needs real focus, and that 20% is now the whole job. If you skip understanding, you are just renting working software until the first bug you cannot explain.

The fix: I have to be able to explain every line I ship. If I cannot, it does not go in, no matter how green the tests are. I make the agent walk me through the non-obvious parts. I write the tests, or at least I own them, because tests are where the requirements the agent silently dropped come back to bite you. The agent writes the plumbing, I make the calls that need judgment. That division of labor is the only one that has held up for me.

The short version

Plan before it edits. Keep its context lean. Review the output like you trust it least. Put real guardrails around it. And never ship code you cannot explain. None of this is anti-AI. It is just what using these tools seriously looks like once the novelty wears off. The agent is a fast junior who never gets tired and never gets bored, which is exactly why the senior judgment is still yours to provide. That part did not get automated. If anything, it got more valuable.

back to blog