Checking the Vibes
Table of Contents
In the last blog, I briefly tested different agentic coding tools such as Claude Code, Amp, Antigravity etc. After that, I decided to build something that I would use on a daily basis. Also, see how far these agents can carry a real project.
In my experience, you can get surprisingly far but not all the way. And every time I see folks claiming software engineering is dead in 6-12 months, I am skeptical of it. Based on the current state of AI, it feels like they are selling a dream which may not materialize.
Initial Euphoria : Engaging Autopilot
Compared to even two years ago, agentic coding tools feel magical. You just write a prompt, hit enter, and watch the codebase materialize in seconds that would have taken hours or even weeks to write manually.
It is easy to see this and spiral into existential doom:
- “The era of human written code is over.”
- “Tech debt is solved.”
- “Junior devs are finished.”
You also start building all those side projects you never had time for. Don’t know Python? Doesn’t matter, you can build that automation script that you have been dreaming about. The barrier to starting is gone.
However, with this removal of barrier, it doesn’t mean that we won’t be building bad software. We will get to that.
Where the Magic Stops
The honeymoon ends when you try to turn that weekend prototype into something with features. That’s when you hit “The Capability Ceiling”: the point where the autopilot disengages, and suddenly you are hand-flying through turbulence.
Architectural Limits: The One-Shot Wall
For simple CRUD apps, the one-shot prompt works beautifully. The problems start when you want to add nuance or the size of the codebase becomes large.
The agent can’t find the right files to modify. It implements half the feature then hallucinates the rest. It fundamentally misunderstands what you’re asking for because it lacks the architectural context of why you built things this way.
Currently, the solution for these problems is context management. Provide appropriate context, write detailed specs and guide the LLM towards the correct solution.
But here’s the catch: doing this effectively requires you to already know the codebase. You need the mental model of a software engineer to direct the agent that supposedly replaces software engineers. Without that prior knowledge, you are stuck.
Cognitive Gaps: Debugging and Intention
Let’s say you manage the context perfectly. The code generates. You run it.
Bugs happen. The obvious ones and subtle logic errors when real data triggers edge cases.
This is where agentic coding currently falls apart. The agent can’t debug like a human can. It gets stuck in fix loops, chasing ghosts (declaring typos exist where none are), or misdiagnosing the root cause entirely. Sometimes it “fixes” the symptom by creating three new bugs.
There’s a specific flavor of this I keep hitting with Python: error handling. Not the syntax of try/except blocks, but the intention behind them. Some errors should crash the program loudly (you want to know your database connection failed). Others should fail silently (a cached avatar missing shouldn’t break the UI).
That distinction—knowing which failure mode to choose—comes from operational experience. The agent has no intuition for production vs. personal use. It just wraps everything in generic handlers and calls it a day.
UI Slop
When you ask an agent to build a web UI, you get something that looks… fine. Acceptable. Generic.
But try to push it toward sophisticated interaction design—complex CSS animations, responsive breakpoints that actually work, accessibility considerations—and you spend more time correcting the agent than writing it yourself.
The models seem to lack a coherent mental model of CSS. They generate verbose, conflicting styles that technically work but architecturally horrify. Tools like AntiGravity now have browser access, so they can at least see their mistakes. That’s progress. But we’re still in the “decent prototype, production nightmare” zone.
The Tutoring Paradox
So you implement the fixes. You create Agents.md and Skills.md files. You set up MCP servers. You use context compression services. You start fresh sessions at 50% context window usage to avoid the “forgetting spiral.”
At some point, you have to stop and ask: What am I actually doing here?
You’re not automating coding. You’re tutoring a very fast, very confident, occasionally hallucinating junior developer who never sleeps. You’re the architect, the product manager, the code reviewer, and the debugger. The AI is just the typist with a massive pattern-matching engine.
When I hear “Software Engineering is dead,” I wonder how they dealt with bugs, did they even test their app with real data?
This isn’t replacing engineers. It’s changing where the bottleneck sits. Now it is verification. Knowing whether the generated code is correct, secure, and maintainable. That verification requires experience. The kind of experience you only get by writing software.
The Pilot Analogy
The best way I’ve found to explain this dynamic is the airplane metaphor. Think of the AI as an sophisticated autopilot system, and you as the pilot:
| Flight Phase | Pilot Action | AI Coding Equivalent |
|---|---|---|
| Pre-flight checks | System inspection, weather review | Setting up Agents.md, Skills.md, MCP servers, choosing your model |
| Takeoff | Hands-on control, critical timing | Writing the detailed spec, answering the AI’s clarifying questions |
| Cruise | Autopilot engaged, monitoring instruments | The AI generates code while you watch for divergence from the plan |
| Turbulence/Anomalies | Manual override, quick decisions | AI tries to delete your entire repo or hallucinates a dependency—you intervene |
| Landing | Controlled descent, touchdown, taxi | Code review, testing, edge-case verification, deployment prep |
Pilots aren’t obsolete because autopilot exists. In fact, they’re trained to fly without autopilot specifically, so they can handle the moments when it fails. The autopilot reduces physical workload during cruise, but it increases cognitive load during anomalies. You have to figure out what the machine is doing wrong and fix it.
Final Approach
If you’re an experienced developer, these tools are genuinely useful. They’re autocomplete evolved into pair-programming. You can move faster on well-understood problems because you can verify the solution quickly with tests and knowing what the right answer is.
But if you’re learning? Learn to fly without autopilot first.
Write the code manually. Break it. Fix it. Develop the intuition. Without that mental model, AI tools can only generate demo-level code for now.
Right now, the cockpit is getting fancier, but we still need pilots.