Feeling the Vibes // Dino Hacks

In 2023, LLMs were introduced that were capable of generating fluent English text. Within a year, they were able to generate code good enough for use in real projects. Then came the tab-completion feature which allowed coders to fill in code snippets without using the chat interface. In 2025, we saw the start of a new era: Agentic Coding.

In Agentic Coding, you prompt the model with a full description of what you want to achieve, and the model will generate the code for you from start to finish. Typically, an Agentic Coding tool relies on two things: harness engineering and context engineering. Harness engineering involves creating a robust and efficient environment for the AI model to operate in, while context engineering focuses on providing the model with the necessary information and examples to generate accurate and relevant code.

Intrigued by this, I decided to try a few Agentic coding tools in Ubuntu 22.

My opinions are based on the following:

How does the tool feel when used for the first time?
Did I face any bugs or issues while using it?
How well does the LLM generate code?

These are Agentic coding tools along with LLMs that I tried, categorized into IDE-based and CLI-based:

Here are the tools (and LLMs) that I tried.

IDEs (or extensions in VS Code) tried:

Antigravity: Opus 4.5, Gemini 3 Pro, and Gemini 3 Flash.
Windsurf: GPT 5.2 Medium Reasoning
Kilo Code: Devstral 2 and MiniMax 2.1

CLIs tried:

OpenCode: GLM 4.7
Amp Code (free): Auto mode (Claude Haiku 4.5)
Droid: GPT-5.1-Codex-Max Extra High
Gemini CLI: Auto mode (Gemini-3-pro-preview, Gemini-3-flash-preview)
Claude Code: Opus 4.5
Goose Agent: GPT-5.2-2025-12-11
Kiro CLI: Auto mode (Sonnet,Haiku and Opus)

What did I use them for?

I used them to build a codebase from scratch and then to edit it or add new features.

My Experience

Antigravity

This is a recently launched product by Google. They offer generous limits for using Claude models such as Opus 4.5 and Google models such as Gemini 3 Pro.

The tool features a workflow where the models generate a TODO list and an Implementation file which the user can review and comment on. Additionally, the tool allows models to spawn a browser, interact with a page, and take a screenshot of the page in order to perform tasks.

While executing a task, the models also have access to a terminal. However, this is displayed inline in the chat window which feels clunky. It would be better off shown in the terminal window/tab by default.

When I tried Opus 4.5 to generate code, it was able to do it in a single prompt after finalizing the implementation plan. But sometimes, the TODOs were not being tracked correctly. I am not sure if this is an issue with the model or tool itself.

Once the application was created, I asked Gemini 3 Pro to give me suitable options for the colour palette for the UI. It gave me three options. I replied by combining parts of them and asked it to implement my choice. The model ended up breaking the entire UI. It couldn’t fix the UI that it had broken.

I switched over to Gemini 3 Flash and asked it to rewrite the frontend. I hoped that doing this might fix the UI. It did fix the UI but left a few broken UI elements such as buttons. It had also taken the liberty of completely changing the application name and other names that were present in the UI. When asked why it did those changes, it said that it took some creative liberty because its internal directive is to “Wow” the user with its frontend design.

Gemini models didn’t perform well when they were used via this tool. It felt like I would have to do more hand-holding to get the correct result. Regardless, it feels like this product is good if you can tolerate the usability friction.

OpenCode

OpenCode can be considered an open-source alternative to Claude Code. Currently, you can use GLM 4.7 for free in it. The tool opens in plan mode by default. Once you are satisfied with the plan created by the model, you can change it to build mode. When I used it, GLM 4.7 presented the plan and asked me various questions seeking clarification and preferences. This was a good experience.

The model was able to implement the new feature correctly. However, when the codebase grew larger, the model couldn’t implement a new feature correctly. In fact, it had trouble fixing the issues. For example, at one time, the model thought that there was a typo, but in reality, there wasn’t, so I had to guide it to resolve the issue.

Nonetheless, this tool is highly customizable where you can edit system prompts of sub-agents and select which model provider you want to use. Overall, using this tool felt really smooth. Highly recommended if you prefer an open-source CLI.

Amp Code

Amp Code is a unique CLI tool that has a free tier with ads! They offer $0.42 in credits per hour, with a $10 daily limit. In the free tier, it uses auto mode which selects different models based on the task at hand.

I used it to perform a few tweaks to the codebase which it handled without any issues using Claude Haiku 4.5. However, unlike other CLI tools that generate a summary of the context in order to save tokens, this tool doesn’t do that.

This is a good tool which has the minimum required features for coding. If you are someone who prefers minimalistic tools, then Amp Code is for you.

Claude Code

Claude Code is a CLI tool for performing a wide range of agentic tasks. It was earlier created for coding purposes which has since evolved into a full AI Agentic Tool. Going by the public posts about Claude Code, I think it is safe to say that it is being considered State-of-The-Art (SOTA) right now.

Claude Code has a learning curve where you need to learn about how to customize sub-agents, using hooks, MCP, skills, etc. Folks have created additional plugins such as Ralph Wiggum for Claude Code which allows it to perform long-running tasks independently. Moreover, you can use non-Claude models in Claude Code via Openrouter integration.

I tried using other models such as kat-coder-v2. It didn’t work as well as Opus 4.5. Ultimately, I switched to using Opus 4.5 in Claude Code.

If you want to work with an existing codebase, the first thing you have to do is to use the command /init which creates a CLAUDE.md file. This file acts as a constitution for the model. You can include additional coding guidelines in it.

Once this was done, I asked the model to fix certain bugs. As expected, the model was able to one-shot them. I noticed one issue: high token consumption.

The model consumes a lot of tokens even for performing small changes. So, I recommend using it only if you can get a subscription.

The tool is highly customizable, and you can run multiple parallel sessions and have each session perform different tasks. In fact, it can also be used for non-coding purposes such as booking a flight!

Other than high token consumption, I didn’t face any issues. I can see why it is being praised a lot. But does this mean that it can one-shot any coding challenge? Technically yes, but depending on the task, it requires you to do a bit of heavy lifting.

Take this tweet for example: https://x.com/snewmanpv/status/2008002812471586828 and see how detailed the prompt is. The user has provided all the details so Opus 4.5 can come up with an implementation plan and generate code. People have reported that they are able to one-shot certain tasks by giving vague details.

But to ensure that the model generates the code correctly, it is better to provide all the details.

Windsurf

This felt like a polished version of Copilot. They offer 25 free prompts, which I used to test GPT 5.2. In terms of the code that was generated, it feels very close to Opus 4.5 if not better at certain times.

This tool has a plan mode (cascade) and ask mode. The model, in cascade mode, first creates a TODO list and executes them one by one.

GPT 5.2 was exactly what I asked it to do. I didn’t have to create a file summarizing the codebase. The model fetched all the relevant files and applied the fix correctly.

They offer unlimited tab completion and inline edits as part of their free plan. So, if you are a developer who doesn’t really use AI a lot and prefers tab completion, then Windsurf is a good choice. But it doesn’t have advanced features seen in other Agentic Coding tools.

Droid by Factory AI

Droid offers access to GPT, Claude and GLM models as part of their free plan. The CLI tool has features such as custom sub-agents, hooks, compressing the context etc. Now, if you want to use your custom models from a provider then you need to manually edit the config files. This is not ideal. OpenCode doesn’t require manual editing of config files, especially for choosing models and providers.

I used the model GPT-5.1-CODEX-Max Extra High. Compared to Windsurf, the GPT model performed better. It was able to one-shot all the tweaks and fixes that I asked. However, if I used a model from OpenRouter, it didn’t do well.

Compared to Claude Code, Droid is a viable alternative if you don’t want to spend time configuring things. Also, the sessions are synced to the cloud, so technically you can code from your mobile phone.

Gemini CLI

I had a rough experience using Gemini 3 in Antigravity. I was expecting the same when I used Gemini CLI. To my pleasant surprise, using Gemini models via CLI was the best experience. In this harness, it genuinely felt like a top model.

Now, the CLI tool doesn’t have all the features that Claude Code has but the existing features work really well. Similar to CLAUDE.md, I had to create a GEMINI.md first. Once done, the model was generating correct and consistent code.

But compared to the Antigravity and OpenCode, it seems to lack an explicit plan mode. I have to ask the model to do planning first and only then implement the changes.

If you like Gemini models and want to use them exclusively, then use Gemini CLI. This is the best way to use it for coding.

Kilo Code

Kilo Code is a good implementation of Agentic Coding in an IDE. It has various modes such as “Ask”, “Code”, “Architect” etc., which are really helpful; however, it doesn’t show the changes made to the files clearly like other IDEs do.

One of the unique features that this IDE has is checkpoints which help in reverting files; however, it appears to be buggy.

Moreover, this tool is good if you want to try models other than GPT, CLAUDE and Gemini. It offers a bonus if you purchase credits. You can still use it via API Key from other providers.

I have used it to add a complex feature using the model MiniMax 2.1 however the implementation wasn’t complete. The components it wrote were correct, but they weren’t being called so it didn’t work.

Overall, I feel like this IDE is suitable if you intend to use cheaper and open-source models for coding. However, you will have to do a lot of hand-holding to get the work done correctly. Instead, I would rather use it for code review or understanding the architecture because the tool generates good-looking Mermaid diagrams showing the architecture.

Kiro Code

Kiro Code is by Amazon. It has IDE and CLI versions. I used the CLI version. The CLI tool reminds me of Claude Code. It has a lot of the same capabilities that Claude Code has like hooks, sub-agents, MCP integration etc.

I have used this in Auto mode where it uses Claude models such as Haiku, Sonnet, and Opus. Overall, the experience is quite good. If you cannot afford Claude Code then Kiro Code is the next best option, at least for enterprise users.

Goose Agent

Goose Agent is a new tool which is under development and feels like it aims to be an AI Agentic Tool that can perform any task. I used the GPT-5.2 model, provided via Tetrate, to make certain changes to the project. While using it, the model said that it can start a browser and take a screenshot of it to see what the bug was in the UI. However, it couldn’t do so.

This tool has a lot of potential in surpassing Claude Code in terms of capability but time will tell. Overall, keep an eye on it.

Final Thoughts

Agentic Coding will get better as the models get smarter and managing context becomes easier. A skilled coder can generate code much faster than an amateur coder using these tools. If you are a beginner learning how to write code, I think it is better to also learn how to design systems as well.

Since we are living in an AI bubble, we will see a lot of new tools or existing tools upgraded with new functionality. There is no telling which of these tools will become the standard in the future. So, it is best to have an open mind and try various tools and become familiar with them. If there is a tool that you enjoy using, then keep using it but at the same time, keep checking out new tools.

Additionally, we are seeing a growing number of new models being released every year. So, it is in our best interest to try these different models whenever possible. If your current chosen model fails at a given task, then switch to another model.

Note about Google

Google is dropping the ball when it comes to exposing their frontier models, especially for coding. We have three tools: Antigravity, Gemini CLI, and Jules Coding Agent. Why can’t Google consolidate them into one product with different versions like Amazon has done for Kiro Code?

If they could do that and fine-tune their model to work within this harness, Gemini would surpass Opus 4.5 easily. Despite being generous with their API limits, it is quite sad to see how the models do not perform so well when used via their products.

Table of Contents