Don't vibecode! Break it down. | Blog

Mr. RIGANTI is our new SaaS that enables running AI agents directly in Azure DevOps. Check it out! If you already use GitHub, the same principles apply too - just substitute Mr. RIGANTI with GitHub Copilot.

A couple of days ago, I wanted to add a new feature to Mr. RIGANTI, and I realized that my development workflow had turned upside down. My brain usually works much faster than my fingers, so I often build the high-level plan in my head, then just have to wait until all the code is typed. Recently, I caught myself rarely starting any task by touching the code. Writing or editing code became the last resort option. By default, I ask the AI agent to implement something, and in a few minutes, I review the code changes and verify they are correct. When I start building a new feature, I usually create a work item and assign it to Mr. RIGANTI to track project activities. In the later stages, when I verify the code, I open Visual Studio to see the changes in context, make additional edits using GitHub Copilot directly in Visual Studio or VS Code, and verify the changes. When I am happy with them, I commit the changes to the branch. It is unfair because most of the changes are made by Copilot, but I am listed as the author of that commit.

Most of the time, the AI-generated code is fine, and my comments are either small change requests (it would be more elegant to move this method here, or reuse the same functionality in another place), or I find out I asked for the wrong thing - the design needs to be changed. It happens to me a lot, and it happened even before using AI - I designed something, implemented it, and realized it needed to be changed. Looking at the "wrong" code works better for me than poring over long specification documents or staring at diagrams. If the design is wrong, there is usually something off in the code - something looks suspicious, the things are too intertwined, or it just doesn't look elegant.

So far, it works flawlessly for small and medium-sized features. When the AI agent produces 5-20 changed files, it is quite easy to review them and ensure they are correct. When the feature becomes too large, I break it into multiple steps. Not because I am scared the AI agent wouldn't be able to do it completely, but mostly because of the limitations of my own brain's context. When I get a PR with hundreds of files, it is impossible to do a proper code review. And I am still not ready for the world where we will just accept AI-generated code without scrutinizing it first. How long will it hold? I do not know, but I've seen thousands of examples where looking just at the outcomes led to catastrophic results.

"Run from branch" feature in Mr. RIGANTI

When I assign a task to Mr. RIGANTI, he usually creates a new Git branch and implements the change. Along with reviewing the code in the pull request, we also need to test the feature locally, which takes time. I need to checkout the branch in Git, which sometimes means that I need to stash the other changes because I am working on some other task at the moment. Then the application needs to compile and run, which also takes some time. When I just need to check something trivial (moving a button to another side), the overhead of testing it is way longer than the actual check, and I have to do it on a machine where I have all the software and dependencies.

I can easily imagine that some smaller changes may be requested even by my non-developer colleagues, and it would be nice if they could check the results immediately. They may not even know what Git is, but asking Mr. RIGANTI is so easy, and if they "implement" something and can test it, we can get it into production faster.

This is the goal of the "Run from branch" feature. When Mr. RIGANTI finishes a change in a pull request, I want it to generate a link to a special page where we can run the app and test it out directly. In the background, it would spin up another Azure Pipeline (we already have an infrastructure for that), clone the repo, build the project from the correct branch, run it (inside the build agent), and expose the ports the app listens on using some kind of IP tunnels, so the user can interact with it from their computer. These tests will take just a few minutes, so I don't mind blocking the agent pool - there just needs to be some automatic cleanup mechanism.

We use Aspire in most of our projects, and we wanted to reach a state where the app can run inside Azure DevOps pipelines anyway - this would allow Mr. RIGANTI to launch the app and use Playwright MCP server to browse it, and so on. Aspire makes it easy to provision the database and other resources the app needs directly on the build agent, and it takes only a few prompts to seed some reasonable test data, making these tests meaningful.

The process

As you can imagine, the feature is quite big, and the previous three paragraphs are not a complete specification. There are a lot of decisions to be made, but I am not able to think about them up-front. If I tried, I might have found a decent number of them, but I would surely miss some important ones. With AI, I can just start sketching this up.

First commit

The initial prompt for Mr. RIGANTI was this:

We already run an Azure Pipeline to perform code reviews and implement changes in a PR. Prepare the foundations for the new agent run action type called Run. Instead of running AI agent, we will run a predefined command (let's assume we'll have a .aicoder/run-settings.json file in the project repo that will contain workingDirectory, executable and args properties to define what will be launched; also, add there type="aspire" becasue we will assume the app is using Aspire and we will need to discover the ports the application uses).
Implement the agent side: instead of running an AI agent, run the specified process and wait until it completes.
We'll also need an infrastructure to send messages back to the agent - currently, the agent can post events to the web app, but there is no way to send messages back to the agent. Implement a SignalR hub that the agent can connect to (only in the Run scenario) so we can send messages from the API. Currently, only the Stop command will gracefully end the Aspire app (send a Ctrl-C signal to the process).

As you can see, the prompt is quite vague. Some of its parts are just sketched, while in others it goes deep into the details. I do this intentionally - when there is something I care about, I describe it exactly, but the rest is just a raw sketch. The model doesn't mind, and the result was mostly what I expected. We had an enum specifying the agent's action types. Mr. RIGANTI found it, added a third type there, then proceeded with the SignalR hub, using proper authentication configuration, and made a special case for this type of run in the agent main "procedure". There were some shared components (logging, process crash handling, and output collection) that should have remained in both cases, but some things made sense only in the original workflow, not in the new one. This wasn't clear from the task description, and I didn't think about it. This is exactly the example of stuff I discover only after I can see the code.

Because of the added code, the class grew a lot - the first workflow was already long enough, and the second one got quite complicated too. I didn't bother to verify the code at this point - my only immediate comment was to split this procedure into two classes: one will handle running AI-agentic tools (which was already present), and the other will handle running the application in the "try from branch" mode.

The prompt:

Handling of git branches, git config, and authentication is needed only in AI coding actions.
It would also be wise to extract the code review/implement and run actions into separate classes; this file is getting too long.

Second commit

Mr. RIGANTI implemented my comment as I wanted: each workflow got its own class, making it easier for me to read and understand. I identified a cluster of methods that apply only when the app uses Aspire. For now, it is the only supported case, but in the future, we'll want it to be extensible.

I marked the line with one method I wanted to move to a separate class:

Extract this as a separate class - parsing of the JSON file may get way more complicated than this in the future.

Third commit

Mr. RIGANTI moved the method I marked, along with all related methods, to a separate class, exactly as I wanted. I discovered a similar situation in the remaining code, so I added another comment.

The methods that concern the SignalR hub client should also be extracted to a separate class (maybe in the Api directory).

I noticed that it's exactly how I would instruct junior developers. The code they wrote usually works, but the structure can be improved so related things sit close together and can potentially be reused elsewhere. Basically, it's just applying SOLID principles.

Fourth commit

When everything related to SignalR communication was placed in the RunControlHubClient class, it was way easier to review. When the logic had been just a bunch of private methods inside a class that did other things, I could easily miss something.

Thanks to this reorganization, I found duplicate private helper methods.

We have these methods twice in the codebase.

I didn't even write what I wanted the model to do - it understood my intent.

Fifth commit

At this point, the code was in good shape, and the PR could be merged. Previously, I was never a fan of Trunk-based development, where you can commit unfinished features to the main branch and use feature flags to enable them once they are ready. But in the case of Mr. RIGANTI, this approach is actually quite sensible. In smaller PRs, it is easy to validate this change, and since it adds a workflow that is never entered, the risk of having it in production is minimal. I didn't merge the PR at that point, but looking back, I should have - it would rescue us from ending up with a large PR which had to be reviewed commit by commit (or precisely, cluster of commits by cluster of commits).

The prompt:

Now I need to implement the next step. Once we run the process, we'll need to determine what ports the application runs, so we will be able to set up IP tunnels (we'll do this later). Add IAgentRunPortDiscovery interface and implement a provider for Aspire (when config file specifies type=aspire) that will detect what ports the Aspire resources use (resource name, endpoint name or index, port number and type - for now, let's do only Tcp).
I am not sure what way we should use to communicate with Aspire - search and identify possible options. If you need clarification on which option would be more suitable, do not proceed with the implementation and ask.

Sixth commit

Mr. RIGANTI didn't ask me anything (he can - he does it by just replying to the comment that triggered him). He found that he could run "aspire describe" to list all running app hosts and obtain a JSON with all the endpoints. The code looked good - it was another bunch of classes and infrastructure, but because of the complexity, I couldn't tell if it was correct or not - now was the point I had to run it.

However, I noticed I forgot a small but useful feature:

Detect whether Aspire CLI is installed and install it automatically or try to auto-upgrade it.

Seventh commit

More code was added, and again, since it runs some processes and works with the output, I had to run it locally to try it out.

I opened Visual Studio and tried it - it was just a bit complicated to set up, and I had to add more web tunnels because I needed Azure DevOps to be able to send webhooks to my application. I found a couple of issues that AI naturally couldn't see.

For example, when you run aspire describe --non-interactive --nologo --format Json, you would expect that the output will be only hte JSON. However, the first line was Scanning for running apphosts.... I tried to play with the args, but I was unable to get rid of the status line, so I had to write my first line of code that trims the output until the first { character, hoping that they would not add status messages containing this character.

There were a couple of other tiny issues like that - I mostly had to reorder some operations, because you need to first run the process and then ask for the open ports, and there needs to be some retry logic because starting up the application can take some time. I implemented these edits directly in Visual Studio using GitHub Copilot chat window.

I managed to get to the point where the agent runs the app using the settings in the JSON file in the repository, and queries Aspire for the ports the application listens on. We could also connect to the SignalR hub and stop the application (and agent) when the Stop command was received.

Detour - Websocket-based IP tunneling

The next part of the feature was to establish IP tunnels so we could connect to the ports opened by the application on the build agent. The build agents run on our internal network and are not accessible from the Internet. Some users of Mr. RIGANTI may use either Microsoft-hosted or local on-premises deployments (usually virtual machines or containers). Therefore, I needed a solution for tunneling the ports to the local machine.

I used another AI (Perplexity) to discover what options we have. I won't go into the details here, but the most flexible option was to implement a simple IP tunneling solution using WebSockets. There is an existing implementation in JavaScript, but I didn't want to run other processes - I wanted something that could be hosted in the agent process. The fewer things we install and depend on, the better.

I wanted to make a proof of concept first, so I opened Visual Studio, created a new solution, and had Copilot implement the IP tunneling using WebSockets. The prompt was basically Perplexity's output, with two sentences stating that it must be a Class library because I need to embed the remote side of the tunnel into an existing agent application.

Copilot sketched the implementation, and it worked immediately. You run a web application that works as the middle tier - it connects remote and local ends of the tunnel based on session IDs. Then you run the same "client" console app on both the remote and local ends of the tunnel, specifying the list of ports you want to forward.

I saw that this solution may be useful in other scenarios as well, and we have a company repo called "Shared Infrastructure", so I asked Copilot to embed the solution there (with renaming and cleaning up the projects to follow the repository conventions). Our shared infrastructure already contains over a hundred projects (mostly class libraries) that implement commonly used things such as persistence of domain objects using Marten and PostgreSQL, sending emails using various providers, abstraction over filesystems and various blob storages, useful helpers for caching, and plenty of other features. WSTunnels became yet another component, with a shared library implementing the internal protocol primitives, client- and server-side libraries, and test server and client apps.

Everything that belongs to our shared infrastructure should be tested, so my next request for Copilot was creating E2E (end-to-end) tests. When I use AI to generate tests, I am extremely careful - the agents are sometimes lazy and generate tests only for the trivial cases. It is important to ensure the proper thing is being tested, and it was the case - I had to further ask for specific kinds of tests for simultaneous sessions and multiple ports.

Mr. RIGANTI project uses our shared infrastructure as a submodule, so after WSTunnels were implemented, I made another manual commit to Mr. RIGANTI repo to lift the commit ID of the infrastructure submodule. Actually, it was more complicated than that - WSTunnels are implemented in a feature branch that hasn't been merged to main, and Mr. RIGANTI already used a different feature branch, so I had to cherry-pick several commits.

After that, I asked Mr. RIGANTI this:

There is a new version of the submodule libs/infra with WSTunnel feature. Use it in the agent to expose ports discovered by Aspire (it will be the remote end of the tunnel), and add the server part to the main web app. The user will set up the local end of the tunnel manually using a separate console app we'll create later.

Eight commit

Never having heard of our implementation of WSTunnel, Mr. RIGANTI correctly found out what projects I speak about, and figured out how to use them. The code changes looked great.

But wait, we will need some authentication. I went back to the infra repo and asked GitHub Copilot to integrate the ASP.NET Core authentication handler mechanism into WSTunnel - if I specify an authentication scheme, the tunnel will have to be authenticated before the connections can be established.

This time, Copilot did something other than what I asked for. Instead of just wiring up ASP.NET Core Authentication handlers, it created an interface called IWSTunnelsAuthenticationHandler instead of using the ASP.NET Core built-in one. At first, I thought it was a mistake, but after a while, I noticed it might be useful to pass the first "Hello" message to the handler so we can get more information about the client than we could see from the HTTP request alone. So, I just asked to provide an implementation that can just use the specified ASP.NET Core authentication scheme.

After that, I asked Mr. RIGANTI to use the new version of the submodule and implement a WSTunnel authentication handler to check the project API key and agent run ID - we already have this implemented for the other agent run types.

Ninth commit

Mr. RIGANTI implemented the handler, and it appeared to work fine. When I tried it locally, I saw just a trivial flaw - the entire Mr. RIGANTI application is multi-tenant, and when we validated the API key, we didn't know the tenant ID yet - we identified the tenant by the API key, and then needed to use the tenant ID to query additional data about the agent run. However, these operations were done in the wrong order, so they ended up with NullReferenceException.

The fix was just to reorder a few lines, which I did myself.

Another issue was that the SignalR hub was registered as a Singleton (which was fine), but the authentication handler was Scoped, and Singletons cannot depend on Scoped objects. The fix was easy, but I had to run the project to discover the issue.

I struggled to get the tunnels working correctly for quite some time due to port conflicts. You run the Aspire project, which runs the apps on some ports, and you need to tunnel them to different ports, because both tunnel ends are on the same machine.

After doing so, I found another flaw in the WSTunnel implementation. To keep the WebSocket open, we send Heartbeat messages. It is just a simple message that the client sends to the server every 30 seconds, and the server sends it back. However, the AI generated the same mechanism on the client, so the heartbeat message was sent back and forth in an infinite loop. The fix was easy - I just told Copilot that I can see way too many Heartbeat messages when debugging the app.

Interestingly, the end-to-end tests didn't catch it, probably because they only tunneled one or two ports and didn't run long enough. In my real scenario, I had more ports in the tunnel, and the message probably got replicated by the number of ports, so no real traffic could get through - the queue was full of heartbeat messages.

After fixing the issue and updating the submodule, the tunnels started working.

However, I realized I made a design flaw. Mr. RIGANTI has two web projects:

The first app is the web interface, where you can configure the service, see the stats, and interact with Mr. RIGANTI. It also serves as an API, but there are quite a lot of changes, and it gets redeployed with every feature.
The second app is a proxy through which we communicate with AI models. Since the agent needs to interact with our Azure Foundry endpoint, and we cannot send the real API key there, we use project-specific API keys and tunnel traffic through our proxy, which calculates token usage and exchanges the project-specific API key for our own. This component is quite stable and doesn't need to be redeployed very often. It will also need to use different scaling patterns than the web app in the future.

It is not a problem when the web app goes down for a couple of seconds - all clients use retry logic that should deal with this. However, the WSTunnels would stop working, and we'd need to handle reconnects. A better place to host the tunnels' server component was the proxy project.

At this point, when I assigned such a task to humans, I always felt terrible and apologized. My bad decision resulted in work that had to be partially or even fully thrown away. Several times, I caught myself apologizing to the AI model, asking them to change something they had already finished, and that was doing exactly what I asked for. What a waste of tokens.

My next prompt was this:

I updated some files and tested the WSTunnel communication, but for the best reliability, I need to move the WSTunnel server part from the Web project to the Proxy. This means re-implementing the WSTunnel authentication handler to verify API key and session using the API, similar to what we do in ApiKeyValidationMiddleware. Look there: extract the calling API and cache the result in a separate mechanism so it can be reused for WSTunnel authentication as well. The agent-run ID and API key values should match those for the API communication.
Also, change the service URL from /ws/agent/tunnels to /wstunnel to be more consistent

Tenth commit

Mr. RIGANTI did a great job of doing exactly as I asked. Even more, I forgot to mention that the agent will need to obtain the new URL for the WSTunnel endpoint, since it cannot be inferred from the other API endpoints' URLs. The code change involved this adjustment as well.

When I tried it, it didn't work - the authentication got broken. It was my fault - I said that the API keys and agent run IDs are the same, but it wasn't true. The proxy used tenant-wide API keys, while we use project-wide API keys for agent API communication.

I was wrong - the API key the WSTunnel sent in the project's AgentApiKey, but the existing middleware checks for a tenant-wide API key for securing the communication with LLM. It would be nice to support both types of API keys. If the project-level key is used, we need to validate that the agent run belongs to the correct project. In the future, the LLM communication may also want to use project-wide keys instead of tenant-wide, so keep them there as well.

Eleventh commit

Again, the last code change made by Mr. RIGANTI was mostly correct - I just had to make a trivial single-line change coming from the multi-tenancy nature of our app. I ran the project, and it worked - the application was launched, the IP tunnels were set up, and I was able to connect to the other side by running a console app and playing with the app in a browser through the tunneled ports.

There will be more work on the error-handling side, but this will require further testing and data from pre-production environments to simulate real-world situations. However, this requires deploying the project in our test environment, where we use it on real-world projects. Actually, we dog-food Mr. RIGANTI first on Mr. RIGANTI himself.

As you can see, I haven't gotten to do any UI yet. To be able to test the feature locally, I manually inserted a special instance of the agent run in my database, and set up my environment to run the agent with this hard-coded agent run ID to try out the new workflow. For now, I can do the same thing in our test environment. Even if this feature gets to production, it shouldn't break anything, and nobody will see it.

Summary

This approach doesn't look easy and requires a deep understanding of the project, but it helped me identify numerous design flaws in my high-level plan. Thanks to breaking it down to smaller items, I had the chance to validate all code changes in context. I believe there is huge value in that, and I honestly cannot imagine asking to implement the whole feature and reviewing 100+ files at once.

I tracked the time I spent on the feature, and it was only about 2.5 hours (excluding the UI, but including building the reusable WSTunnels component). Doing this the old-fashioned way would surely take 20+ hours.

The feature is not finished yet, but if you find it useful, feel free to try Mr. RIGANTI.

I plan to write another blog post on how we use agentic AI to build user interfaces - it is an interesting story.