Before we start explaining, letâs share a couple of screenshots.

Behind the scene â Story
I wanted to use Regolo.ai for a RAG system, but all the openâsource projects I find rely on a different SaaS and not on a free, serviceâagnostic solution.
Regolo is free until the end of December and, as an AI inference provider, offers various models that are GDPRâcompliantâsomething I value for internal company projects. [After the recent Anthropic incident](https://www.techradar.com/pro/security/anthropic-claims-chinese-hackers-hijacked-claude-to-launch-ai-orchestratâŚ
Before we start explaining, letâs share a couple of screenshots.

Behind the scene â Story
I wanted to use Regolo.ai for a RAG system, but all the openâsource projects I find rely on a different SaaS and not on a free, serviceâagnostic solution.
Regolo is free until the end of December and, as an AI inference provider, offers various models that are GDPRâcompliantâsomething I value for internal company projects. After the recent Anthropic incident, where hackers could interact with other usersâ data, this compliance is especially important.
Spoiler: Iâm part of the Regolo team.
So I wanted to try also GitHub Copilot with their agent to change the codebase of a repository, as for OSS projects is free for now.
Also, I switched recently after years of NeoVim to PyCharm/WebStorm, so I wanted to look on how to do a plugin, as for NeoVim are very simple. TLDR: It is not simple as NeovimâŚ
In this way with various stuff I could update my skills and learn a bit of everything how it works.
My requirements
I imposed a constraint on myself: I will use SQLite as the database because it is the most popular choice among developers for selfâhosted solutions.
The first step is to use the sqliteâvector extension, which adds vector support to SQLite, followed by adopting uv (I used to prefer Poetry, but uv has replaced it).
The next step is to employ two different models from Regolo. Since the project is OpenAIâcompliant, switching the inference provider is sufficient. I selected the following models:
- Qwen3âEmbeddingâ8B
- qwen3âcoderâ30b
My plan is to build a RAG system from scratch that stores files as vectors using the embedding model, then queries those vectors to retrieve the most relevant files for a coding question about the codebase (including its dependencies) using the coding model.
With these requirements in mind and a .env file to allow anyone to customize the settings, I began prompting.
The development decisions
I started building a RAG system with a web UI from scratch. After a while, I realized it wasnât performant and that it was uploading the entire file contents to the database, creating a monstrosity of several hundred MB. I switched to PicoCode to index the codebase, and the situation improved gradually.
When the web UI was finally working, I began working on a PyCharm (IntelliJ) plugin with a CI pipeline to build it.
During this process, I discovered several things that were new to me or that I hadnât considered:
- Why build a RAG system from scratch? It is very complex, and tools like LlamaâIndex (the Regolo integration article was handy) already handle most of the heavy lifting. Using such a library reduces the amount of agent code you have to understand each time.
- Coding models arenât updated automatically. The models used by Copilot and QwenâCoder keep an inâmemory snapshot of older versions of libraries and APIs. For example, Copilot referenced outdated IntelliJâplugin APIs and couldnât explain why my IDE refused to install the pluginâthough the IntelliJ log made the problem obvious. The issue turned out to be that it was also using an old version of Gradle and Kotlin that were poorly built.
- Agents generate a lot of documentation. They like to write extensive project notes and âwhatâtoâdoâ files, which I donât want cluttering the repository, especially while prototyping and constantly changing the code. I kept the RESTâAPI and IntelliJâplugin documentation in a separate location for anyone who wants to contribute or extend the project to other IDEs.
- Retroâcompatibility vs. rapid prototyping. The agents were diligent about maintaining backward compatibility, which was unnecessary for my experiments and forced me to clean up the code frequently.
- Commentâwriting habits. LLMs love to add comments everywhere. Even when I asked them to limit comments to function docstrings, they still inserted many inline notes.
- Mixedâlanguage prompts. When I mixed Italian and English, the model understood the intent, but Copilot reproduced my typos and extraneous wording in PR descriptions.
- Multiâtask, multiâlanguage commits. Agents can perform several tasks in a single call, using different programming languages, and can create separate commits for each task.
- Repository access problems. Initially, Copilot (and its agent) couldnât read the repository, so I had to copyâpaste files manually because it didnât generate a zip archive. Occasionally it missed files, leaving dead code behind, which required multiple followâup requests.
- Tooling constraints. Copilot ran code linting and CodeQL checks on every change; adding a copilotâinstructions.md file wasnât enough to suppress these checks. CodeQL can only be disabled during agent execution, not via the static instructions file. I preferred to run CodeQL after the project was finished, which made the agent faster.
- LlamaâIndex limitations. While useful, LlamaâIndex relies on Treeâsitter for code understanding. The Treeâsitter language packages are abandoned and donât support the latest Python version, forcing me to research correct dependency versionsâtrusting the agent alone wasnât viable.
- FastAPI background tasks. FastAPIâs
BackgroundTasksprovide a better alternative to raw threading and avoid blocking the web server.
During development, I wanted to add the following features:
- A fileâwatcher to automatically update the RAG index.
- A caching layer.
- A logging system that can gracefully shut down the web server for debugging.
- A rate limiter.
- A projectâmanagement system with separate databases per project.
- A task queue in the database to avoid SQLite locks.
- A connectionâpool for SQLite (especially important when the
sqliteâvectorextension isnât loaded).
My approach was to handle the small bugâfixes and versionâchecking tasks myself rather than asking the agent, keeping the changes minimal and focused.
Learnings
- Always verify the software versions when a crash occurs. Many factors can cause failures, and the agent may attempt to modify the codebase without truly understanding the underlying error.
- With Vibe Coding you need extensive codeâreview experience, because you must comprehend the code without running it and also be able to add features or suggest changes. This makes the overall workflow faster, although the free plan adds some latency.
- Task execution is not instantaneous; depending on the complexity, a single operation can take five minutes or more.
- The agent often generates numerous constant definitions and parameters, even when they are unnecessary. AIâdriven codeâoptimisation is still weak at removing such redundancies.
Costs
These are my costs (Regolo is free at the moment) for the last two months.
As you can see, most of the expense comes from calls to the TelegramTranscriber , which I also use to transcribe video meetings. Consequently, the actual cost for embedding and codingâusing the coding model in PyCharm via the ProxyAI plugin (for other stuff) âis very low; for the PicoCode case it is probably under 1 euro.

Thank you for reading about my experience with agents, Vibe Coding, and related tools.
I use PicoCode daily on my projects, and the SQLite database can grow to around 300â400 MB.
Liked it? Take a second to support Mte90 on Patreon!