PicoCode - AI self-hosted Local Codebase Assistant (RAG) - Daniele Mte90 Scasciafratte

Before we start explaining, let’s share a couple of screenshots.

immagine

Behind the scene – Story

I wanted to use Regolo.ai for a RAG system, but all the open‑source projects I find rely on a different SaaS and not on a free, service‑agnostic solution.

Regolo is free until the end of December and, as an AI inference provider, offers various models that are GDPR‑compliant—something I value for internal company projects. [After the recent Anthropic incident](https://www.techradar.com/pro/security/anthropic-claims-chinese-hackers-hijacked-claude-to-launch-ai-orchestrat…

Before we start explaining, let’s share a couple of screenshots.

immagine

Behind the scene – Story

I wanted to use Regolo.ai for a RAG system, but all the open‑source projects I find rely on a different SaaS and not on a free, service‑agnostic solution.

Regolo is free until the end of December and, as an AI inference provider, offers various models that are GDPR‑compliant—something I value for internal company projects. After the recent Anthropic incident, where hackers could interact with other users’ data, this compliance is especially important.

Spoiler: I’m part of the Regolo team.

So I wanted to try also GitHub Copilot with their agent to change the codebase of a repository, as for OSS projects is free for now.

Also, I switched recently after years of NeoVim to PyCharm/WebStorm, so I wanted to look on how to do a plugin, as for NeoVim are very simple. TLDR: It is not simple as Neovim…

In this way with various stuff I could update my skills and learn a bit of everything how it works.

My requirements

I imposed a constraint on myself: I will use SQLite as the database because it is the most popular choice among developers for self‑hosted solutions.

The first step is to use the sqlite‑vector extension, which adds vector support to SQLite, followed by adopting uv (I used to prefer Poetry, but uv has replaced it).

The next step is to employ two different models from Regolo. Since the project is OpenAI‑compliant, switching the inference provider is sufficient. I selected the following models:

Qwen3‑Embedding‑8B
qwen3‑coder‑30b

My plan is to build a RAG system from scratch that stores files as vectors using the embedding model, then queries those vectors to retrieve the most relevant files for a coding question about the codebase (including its dependencies) using the coding model.

With these requirements in mind and a .env file to allow anyone to customize the settings, I began prompting.

The development decisions

I started building a RAG system with a web UI from scratch. After a while, I realized it wasn’t performant and that it was uploading the entire file contents to the database, creating a monstrosity of several hundred MB. I switched to PicoCode to index the codebase, and the situation improved gradually.

When the web UI was finally working, I began working on a PyCharm (IntelliJ) plugin with a CI pipeline to build it.

During this process, I discovered several things that were new to me or that I hadn’t considered:

Why build a RAG system from scratch? It is very complex, and tools like Llama‑Index (the Regolo integration article was handy) already handle most of the heavy lifting. Using such a library reduces the amount of agent code you have to understand each time.
Coding models aren’t updated automatically. The models used by Copilot and Qwen‑Coder keep an in‑memory snapshot of older versions of libraries and APIs. For example, Copilot referenced outdated IntelliJ‑plugin APIs and couldn’t explain why my IDE refused to install the plugin—though the IntelliJ log made the problem obvious. The issue turned out to be that it was also using an old version of Gradle and Kotlin that were poorly built.
Agents generate a lot of documentation. They like to write extensive project notes and “what‑to‑do” files, which I don’t want cluttering the repository, especially while prototyping and constantly changing the code. I kept the REST‑API and IntelliJ‑plugin documentation in a separate location for anyone who wants to contribute or extend the project to other IDEs.
Retro‑compatibility vs. rapid prototyping. The agents were diligent about maintaining backward compatibility, which was unnecessary for my experiments and forced me to clean up the code frequently.
Comment‑writing habits. LLMs love to add comments everywhere. Even when I asked them to limit comments to function docstrings, they still inserted many inline notes.
Mixed‑language prompts. When I mixed Italian and English, the model understood the intent, but Copilot reproduced my typos and extraneous wording in PR descriptions.
Multi‑task, multi‑language commits. Agents can perform several tasks in a single call, using different programming languages, and can create separate commits for each task.
Repository access problems. Initially, Copilot (and its agent) couldn’t read the repository, so I had to copy‑paste files manually because it didn’t generate a zip archive. Occasionally it missed files, leaving dead code behind, which required multiple follow‑up requests.
Tooling constraints. Copilot ran code linting and CodeQL checks on every change; adding a copilot‑instructions.md file wasn’t enough to suppress these checks. CodeQL can only be disabled during agent execution, not via the static instructions file. I preferred to run CodeQL after the project was finished, which made the agent faster.
Llama‑Index limitations. While useful, Llama‑Index relies on Tree‑sitter for code understanding. The Tree‑sitter language packages are abandoned and don’t support the latest Python version, forcing me to research correct dependency versions—trusting the agent alone wasn’t viable.
FastAPI background tasks. FastAPI’s BackgroundTasks provide a better alternative to raw threading and avoid blocking the web server.

During development, I wanted to add the following features:

A file‑watcher to automatically update the RAG index.
A caching layer.
A logging system that can gracefully shut down the web server for debugging.
A rate limiter.
A project‑management system with separate databases per project.
A task queue in the database to avoid SQLite locks.
A connection‑pool for SQLite (especially important when the sqlite‑vector extension isn’t loaded).

My approach was to handle the small bug‑fixes and version‑checking tasks myself rather than asking the agent, keeping the changes minimal and focused.

Learnings

Always verify the software versions when a crash occurs. Many factors can cause failures, and the agent may attempt to modify the codebase without truly understanding the underlying error.
With Vibe Coding you need extensive code‑review experience, because you must comprehend the code without running it and also be able to add features or suggest changes. This makes the overall workflow faster, although the free plan adds some latency.
Task execution is not instantaneous; depending on the complexity, a single operation can take five minutes or more.
The agent often generates numerous constant definitions and parameters, even when they are unnecessary. AI‑driven code‑optimisation is still weak at removing such redundancies.

Costs

These are my costs (Regolo is free at the moment) for the last two months.

As you can see, most of the expense comes from calls to the TelegramTranscriber , which I also use to transcribe video meetings. Consequently, the actual cost for embedding and coding—using the coding model in PyCharm via the ProxyAI plugin (for other stuff) —is very low; for the PicoCode case it is probably under 1 euro.

Thank you for reading about my experience with agents, Vibe Coding, and related tools.

I use PicoCode daily on my projects, and the SQLite database can grow to around 300–400 MB.

Liked it? Take a second to support Mte90 on Patreon!

Behind the scene – Story

Behind the scene – Story

My requirements

The development decisions

Learnings

Costs

Similar Posts