In July of this year, I quit my job. That was a terrible decision, but I don’t want to dwell on it. In August, I signed up for the Claude Pro plan, figuring I’d need to learn how to code with LLMs to stay competitive in the job market. What follows is a brief synopsis of how I came to develop SourceMinder, A Context Aware Code Search for Solo Devs and Claude Code.
TLDR: After running into context window issues on my first two projects, I developed a tool for making Claude Code use fewer tokens by creating an indexer that provides context in the search results. Built with sqlite and tree-sitter, it currently supports the following languages: C, Go, PHP, Python, and TypeScript. Get the code here: https://github.com/ebcode/SourceMinder
Pro…
In July of this year, I quit my job. That was a terrible decision, but I don’t want to dwell on it. In August, I signed up for the Claude Pro plan, figuring I’d need to learn how to code with LLMs to stay competitive in the job market. What follows is a brief synopsis of how I came to develop SourceMinder, A Context Aware Code Search for Solo Devs and Claude Code.
TLDR: After running into context window issues on my first two projects, I developed a tool for making Claude Code use fewer tokens by creating an indexer that provides context in the search results. Built with sqlite and tree-sitter, it currently supports the following languages: C, Go, PHP, Python, and TypeScript. Get the code here: https://github.com/ebcode/SourceMinder
Project 1: SiteFerry
The first project I attempted with Claude Code was a suite of bash utilities to download a remote website (files and database), and get a local ddev instance running. Here are a few things I learned from working with Claude Code on that project:
- Claude can write a LOT of code, very quickly. 5 - 10 lines per second.
- You have to review EVERY line of code Claude generates.
- Without a solid design, the code will get away from you. Take time to refactor during development.
- Bats is an excellent testing framework for bash.
- Automated testing is essential, as Claude can use the failing test output to correct mistakes.
As I was developing this bash project (SiteFerry, I called it), I kept thinking to myself, "Will I be able to maintain this code without Claude?" I was still "kicking the tires" with Claude, and I wasn’t sure the $20/mo. was worth the price. And now, I had over 3500 lines of bash code to maintain. And because I hadn’t written the code myself, only reviewed it, I didn’t have the same relationship with it that I have to code I write myself. So, what to do?
Project 2: Functional Bash
That’s when I had my next idea, a functional programming library for bash. My thinking was along these lines: "I know that functional code can be much more expressive (and compact) than imperative code, and if I can refactor these 3500 lines of bash to use a functional style, I can reduce that line count to something more reasonable, maybe even under 2000 lines." Something I would feel more comfortable maintaining.
So, I searched around the web for functional bash libraries, and while there are a few excellent projects on GitHub, they didn’t scream "production-ready" to me. The scream sounded more like, "Personal itch!" So that’s what led me to my second project with Claude Code. But I’ll spare you the details. It was again, in bash, and I just ran right into the same issues I did with SiteFerry. Too much code generated, and I still wasn’t doing the up-front design work that I knew (/hoped?) would keep Claude on track. But I did learn one more important thing during that project:
- At the end of each session, have Claude write a Summary.txt file that it can read at the start of the next session.
My prompt for that is currently: "Summarize the work done in this session to a timestamped work_summary_TIMESTAMP.md file, include a few forward-looking statements about potential next steps at the end."
Project 3: Canvas
Me: You know what I’m always complaining about on other people’s projects? Lack of documentation. What I need for my functional bash project is a good design document, with DIAGRAMS!
This was surely a faulty idea of mine, but it did pry me out of the clutches of bash, and into the enthusiastic embrace of TypeScript. So I started looking around for diagramming tools. I wanted something that would let me draw DataFlow Diagrams, as well as Petri Nets, since I’d had another strange idea that combining these would somehow be awesome. Inkscape wouldn’t work, because I’d have to draw everything myself. Dia was the closest thing I could find, but it didn’t win me over. It didn’t have tools specifically for either DFD or Petri Nets, and it doesn’t appear to be under active development.
In all honestly, I was probably just looking for an excuse to start another project, but I thought: "The diagramming tools don’t have any built-in sense of grammar. In a DFD, you can’t draw a box to a box, that’s illegal. Likewise, in a Petri Net, only certain connections are allowed. What would be great would be a diagramming tool that understood what kind of diagram you were drawing, and would only allow you to draw arrows between shapes if they matched the grammar." Remember, I’m under-employed.
So, for want of a better name, I start on "Canvas", the DFD/Petri Net diagramming tool, in TypeScript. At this point, I’m actually having a lot of fun with Claude Code, and feel like I’m really getting into the swing of things. I’m reading John Ousterhout’s "A Philosophy of Software Design", I’ve got a design document, I’m challenging Claude when it generates something questionable, and I’m making tremendous progress on the diagramming tool. The $20 per month seems worth it, two months in.
That’s when I ran into the issue that leads me to SourceMinder.
The Context Window
Every new session with Claude Code starts the same:
- Read the work-summary file from the last session.
- Now read the design document.
- Show me the plan for the next feature implementation.
- Begin the implementation.
As the diagramming tool codebase is growing in size, the beginning of each session with Claude is using more and more of the context window. It’s really in Step 3 – Claude’s "planning mode" – where a lot of the tokens are getting used up. The LLM says, "Sure, let me familiarize myself with the current code to see how to implement this next feature." Then it starts lising directory contents and reading files... and grepping.
Claude: [thinking] Oh, okay, this next feature has to do with how the mouse pointer drags a new box, let me search for "mouse". Hm, no results. Okay, how about, "pointer"? Still no results. Ah, it’s called "cursor" in this code. Got it!
And, of course, when we start the next session, Claude will have forgotten all this, and go searching for "mouse" again. If you’ve worked with Claude Code, you know about the CLAUDE.md file. And one solution to this problem is to put a "memory" in the CLAUDE.md file saying, "In this project, we refer to the mouse pointer as a ‘cursor’. When searching, search for ‘cursor’".
But how many such terms will we have in this project? How many key terms are there in a project with 10,000 lines of code, and will I need a CLAUDE.md memory for each one, to prevent the fruitless searches? The context window for Claude is large, but not infinite. Every fruitless search uses more context. But also every line in CLAUDE.md uses more context. So there’s a tradeoff. Either let Claude go ahead with the fruitless searches each time, or curate a list of "known symbols" in CLAUDE.md to steer the LLM in the right direction.
SourceMinder
October 6th was the day that I hit on the idea of a "code index" for Claude to use. It was funny, in a way, because when I first proposed the idea, Claude was decidedly against it. And it raised some good counter arguments. Here is the first half of Claude’s reply:
Claude: An index sounds appealing initially, but I don’t think it would actually solve the problem effectively. Here’s why:
Problems with a full word index: - Would be massive (10k LOC could have 50k+ word occurrences) - Most words would be noise (keywords: if, const, function, etc.) - Lacks context - knowing user appears on line 247 doesn’t tell me if it’s what I need - Stale the moment you edit code - Reading the index itself would consume significant tokens
The real issue you’re describing isn’t about finding words that exist - Grep does that instantly. It’s about me not knowing what to search for in the first place.
So, I addressed each of Claude’s concerns:
- We remove the "noise". All common, short English words will be out (a, in, the, of, ...), as well as code keywords (if, function, class, ...).
- We add "context" to the index. Each symbol in the index is a row in a table, with a ‘context’ column that tells us what the symbol is: function, filename, comment, string, ...
- For staleness, there is a file watcher (daemon), that updates the index whenever a file is saved.
- You don’t need to read the entire index, you query the index, and it returns a small subset of results.
Okay, now, I know what you’re thinking. Which is that in 4, "query the index", I haven’t really solved the problem of Claude not knowing what to search for. And that’s true. I haven’t. But what I have done is solve the problem of false positives. When Claude (or any user, for that matter) searches a codebase for a term and gets results, only to go read the file and find that the result was in a bit of commented out code, or a string, and not part of the ‘code proper’, that’s wasted effort. With a code index that you can query and exclude comments and strings, you’ve already eliminated a bunch of fruitless searches. So yes, Claude still needs to search, but the search is now context aware, which is something that grep can’t do.
So! Having assuaged Claude’s fear of: too many results, too much noise, lack of context, and a stale index, we set to work implementing the idea.
It’s now been just over nine weeks of development, and it’s somewhere close to 9,000 lines of C11 code. There’s just two dependencies, sqlite, and tree-sitter. I had the first version working in TypeScript when I realized that it needed to be much, much faster. So, buyer beware here, but I’m not an expert C programmer (– but I am an expert PHP programmer. Here’s my resume). And if you reading this, happen to be an expert with C, please, go easy on me! I didn’t write every line, though I did write some, and I’ve read all of them. I would say that they are understandable, to a C amateur like myself.
Anyway, I think that what I have works, and it works well. In the past nine weeks, I’ve added a bunch of features that make SourceMinder exceptionally well suited to preserving the context window when used with Claude Code. I’m finding the following prompt works well to get Claude to use qi (the SourceMinder ‘query-index’ tool):
In this session, I’d like you to use qi in lieu of your usual Search/Grep/Read tools. Start by running: qi –help
But it’s also useful as a grep replacement for solo devs. No more hunting down commented-out code!
Today, I’m releasing it under an Open Source license (GPLv3). My hope is that you will find it useful.
I’ll be happy to review pull requests and issues on GitHub, and answer any questions you might have (within reason).
Happy hacking!
Check it out here: https://github.com/ebcode/SourceMinder