Everyone grabs the "best" embedding model and moves on.
Makes sense.
Who has time to test alternatives when you're shipping?
But "best" according to who?
I've been building an agent memory system for Claude Code.
And I kept hitting the same question: which embedding model actually works best for code retrieval?
Turns out, it's complicated.
Cloud model speed depends on where you are. I'm in Sydney โ a model that's fast in San Francisco might be sluggish here.
Network infrastructure matters more than people think.
Local models? Even messier.
Your hardware. Your OS. What else is running?
I've seen the same Ollama model perform wildly differently on the same machine depending on background processes.
So I built benchmarking into ClaudeMem. Test any embedding model โ remote APIs, local Ollama, whatever โ against your actual codebase.
I just wanted to let you know that on your machine. Right now.
Real retrieval metrics: NDCG for quality, MRR for ranking, Hit@5 for "did it actually find the right code."
The expensive model isn't always the best.
The "fastest" model isn't always fast for you.
The only way to know is to test it yourself.
If you're building AI agents and care about memory/retrieval โ drop a comment.
I'll share what I'm learning.
submitted by