6 min readJust now
–
Press enter or click to view image in full size
Photo by Aerps.com on Unsplash
Last week, I spent three days trying to solve what seemed like a simple problem. The solution that finally worked? Using a “dumber” model for most of the work and saving the smart one for the hard part.
Let me tell you what happened.
The Problem: Are These the Same Product?
I was building a system to organize retail product information. Given UPC codes, I needed to figure out: are these different sizes of the same product, or completely different products?
- Coca-Cola 250ml and Coca-Cola 1L → Same product, different sizes
- Coca-Cola and Sprit…
6 min readJust now
–
Press enter or click to view image in full size
Photo by Aerps.com on Unsplash
Last week, I spent three days trying to solve what seemed like a simple problem. The solution that finally worked? Using a “dumber” model for most of the work and saving the smart one for the hard part.
Let me tell you what happened.
The Problem: Are These the Same Product?
I was building a system to organize retail product information. Given UPC codes, I needed to figure out: are these different sizes of the same product, or completely different products?
- Coca-Cola 250ml and Coca-Cola 1L → Same product, different sizes
- Coca-Cola and Sprite → Different products entirely
Sounds simple, right? Just throw it at Claude and let the smart model figure it out.
1st Attempt: The “Smart Model for Everything” Disaster
My first approach was straightforward:
Press enter or click to view image in full size
But here’s what happened:
- Claude would hallucinate product names when UPC data was unclear
- Sometimes it would get confused and say Pepsi 500ml and Pepsi 1L were different products
- Processing 500 UPCs took forever
- Accuracy was around 75%
I was frustrated. Claude is supposed to be smart! Why was it making such basic mistakes?
Then I realized: I was asking Claude to do too much. It had to:
That’s not one task — that’s seven different tasks crammed into one prompt.
The Experiment: Decomposition
I stepped back and asked: “What does Claude actually need to be good at here?”
Answer: Comparing products to decide if they’re the same. That’s it.
Everything else — getting product names, extracting brands, parsing sizes — that’s just data preparation. I don’t need genius-level intelligence for data preparation.
So I redesigned the pipeline:
- Step 1: UPCItemDB API → Get actual product labels
- Step 2: Ollama (Llama 3.2) → Extract brand and size from labels
- Step 3: Python → Normalize and group by brand
- Step 4: Claude → For each brand group, determine variants
The Architecture That Actually Worked
Let me show you the code:
class ProductAgent(dspy.Module): def __init__(self): self.lookup_tool = lookup_upc_tool # External API self.extractor = dspy.ChainOfThought(ExtractAttributes) def forward(self, upc): # Step 1: Get real product data upc_data = self.lookup_tool(upc=upc) title = upc_data.get("title") # Step 2: Extract with Ollama (local, free, fast) preds = self.extractor( title=title, description=upc_data.get("description") ) # Step 3: Normalize with Python (zero cost, zero errors) brand = normalize_brand(clean_text(preds.brand)) size = normalize_size(clean_text(preds.size)) return {"brand": brand, "size": size, "label": title}
Then for the actual intelligence part:
# Group all products by brandbrand_groups = df.groupby('brand')# For each brand, ask Claude the hard questionfor brand, products in brand_groups: prompt = f""" These products are all from {brand}: {products[['label', 'size']].to_string()} Which ones are size variants of the same product type/flavor? Group them accordingly. """ variants = claude_sonnet(prompt)
Why This Combo Was a Killer
1. UPCItemDB Did the Heavy Lifting
Instead of having Claude imagine what “858514006595” might be, I got the actual product name: “Humble Brands Natural Deodorant Moroccan Rose 2.5oz”
No hallucination. No guessing. Just facts.
2. Ollama Was Perfect for the Boring Stuff
Running locally on my laptop, Llama 3.2 extracted brands and sizes:
- “Humble Brands Natural Deodorant Moroccan Rose 2.5oz” → Brand: “Humble Brands”, Size: “2.5oz”
- Cost: ₹0 (running locally)
- Speed: 2 seconds per UPC
- Accuracy: 95% (good enough for this step)
When it made mistakes, they were small — like “Humble Brands” vs “Humble-Brands”. Easy to fix with simple normalization functions.
3. Python Handled the Obvious
Normalizing sizes? That’s just string manipulation:
def normalize_size(size_str): # "12 Fl Oz" → "12oz" # "1 L" → "1L" # "500ml" → "500ml" match = re.match(r"([\d.,]+)\s*([mM][lL]|[lL]|oz)", size_str) if match: number, unit = match.groups() return f"{number}{unit.lower()}"
No AI needed. No cost. No possibility of hallucination.
4. Claude Only Did What It’s Actually Good At
By the time Claude saw the data, it looked like this:
Brand: Humble BrandsProducts:1. Humble Brands Natural Deodorant Moroccan Rose 2.5oz2. Humble Brands Natural Deodorant Moroccan Rose 3.75oz3. Humble Brands Natural Deodorant Moroccan Rose 7.1oz4. Humble Brands Natural Deodorant Lavender 2.5oz# Note: Products mentioned here do to represent the # actual sized variants of the brand, for understanding purposes only
Question: Which are variants of the same product?
This is where Claude shines. It needs to understand:
- Moroccon Rose vs Lavender = different variants
- Different sizes of Moroccon Rose = same product
- Lavender = different product entirely
This requires real reasoning. And Claude nailed it nearly every time.
The Results
Press enter or click to view image in full size
The multi-agent approach wasn’t just cheaper — it was better.
Why This Pattern Works
1. Use the Right Tool for Each Job
- API calls for data retrieval (that’s what APIs are for)
- Local models for simple extraction (it’s just pattern matching)
- Python for deterministic logic (why use AI for string manipulation?)
- Smart models for actual reasoning (the hard stuff)
2. Errors Are Isolated
When something went wrong, I knew exactly where:
- Wrong product name? → API issue
- Wrong brand extraction? → Ollama prompt needs tweaking
- Wrong grouping? → Claude prompt needs improvement
With the monolithic approach, everything was tangled. One error, one inscrutable output.
3. Cost Follows Complexity
I only pay for Claude when the task actually needed intelligence.
Processing 500 UPCs meant:
- 500 free API calls (just HTTP requests)
- 500 free Ollama extractions (running locally)
- 50 Claude calls (only one per brand group, not per product)
That’s why costs dropped 92%. This taught me a general pattern that works everywhere:
Press enter or click to view image in full size
Examples I’ve seen with other projects since:
Customer support tickets:
- GPT-4o-mini extracts category and urgency → Claude Sonnet drafts response
Legal document review:
- Llama finds relevant clauses → GPT-4 analyzes implications
Code review:
- Local model identifies changed functions → Claude Opus reviews logic
Research synthesis:
- Fast model gathers sources → Smart model synthesizes insights
The pattern is universal: use cheap intelligence to set up the problem, then use expensive intelligence where it actually matters.
The Mental Shift
The hardest part wasn’t the code — it was changing how I thought about the problem.
Old thinking: “I need a model smart enough to solve this entire problem.”
New thinking: “What’s the minimum intelligence needed for each step?”
It’s like cooking. You don’t need a Michelin-star chef to chop vegetables. You need them for the sauce. Use prep cooks for prep work.
What I’d Tell Someone Starting Today
If you’re building anything that processes data at scale:
- Break down your task. What are the actual steps?
- Ask: Which steps need real intelligence vs. simple pattern matching?
- Use the cheapest tool that works for each step
- Save your smart model for the part that actually requires reasoning
- Test each step independently. Makes debugging 10x easier.
I tried to be smart by using the smartest model. Don’t fall into the trap I did — throwing your most expensive tool at every problem because it’s “the best.” I succeeded by being strategic about when to use what.
Sometimes the best solution is knowing when not to use the best model.