From 75% to 99.6%: The Math of LLM Ensembles
shibaprasadb.com·22h·
Discuss: Hacker News
🧮Kolmogorov Bounds
Preview
Report Post

January 20, 2026

technical

The last project I worked on involved a lot of LLM API calls. One subtask seemed simple: count elements from a specific list. Straightforward, right? Not quite.

This needed production-level accuracy. But the simple API approach wasn’t cutting it. After testing 50 cases, I was only hitting around ~75% accuracy (37 out of 50). For production, that’s a non-starter.

The Problem with Single API Calls

The LLM was doing the task correctly for some instances but missing elements in others. Sometimes it would catch all 10 items, other times only 7 or 8. The pattern was clear: when it failed, it undercounted. It never hallucinated extra elements or went above the true count. It just missed things.

This …

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help