9 min read12 hours ago
–
TL;DR: AI models are usually too large to be sent to a user’s device, but for some tasks they can be made surprisingly small.
On the platform where I write this (medium.com) I’m forever typing snippets of code only to have the system guess the wrong language (and use the wrong syntax highlighting). The same is true on other platforms, and I’ve always been puzzled by this, I mean, how hard can it be?
A frontier LLM can detect the language easily, but what about smaller models? CodeBERTa-language-id is built for this exact task, and is 99.9% accurate, but it’s also 330 MB. So from the application-developer’s perspective it’s destined to live forever on a server.
At the other end of the scale,…
9 min read12 hours ago
–
TL;DR: AI models are usually too large to be sent to a user’s device, but for some tasks they can be made surprisingly small.
On the platform where I write this (medium.com) I’m forever typing snippets of code only to have the system guess the wrong language (and use the wrong syntax highlighting). The same is true on other platforms, and I’ve always been puzzled by this, I mean, how hard can it be?
A frontier LLM can detect the language easily, but what about smaller models? CodeBERTa-language-id is built for this exact task, and is 99.9% accurate, but it’s also 330 MB. So from the application-developer’s perspective it’s destined to live forever on a server.
At the other end of the scale, Highlight.js runs in the browser and *can *detect programming languages. But detection is not the main focus and accuracy is quite low.
Do we really have to choose between good-but-huge, and small-but-bad? Or can we capture some of the intelligence found in LLMs in a very small model that can be shipped over the network?
In this article I’ll explore the general idea of shrinking intelligence down into a tiny model, using language detection as an example.
Let’s take a look at some results first…
Results
Easy mode
In ‘easy mode’ I’m using the CodeSearchNet dataset which contains code in six programming languages. The…