Anything Will Work (In AI) (opens in new tab)  🧠LLM Inference

This post is about synthetic data and why it wont scale like we think. This field is the absolute frontier of science, it is about the process of learning itself, and consequently it is mired in epistemics and traps. The problem is that the search space of solutions is infinite, and importantly, there is also an infinite number of deceptively good ideas. We need to develop strong tastes and high level beliefs to tell us when our eyes (empirics) are wrong. I think the old statistical theorems are actually really good and explain deep learning very well contrary to what many people will tell you. # The "magic" of deep learning ![[inductive_bias_invert.png]] *Inverted image from ["Deep Learning is Not So Mysterious or Different"](https://arxiv.org/abs/2503.02113v2), which you should read* The paper "[Deep Learning is Not So Mysterious or Different](https://arxiv.org/abs/2503.02113v2)" explains why the classical statistical theorems still apply to deep learning, and how things like double descent or grokking aren’t "new phenomena". I believe very strongly in the predictive power of the theory of inductive biases, a topic covered in the paper. The data processing inequality is also a good clsasical theory, and can be understood through the lens of inductive biases to make sense of synthetic data generation. An inductive bias is essentially a prior you place on the space to nudge your model to a solution. So for example if you’re predicting demand and you know you demand is cyclical then a well placed sin function in your model will nudge the model to a much better solution than a ton of ReLUs. It’s not guaranteed but the general idea is that when you build a model, you are implicitly encoding your priors about the structure of the data. If your priors are bad, your model will be bad. ## Machine Epistemology ![[sin.excalidraw.png]] However, unlike in the narrow domain of classical ML, LLMs are for pursing *general* intelligence. Our "in domain" is literally all knowledge ever made. This means that any inductive biases we place is essentially equivalent to an epistemic claim about intelligence more broadly and the nature of complexity and understanding in our universe. That’s a really big responsibility! But it’s a responsibility that has been met by all the *time tested* advancements in LLM technology. - Training a high depth transformer is equivalent to saying that all the complexity in our universe is *emergent* and thus a product of composing smaller, lower-order relationships and laws into larger ones (i.e.

Loading more...

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help