White-Box VS Black-Box distillation
White-box distillation
(classic — but less common for frontier teacher LLMs)
We control the teacher LLM weights and can access intermediate states and logits. This enables the original “soft targets” style distillation — and optionally layer or attention matching.
This is common when the teacher is an open LLM and we distill one open model into another.
Black-box distillation
(modern production — more common in today’s times)
Here — our teacher is a proprietary LLM endpoint — and we only receive text responses. Distillation here becomes a data generation pipeline. We create set of prompts -> make the teacher generate outputs -> filter and validate -> then train the student via supervised fine-tuning on the outputs generate…
White-Box VS Black-Box distillation
White-box distillation
(classic — but less common for frontier teacher LLMs)
We control the teacher LLM weights and can access intermediate states and logits. This enables the original “soft targets” style distillation — and optionally layer or attention matching.
This is common when the teacher is an open LLM and we distill one open model into another.
Black-box distillation
(modern production — more common in today’s times)
Here — our teacher is a proprietary LLM endpoint — and we only receive text responses. Distillation here becomes a data generation pipeline. We create set of prompts -> make the teacher generate outputs -> filter and validate -> then train the student via supervised fine-tuning on the outputs generated by the teacher.
Conceptually — this is close to how instruction-following LLMs were trained using synthetic instruction-output data generated by stronger LLMs.
Why distillation will matter more going forward in Agentic era
1) Cost and latency will become architecture constraints — not just procurement problems: For most Agentic systems — the unit economics of “frontier LLM on every request” would break at scale. Distillation would let us reserve the teacher for what only the teacher can do. We’ll serve the long tail on a much cheaper student LLM.
2) Enables us to package a “house style” of intelligence: Agents need to have a consistent tone, safety posture, output format and decision policies. Distillation is one of the most direct ways to replicate this preferred behavior into a smaller LLM.
3) It is the practical path to on-device and edge LLM experiences: Distillation goes a long way in compressing Transformer LMs into smaller versions that keep most of the teacher models capability — e.g. DistilBERT, TinyBERT, and MiniLM. They established the playbook that has the potential to be now reused with instruction-following and Agent workload needs.
4) It is a stronger alternative to “prompting your way out”: Prompting is a great tool — no doubt. But it is brittle. Distillation shifts the behavior from “prompt space” into “model weights” — which is more stable once you have clear contracts and eval gates.
How LLM distillation is performed
We need to treat distillation as a pipeline — not a single loss function.
Step 1: Define the scope and the “behavior contract”
We pick the behaviors we want the student to own. e.g.:
- Controlled tone and length summarization.
- Structured JSON extraction following a strict schema
- Q&A over internal policies — with refusal boundaries
- Customer support answer style with citations
We prepare the output contract. If we can’t specify what “good” looks like — distillation will mostly teach the student to mimic teacher’s style without any reliability.
Step 2: Build the prompt distribution
Student LLM learns whatever distribution we feed it. The prompt set should reflect the real scenarios:
- Real world user prompts
- Synthetic prompts to cover edge cases
- Adversarial prompts to test boundaries
- “Negative” prompts that trigger safe behavior e.g. refusals
Step 3: Generate teacher LLM’s outputs
For each prompt — we generate:
- 1 best answer (greedy one at low temperature)
- Optionally — for later filtering and preference selection — we generate multiple candidates
- Rubric scores, tool trace and self-checks — These are optional structured metadata
With most teacher LLMs being expensive — this is where we spend most of our distillation cost.
Step 4: Filter and validate mechanically
Filtering is where success or failure of LLM distillation gets decided.
Key filters:
- Contract validity: correct schema, correct tags, parseable JSON, no redundant prose
- Safety compliance: policy-violating outputs are removed
- Self-consistency checks: verification (and regeneration) with a secondary judge
- Ground-truth checks: for tasks with right answers — wherever possible
This is how we avoid a failure mode which is fairly common — student LLM that learn “format tokens” more than the task itself.
Step 5: Choosing the training objective that matches our access
In white-box: KL on softened logits is the classic option. We may also distill intermediate representations or attention — which has been effective in Transformer compression e.g. MiniLM.
In black-box: Train student LLM with supervised fine-tuning using teacher outputs (prompt <-> teacher response pairs).
Sequence-level distillation for generation: train student LLM to imitate teacher sequences -improving decoding behavior and reducing the need for search which is usually expensive.
Step 6: Evaluate the system
We don’t evaluate only on a small — curated set.
Least metrics we want:
- Task success metrics e.g. rubric score, exact match, human preference
- Contract pass rate e.g. parsability and schema validity
- Robustness to prompt variations and paraphrases
- Safety tests for refusal behavior
- Latency and cost evals on your target stack
If we do black-box distillation — we as well need to measure when the student LLM should be deferring to the teacher LLM.
Distillation in production is a router system — not a binary replacement.
Reasoning distillation:
(transferring the “hows” — not just “whats”)
For LLMs — a valuable capability is reasoning under constraints: decomposition, intermediate checks and avoiding hallucinated steps. Reasoning distillation treats the teacher’s reasoning traces as an additional supervision.
There are two common ways to handle this:
1) Answer-only distillation (fast and stable)
We train the student LLM to produce only the final generation in the contract format.
This is cheaper — and it avoids teaching the student LLM to generate a long reasoning chain that’s not required in prod.
2) Rationale/explanation distillation (high signal but higher risk)
We train the student LLM on teacher-generated final answers + step-by-step reasoning traces. This significantly improves student LLM’s performance on tasks requiring multi-step reasoning — but it is sensitive to trace quality and teaches “bad habits” if the teacher LLM trace is inconsistent.
Distilling Step-by-Step is a widely cited reference — showing that using rationales (step-by-step) as training signal — let smaller LLM outperform what we might expect from answer-only training.
The key takeaway is that reasoning distillation is not a “turn it on feature”.
It is an altogether — separate data pipeline which needs stronger quality controls and is more involved.