Structural Assessment for Understanding and Guiding Dataset Distillation in Discrete Token Space (opens in new tab)
Dataset distillation (DD) has proven to reduce training cost while preserving accuracy. While promising, the factors that make one distilled dataset more effective than another remain poorly understood. In this work, we investigate this question through the lens of discrete visual tokenizers. Whereas many prior DD efforts emphasize matching global data distributions, we suggest that the effectiveness depends on which semantic concepts are captur...
Read the original article