🎄 Advent of Small ML: Day 7 🎄 Topic: Entropy-Based Rewards (Forcing the model to "keep its options open")

🎄 Advent of Small ML: Day 2 : Teaching a VLM to reason about charts with Unsupervised GRPO🎄

a big use case for VLMs is parsing chart data for Q&A. CharXiv from @zwcolin is a great recent benchmark for this, but I had a question: Can we do this in an unsupervised way?

If we don’t need labeled Q/A pairs for every chart, we can leverage data much more cheaply.

The inspiration came from CycleGAN and the idea of using a numerical loss as a proxy for how "good" the text generated by the VLM actually is. (Big inspo here is @rosmine_b’s SVG work - go check it out).

The Experiment: I set up a loop to treat the VLM like an autoencoder:

1. Take a chart image.

2. Prompt the VLM to describe it.

3. Feed that description into an image generator (Flux Schnell).

4. Measure the cos…

Similar Posts