๐ŸŽ„ Advent of Small ML: Day 7 ๐ŸŽ„ Topic: Entropy-Based Rewards (Forcing the model to "keep its options open")
threadreaderapp.comยท1d
๐Ÿง LLM Inference
Preview
Report Post

๐ŸŽ„ Advent of Small ML: Day 2 : Teaching a VLM to reason about charts with Unsupervised GRPO๐ŸŽ„

a big use case for VLMs is parsing chart data for Q&A. CharXiv from @zwcolin is a great recent benchmark for this, but I had a question: Can we do this in an unsupervised way?

If we donโ€™t need labeled Q/A pairs for every chart, we can leverage data much more cheaply.

The inspiration came from CycleGAN and the idea of using a numerical loss as a proxy for how "good" the text generated by the VLM actually is. (Big inspo here is @rosmine_bโ€™s SVG work - go check it out).

The Experiment: I set up a loop to treat the VLM like an autoencoder:

1. Take a chart image.

2. Prompt the VLM to describe it.

3. Feed that description into an image generator (Flux Schnell).

4. Measure the cosโ€ฆ

Similar Posts

Loading similar posts...