- 17 Dec, 2025 *
Example captions from our pipeline
AION-Search
We built a semantic search engine for galaxy images by having LLMs write the captions.
This is a method I’d call AI-in-the-loop: having a language model analyze all outputs from a pipeline. Unlike CNNs, these models bring literature knowledge to the table. In principle, this should offer an advantage for spotting scientifically interesting phenomena.
To explore this new capability, we had LLMs write captions for nearly 300k galaxies, then used those to train a CLIP-style model using the AION-1 foundation model as our image encoder. This enables search across all 140M…
- 17 Dec, 2025 *
Example captions from our pipeline
AION-Search
We built a semantic search engine for galaxy images by having LLMs write the captions.
This is a method I’d call AI-in-the-loop: having a language model analyze all outputs from a pipeline. Unlike CNNs, these models bring literature knowledge to the table. In principle, this should offer an advantage for spotting scientifically interesting phenomena.
To explore this new capability, we had LLMs write captions for nearly 300k galaxies, then used those to train a CLIP-style model using the AION-1 foundation model as our image encoder. This enables search across all 140M galaxies in the Legacy Survey and HSC datasets on Multimodal Universe. You can try the search app here (currently a 19M subset).
Evaluating caption quality is tricky due to its open-ended nature. To get a feel, I’d recommend reading the captions in the figure above and trying the search engine yourself. For systematic evaluation, we had LLMs caption a subset of galaxies classified by Galaxy Zoo volunteers as a benchmark. Since volunteers fill out a decision tree, we had an LLM-judge fill out the same tree using the caption, checking whether relevant information was captured. We found that GPT-4.1 captions matched volunteer answers 50.8% of the time on average, with GPT-4.1-mini at 50.1%, which we used to generate the training set ($53 per 100k captions). After training, we found that using text-queries with AION-Search outperforms the original AION-1’s similarity search at finding spirals, mergers, and lenses. nDCG@10 of 0.94 v. 0.64, 0.55 v. 0.38, 0.18 v. 0.02, respectively.
Another demonstration of AI-in-the-loop is our re-ranking method. We can directly improve the results of the search by having an LLM scan the top-1000 results and score each one ("Does this image fit the query? 1-10"), then re-order by score. This can double the number of gravitational lenses in the top-100, and works better with larger models and by sampling the LLM more for each image. So if a search is important, you can spend more on compute to get more discoveries. Due to cost, this feature is not currently available in the app.
Best practices
- Think like an LLM. Before searching, ask yourself: how would an LLM describe the image you’re looking for?
- For example, searching "spiral galaxy" brings up small blue-ish galaxies, these were described as spiral galaxies only as a best guess when spiral arms aren’t clearly visible. For classic grand-design spirals, search "spiral arms" instead, which only appears in descriptions when arms are actually resolved.
- Checkout the examples at the top of the page to help with this intuition.
- Wording matters. Small changes can shift results significantly:
- "red galaxies" vs. "red galaxy". The first brings up images with multiple red galaxies, while the second brings up single red galaxies.
- "stream" vs. "galaxy with stream". the first returns faint features filling the image without a centered galaxy, while the second returns galaxies centered in the image with tidal features around them.
- Try many phrasings of your query: "galaxy with stream", "galaxy with tidal stream", "galaxy with tidal features like streams".
- Try image search. When you find a good image, click the "Search for similar" button.
- This provides text + image search to find more like it. Behind the scenes, this finds the closest images to the vector sum of the text embedding and image embedding. You can also adjust the weight of how each contributes to the search.
- Here’s a "galaxy with shells from tidal debris" with an example image.
- Tip: adding a negative -1 "cluster" vector to filter out crowded images.
- Filter by brightness. Sources fainter than r ~ 19 mag weren’t in the training set, so use the brightness slider to exclude them.
Enjoy, and please share interesting things you find!
Example searches mentioned in best practices
Show BibTeX
@misc{koblischke2025semantic,
title={Semantic search for 100M+ galaxy images using AI-generated captions},
author={Nolan Koblischke and Liam Parker and Francois Lanusse and Irina Espejo Morales and Jo Bovy and Shirley Ho},
year={2025},
eprint={2512.11982},
archivePrefix={arXiv},
primaryClass={astro-ph.IM},
url={https://arxiv.org/abs/2512.11982},
}