The mirage of visual understanding in current frontier models (opens in new tab)
When a model achieves a “top rank on a standard chest X-ray question-answering benchmark without access to any images” you know something is deeply wrong.
Read the original article