AI-generated accessibility, an update — frontier models still fail, but skills change the game (opens in new tab)
A few months ago I shared early results from the A11y LLM Eval project, a benchmark that measures how accessibly LLMs generate UI code. The showed that LLMs default to inaccessible code, explicit accessibility instructions can dramatically change that, and manual testing is still essential. The latest report is out, with new models, a redesigned test scope, and a brand new mechanic: skills. Two things stand out: The newest frontier models (GPT‑5.5, Claude Opus 4.7, Gemini 3.1 Pro Preview, Cla...
Read the original article