SKATE, a Scalable Tournament Eval: Weaker LLMs differentiate between stronger ones using verifiable challenges
arxiv.orgยท12h
Predicted impact of LLM use on developer ecosystems
shape-of-code.comยท18h
Autarkie - Instant grammar fuzzing using Rust macros (WHY2025)
cdn.media.ccc.deยท4h
Measuring intelligence and reverse-engineering goals
lesswrong.comยท14h
Loading...Loading more...