benchmarking LLMs, evals, MMLU, model assessment, capability testing
No more posts from buckman's subscribed feeds.
Press ? anytime to show this help