Idris, Agda, Proof Assistants, Type-Level Programming
Status Week 39
blogs.gnome.orgยท21h
Evaluating Double Descent in Machine Learning: Insights from Tree-Based Models Applied to a Genomic Prediction Task
arxiv.orgยท11h
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
arxiv.orgยท1d
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
arxiv.orgยท11h
Robust Preference Optimization: Aligning Language Models with Noisy Preference Feedback
arxiv.orgยท1d
Agentic Exploration of Physics Models
arxiv.orgยท1d
RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
arxiv.orgยท11h
Loading...Loading more...