MirageOS, dune, functional
Scaling behavior of large language models in emotional safety classification across sizes and tasks
arxiv.org·2d
On the Same Wavelength? Evaluating Pragmatic Reasoning in Language Models across Broad Concepts
arxiv.org·1d
From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs
arxiv.org·1d
Evaluating NL2SQL via SQL2NL
arxiv.org·2d
Loading...Loading more...