AI Agent Benchmark Compendium
⚡Performance Mythology
Flag this post
Classic Demo Effects, Fire
📟Terminal Physics
Flag this post
Cracking the CRISPR code to find the 'passwords' that unlock its full potential
phys.org·2d
🧬Palindrome Codes
Flag this post
HiCoTraj:Zero-Shot Demographic Reasoning via Hierarchical Chain-of-Thought Prompting from Trajectory
arxiv.org·2d
🌍Cultural Algorithms
Flag this post
Scheming Ability in LLM-to-LLM Strategic Interactions
arxiv.org·1d
🔲Cellular Automata
Flag this post
Benchmarking Deep Learning Models for Laryngeal Cancer Staging Using the LaryngealCT Dataset
arxiv.org·3d
🎵Audio ML
Flag this post
Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification?
arxiv.org·2d
✅Formal Methods
Flag this post
Alif: Advancing Urdu Large Language Models via Multilingual Synthetic Data Distillation
arxiv.org·4d
🎙️Whisper
Flag this post
Automated Retrospective Analysis & Predictive Maintenance of Polymer Degradation Using Multi-Modal Deep Learning
🔍Vector Forensics
Flag this post
Protein as a Second Language for LLMs
arxiv.org·3d
🔢Denotational Semantics
Flag this post
Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations
arxiv.org·3d
📊Learned Metrics
Flag this post
Generative Latent Video Compression
arxiv.org·3d
🧠Learned Codecs
Flag this post
ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training
arxiv.org·3d
📐Projective Geometry
Flag this post
Skill-Targeted Adaptive Training
arxiv.org·3d
📊Learned Metrics
Flag this post
Reasoning Pattern Matters: Learning to Reason without Human Rationales
arxiv.org·2d
🧮Theorem Proving
Flag this post
Tech With Tim: Why 1M People Tried This AI Coding Tool (Full Vibe Coding Tutorial)
🌀Brotli Internals
Flag this post
Ensembling Large Language Models to Characterize Affective Dynamics in Student-AI Tutor Dialogues
arxiv.org·2h
🤖Grammar Induction
Flag this post
MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest
arxiv.org·3d
🔍BitFunnel
Flag this post
Loading...Loading more...