🏆 Model Benchmarks - cortesalexander8 · Scour

Show HN: Pre-training, fine-tuning, and evals platform 🔓Open Source AI

oumi.ai·5d·Hacker News

Show HN: Benchmark multiple LLMs to compare quality, speed, and cost 🧠LLMs

loopthink.ai·4h·Hacker News

Are LLM-Based Retrievers Worth Their Cost? An Empirical Study of Efficiency, Robustness, and Reasoning Overhead 🧠LLMs

arxiv.org·1d

Sanity check experiments 💬Prompt Engineering

sebiwette.de·13h

Better Harness: A Recipe for Harness Hill-Climbing with Evals ✨Vibe Coding

blog.langchain.com·3h

Scraping and analyzing submissions to Terminal Bench 2.0 ✨Vibe Coding

primeradiant.com·2d

I benchmarked my own product, published everything, and 0.2.0 is basically the list of things I had to fix. 🔓Open Source AI

blog.routerly.ai·22h·r/SideProject

Thoughts on causal isolation of AI evaluation benchmarks ⚠️AI Safety

lesswrong.com·6d

reviseio/errata-bench: A proofreading benchmark for LLMs 🧠LLMs

github.com·1d·Hacker News

I benchmarked GPT-4o, Claude 3.5, and Gemini 1.5 for security 💬Prompt Engineering

aibench.trypromptguard.com·21h·DEV

HappyHorse-1.0 hits #1 on Artificial Analysis video leaderboard 🤖AI News

artificialanalysis.ai·3h·Hacker News

The Model That Passed Every Benchmark 🔓Open Source AI

medium.com·2d

Gemma 4 and what makes an open model succeed 🔓Open Source AI

interconnects.ai·5d

Our AI Hallucinated in Production: How We Fixed It With Evals — Yicheng Guo at AI Engineer Melbourne 2026 ⚠️AI Safety

webdirections.org·22h

Ansible CIS Benchmark: A Fling or a Serious Date? 💬Prompt Engineering

medium.com

·5h

Gemma 4 E4B vs. Gemma Family: Enterprise Benchmark Across 8 Task Suites ✨Vibe Coding

aiexplorer-blog.vercel.app·1d·Hacker News

The Insert Benchmark vs MariaDB 10.2 to 13.0 on a 24-core server 📊Data Analysis

smalldatum.blogspot.com·20h·smalldatum.blogspot.com

Inference Arena – new benchmark of local inference and training 🧠LLMs

kvark.github.io·3d·Hacker News

Show HN: ErrataBench - A Proofreading Benchmark for LLMs 🧠LLMs

revise.io·1d·Hacker News

These gaming phones got busted for cheating, and here’s what the brand says in defence 🟢OpenAI

androidauthority.com·9h

Loading more...