📊 LLM Evaluation - gilesr · Scour

PELLI: Framework to effectively integrate LLMs for quality software generation

arxiv.org·1d

When LLMs get significantly worse: A statistical approach to detect model degradations

arxiv.org·1d

Generative LLMs as Automatic Proofreaders of Radiology Reports - Radiological Society of North America

rsna.org·2d

SWE-rebench Jan 2026: GLM-5, MiniMax M2.5, Qwen3-Coder-Next, Opus 4.6, Codex Performance

swe-rebench.com·3h·

Discuss: r/LocalLLaMA

How To Utilize LMS Data: Use Cases For Enhancing L&D Insights

elearningindustry.com·1d

My Skill Makes Claude Code GREAT At TDD

aihero.dev·5h

Analysis of systems with dependent components through a variance-based index and regression importance signature

sciencedirect.com·1d

Quality Assurance in AI Assisted Software Development: Risks and Implications

dev.to·19h·

Discuss: DEV

The AI hater’s guide to code with LLMs. This is an interesti...

kottke.org·52m

The case for industrial evals

lesswrong.com·19h

The Problem With LLMs

deobald.ca·2d·

Discuss: Lobsters, Hacker News

Intelligence analysis platform for AI Agents (~OpenClaw)

blog.lukaszolejnik.com·6h

AI dev tool power rankings & comparison [Feb. 2026]

blog.logrocket.com·4h

feat: implement LLM decision engine (Task 10) by meleantonio · Pull Request #36

github.com·1d

Your AI sounds confident. But is it right?

truthlayer.netlify.app·1d·

Discuss: Hacker News

Scaling LLM Post-Training at Netflix

netflixtechblog.com·13h

Quality and understandability after AI

federicopereiro.com·1d·

Discuss: Hacker News

Design Decision: Technical Debt in BillaBear

iain.rocks·9h·

Discuss: Hacker News, r/programming

LangChain Agent Testing Guide Tool (Free)

news.ycombinator.com·5h·

Discuss: Hacker News

I used a local LLM to analyze my journal entries

ankursethi.com·5h·

Discuss: Lobsters

✍longform travel writing

Loading more...