📊 LLM Evaluation - gilesr · Scour

PELLI: Framework to effectively integrate LLMs for quality software generation

arxiv.org·2d

Analysis of systems with dependent components through a variance-based index and regression importance signature

sciencedirect.com·2d

AI usage in popular open source projects

tirkarthi.github.io·11h·

Discuss: Hacker News, r/programming

Karpathy's Micro LLM in JavaScript

github.com·2d·

Discuss: Hacker News

The Evolving Role of the ML Engineer

towardsdatascience.com·1d

Reflections on making a video game whose core mechanic is talking to LLMs

alanmunirji.dev·18h·

Discuss: Hacker News

Agentic Engineering: What Actually Works After Hundreds of Sessions

muhammadhammadkhan.substack.com·21h·

Discuss: Substack

Building an ARC-2 Solver — From Socratic Panels to a Single Oracle

pub.towardsai.net

·1d

Intelligence analysis platform for AI Agents (~OpenClaw)

blog.lukaszolejnik.com·1d

Quality and understandability after AI

federicopereiro.com·2d·

Discuss: Hacker News

Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems

arxiv.org·1d

The AI hater’s guide to code with LLMs. This is an interesti...

kottke.org·23h

The Problem With LLMs

deobald.ca·3d·

Discuss: Lobsters, Hacker News

Generative LLMs as Automatic Proofreaders of Radiology Reports - Radiological Society of North America

rsna.org·2d

LangChain Agent Testing Guide Tool (Free)

news.ycombinator.com·1d·

Discuss: Hacker News

I used a local LLM to analyze my journal entries

ankursethi.com·1d·

Discuss: Lobsters

✍longform travel writing

Painless Activation Steering (PAS): Automated, Lightweight Post‑Training for LLM Behavior

sashacui.substack.com·13h·

Discuss: Substack

🤖AI Agents Weekly: GPT-5.3-Codex-Spark, GLM-5, MiniMax M2.5, Recursive Language Models, Harness Engineering, Agentica, and More

nlp.elvissaravia.com

·4h

LLM Performance in Astro, React, Tailwind and Cloudflare

10xbench.ai·3d·

Discuss: Hacker News

The case for industrial evals

lesswrong.com·1d

Sign up or log in to see more results