📊 LLM Evaluation - ibrahimsharaf · Scour

How to run evals for the model router 🚀LLM Deployment

devblogs.microsoft.com·1d

Context pruning: cut LLM tokens without losing quality (9 minute read) 🎯LLM Finetuning

Less-relevant results

https://research.perplexity.ai/articles/query-aware-context-compression-for-better-snippets 🔍RAG

research.perplexity.ai·12h

Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA 🧠LLMs

Agentic evals or LLM as a judge? considering cost, time and quality 🎯LLM Finetuning

news.ycombinator.com·5d·Hacker News

3DAeroRelief: The first 3D Benchmark UAV Dataset for Post-Disaster Assessment 🛡️AI Safety

sapientinc/HRM-Text: HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning. 🚀LLM Deployment

github.com·1d·r/singularity

Researchers train AI model that hits near-full performance with just 12.5 percent of its experts 🧠LLMs

the-decoder.com

·4d

NLA Verbalizations on AuditBench: Llama 70B 🧠LLMs

lesswrong.com·5d

tokenspeed — feel LLM tokens-per-second 🎯LLM Finetuning

mikeveerman.github.io·2h

Discover the Red Hat OpenShift AI model catalog 🚀LLM Deployment

Eval engineering: The missing piece of agentic AI governance 🤖Agentic AI

siliconangle.com·3d

Beyond the Runbook: How to Scale SRE Operations for Cloud-Native Infrastructure 🤖Agentic AI

cloudnativenow.com·2d

Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version) 🚀LLM Deployment

huggingface.co·5d·r/LocalLLaMA

Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models 🧠LLMs

AI researchers flag bias risks in LLM judging 🧠LLMs

kite.kagi.com·5d

AI researchers push reliability tests for agent systems 🤖AI Agents

kite.kagi.com·4d

May 20, 2026 (#4672) 🤖AI Agents

alvinashcraft.com·18h

Build custom code-based evaluators in Amazon Bedrock AgentCore 🤖AI Agents

aws.amazon.com·2d

LLM-as-a-Judge: How to Become a Preferred Content Source for AI Answers 🏢LLM Adoption

Log in to enable infinite scrolling