✅ evals - zhuangda · Scour

LLM-as-a-Judge: How to Become a Preferred Content Source for AI Answers ✍️Prompt Engineering

SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs ✍️Prompt Engineering

Command A+: Making sovereign agentic capabilities available to all ✍️Prompt Engineering

cohere.com·11h·Hacker News

Multimodal evaluators: MLLM-as-a-judge for image-to-text tasks in Strands Evals ✍️Prompt Engineering

aws.amazon.com·10h

How to run evals for the model router ✍️Prompt Engineering

devblogs.microsoft.com·1d

AI researchers flag bias risks in LLM judging ✍️Prompt Engineering

kite.kagi.com·5d

Sutro ✍️Prompt Engineering

OpenMOSS/MOSS-Audio: MOSS-Audio is an open-source foundation model for unified audio understanding, enabling speech, sound, music, captioning, QA, and reasoning in real-world scenarios. ✍️Prompt Engineering

tokenspeed — feel LLM tokens-per-second ✍️Prompt Engineering

mikeveerman.github.io·1h

Document-tuning instills durable animal compassion in LLMs (and generalizes to humans) ✍️Prompt Engineering

lesswrong.com·48m

https://research.perplexity.ai/articles/query-aware-context-compression-for-better-snippets ✍️Prompt Engineering

research.perplexity.ai·10h

Market Trend Analysis: The Impact of Recent Advances on the Large Language Model Evaluation As A Service Market ✍️Prompt Engineering

Beyond the Runbook: How to Scale SRE Operations for Cloud-Native Infrastructure ✍️Prompt Engineering

cloudnativenow.com·2d

3DAeroRelief: The first 3D Benchmark UAV Dataset for Post-Disaster Assessment ✍️Prompt Engineering

Why every AI tooling decision needs a measurement ✍️Prompt Engineering

noesisvision.substack.com·6d

Mastering Agentic Techniques: AI Agent Evaluation ✍️Prompt Engineering

developer.nvidia.com·1d

Training a 22MB prompt injection classifier ✍️Prompt Engineering

stackone.com·13h·Hacker News

May 20, 2026 (#4672) ✍️Prompt Engineering

alvinashcraft.com·17h

What Happens When AI Learns From Incorrect Labels: The Hidden Cost of Noisy Training Data 🤖AI

sitepoint.com·5d

Qwen 3.7 Preview ✍️Prompt Engineering

news.ycombinator.com·2d·Hacker News

Log in to enable infinite scrolling