Back to article

cameronrwolfe.substack.com

Agent Evaluation: A Detailed Guide (opens in new tab)

Covers 7 stories including MCP is an open protocol that standardizes how apps provide context to LLMsCovered by tldr.techDiscussed on Substack

Covers 7 related stories

modelcontextprotocol.io·

MCP is an open protocol that standardizes how apps provide context to LLMs

Discussed on Hacker News, Hacker News, r/programming, and DEV

trychroma.com·

Context Rot: How Increasing Input Tokens Impacts LLM Performance

Discussed on Hacker News and DEV

[2310.06770] SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

anthropic.com·

Writing effective tools for LLM agents–using LLM agents

Discussed on Hacker News

LiteLLM

AgentBench: Evaluating LLMs as Agents

gorilla.cs.berkeley.edu·

GLM 4.6 IS A FUKING AMAZING MODEL AND NOBODY CAN TELL ME OTHERWISE

Discussed on r/LocalLLaMA

Covered in 1 article

Qwen 3.7 🤖, Cursor Composer 2.5 👨‍💻, Anthropic acquires Stainless 🛠️