Skip to main content
Scour
Discover
Docs
Login
Sign Up
Discover
About
Docs
Changelog
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Back to article
cameronrwolfe.substack.com
4w
4 weeks ago
Agent Evaluation: A Detailed Guide
(opens in new tab)
Covers
7 stories
See all stories this covers
including
MCP is an open protocol that standardizes how apps provide context to LLMs
Covered by
tldr.tech
Discussed on
Substack
Love
Like
Not for me
Save
|
|
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Covers 7 related stories
modelcontextprotocol.io
·
44w
44 weeks ago
MCP is an open protocol that standardizes how apps provide context to LLMs
Discussed on
Hacker News
,
Hacker News
,
r/programming
, and
DEV
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for MCP is an open protocol that standardizes how apps provide context to LLMs
trychroma.com
·
12w
12 weeks ago
Context Rot: How Increasing Input Tokens Impacts LLM Performance
Discussed on
Hacker News
and
DEV
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Context Rot: How Increasing Input Tokens Impacts LLM Performance
arxiv.org
·
24w
24 weeks ago
[2310.06770] SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for [2310.06770] SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
anthropic.com
·
40w
40 weeks ago
Writing effective tools for LLM agents–using LLM agents
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Writing effective tools for LLM agents–using LLM agents
litellm.ai
·
28w
28 weeks ago
LiteLLM
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for LiteLLM
arxiv.org
·
36w
36 weeks ago
AgentBench: Evaluating LLMs as Agents
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for AgentBench: Evaluating LLMs as Agents
gorilla.cs.berkeley.edu
·
37w
37 weeks ago
GLM 4.6 IS A FUKING AMAZING MODEL AND NOBODY CAN TELL ME OTHERWISE
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for GLM 4.6 IS A FUKING AMAZING MODEL AND NOBODY CAN TELL ME OTHERWISE
Covered in 1 article
tldr.tech
·
4w
4 weeks ago
Qwen 3.7 🤖, Cursor Composer 2.5 👨💻, Anthropic acquires Stainless 🛠️
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Qwen 3.7 🤖, Cursor Composer 2.5 👨💻, Anthropic acquires Stainless 🛠️
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report