Skip to main content
Scour
Discover
Docs
Login
Sign Up
Discover
About
Docs
Changelog
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Back to article
tbench.ai
56w
56 weeks ago
Terminal-Bench: a benchmark for AI agents in terminal environments
(opens in new tab)
Covered by
15 sources
See all sources covering this story
including
Blog on Tailscale
,
Kilo Blog
Discussed on
Hacker News
and
Hacker News
Love
Like
Not for me
Save
|
|
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Covered in 16 articles
Blog on Tailscale
·
4w
4 weeks ago
A simpler way to experiment with AI coding agents using Aperture CLI
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for A simpler way to experiment with AI coding agents using Aperture CLI
Kilo Blog
·
1w
1 week ago
KiloBench - Because Your Benchmark Score Doesn't Pay the Bill
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for KiloBench - Because Your Benchmark Score Doesn't Pay the Bill
goose-docs.ai
·
4d
4 days ago
Self-Improving Agents Still Need Humans
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Self-Improving Agents Still Need Humans
The New Stack
·
4w
4 weeks ago
Cursor bets on cheaper coding with Composer 2.5 and Kimi K2.5
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Cursor bets on cheaper coding with Composer 2.5 and Kimi K2.5
venturebeat.com
·
4w
4 weeks ago
Google says Gemini 3.5 Flash can slash enterprise AI costs by more than $1 billion a year
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Google says Gemini 3.5 Flash can slash enterprise AI costs by more than $1 billion a year
venturebeat.com
·
5w
5 weeks ago
AI IQ is here: a new site scores frontier AI models on the human IQ scale. The results are already dividing tech.
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for AI IQ is here: a new site scores frontier AI models on the human IQ scale. The results are already dividing tech.
Tom's Guide
·
4w
4 weeks ago
Google just launched Gemini 3.5 Flash
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Google just launched Gemini 3.5 Flash
agents-last-exam.org
·
1w
1 week ago
AI Agent Benchmark for Real-World Professional Workflows
Discussed on
Hacker News
and
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for AI Agent Benchmark for Real-World Professional Workflows
henrypan.com
·
3w
3 weeks ago
What 1,000+ Harness Experiments Taught Me About Self-Improving Agents
Discussed on
Hacker News
and
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for What 1,000+ Harness Experiments Taught Me About Self-Improving Agents
andrewjesson.com
·
4d
4 days ago
The engineering practices Claude Code and Codex use to improve AI agents
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The engineering practices Claude Code and Codex use to improve AI agents
Show more
In other languages
Xataka
·
4w
4 weeks ago
Hay una batalla por tener el modelo de IA que programa mejor. Y en ella ha aparecido un rival bueno, bonito y muy barato: Cursor
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Hay una batalla por tener el modelo de IA que programa mejor. Y en ella ha aparecido un rival bueno, bonito y muy barato: Cursor
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report