Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Back to article
SWE Bench just got updated – new #1s
(opens in new tab)
27
articles covering this post
swebench.com
·
74w
74 weeks ago
·
Hacker News
,
Hacker News
,
Hacker News
·
Open original
(opens in new tab)
Save
Love
Like
Dislike
|
Add interest
Feeds
Share
|
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block
Add interest
Show Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Covered in 27 articles
HybridDeepResearch: Enforcing Rigor Across SQL and Web Search for Enterprise Agents
snowflake.com
·
1w
1 week ago
Actions for HybridDeepResearch: Enforcing Rigor Across SQL and Web Search for Enterprise Agents
The Ultimate Developer's Directory: 180+ AI Tools & Agents You Need to Try
dev.to
·
5d
5 days ago
·
DEV
Actions for The Ultimate Developer's Directory: 180+ AI Tools & Agents You Need to Try
Frontier AI in 2026, what actually changed and what did not
dev.to
·
1w
1 week ago
·
DEV
Actions for Frontier AI in 2026, what actually changed and what did not
Best Vibe Coding Tools for SaaS in 2026
dev.to
·
3w
3 weeks ago
·
DEV
Actions for Best Vibe Coding Tools for SaaS in 2026
LLM Benchmark Rankings 2026: 15 Models Tested on 38 Real Coding Tasks
dev.to
·
3w
3 weeks ago
·
DEV
Actions for LLM Benchmark Rankings 2026: 15 Models Tested on 38 Real Coding Tasks
Model Sizing for Coding Agents: Bigger Is Not Always Better
dev.to
·
3w
3 weeks ago
·
DEV
Actions for Model Sizing for Coding Agents: Bigger Is Not Always Better
When AI Builds Itself
anthropic.com
·
1w
1 week ago
·
DEV
,
Lobsters
,
Hacker News
,
r/ClaudeAI
,
r/artificial
,
r/singularity
Actions for When AI Builds Itself
PyCon US 2026 Typing Summit Recap
bernat.tech
·
4w
4 weeks ago
·
Lobsters
,
Hacker News
Actions for PyCon US 2026 Typing Summit Recap
How to Choose the Right AI Model for Your Needs
analyticsvidhya.com
·
1w
1 week ago
Actions for How to Choose the Right AI Model for Your Needs
Why Aren’t We Measuring How AI Affects Humans?
spectrum.ieee.org
·
1w
1 week ago
·
Hacker News
,
Hacker News
Actions for Why Aren’t We Measuring How AI Affects Humans?
Best AI Agents for Software Development Ranked: A Benchmark-Driven Look at the Current Field
marktechpost.com
·
4w
4 weeks ago
Actions for Best AI Agents for Software Development Ranked: A Benchmark-Driven Look at the Current Field
Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code
mini-swe-agent.com
·
2w
2 weeks ago
·
Hacker News
Actions for Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code
Adrarsh Divakaran: Building AI Agents in Python
blog.adarshd.dev
·
1w
1 week ago
Actions for Adrarsh Divakaran: Building AI Agents in Python
Minimal AI agent tutorial
minimal-agent.com
·
6d
6 days ago
·
Hacker News
Actions for Minimal AI agent tutorial
Tabular ML is entering a new benchmark era
mindfulmodeler.substack.com
·
3w
3 weeks ago
·
Substack
Actions for Tabular ML is entering a new benchmark era
The Hub of Heliopolis - Busting performance issues, AI edition
p403n1x87.github.io
·
3w
3 weeks ago
·
Hacker News
Actions for The Hub of Heliopolis - Busting performance issues, AI edition
Building an AI Agent in Python
serpapi.com
·
1w
1 week ago
Actions for Building an AI Agent in Python
Why LLMs (still) lack taste
beyondtheprior.com
·
4d
4 days ago
·
Hacker News
Actions for Why LLMs (still) lack taste
The Coding Harness Behind GitHub Copilot in VS Code
code.visualstudio.com
·
4w
4 weeks ago
·
Hacker News
,
r/GithubCopilot
,
r/vscode
Actions for The Coding Harness Behind GitHub Copilot in VS Code
Long-horizon tasks: building agents that work over hours & days
redis.io
·
3w
3 weeks ago
Actions for Long-horizon tasks: building agents that work over hours & days
Claude Fable 5 review: what the new Mythos model gets right (and very wrong)
lennysnewsletter.com
·
3d
3 days ago
Actions for Claude Fable 5 review: what the new Mythos model gets right (and very wrong)
AI-Accelerated Software Security Vulnerability Discovery: Is Hardware Next?
eetimes.com
·
1w
1 week ago
Actions for AI-Accelerated Software Security Vulnerability Discovery: Is Hardware Next?
In other languages
AI가 스스로를 만들 때: 재귀적 자기 개선을 향한 우리의 진전
news.hada.io
·
1w
1 week ago
Actions for AI가 스스로를 만들 때: 재귀적 자기 개선을 향한 우리의 진전
Самосовершенствующийся ИИ: что происходит внутри Anthropic
habr.com
·
1w
1 week ago
Actions for Самосовершенствующийся ИИ: что происходит внутри Anthropic
1C Code Bench — бенчмарк для оценки способности LLM писать код на 1С
habr.com
·
2w
2 weeks ago
Actions for 1C Code Bench — бенчмарк для оценки способности LLM писать код на 1С
Новый бенчмарк по кодингу для LLM ProgramBench: 9 топ моделей, 200 задач, 248 тысяч тестов. Полностью решённых
habr.com
·
4w
4 weeks ago
Actions for Новый бенчмарк по кодингу для LLM ProgramBench: 9 топ моделей, 200 задач, 248 тысяч тестов. Полностью решённых
Тысяча конфликтов и одна LLM: как мы автоматизировали переход на новые версии Chromium
habr.com
·
4w
4 weeks ago
Actions for Тысяча конфликтов и одна LLM: как мы автоматизировали переход на новые версии Chromium
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help