Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Back to article
LessWrong
12h
12 hours ago
Door's Locked, Try the Window
(opens in new tab)
Covers
8 stories
See all stories this covers
including
Our evaluation of Claude Mythos Preview’s cyber capabilities
Love
Like
Not for me
Save
|
|
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Covers 8 related stories
aisi.gov.uk
·
10w
10 weeks ago
Our evaluation of Claude Mythos Preview’s cyber capabilities
Discussed on
Hacker News
and
Lobsters
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Our evaluation of Claude Mythos Preview’s cyber capabilities
Anthropic
·
4w
4 weeks ago
How we contain Claude across products
Discussed on
Hacker News
,
Hacker News
,
Hacker News
, and
r/ClaudeAI
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for How we contain Claude across products
tbench.ai
·
57w
57 weeks ago
Terminal-Bench: a benchmark for AI agents in terminal environments
Discussed on
Hacker News
and
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Terminal-Bench: a benchmark for AI agents in terminal environments
metr.org
·
5w
5 weeks ago
Frontier Risk Report (February to March 2026)
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Frontier Risk Report (February to March 2026)
GitHub
·
73w
73 weeks ago
fastapi/fastapi
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for fastapi/fastapi
arXiv
·
39w
39 weeks ago
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
GitHub
·
355w
355 weeks ago
Datasette
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Datasette
Anthropic
·
15w
15 weeks ago
Eval awareness in Claude Opus 4.6’s BrowseComp performance
Discussed on
Hacker News
,
Hacker News
,
r/ClaudeAI
, and
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Eval awareness in Claude Opus 4.6’s BrowseComp performance
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report