Large Language Models

Feeds to Scour
SubscribedAll
Scoured 33 posts in 18.6 ms

Would an LLM tell you if it’s gaming your eval? Often, no. But we can still catch the model thinking about it.

 💬Prompt Engineering
threadreaderapp.com·

Fine-tuning vs RAG vs MeMo: Where should LLM Knowledge Live?

 💬Prompt Engineering
pub.towardsai.net
·

The sample efficiency black hole

 💬Prompt Engineering  Content type: News
dwarkesh.com··Hacker News

Show HN: Black-box API bug detection across 7 AI systems

 🔌APIs
Less-relevant results

Who Owns the Loop?

 💬Prompt Engineering  Content type: News  Content type: Blog

Agent Harness Engineering: A Survey

 🤖Multi-Agent Systems  Content type: Academic

Show HN: LLM memory without context bleed; 100% precision vs. <10% vector search

 🤖AI

kenm47/dbmachine: Turn a Postgres database into a self-describing, agent-operable application backend.

 💻CLI Tools  Content type: Code
github.com··Hacker News

Logits as a new monitor for evaluation awareness

 💬Prompt Engineering
lesswrong.com··Hacker News

If Claude Fable stops helping you, you’ll never know

 🤖AI

Agent Arena: Causal Evaluation of Agents in the Real World

 🐚Shell Scripting  Content type: Blog
arena.ai··Hacker News

AI Story Engine — High-Intensity Strategic Simulation Test Report

 💬Prompt Engineering

Agents are getting phone numbers. The reason is not obvious

 🤖Multi-Agent Systems  Content type: News

No more posts from nate_dkz's subscribed feeds.

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help