🤖 Large Language Models - nate_dkz · Scour

Would an LLM tell you if it’s gaming your eval? Often, no. But we can still catch the model thinking about it.

💬Prompt Engineering

threadreaderapp.com·

Fine-tuning vs RAG vs MeMo: Where should LLM Knowledge Live?

💬Prompt Engineering

pub.towardsai.net

·

The sample efficiency black hole

💬Prompt Engineering News

dwarkesh.com··Hacker News

Show HN: Black-box API bug detection across 7 AI systems

resources.kusho.ai··Hacker News

Less-relevant results

Who Owns the Loop?

💬Prompt Engineering News Blog

cardeo.substack.com··Substack

Agent Harness Engineering: A Survey

🤖Multi-Agent Systems Academic

picrew.github.io··Hacker News

Show HN: LLM memory without context bleed; 100% precision vs. <10% vector search

tenureai.dev··Hacker News, Hacker News

kenm47/dbmachine: Turn a Postgres database into a self-describing, agent-operable application backend.

💻CLI Tools Code

github.com··Hacker News

Logits as a new monitor for evaluation awareness

💬Prompt Engineering

lesswrong.com··Hacker News

If Claude Fable stops helping you, you’ll never know

simonwillison.net··Hacker News

Agent Arena: Causal Evaluation of Agents in the Real World

🐚Shell Scripting Blog

arena.ai··Hacker News

AI Story Engine — High-Intensity Strategic Simulation Test Report

💬Prompt Engineering

gist.github.com··Hacker News

Agents are getting phone numbers. The reason is not obvious

🤖Multi-Agent Systems News

newsletter.gtmengineering.ai··Hacker News

No more posts from nate_dkz's subscribed feeds.

Scour all 25257 feeds Learn more about Feeds

Log in to enable infinite scrolling