Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 38 posts in 8.6 ms

gaelazzo/python_chess: Chess trainer

馃AI ResearchContent type: Code
github.comHacker News

Mbodi AI (YC P25) Is Hiring Founding Machine Learning Engineer (Robotics)

馃挕Entrepreneurship
ycombinator.comHacker News

OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.

馃AIContent type: Blog

Best explanations of how LLMs work

鈿欙笍AI InfrastructureContent type: Blog

Show HN: The Deterministic Core Architecture for AI-Augmented Applications

馃AI Research
Less-relevant results

Tracing Eval-Awareness Emergence Through Training of OLMo 3

鉁嶏笍Prompt Engineering
lesswrong.com

The Effective Sample Size

馃AI Research
alex.smola.orgHacker News

Why Robotics Is a Pre-Paradigm Field

馃AI ResearchContent type: News

Bumblebees can spontaneously solve problems, study finds

馃AI Research
arstechnica.com

Issue 654

馃AI ResearchContent type: Blog

Introducing the Third Generation of Apple鈥檚 Foundation Models

鈿欙笍AI Infrastructure

Rohin Shah on AGI Safety

馃AI Research
lesswrong.com

Optimisation over non-stationary distributions creates weirder minds

馃AI Research
lesswrong.com

Training Deliberative Monitors for Black-Box Scheming Detection

馃Claude
lesswrong.com

(Mis)generalization of Helpful-Only Fine-tuning

馃Claude
lesswrong.com

Do We Want a Superintelligent People-Pleaser?

馃Claude
lesswrong.com

I got so mad at poke(rogue)like that I trained a RL agent to beat it for me

馃AI ResearchContent type: Blog

No more posts from pwadstrom's subscribed feeds.

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help