🎯 强化学习 - ice · Scour

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards 👁️多模态AI

The hard core of alignment (is robustifying RL) ⏱️个人效率

lesswrong.com·2d

CatIF-RL: Activity-Oriented Enzyme Sequence Design by Steered Inverse Protein Folding ⏱️个人效率

biorxiv.org·15h

rl for red teaming: training models to attack and defend themselves ⏱️个人效率

castform.com·2d·Hacker News

Luke Coffey On How Tehran Has Adapted Kremlin Negotiation Tactics 🤖人工智能、人形机器人、机器人商业化、具身智能、人机交互、AI创业相关

SFT, RL, and On-Policy Distillation Through a Distributional Lens (19 minute read) 👁️多模态AI

nrehiew.github.io·6d·Hacker News

Your Daily digest for AkademikLink 🔭科技趋势

5 Dakikada Teknoloji Gündemi <team@aposto.com> via kill-the-newsletter.com·1d

Eric Jang – Building AlphaGo from scratch 🤖手搓机器人、人生系统、有趣的AI工具

dwarkesh.com·1d·Hacker News

agreed. RL is not (at least by itself) the way to alignment ⏱️个人效率

twitter.macworks.dev·3d

GRIP-VLM: RL for Efficient Vision-Language Models 👁️多模态AI

startuphub.ai·2d

yikart/AiToEarn: Let's use AI to Earn! 🛠️独立开发

Show HN: Watch a neural net discover molecules by arguing with itself 👁️多模态AI

randman444.github.io·2d·Hacker News

Il pieno di energia! 📷计算机视觉

maestroandrea.bearblog.dev·5d

Locked Shields 2026: RL Joins Live-Fire Cyber Event 🔭科技趋势

reversinglabs.com·2d

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs [video] 👁️多模态AI

youtube.com·1d·Hacker News, Hacker News

DQN vs Rainbow: 4.8x Score Gain From 6 Extensions 👁️多模态AI

tildalice.io·5d

UAE Denies Netanyahu Visited During Iran War 🔭科技趋势

beijingbulletin.com·2d

RL-Based Retargeting Method For Transferring Human Motion To Robots 👁️多模态AI

Top House Republican Says No New US Ukraine Supplemental Likely, Backs More Russia Sanctions ⏱️个人效率

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning 👁️多模态AI

machinelearning.apple.com·6d

Log in to enable infinite scrolling