🎮 Reinforcement Learning - lmilekic · Scour

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

📊LLM Evaluation Academic

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

lesswrong.com·

🥇Top AI Papers of the Week

🤖AI Agents News

nlp.elvissaravia.com·

OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.

🤖AI Blog

huggingface.co··Hacker News, r/LocalLLaMA

Cohere open-sources a coding agent that runs on a single H100

venturebeat.com·

How to Train Your Goblin

goblins.mchen.workers.dev··Hacker News, Hacker News

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

🌀Procedural Generation Academic

I got so mad at poke(rogue)like that I trained a RL agent to beat it for me

📊LLM Evaluation

thiagolira.blot.im··Hacker News

Time-slip in AI sepsis models may inflate results, risking under- or overtreatment

📚CS Research

medicalxpress.com·

Geometry-Aware Reinforcement Learning for 2D Irregular Nesting

🦾Motion Planning Academic

A Functional Taxonomy of World Models

China women’s volleyball team finish Nations League leg on a high after opening defeat

👁️Computer Vision News

NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI

🦿Embodied AI Blog

blogs.nvidia.com·

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

🤖AI Agents Academic

Memoirs of a Learning Machine: Autobiographical Self-Training and the Self-Training Gap

📚CS Research

zenodo.org··Hacker News

Model predictive task sampling for efficient and robust adaptation

⚙️Prompt Engineering Academic

Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation

🤖AI Agents Academic

École secondaire Notre-Dame-du-Sault to hold graduation on June 24

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

🧠LLMs Academic

The Exploit Always Wins

🔬ML Research Blog

abhishek-shankar.com·

Log in to enable infinite scrolling