🎮 Reinforcement Learning - saeedesmaili · Scour

Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix

🤖AI Agents Academic

Less-relevant results

Best explanations of how LLMs work

🧠LLMs Blog

vorushin.github.io··Hacker News

Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

🦾Robotics Academic

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

🤖AI Agents Academic

UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

🤖AI Agents Academic

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

🔬Deep Learning Code

github.com··Hacker News

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

🎯Fine-tuning Academic

Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral Generalization

🤖AI Agents Academic

Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation

🤖AI Agents Academic

Show HN: The Deterministic Core Architecture for AI-Augmented Applications

🪟Context Windows

brandonbellsystems.com··Hacker News

Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning

🦾Robotics Academic

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

🌱Digital Gardens Academic

GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning

🤖LLM Academic

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

🤖AI Agents Academic

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

🧠LLM Inference News Blog

kaitchup.substack.com··r/LocalLLaMA

Deep reinforcement learning for process design: Review and perspective

🔬Deep Learning Academic

APPO: Agentic Procedural Policy Optimization

🤖AI Agents Academic

Introducing the Third Generation of Apple’s Foundation Models

machinelearning.apple.com··Hacker News, r/apple

Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach

🤖AI Agents Academic

A Unifying Lens on Reward Uncertainty in RLHF

🤖LLM Academic

No more posts from saeedesmaili's subscribed feeds.

Scour all 25258 feeds Learn more about Feeds

Sign up or log in to see more results

Log in to enable infinite scrolling