🎯 Reinforcement Learning - Scourface · Scour

660 AI Agents Ran 27,000 Experiments. Their Biggest Discovery Was a 2015 Textbook Result. 🧠Neuromorphic Hardware

towardsai.net·2d

TabQL: In-Context Q-Learning with Tabular Foundation Models 🔄Meta-Learning

Wikipedia 🔬Science

en.wikipedia.org·6d

Weekly Research Recap 🤖Machine Learning

quantseeker.com·1d

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation 🔄Meta-Learning

A Chinese study monitoring low-frequency time-code signals during the November 2025 geomagnetic storm found that signal strength dropped by over 2.3 dBμV/m and ... 📡Signal Processing

frontiersin.org·2d·r/space

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients 🎯Predictive Coding

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints 🧠Neuromorphic Hardware

An Encoded Corrective Double Deep Q-Networks for Multi-Agent Control Systems 🧠Neuromorphic Hardware

Progressive Generalization Augmentation with Deeply Coupled RND-PPO and Domain-Prioritized Noise Injection for Robust Crop Management Reinforcement Learning 🔄Meta-Learning

blevesearch/vellum: A Go library implementing a FST (finite state transducer) 🧠Neuromorphic Hardware

DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models 🔄Meta-Learning

GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning 🔄Meta-Learning

Addressing Terminal Constraints in Data-Driven Demand Response Scheduling 🧠Neuromorphic Hardware

When Actions Disappear: Adversarial Action Removal in Self-Play Reinforcement Learning 🔄Meta-Learning

$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data 🔄Meta-Learning

DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis 🎯Predictive Coding

Offline Contextual Bandits in the Presence of New Actions 🔄Meta-Learning

ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization 🔄Meta-Learning

Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation 🔄Meta-Learning

Log in to enable infinite scrolling