🎯 Reinforcement Learning - asdfjllji · Scour

Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

🌐World Models Academic

Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning

🌐World Models Academic

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

🌐World Models Academic

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

🌐World Models Academic

A Barrier-Modulated Architecture for Safe Affine Formation Control in Second-Order Multi-Agent Systems

♟️Game Theory Academic

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

🌐World Models Academic

SAW: Stage-Aware Dynamic Weighting for Multi-Objective Reinforcement Learning in Large Language Models

🌐World Models Academic

Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains

🌐World Models Academic

Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles

🌐World Models Academic

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

🌐World Models Academic

ARTA: Adaptive Reinforcement-Learning-Based Throttling Agent for RowHammer Vulnerabilities

🌐World Models Academic

Representation Learning Enables Scalable Multitask Deep Reinforcement Learning

🌐World Models Academic

Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations

🌐World Models Academic

Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning

🌐World Models Academic

Constrained Deep Reinforcement Learning for Cognitive Radar Resource Management

🌐World Models Academic

Alpha-RTL: Test-Time Training for RTL Hardware Optimization

🌐World Models Academic

SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter

🌐World Models Academic

The Impact of Market Informedness on Market Makers' Profitability

🌐World Models Academic

SocraticPO: Policy Optimization via Interactive Guidance

🌐World Models Academic

Q-VGM: Q-Guided Value-Gradient Matching for Flow-Matching VLA Policies

📄AI Research Academic

Sign up or log in to see more results

Log in to enable infinite scrolling