🎮 Reinforcement Learning - barisamiw · Scour

Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning

arxiv.org·1d

Explainable Causal Reinforcement Learning for heritage language revitalization programs with inverse simulation verification

dev.to·7h·

Discuss: DEV

CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use

arxiv.org·1d

🌐Distributed Systems

Multi-armed bandit

en.wikipedia.org·1d

Optimizing post-disaster road restoration with reinforcement learning: A traveler-behavior-aware approach

sciencedirect.com·2d

🌐Distributed Systems

check out this article on Reinforcement Learning with R: Origins, Real-Life Applications, and Practical Implementation

dev.to·4d·

Discuss: DEV

Power of Agent assisted coding and learning to achieve goals faster and cheaper

osm2pgsql.org·2h·

Discuss: DEV

#0186: What We Let Machines Do

matthewsinclair.medium.com·3h

Forge: Scalable Agent RL Framework and Algorithm

minimax.io·1d·

Discuss: Hacker News

🌐Distributed Systems

The implementation for the drifting model

breno.bearblog.dev·1d

Painless Activation Steering (PAS): Automated, Lightweight Post‑Training for LLM Behavior

sashacui.substack.com·10h·

Discuss: Substack

🔀Transformers

Functional distinctions between orbitofrontal cortex and anterior cingulate cortex subregions in decision-making and autonomic regulation

nature.com·4h

🔀Transformers

Show HN: Fighting the War Against Expensive Reinforcement Learning

cadenza-landing-qtu7gbjwb-akshparekh123-3457s-projects.vercel.app·2d·

Discuss: Hacker News

Optimal timing for superintelligence

marginalrevolution.com·1d

🌐Distributed Systems

A Conceptual Framework for Exploration Hacking

lesswrong.com·2d

🔧Feature Engineering

MiniMax-AI/MiniMax-M2.5

github.com·4h

I Built a Smart Movie Recommender with Collaborative Filtering

analyticsvidhya.com·3h

🧭Vector Databases

Decoding urban soundscapes: spatial prediction and influence mechanism analysis with interpretable semi-supervised learning

sciencedirect.com·1h

🔧Feature Engineering

Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration

machinelearning.apple.com·1d

🔧Feature Engineering

Swift to Harbour, Slow to Berth

joehalliwell.com·2h

🌐Distributed Systems

Loading more...