🎮 Reinforcement Learning - saeedesmaili · Scour

Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies

🤖AI Agents Academic

Agents Need Work Data: A Primer on RLWD, or Reinforcement Learning on Work Data

anjalishriva.com··Hacker News

Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information

🎯Fine-tuning

venturebeat.com··Hacker News

Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)

📈Optimization Academic

web.mit.edu··Hacker News

Less-relevant results

Propel: Breaking the Solver Bottleneck in Task-Generator RL

🎯Fine-tuning

vmax.ai··Hacker News

Introducing North Mini Code: Cohere’s First Model For Developers

🎯Fine-tuning Blog

huggingface.co··Hacker News

Memoirs of a Learning Machine: Autobiographical Self-Training and the Self-Training Gap

🧩Cognitive Science

zenodo.org··Hacker News

Why LLMs (still) lack taste

beyondtheprior.com··Hacker News

How to Train Your Goblin

🎯Fine-tuning

goblins.mchen.workers.dev··Hacker News, Hacker News

Multi-agent rendezvous in fluid flows via reinforcement learning

🤖AI Agents Academic

How to Stop Shipping Low-Quality RL Environments (with Examples)

🎯Fine-tuning News

latent.space··Hacker News

Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems

📈Optimization Academic

I got so mad at poke(rogue)like that I trained a RL agent to beat it for me

thiagolira.blot.im··Hacker News

AI-powered living business intelligence network

atlasforgex.com

··Hacker News

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

📈Optimization Academic

Risk Has an Owner, and It's Not the AI

🤖Automation Blog

aaddrick.com··Hacker News

[NEW MODEL] SupraLabs just released Supra1.5-50M Base (Experimental)!

🔤Tokenization

huggingface.co··r/LocalLLaMA

LLM Research Papers: The 2026 List (January to May)

💬Natural Language Processing News

magazine.sebastianraschka.com

··Hacker News

TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution

📈Optimization Academic

A wild idea: Abstract reality using ontology

🤖LLM Discussion

news.ycombinator.com··Hacker News

Log in to enable infinite scrolling