🎯 Reinforcement Learning - 2795725893 · Scour

Goal-Conditioned Reinforcement Learning from Sub-Optimal Data on Metric Spaces

arxiv.org·15h

Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling

arxiv.org·15h

check out this article on Reinforcement Learning with R: Origins, Real-Life Applications, and Practical Implementation

dev.to·2d·

Discuss: DEV

A multi-agent reinforcement learning approach to autonomous aircraft taxiing with taxiing time, fuel consumption, and emission optimization

sciencedirect.com·1d

Show HN: Fighting the War Against Expensive Reinforcement Learning

cadenza-landing-qtu7gbjwb-akshparekh123-3457s-projects.vercel.app·13h·

Discuss: Hacker News

Recursive self-improvement from AI models

marginalrevolution.com·2d·

Discuss: Hacker News

🎨Multimodal AI

A training principle for drifting models

breno.bearblog.dev·9h

Optimizing post-disaster road restoration with reinforcement learning: A traveler-behavior-aware approach

sciencedirect.com·4h

ashworks1706/rlhf-from-scratch: A theoretical and practical deep dive into Reinforcement Learning with Human Feedback and it’s applications in Large Language Models from scratch.

github.com·2d·

Discuss: Hacker News

Zero State Architecture deep dive

news.ycombinator.com·3h·

Discuss: Hacker News

JupyterPS/VBAF: Visual Business Automation Framework - PowerShell-based reinforcement learning for education and business automation

github.com·2d·

Discuss: Hacker News

A Conceptual Framework for Exploration Hacking

lesswrong.com·4h

The 4 Mixture of Experts Architectures: How to Train 100B Models at 10B Cost

pub.towardsai.net

·7h

🎨Multimodal AI

Recursive Language Models: Stop Stuffing the Context Window

nlp.elvissaravia.com·29m

Observe emergent behavior in autonomous multi-agent LLM networks

agents.glide2.app·2d·

Discuss: Hacker News

Gibbs Measures from Deep Shaped Multilayer Perceptrons

link.aps.org·7h

Researchers propose a self-distillation fix for ‘catastrophic forgetting’ in LLMs

infoworld.com·10h

Robotics Motion Learning: Training Linked Robot Arms with Kuramoto Models

hackernoon.com·1d

🎨Multimodal AI

Show HN: A minimal online decision maker

decisionmaker.online·1d·

Discuss: Hacker News

How to Leverage Explainable AI for Better Business Decisions

towardsdatascience.com·5h

Loading more...