🎮 Reinforcement Learning - gautam6599123 · Scour

Control Reinforcement Learning: Token-Level Mechanistic Analysis via Learned SAE Feature Steering

arxiv.org·1d

🗣️Large Language Models

Optimistic Training and Convergence of Q-Learning -- Extended Version

arxiv.org·4d

📊Optimization

Show HN: Fighting the War Against Expensive Reinforcement Learning

cadenza-landing-qtu7gbjwb-akshparekh123-3457s-projects.vercel.app·1d·

Discuss: Hacker News

∂Automatic Differentiation

BetaZero V2: A Diffusion Model for Setting Boulder Problems

evmojo37.substack.com·14h·

Discuss: Substack

∂Automatic Differentiation

A Conceptual Framework for Exploration Hacking

lesswrong.com·21h

🗣️Large Language Models

ashworks1706/rlhf-from-scratch: A theoretical and practical deep dive into Reinforcement Learning with Human Feedback and it’s applications in Large Language Models from scratch.

github.com·3d·

Discuss: Hacker News

🗣️Large Language Models

Robotics Motion Learning: Training Linked Robot Arms with Kuramoto Models

hackernoon.com·1d

Human-like metacognitive skills will reduce LLM slop and aid alignment and capabilities

lesswrong.com·18h

🤖Transformers

Forge: Scalable Agent RL Framework and Algorithm

minimax.io·5h·

Discuss: Hacker News

🗣️Large Language Models

[TUHS] bare m4 (was BTL summmer employees)

tuhs.org·1d·

Discuss: Lobsters

🗣️Large Language Models

Show HN: A minimal online decision maker

decisionmaker.online·2d·

Discuss: Hacker News

🎯Decision Theory

Beyond Kuramoto Models: Associative Memory and Plastic Synapses in ML Ensembles

hackernoon.com·1d

🧠Neural Networks

BalatroBench Benchmarks Large Language Models Playing Balatro

balatrobench.com·2h·

Discuss: Hacker News

🗣️Large Language Models

Schedules of Reinforcement in Psychology (Examples)

simplypsychology.org·2d·

Discuss: Hacker News

🎲Probability Theory

polyrhachis/macrograd: A lightweight autograd engine inspired by PyTorch and micrograd

github.com·26m·

Discuss: Hacker News

Recursive self-improvement from AI models

marginalrevolution.com·2d·

Discuss: Hacker News

∂Automatic Differentiation

Part 2 - AI Chat Evaluation of the Formal Language in He Xin's PEPC System

news.ycombinator.com·1d·

Discuss: Hacker News

🗣️Large Language Models

Training A Small Language Model To Outperform Frontier Models On CRM-Arena

neurometric.substack.com·1d·

Discuss: Substack

🗣️Large Language Models

Transformer-Based Memory Forecasting: Leveraging Anonymized Aggregates for Personal Insights

novice.media·1d·

Discuss: Hacker News

⏱️Time Series Analysis

Digitizing the "Shokunin": How we encoded a Master's hammer strike into AI

yusukekaizen.substack.com·1d·

Discuss: Substack

Loading more...