Information Gain-based Policy Optimization: A Simple and Effective Approach forMulti-Turn LLM Agents
paperium.netยท10hยท
Discuss: DEV
๐Ÿ”„Meta-Learning
Flag this post
Accelerated Dielectric Barrier Coating Optimization via Multi-Modal Data Fusion & Bayesian Hyperparameter Tuning
dev.toยท17hยท
Discuss: DEV
๐ŸŽฏPredictive Coding
Flag this post
Augmenting learning in neuro-embodied systems through neurobiological first principles
arxiv.orgยท7h
๐Ÿง Neuromorphic Computing
Flag this post
Dynamic Model Selection for Trajectory Prediction via Pairwise Ranking and Meta-Features
arxiv.orgยท7h
๐Ÿ”„Meta-Learning
Flag this post
Post-training methods for language models
developers.redhat.comยท5h
๐Ÿ”„Meta-Learning
Flag this post
Adaptive Human-Computer Interaction Strategies Through Reinforcement Learning in Complex
arxiv.orgยท1d
๐Ÿ”ŒNeural Interfaces
Flag this post
Open-weight training practices and implications for CoT monitorability
lesswrong.comยท1h
๐Ÿ”„Meta-Learning
Flag this post
Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)
sebastianraschka.comยท1dยท
Discuss: r/LLM
๐ŸŽฏPredictive Coding
Flag this post
Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models
arxiv.orgยท7h
๐ŸŽฏPredictive Coding
Flag this post
Reinforcement Learning for Resource Allocation in Vehicular Multi-Fog Computing
arxiv.orgยท7h
๐Ÿง Neuromorphic Hardware
Flag this post
Token-Regulated Group Relative Policy Optimization for Stable Reinforcement Learning in Large Language Models
arxiv.orgยท7h
๐Ÿ”„Meta-Learning
Flag this post
Aligning LLM agents with human learning and adjustment behavior: a dual agent approach
arxiv.orgยท7h
๐ŸŽฏPredictive Coding
Flag this post
Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning
arxiv.orgยท7h
๐Ÿ”„Meta-Learning
Flag this post