Model Training

Feeds to Scour
SubscribedAll
Scoured 830 posts in 7.8 ms

BacteReason: A Reasoning Model for Antimicrobial Resistance Prediction

 📐Scaling Laws  Content type: Academic
biorxiv.org·

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Parameter-Efficient Adapter Tuning for Tabular-Image Multimodal Learning

 🧠AI Research  Content type: Academic
arxiv.org·

ApodexAI/AgentHarness: Evaluation harness for Apodex-1.0 on public deep-research benchmarks.

 📐Scaling Laws  Content type: Code
github.com··Hacker News

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

 💬LLMs  Content type: Academic
arxiv.org·

(Mis)generalization of Helpful-Only Fine-tuning

 🎮Reinforcement Learning
lesswrong.com·

ViP-VL: Vietnamese Self-supervised Speech Pretraining Model with Vector-Quantization Learning

 💬LLMs  Content type: Academic
arxiv.org·

mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies

 💬LLMs  Content type: Code
github.com··Hacker News

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Corpus Augmentation for Sign Language Translation via LLM-Guided Video Stitching

 💬LLMs  Content type: Academic
arxiv.org·

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

 💬LLMs  Content type: Code
github.com··r/LocalLLaMA

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

If Claude Fable stops helping you, you'll never know

 💬LLMs  Content type: Blog

Training Deliberative Monitors for Black-Box Scheming Detection

 🎮Reinforcement Learning
lesswrong.com·

Simplicity Suffices for Parameter Noise Injection in Stochastic Gradient Descent

 📉Deep Learning  Content type: Academic
arxiv.org·

Does anyone know what PCIe mode was used for these benchmarks?

 💬LLMs  Content type: Code
github.com··r/LocalLLaMA

Harness In-Context Operator Learning with Chain of Operators

 💬LLMs  Content type: Academic
arxiv.org·

PriFT: Prior-Support Guided Supervised Fine-Tuning

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

SlideCheck: Guiding Self-Supervised Pretraining of Pathology Foundation Models via Dataset Distributions

 💬LLMs  Content type: Academic
arxiv.org·

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help