🎯 Fine-tuning - saeedesmaili · Scour

How to Train Your Goblin

🎮Reinforcement Learning

goblins.mchen.workers.dev··Hacker News, Hacker News

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

🧠Transformers Academic

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

🧠LLM Inference

local-llm.utop.workers.dev··Hacker News

Claude Fable 5 and new AI safety fables

🧩Cognitive Science News

interconnects.ai··Hacker News

It blocked us at 'hello!' Anthropic Fable 5 refusing innocuous prompts

🪟Context Windows News

theregister.com··Hacker News

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework

🧠LLMs Academic

Show HN: Bosun – a small model that keeps an agent's memory graph clean

🔤Tokenization

huggingface.co··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

🧠LLM Inference News Blog

kaitchup.substack.com··r/LocalLLaMA

"North Mini Code"; open weights, 30B param, Canadian coding model

🤖Data science Blog

cohere.com··Hacker News, Hacker News

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

🔬Deep Learning Code

github.com··Hacker News

GPT-2: Too Dangerous To Release (2019)

🧠Transformers Blog

naokishibuya.github.io··Hacker News

GraphInfer-Bench: Benchmarking LLM's Inference Capability on Graphs

🕸️Knowledge Graphs Academic

Stack Overflow didn't just help AI learn to code

zozo123.github.io··Hacker News

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

🎮Reinforcement Learning Academic

PhysicsIntern: From an Autonomous Benchmark-Runner to a Research Sidekick

🤖Data science Blog

huggingface.co··Hacker News

ApodexAI/AgentHarness: Evaluation harness for Apodex-1.0 on public deep-research benchmarks.

🤖Data science Code

github.com··Hacker News

Pythia 1.4B reproduces 3.6% of training samples verbatim given 950-token prompts

🤖Data science Blog

ret2libc.com··Hacker News

Doc-to-Atom: Learning to Compile and Compose Memory Atoms

🧠LLM Inference Academic

The Philosophy of the Out-of-Office Email

🪨Obsidian News

theatlantic.com

··Hacker News

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

🎮Reinforcement Learning Academic

Log in to enable infinite scrolling