📊 Gradient Accumulation - miterion · Scour

The batch size in training

breno.bearblog.dev·6h

📉Model Quantization

Towards Compressive and Scalable Recurrent Memory

arxiv.org·1d

🧩Attention Kernels

The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient

arxiv.org·1d

🏎️TensorRT

LLM Optimization: From Research to Production

dev.to·6h·

Discuss: DEV

A Neural Network Playground

playground.tensorflow.org·39m

The 5 Distributed Training Methods: How to Train Models Too Large for One GPU

pub.towardsai.net

·1d

NEURAL ARCHITECTURE: The Dawn of Direct Cognitive Programming and the Question of Human Sovereignty

medium.com·5h

⚡Flash Attention

Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration

machinelearning.apple.com·1d

🎓Model Distillation

polyrhachis/macrograd: A lightweight autograd engine inspired by PyTorch and micrograd

github.com·1d·

Discuss: Hacker News

📜TorchScript

Running an experiment with Claude Code overnight

blog.nolank.ca·3h

🤖AI Coding Tools

Tiny Recursion Models (TRM): How Tiny Networks With Recursion Beat Large Models on Hard Puzzles

pub.towardsai.net·17h

🏎️TensorRT

Gibbs Measures from Deep Shaped Multilayer Perceptrons

link.aps.org·2d

📉Model Quantization

BetaZero V2: A Diffusion Model for Setting Boulder Problems

evmojo37.substack.com·1d·

Discuss: Substack

📉Model Quantization

How low-bit inference enables efficient AI

dropbox.tech·9h·

Discuss: Hacker News

🎯Tensor Cores

LLMs struggle to verbalize their internal reasoning

lesswrong.com·4h

Explainable Causal Reinforcement Learning for heritage language revitalization programs with inverse simulation verification

dev.to·10h·

Discuss: DEV

🎓Model Distillation

Ai’s Inner Workings Revealed By Model Trained On One Billion Data Points

quantumzeitgeist.com·2d

🎓Model Distillation

Power of Agent assisted coding and learning to achieve goals faster and cheaper

osm2pgsql.org·6h·

Discuss: DEV

🤖AI Coding Tools

AI Learning Platforms

trendhunter.com·8h

🤖AI Coding Tools

Presentation: Building Embedding Models for Large-Scale Real-World Applications

infoq.com

·1d

🎓Model Distillation

Loading more...