Region-Aware Reconstruction Strategy for Pre-training fMRI Foundation Model
arxiv.orgΒ·4h
🏎️TensorRT
Flag this post
Writing an LLM from scratch, part 27 – what's left, and what's next?
gilesthomas.comΒ·8hΒ·
Discuss: Hacker News
πŸŽ“Model Distillation
Flag this post
Fast, Scalable LDA in C++ with Stochastic Variational Inference
github.comΒ·17hΒ·
Discuss: r/cpp
🏎️TensorRT
Flag this post
Don’t Just Normalize, Batch Normalize! A Guide to Stable Neural Networks
pub.towardsai.netΒ·1d
πŸ“‰Model Quantization
Flag this post
Bio-Inspired Neuron Synapse Optimization for Adaptive Learning and Smart Decision-Making
arxiv.orgΒ·4h
πŸ‘οΈAttention Optimization
Flag this post
Post-training methods for language models
developers.redhat.comΒ·2h
πŸŽ“Model Distillation
Flag this post
Connectivity Structure and Dynamics of Nonlinear Recurrent Neural Networks
journals.aps.orgΒ·9h
πŸ“‰Model Quantization
Flag this post
Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)
sebastianraschka.comΒ·1dΒ·
Discuss: r/LLM
πŸ‘οΈAttention Optimization
Flag this post
Machine Learning Fundamentals: Everything I Wish I Knew When I Started
dev.toΒ·2dΒ·
Discuss: DEV
πŸŽ“Model Distillation
Flag this post
A Practitioner's Guide to Kolmogorov-Arnold Networks
arxiviq.substack.comΒ·1dΒ·
Discuss: Substack
πŸ“‰Model Quantization
Flag this post
Probabilistic Robustness for Free? Revisiting Training via a Benchmark
arxiv.orgΒ·4h
πŸ“‰Model Quantization
Flag this post
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
arxiv.orgΒ·4h
πŸ‘οΈAttention Optimization
Flag this post
Attention Is All You Need for KV Cache in Diffusion LLMs
paperium.netΒ·4hΒ·
Discuss: DEV
🎯Tensor Cores
Flag this post
Yes, you should understand backprop (2016)
karpathy.medium.comΒ·2dΒ·
Discuss: Hacker News
πŸ“‰Model Quantization
Flag this post
What Are Auto-regressive Models? A Deep Dive and Typical Use Cases
blog.pangeanic.comΒ·20h
πŸŽ“Model Distillation
Flag this post
MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling
arxiv.orgΒ·4h
πŸ“‰Model Quantization
Flag this post
How Transformer Models Detect Anomalies in System Logs
hackernoon.comΒ·15h
⏱️CUDA Events
Flag this post
Understanding the Design of Optimizers with me
dev.toΒ·1dΒ·
Discuss: DEV
🏎️TensorRT
Flag this post