🐿️ ScourBrowse
LoginSign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
⚡ SIMD Vectorization

AVX Instructions, Parallel Data Processing, Compiler Optimization, Performance

Writing Speed-of-Light Flash Attention for 5090 in CUDA C++
gau-nernst.github.io·55m·
Discuss: Hacker News
🖥️Terminal Renaissance
Cracking the Density Code: Why MAF Flows Where KDE Stalls
towardsdatascience.com·20h
🔗Tailscale
When AI optimizations miss the mark: A case study in array shape calculation
questdb.com·2d·
Discuss: Hacker News, r/programming
⚡Performance Mythology
Don't Repeat Yourself, Coarse-Grained Circuit Deduplication to Accelerate Sim
danglingpointers.substack.com·22h·
Discuss: Substack
🖥️Game Emulation
Deep Dive: OpenAI's GPT-OSS
dev.to·10h·
Discuss: DEV
📊Quantization
numexpr: fast numerical array expression evaluator for Python
github.com·2d·
Discuss: Hacker News
🚀SIMD Parsing
Parallel Reduce and Scan on the GPU
cachemiss.xyz·5d·
Discuss: Hacker News
⚡SIMD Optimization
Show HN: A short story on developing a long-context World-Model with no money
francesco215.github.io·19h·
Discuss: Hacker News
🧠Learned Codecs
Get Back To WARP
binary.ninja·19h
🧪Binary Fuzzing
Compute Where It Counts: a trainable LLM sparsity enabling 4x CPU speed
crystalai.org·2d·
Discuss: Hacker News
🌊Streaming Algorithms
Sysbench for MySQL 5.6 thru 9.4 on a small server
smalldatum.blogspot.com·1d·
Discuss: smalldatum.blogspot.com
🦀Rusty Databases
Here's how Nvidia and AMD hardware are being used in surprising ways to build Nvidia's fastest GPU ever
techradar.com·2h
🖥️Terminal Renaissance
UnderColor’s spiral challenge from 1984 – part 3
subethasoftware.com·1d
📺VT100
Using large-scale search to discover fast GPU kernels in Rust
reddit.com·2d·
Discuss: r/rust
🦀Rust Macros
Show HN: Novel GPT-2 sampling and memory architecture
github.com·3h·
Discuss: Hacker News
💎Information Crystallography
You could have invented CuTe hierarchical layout (but maybe not the rest of it?)
blog.ezyang.com·1d·
Discuss: blog.ezyang.com
⟷Bidirectional Programming
Speeding Up AI Coding Assistants Using Deterministic Feedback
proxymock.io·21h·
Discuss: Hacker News
📼Tape Combinators
Cursor: 1.5x Faster Moe Training on Blackwell with MXFP8 Kernels
cursor.com·3d·
Discuss: Hacker News, Hacker News
🖥️Game Emulation
Fast globally optimal Truncated Least Squares point cloud registration with fixed rotation axis
arxiv.org·1d
🌀Riemannian Computing
FFmpeg 8.0 Released
ffmpeg.org·10h·
Discuss: Hacker News
🎬AV1 Encoding
Loading...Loading more...
AboutBlogChangelogRoadmap