Attention Is All You Need for KV Cache in Diffusion LLMs
paperium.net·3h·
Discuss: DEV
🎯Tensor Cores
Flag this post
Sorting by Strip Swaps is NP-Hard
arxiv.org·3h
🚀Compiler Optimization
Flag this post
Geonum – geometric number library for unlimited dimensions with O(1) complexity
github.com·17h·
Discuss: Hacker News
✂️CUTLASS
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.com·2d
🧠CPU Architecture
Flag this post
Low-Level Hacks
blog.raycursive.com·5h·
Discuss: Hacker News
📊Profiling Tools
Flag this post
Big-O Notation: Explained in 8 Minutes
blog.algomaster.io·4h
🚀Compiler Optimization
Flag this post
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads
arxiv.org·3h
🔗NCCL
Flag this post
Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It
hackernoon.com·1d
🧮cuDNN
Flag this post
Cons Should Not Cons Its Arguments, Part II: Cheney on the MTA
web.archive.org·1d·
Discuss: Hacker News
🚀Compiler Optimization
Flag this post
Fast Answering Pattern-Constrained Reachability Queries with Two-Dimensional Reachability Index
arxiv.org·3h
🔗Kernel Fusion
Flag this post
My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.io·1d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.org·3h
Flash Attention
Flag this post
Don't let these 3 CPU specs trick you into paying more
xda-developers.com·12h
Flash Attention
Flag this post
Can-t stop till you get enough
cant.bearblog.dev·1d·
Discuss: Hacker News
📜TorchScript
Flag this post
Some Fun Videos on Optimizing NES Code
bumbershootsoft.wordpress.com·2d
🚀Compiler Optimization
Flag this post
CHIP8 – writing emulator, assembler, example game and VHDL hardware impl
blog.dominikrudnik.pl·11h·
Discuss: Hacker News
🔄SIMD Programming
Flag this post
Essential Things to Know Before Upgrading Your Computer Memory
buysellram.com·15h·
Discuss: Hacker News
⚙️Systems Programming
Flag this post
Co-Simulation Framework for Parallel DNN Execution on Chiplet-Based Systems (UW–Madison, Washington State)
semiengineering.com·11h
🌊CUDA Streams
Flag this post
How much disorder is there in a descending run?
morwenn.github.io·19h·
Discuss: Hacker News
🔬Static Analysis
Flag this post