TIL: For long-lived LLM sessions, swapping KV Cache to RAM is ~10x faster than recalculating it. Why isn't this a standard feature?
๐Occupancy Optimization
Flag this post
Some Fun Videos on Optimizing NES Code
bumbershootsoft.wordpress.comยท14h
๐Compiler Optimization
Flag this post
A hitchhiker's guide to CUDA programming
๐ฏGPU Kernels
Flag this post
Entropy in algorithm analysis
11011110.github.ioยท11h
๐Kernel Fusion
Flag this post
Fungus: The Befunge CPU(2015)
โ๏ธSystems Programming
Flag this post
Unlock Linear Solver Speed: Symbolic Preconditioning for Hyper-Performance
๐ฏTensor Cores
Flag this post
A unified threshold-constrained optimization framework for consistent and interpretable cross-machine condition monitoring
sciencedirect.comยท13h
โฑ๏ธBenchmarking
Flag this post
Challenging the Fastest OSS Workflow Engine
๐งPTX
Flag this post
Procedural world gen: connection between water terrain in chunks are failing
๐GPU Occupancy
Flag this post
Q&A #80 (2025-10-31)
computerenhance.comยท1d
๐Profiling Tools
Flag this post
Opportunistically Parallel Lambda Calculus
๐กLSP
Flag this post
Vectorizing for Fun and Performance
๐SIMD Programming
Flag this post
Utilizing Chiplet-Locality For Efficient Memory Mapping In MCM GPUs (ETRI, Sungkyunkwan Univ.)
semiengineering.comยท2d
๐Occupancy Optimization
Flag this post
Loading...Loading more...