Sorting by Strip Swaps is NP-Hard
arxiv.org·3h
🚀Compiler Optimization
Flag this post
Geonum – geometric number library for unlimited dimensions with O(1) complexity
✂️CUTLASS
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.com·2d
🧠CPU Architecture
Flag this post
Low-Level Hacks
📊Profiling Tools
Flag this post
Big-O Notation: Explained in 8 Minutes
blog.algomaster.io·4h
🚀Compiler Optimization
Flag this post
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads
arxiv.org·3h
🔗NCCL
Flag this post
Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It
hackernoon.com·1d
🧮cuDNN
Flag this post
TIL: For long-lived LLM sessions, swapping KV Cache to RAM is ~10x faster than recalculating it. Why isn't this a standard feature?
📈Occupancy Optimization
Flag this post
Fast Answering Pattern-Constrained Reachability Queries with Two-Dimensional Reachability Index
arxiv.org·3h
🔗Kernel Fusion
Flag this post
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.org·3h
⚡Flash Attention
Flag this post
Don't let these 3 CPU specs trick you into paying more
xda-developers.com·12h
⚡Flash Attention
Flag this post
Can-t stop till you get enough
📜TorchScript
Flag this post
Some Fun Videos on Optimizing NES Code
bumbershootsoft.wordpress.com·2d
🚀Compiler Optimization
Flag this post
CHIP8 – writing emulator, assembler, example game and VHDL hardware impl
🔄SIMD Programming
Flag this post
Co-Simulation Framework for Parallel DNN Execution on Chiplet-Based Systems (UW–Madison, Washington State)
semiengineering.com·11h
🌊CUDA Streams
Flag this post
Loading...Loading more...