🔲 Loop Tiling - miterion · Scour

Versor: A Geometric Sequence Architecture

arxiv.org·1d

⚡CUDA Programming Patterns

Evaluating Claude’s C Compiler Against GCC

shbhmrzd.github.io·1d·

Discuss: r/C_Programming

🚀Compiler Optimization

Proximity-driven acceleration of challenging solid-phase peptide couplings

pnas.org·2d

⚡ONNX Runtime

Benchmarking for Single Feature Attribution with Microarchitecture Cliffs

arxiv.org·16h

🧠CPU Architecture

Unleashing Computational Power: Ultimate Latency Optimization of Qwen3 and Qwen3-VL on AMD MI300X Series

lmsys.org·2d

🎛️CUDA Optimization

christopherkarani/Wax: 🍯 Memory layer for on-device AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer.

github.com·4h·

Discuss: Hacker News

⚡Flash Attention

Breaking the Tractability Barrier: A Generic Low-Level Solver for NP-Hard Instances (N=63) on Commodity 64-Bit Silicon

zenodo.org·11h·

Discuss: Hacker News

🎯Tensor Cores

Bitsum. Real-time CPU Optimization and Automation

bitsum.com·1d

📊Profiling Tools

Beyond Latency and Communication Complexity - A Tutorial on the Pipes Model

decentralizedthoughts.github.io·16h

🌊CUDA Streams

BalatroBench Benchmarks Large Language Models Playing Balatro

balatrobench.com·10h·

Discuss: Hacker News

⚡ONNX Runtime

OpenAI GPT-5.3-Codex-Spark Now Running at 1K Tokens Per Secondon BIG Cerebras Chips

servethehome.com·4h·

Discuss: Hacker News

⚡Flash Attention

Minimum Energy Per Query

semiengineering.com·1d

📈Occupancy Optimization

Best CPU 2026 – the top AMD Ryzen and Intel Core processors tested

club386.com·11h

🧠CPU Architecture

SIEVE: an Efficient Turn-Key Eviction Algorithm for Web Caches

cachemon.github.io·2d·

Discuss: Hacker News

📊Profiling Tools

Zero State Architecture deep dive

news.ycombinator.com·1d·

Discuss: Hacker News

🎯Tensor Cores

Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

venturebeat.com·23h·

Discuss: r/LocalLLaMA

Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization

machinelearning.apple.com·3d

⏱️CUDA Events

How octorus Renders 300K Lines of Diff at High Speed

dev.to·14h·

Discuss: DEV

🏗️Build Optimization

Allocators from C to Zig

antonz.org·1d·

Discuss: Lobsters, Hacker News, r/C_Programming, r/programming

🧠CUDA Memory Management

RocksDB 10 and TidesDB 8 Benchmark Analysis on Dedicated Threadripper

tidesdb.com·22h·

Discuss: Hacker News

📊Profiling Tools

Sign up or log in to see more results