🔲 Loop Tiling - miterion · Scour

FCDP: Fully Cached Data Parallel for Communication-Avoiding Large-Scale Training

arxiv.org·1d

Rethinking Code Complexity Through the Lens of Large Language Models

arxiv.org·15h

🚀Compiler Optimization

Performance Tip of the Week #60: In-process profiling: lessons learned

abseil.io·2d

📊Profiling Tools

Operations on a B+-Tree: How the Search Works

dev.to·1d·

Discuss: DEV

🔍Type Checkers

Performance Tip of the Week #93: Robots never sleep

abseil.io·2d

🏗️Build Optimization

Designing a Drift-Resistant Memory System for LLMs

dev.to·4d·

Discuss: DEV

⚡CUDA Programming Patterns

Tensor‑Network Path‑Integral Algorithm for Efficient Simulation of Discrete 3‑D Quantum Gravity and its Application to Cosmological Data **Abstract** We intr...

freederia.com·4d

🏎️TensorRT

**Tensor‑Network Compression of Affine Kac‑Moody Vertex Operator Algebras for Scalable Conformal Field Theory Computations** — ### Abstract Affine Kac‑...

freederia.com·3d

24 bit multifx vs 20 bit

thegearpage.net·6d

📈Occupancy Optimization

Sampling the Oxford CS Library

blog.computationalcomplexity.org·6d·

Discuss: blog.computationalcomplexity.org

🔬Static Analysis

How I Structure My Data Pipelines: The Silver Layer

loglevelinfo.substack.com·6d·

Discuss: Substack

The Limit in the Loop

weaviate.io·6d·

Discuss: Hacker News

📊Gradient Accumulation

I run local LLMs daily, but I'll never trust them for these tasks

xda-developers.com·4d

⚡ONNX Runtime

Computer Memory: Part I, The Fundamentals | by Tom Herbert | Feb, 2026

medium.com·6d

⚙️Systems Programming

Optimized LLM Inference Engines

rishirajacharya.com·6d

⚡ONNX Runtime

I outperformed Enterprise Engines by 225,000x on a $50 CPU. Here is the data

news.ycombinator.com·5d·

Discuss: Hacker News

📈GPU Occupancy

AMD Zen 6: More cores, more cache, hardly any more surface area

igorslab.de·6d

⏱️CUDA Events

Intel attacks the workstation segment with Xeon 600 featuring up to 86 cores and a new platform

igorslab.de·6d

🧠CPU Architecture

adrianbrad/queue: ⏪️ Go package providing multiple queue implementations. Developed in a thread-safe generic way.

github.com·6d

Why Move To 2nm?

semiengineering.com·6d

🎛️CUDA Optimization

Loading more...