⏱️ Instruction Scheduling - hello · Scour

AccelSync: Verifying Synchronization Coverage in Accelerator Pipeline Programs 🛡️Intel CET

Three CPU Generations That Changed Everything: A Latency-Focused History of x86 ⚡BOLT

lucisqr.substack.com

nviennot/core-to-core-latency: Measures the latency between CPU cores 📊Intel PMU

github.com·6h·Hacker News

How to Eliminate Pipeline Friction in AI Model Serving 🌀Naiad

developer.nvidia.com·1h

Efficient Remote Memory Ordering for Non-Coherent Systems 🔁Cache Coherence

danglingpointers.substack.com·6h·Substack

FPGA Spectrum Engine 🔌FPGA Programming

hackaday.io·5d

Per-Phase Fidelity Attribution for Quantum Compilers using HBR Decomposition 📊Profile-Guided Optimization

CCD-Level and Load-Aware Thread Orchestration for In-Memory Vector ANNS on Multi-Core CPUs 🚀Milvus

Beyond Static Policies: Exploring Dynamic Policy Selection for Single-Thread Performance Optimization 📍CPU Pinning

Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes 🔁Cache Coherence

31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding 🌊Memory Bandwidth

AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines 📊Performance Tools

KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving 🎴TAO

Janus: Compiler-Based Defense Against Transient Execution Attacks Using ARM Hardware Primitives 🏷️Memory Tagging

PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs 🧮Intel MKL-DNN

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models 🏷️Pointer Tagging

UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding 📦Folly

Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism 🦙Ollama

Triage: An Adaptive Parallel Window Decoding Scheduler for Real-time Fault-Tolerant Quantum Computation ⚛️Quantum Computing

Low-Latency Out-of-Core ANN Search in High-Dimensional Space 📊Vectorized Query Execution

Log in to enable infinite scrolling