🔲 CPU Architecture - surajkadapa · Scour

GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving 🔁Cache Coherence

A Treasure Trove of Performance: Analyzing the IO500 Submission Data ⚡Performance

Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel ⚡Hardware Acceleration

PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers ⚡Vectorized Execution

Caliper-in-the-Loop: Black-Box Optimization for Hyperledger Fabric Performance Tuning ⚡Low-Latency Systems

Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference 📦CPU Caches

Stochastic Sparse Attention for Memory-Bound Inference ⚡Vectorized Execution

FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow 🧮Compute Optimization

Replication in Graph Partitioning and Scheduling Problems 🌐Distributed Systems

Lightweight Tamper-Evident Log Integrity Verification for IoT Edge Environments: A Merkle Tree Pipeline with Adaptive Chunking 🌸Bloom Filters

Exploring Sparse Matrix Multiplication Kernels on the Cerebras CS-3 🧮Compute Optimization

On the Distortion of Partitioning Performance by Random Quantum Circuits 🧠NUMA

Efficient Training on Multiple Consumer GPUs with RoundPipe 🖥️GPU Computing

Efficient, VRAM-Constrained xLM Inference on Clients ⚡SIMD

Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits 🤝Consensus Algorithms

RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts ♟️Chess Engines

A Semantic Quantum Circuit Cache for Scalable and Distributed Quantum-Classical Workflows 🔁Cache Coherence

DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference ⚡DMA

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems 🔄Coroutines

A Study on the Performance of Distributed Training of Data-driven CFD Simulations 📐Data-Oriented Design

Log in to enable infinite scrolling