ARMS Continuous Profiling Upgrade for Efficient and Accurate Performance Bottleneck Localization
๐๏ธObservability
Flag this post
Which Chip Is Best?
๐Columnar Engines
Flag this post
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.orgยท2d
๐๏ธHardware Architecture
Flag this post
My C++ lockless-ish task scheduler project ive been working on (first real project, also first time using threads) tell me what you think (BSD license) currentl...
๐Concurrency
Flag this post
Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale
๐Columnar Engines
Flag this post
eBPF Tutorial by Example: Monitoring GPU Driver Activity with Kernel Tracepoints
๐Memory Profilers
Flag this post
Co-Optimizing GPU Architecture And SW To Enhance Edge Inference Performance (NVIDIA)
semiengineering.comยท1d
๐Columnar Engines
Flag this post
The Production Generative AI Stack: Architecture and Components
thenewstack.ioยท11h
๐Columnar Engines
Flag this post
Dynamic Resource Allocation in CXL-Enabled Heterogeneous Compute Clusters
๐๏ธObservability
Flag this post
Accelerating AI inferencing with external KV Cache on Managed Lustre
cloud.google.comยท6d
๐Columnar Engines
Flag this post
10 Smart Performance Hacks For Faster Python Code
blog.jetbrains.comยท1d
๐ขNumPy
Flag this post
Show HN: a Rust ray tracer that runs on any GPU โ even in the browser
๐ฆRust Scientific
Flag this post
Why Multimodal AI Broke the Data Pipeline โ And How Daft Is Beating Ray and Spark to Fix It
hackernoon.comยท3d
๐Columnar Engines
Flag this post
Loading...Loading more...