H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.org·7h
🔢SIMD
Flag this post
Low-Level Hacks
🦀Rust
Flag this post
Detailed Technical Documentation on AI Implementation Logic (Taking Large Language Models as an Example )
🏗️CPU Architecture
Flag this post
TIL: For long-lived LLM sessions, swapping KV Cache to RAM is ~10x faster than recalculating it. Why isn't this a standard feature?
📊Performance Tools
Flag this post
Building blobd: single-machine object store with sub-millisecond reads and 15 GB/s uploads
⚡Zig
Flag this post
Free Functions Don't Change Performance (Much)
🦀Rust
Flag this post
Dive into Systems
🖥️Operating Systems
Flag this post
Don't let these 3 CPU specs trick you into paying more
xda-developers.com·16h
🔬RISC-V
Flag this post
Reverse Engineering Googles BotGuard
🔩Assembly
Flag this post
Real-time stock volatility prediction with deep learning on a time-series DB
📊Performance Tools
Flag this post
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads
arxiv.org·7h
🔀Parallel Algorithms
Flag this post
Playing Around with ARM Assembly
🔩Assembly
Flag this post
Loading...Loading more...