HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
arxiv.org·17h
A Novel Side-channel Attack That Utilizes Memory Re-orderings (U. of Washington, Duke, UCSC et al.)
semiengineering.com·3h
Why AI Needs GPUs and TPUs: The Hardware Behind LLMs
blog.bytebytego.com·2d
Weird RAM issue
68kmla.org·15h
Bringing Data Transformations Near-Memory for Low-Latency Analytics in HTAP Environments
arxiv.org·17h
Co-optimization Approaches For Reliable and Efficient AI Acceleration (Peking University et al.)
semiengineering.com·4h
Conversation: LLMs and the what/how loop
martinfowler.com·7h
Loading...Loading more...