Cache Optimization

Feeds to Scour
SubscribedAll
Scoured 20 posts in 11.8 ms

Cache Deep Dive IV — TLB, Huge Pages, and Memory-Level Parallelism

 Caching Strategies  Content type: Blog
dev.to··DEV

The Return of Rigorous Full-System Timing Simulation

 🕹️Retro Gaming
sigarch.org··Hacker News

Beyond the Memory Wall: The CPU Was Helping You All Along

 Caching Strategies  Content type: Blog
prawns.dev··Hacker News

Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design

 📐Design Systems  Content type: Blog
tilert.ai··Hacker News

Prompt Caching on Claude: Cut Input Costs 78% (The Math Nobody Writes Down)

 💬Prompt Engineering
pub.towardsai.net
·

gburd/libumem: This is the user space slab memory allocator, umem, first available in Solaris 9. (mirror of: codeberg.org/gregburd/libumem)

 🚀Performance  Content type: Code
github.com·

Data Residency for AI in Switzerland – A Practical Latency‑Cost Guide

 🌍Edge Computing  Content type: Blog
dev.to··DEV

Release 0.17.6: Merge pull request #3782 from tigerbeetle/release-2026-06-05 · tigerbeetle/tigerbeetle

 👁Code Review  Content type: Code
github.com·

Introducing SQL Data Insights Pro

 🗄️Databases  Content type: Blog
research.ibm.com·

DuckDB Storage Engine for MariaDB. When the Sea Lion Learns to Quack.

 📊Data Pipelines (ETL)
mariadb.org··Hacker News

What is the most efficient way to evaluate poker hands at scale?

 🔌APIs  Content type: Blog
dev.to··DEV
Less-relevant results

A Case for Simulation-Driven Resilience in Agentic Data Systems

 📡Event-Driven Architecture  Content type: Blog

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

 💸Affordable LLMs  Content type: Code
github.com··Hacker News

REST API Design: Building APIs Developers Love (2026)

 🔌APIs  Content type: Blog
dev.to··DEV

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

 🦙Ollama  Content type: Code
github.com··r/LocalLLaMA

How Hash Tables Achieve O(1) Lookups

 📊Algorithms  Content type: Blog
dev.to··DEV

What Building a Go PDF Engine Teaches You About Real Engineering

 🔒Security  Content type: Blog
dev.to··DEV

Great Stack to Doesn't Work #5 — Linux: "Not a Kernel Panic, an Engineer Panic"

 🚀Performance  Content type: Blog
dev.to··DEV

WebSocket Authentication Deep Dive — Tokens, Stateful Connections, and the CORS Bypass Nobody Warns You About

 🔒Security  Content type: Blog
dev.to··DEV

Why Building a PDF Engine in Go Will Help You Understand Go Concepts Better

 🚀Performance  Content type: Blog
dev.to··DEV

No more posts from minezone's subscribed feeds.

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help