Benchmarking LLM Inference on RTX 4090 / RTX 5090 / RTX PRO 6000 #2
reddit.com·5h·
Discuss: r/LocalLLaMA
🏗️LLM Infrastructure
Assuring Agent Safety Evaluations By Analysing Transcripts
lesswrong.com·13h
🏆LLM Benchmarking
Profiling Your Code: 5 Tips to Significantly Boost Performance
usenix.org·21h
Systems Performance
OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference
arxiv.org·19h
🧠LLM Inference
Multi-Core By Default
rfleury.com·22h·
🧵Concurrency
How we built a structured Streamlit Application Framework in Snowflake
about.gitlab.com·23h
🔧Developer tools
Building a Scalable QA Automation Strategy: The 90-Day Roadmap
codemeetscapital.bearblog.dev·7h
👨‍💻Software development practices
Supercharge your Enterprise BI: How to approach your migration to AI/BI
databricks.com·2h
🏗️Infrastructure Economics
MECE — The AI Principle You’ll Never Stop Using After Reading This
pub.towardsai.net·12h
🔍AI Interpretability
simplicity • Pragmatic Dave Thomas & Sarah Taraporewalla
buzzsprout.com·6h·
Discuss: r/programming
Developer Experience
GoMem is a high-performance memory allocator library for Go
github.com·21h
🧠Memory Allocators
InferenceMAX – open-source Inference Frequent Benchmarking
github.com·4h·
Discuss: Hacker News
🏗️LLM Infrastructure
2025-10-10 # LLMs Are Transpilers
alloc.dev·23h·
Discuss: Hacker News
🏆LLM Benchmarking
Can AI Co-Design Distributed Systems? Scaling from 1 GPU to 1k
harvard-edge.github.io·1h·
Discuss: Hacker News
🌐Distributed systems
Operable Software
ferd.ca·10h·
Discuss: Hacker News
🌐Distributed systems
How different AI engines generate and cite answers
searchengineland.com·11h
📊Feed Optimization
When Python can't thread: a deep-dive into the GIL's impact
pythonspeed.com·12h·
Discuss: Hacker News
🧵Concurrency
Scaling Time-Series Data for AI Models
singlestore.com·8h
🎛️Feed Filtering
Let's Write a Macro in Rust
hackeryarn.com·7h·
Discuss: Hacker News
🎭Rust Macros