⚡ Inference - jobz · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🧠LLMs Code

github.com··Hacker News

PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference

🧠LLMs Blog

·

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

🗄️Vector Databases

everylocalai.com··DEV

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

📐Embeddings Academic

Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!

What Ollama Reveals About Local AI, Agents, and Open Models

🌐World Models Blog

odsc.medium.com·

Improved performance and model support with GGUF

🧠LLMs Blog

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

zozo123.github.io··Hacker News

Big Blue’s Redbook on Storage Scale KV Cache management

🤖AI Agents News

blocksandfiles.com·

China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)

🎛️Fine-tuning News

I've tested so many desktop AI tools, but Hermes with Ollama is my new favorite - here's why

🤖AI Agents News Tutorial

Running LLM Inference on Kubernetes: What It Actually Takes

🧠LLMs Blog

fairwinds.com·

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

alternativeto.net·

Google's new open model DiffusionGemma generates text from noise instead of word by word

the-decoder.com

·

Using Scikit-LLM with Open-Source LLMs

machinelearningmastery.com·

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

🎛️Fine-tuning News

newsletter.semianalysis.com

··Hacker News

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

👁️Multimodal AI Blog

blogs.nvidia.com·

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

🎛️Fine-tuning

huggingface.co··Hacker News

How we fight GPU scarcity without compromise

🧠Reasoning Models Blog

equixly.com··Hacker News

Less-relevant results

The Inference Alpha: Maximizing Frontier Models on AMD

🧠LLMs Blog

digitalocean.com·

Log in to enable infinite scrolling