🤖 LLM Inference - buckman · Scour

How to Optimize LLM Inference with KV Caching ⚡Inference

dev.to·6d·DEV

Databricks for Good and Virtue Foundation: Partnering to Connect Medical Volunteers to Critical Health Services in 72 Countries ⚡Apache Spark

databricks.com·18h

AMD says its $4K Ryzen AI Halo workstation practically pays for itself 📊Compute Markets

theregister.com·3h

Cerebras: The $56.4 Billion IPO Challenging NVIDIA’s Memory Wall 📊Compute Markets

artificialintelligencemadesimple.com·2d

UK sovereign LLM inference at 80% cheaper than OpenAI/Claude 💸Inference Costs

relax.ai·5d·Hacker News

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026? 🦙Ollama

dev.to·1d·DEV

I Tested KTransformers on My Laptop — 5 Hidden Features That Made 671B Models Actually Work 🔥 ⚡Performance Engineering

dev.to·23h·DEV

KV Cache Explained Like You're an LLM Engineer 💾KV Cache

dev.to·20h·DEV

Agentic LLM Inference Parameters Reference for Qwen and Gemma 🧠Context Engineering

dev.to·4d·DEV

Cerebras risked it all on dinner plate-sized AI accelerators a decade ago. Today it’s worth $66 billion ⚡Hardware Acceleration

theregister.com·6d·Hacker News

When the Sensitivity Metric Lies: A Drift-Inversion in Mixed-Precision LLM Quantization ⚡Inference

·15h·DEV

Why Agentic AI Changes Everything for Brisbane DevOps Teams — An Infrastructure Perspective 🔄AI Workflows

dev.to·3d·DEV

I squeezed my iGPU dry, then added an eGPU — a GPU buying guide for AI on mini PCs 🟩Nvidia

dev.to·3d·DEV

AutoML for Agent Fleets, Without the Vendor Bill 🦙Ollama

dev.to·5d·DEV

GPU Hardware & Driver Update: RTX 5090 Benchmarks, llama.cpp MTP, Windows 11 Fix 🟩Nvidia

dev.to·3d·DEV

How to Estimate LLM API Cost Before Shipping Your AI App 💰API Pricing

dev.to·4d·DEV

Laravel Horizon in Production: Configuring AI Queue Workloads That Actually Hold 🏛Sovereign AI Infrastructure

dev.to·6d·DEV

The Daimon Java SDK: Chat, Stream, and Query Memory from 3 Lines of Java 🦙Ollama

dev.to·3d·DEV

KTransformers' 5 Hidden Uses That Make 671B Models Run on Your Laptop 🔥 ⚡Hardware Acceleration

dev.to·23h·DEV

AMD says its $4K Ryzen AI Halo workstation practically pays for itself 📊Compute Markets

theregister.com·15h

No more posts from buckman's subscribed feeds.

Scour all 24660 feeds Learn more about Feeds

Log in to enable infinite scrolling