Inference

Feeds to Scour
SubscribedAll
Scoured 338 posts in 6.6 ms

Fixing a stuck Ollama runner and building a GPU watchdog

 ✍️Prompt Engineering

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

 🧠LLMs  Content type: Blog
dnhkng.github.io·

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

 👁️Multimodal AI
everylocalai.com··DEV

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

 📐Embeddings  Content type: Code
github.com··DEV
Less-relevant results

AI Serving Platform That Adapts to Your Model

 🎛️Fine-tuning  Content type: Blog
databricks.com·

Tales of an Ollama Honeypot (Part 3): More Traffic, More Findings

 🧪Synthetic Data
posts.inthecyber.com·

The hidden bottleneck in LLM inference and the impact on MLPerf benchmarking

 🧠LLMs
edn.com·

WEKA software speeds long context AI inferencing on Oracle’s public cloud

 🏛️DAOs  Content type: News
blocksandfiles.com·

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

 🧠LLMs  Content type: Academic
arxiv.org·

Anatomy of a high-performance EP kernel

 🔌MCP  Content type: Blog

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

 ⚙️Agent Frameworks
phoronix.com·

The economics of speculative decoding

 💎Token Economics  Content type: Blog

DiffusionGemma: The Developer Guide

 🎛️Fine-tuning  Content type: Blog

Report: GKE Inference Gateway delivers up to 92% faster AI responses

 🔍RAG  Content type: Blog

Self-hosted remote access for Ollama without complicated setup

 🏛️DAOs

Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design

 🤖AI Agents  Content type: Blog
tilert.ai··Hacker News

A system programmer’s guide to LLM inference

 🧠LLMs  Content type: Blog

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

 🧠LLMs

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

 👁️Multimodal AI  Content type: Code
github.com··Hacker News

DiffusionGemma: 4x Faster Text Generation

 🔬AI Research  Content type: News  Content type: Blog

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help