Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Inference Engineering
🧠 Inference Engineering
model serving, inference optimization, LLM inference, throughput
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
342
posts in
12.1
ms
MLPerf and the rise of latency-aware
LLM
benchmarking
⏱️
Prefill Decoding
edn.com
·
5d
5 days ago
Actions for MLPerf and the rise of latency-aware LLM benchmarking
Build a Medical Report Analyzer on Dedicated
Inference
with Python
💰
Inference Cost
digitalocean.com
·
6d
6 days ago
Actions for Build a Medical Report Analyzer on Dedicated Inference with Python
"AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY
💰
Inference Cost
Content type:
News
Content type:
Blog
braddelong.substack.com
·
2d
2 days ago
·
Substack
Actions for "AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY
🇳🇱 Go/Golang job: Senior Backend
Engineer
(Go) | Studio AI at Creative Fabrica (Amsterdam, Netherlands)
☁️
Cloud Infrastructure
golangprojects.com
·
4h
4 hours ago
Actions for 🇳🇱 Go/Golang job: Senior Backend Engineer (Go) | Studio AI at Creative Fabrica (Amsterdam, Netherlands)
OpenCV 5.0 Computer Vision Library Released with Rewritten DNN
Engine
🚀
Model Serving
linuxiac.com
·
2d
2 days ago
Actions for OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine
Self-hosted remote access for
Ollama
without complicated setup
📝
Infrastructure as Code
oab.arc-i.co.uk
·
3d
3 days ago
·
r/selfhosted
Actions for Self-hosted remote access for Ollama without complicated setup
google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation
💾
KV Cache
huggingface.co
·
2d
2 days ago
·
r/LocalLLaMA
Actions for google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation
Google's new open
model
DiffusionGemma generates text from noise instead of word by word
🎮
GPU Computing
the-decoder.com
·
27m
27 minutes ago
Actions for Google's new open model DiffusionGemma generates text from noise instead of word by word
KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant
KV
cache
+ HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.
⏱️
Prefill Decoding
Content type:
Code
github.com
·
3h
3 hours ago
·
Hacker News
Actions for KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.
How to Run Gemma 4 12B Locally - The Best AI For Consumer Laptops
💾
KV Cache
Content type:
Video
youtube.com
·
6d
6 days ago
Actions for How to Run Gemma 4 12B Locally - The Best AI For Consumer Laptops
ReasonAlloc: Hierarchical Decoding-Time
KV
Cache
Budget Allocation for Reasoning
Models
⏱️
Prefill Decoding
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models
Azure OpenAI Architecture: The Decisions That Actually Matter (Part 2)
☁️
Cloud Infrastructure
techcommunity.microsoft.com
·
2d
2 days ago
Actions for Azure OpenAI Architecture: The Decisions That Actually Matter (Part 2)
Latest technical articles & videos.
💾
KV Cache
certdepot.net
·
4d
4 days ago
Actions for Latest technical articles & videos.
local AI agents for Cursor with pre-tuned marketplace/commu
🚨
Incident Response
locaible.com
·
6h
6 hours ago
·
Hacker News
Actions for local AI agents for Cursor with pre-tuned marketplace/commu
WWDC 2026: Foundation
Models
(& Anarlog)
🏗️
Platform Engineering
skushagra.com
·
1d
1 day ago
Actions for WWDC 2026: Foundation Models (& Anarlog)
Making Local
LLM
Go Brrr
⏱️
Prefill Decoding
seanpedersen.github.io
·
6d
6 days ago
Actions for Making Local LLM Go Brrr
Distributed multi-agent systems with Aspire and Microsoft Agent Framework
🔭
Observability
Content type:
Blog
devblogs.microsoft.com
·
1d
1 day ago
Actions for Distributed multi-agent systems with Aspire and Microsoft Agent Framework
Youssof Altoukhi (@Youssofal_)
🔢
FP8 Training
xcancel.com
·
2d
2 days ago
·
r/LocalLLaMA
Actions for Youssof Altoukhi (@Youssofal_)
IntentKV: Cross-Turn Intent-Aware
KV
Cache
Pruning for Agent
Inference
⚡
FlashAttention
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference
MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
🗜️
Quantization
Content type:
News
Content type:
Blog
kaitchup.substack.com
·
4d
4 days ago
·
r/LocalLLaMA
Actions for MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help