Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Inference
🤖 LLM Inference
Specific
Model Serving, Quantization, vLLM, ONNX Runtime
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
203
posts in
5.9
ms
OpenCV 5 release - New DNN engine with enhanced
ONNX
and
LLM/VLM
support, Intel, Arm, and RISC-V hardware optimizations - CNX Software
🤖
LLM
Content type:
News
cnx-software.com
·
22h
22 hours ago
Actions for OpenCV 5 release - New DNN engine with enhanced ONNX and LLM/VLM support, Intel, Arm, and RISC-V hardware optimizations - CNX Software
Running
LLM
Inference
on Kubernetes: What It Actually Takes
🤖
LLM
Content type:
Blog
fairwinds.com
·
5d
5 days ago
Actions for Running LLM Inference on Kubernetes: What It Actually Takes
China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)
🤖
LLM
Content type:
News
decrypt.co
·
2d
2 days ago
Actions for China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
🤖
LLM
Content type:
Blog
blogs.nvidia.com
·
10h
10 hours ago
Actions for NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
Making LLMs faster and more efficient across multiple languages
🤖
LLM
techxplore.com
·
6d
6 days ago
Actions for Making LLMs faster and more efficient across multiple languages
Show HN:
Run
Llama.cpp
In-Process from Java with Project Panama FFM
🤖
LLM
deemwar-products.github.io
·
5d
5 days ago
·
Hacker News
Actions for Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM
Token4Token — pay-per-token
inference
on Gnosis + Swarm
🤖
LLM
t4t.eth.link
·
1d
1 day ago
·
Hacker News
Actions for Token4Token — pay-per-token inference on Gnosis + Swarm
libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA
KV-cache
. Weights available on Hugging Face.
⚡
Vllm
Content type:
Code
github.com
·
1d
1 day ago
Actions for libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.
Youssof Altoukhi (@Youssofal_)
⚡
Vllm
xcancel.com
·
3d
3 days ago
·
r/LocalLLaMA
Actions for Youssof Altoukhi (@Youssofal_)
AI
Serving
Platform That Adapts to Your
Model
🤖
LLM
Content type:
Blog
databricks.com
·
10h
10 hours ago
Actions for AI Serving Platform That Adapts to Your Model
Unsloth Gemma 4 QAT
🤖
LLM
unsloth.ai
·
5d
5 days ago
Actions for Unsloth Gemma 4 QAT
STAR-KV
: Low-Rank
KV
Cache
Compression via Soft Thresholding for Adaptive Rank Control
⚡
Vllm
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control
WEKA software speeds long context AI
inferencing
on Oracle’s public cloud
🤖
Agents
Content type:
News
blocksandfiles.com
·
11h
11 hours ago
Actions for WEKA software speeds long context AI inferencing on Oracle’s public cloud
BeeLlama.cpp
DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster
🤖
LLM
sleepingrobots.com
·
4d
4 days ago
Actions for BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster
Apples to Apples: MLX vs.
Llama.cpp
for Gemma 4 12B on an M1 16GB
🤖
LLM
Content type:
Blog
ziraph.com
·
5d
5 days ago
·
Hacker News
Actions for Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB
Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!
⚡
Vllm
gizchina.com
·
1d
1 day ago
Actions for Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!
Machinic Psychopharmacology: Do LLMs Self-Medicate?
🤖
LLM
lesswrong.com
·
12h
12 hours ago
·
Hacker News
Actions for Machinic Psychopharmacology: Do LLMs Self-Medicate?
From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure
🤖
LLM
Content type:
Blog
jimmysong.io
·
1d
1 day ago
Actions for From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure
Latest technical articles & videos.
🤖
LLM
certdepot.net
·
4d
4 days ago
Actions for Latest technical articles & videos.
Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support
🤖
LLM
alternativeto.net
·
2d
2 days ago
Actions for Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help