Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Inference
🤖 LLM Inference
Specific
Model Serving, Quantization, vLLM, ONNX Runtime
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
187
posts in
8.1
ms
How we fight GPU scarcity without compromise
⚙️
AI Infrastructure
Content type:
Blog
equixly.com
·
5d
5 days ago
·
Hacker News
Actions for How we fight GPU scarcity without compromise
Less-relevant results
Token4Token — pay-per-token
inference
on Gnosis + Swarm
⚙️
AI Infrastructure
t4t.eth.link
·
1d
1 day ago
·
Hacker News
Actions for Token4Token — pay-per-token inference on Gnosis + Swarm
Making LLMs faster and more efficient across multiple languages
👁️
Multimodal LLMs
techxplore.com
·
6d
6 days ago
Actions for Making LLMs faster and more efficient across multiple languages
Build a local voice agent with Red Hat OpenShift AI
⚙️
AI Infrastructure
developers.redhat.com
·
2d
2 days ago
Actions for Build a local voice agent with Red Hat OpenShift AI
Making Local
LLM
Go Brrr
⚙️
AI Infrastructure
seanpedersen.github.io
·
6d
6 days ago
Actions for Making Local LLM Go Brrr
Breaking the Ice: Analyzing Cold Start
Latency
in
vLLM
⚙️
AI Infrastructure
Content type:
Academic
arxiv.org
·
2d
2 days ago
·
Hacker News
Actions for Breaking the Ice: Analyzing Cold Start Latency in vLLM
CoreML vs TFLite: iPhone 15 Pro GPU 2.3x Faster
⚡
Inference Optimization
Content type:
Blog
Content type:
Discussion
tildalice.io
·
4d
4 days ago
Actions for CoreML vs TFLite: iPhone 15 Pro GPU 2.3x Faster
defai-digital/ax-engine: Apple Silicon
LLM
runtime
supporting Gemma 4 and Qwen 3.6 MTP
modes
⚡
Inference Optimization
Content type:
Code
github.com
·
21h
21 hours ago
·
Hacker News
Actions for defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes
3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1
⚡
Inference Optimization
Content type:
Blog
databricks.com
·
6d
6 days ago
Actions for 3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1
OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine
👁️
Multimodal LLMs
linuxiac.com
·
2d
2 days ago
Actions for OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine
OpenCV 5 release - New DNN engine with enhanced
ONNX
and
LLM/VLM
support, Intel, Arm, and RISC-V hardware optimizations - CNX Software
👁️
Multimodal LLMs
Content type:
News
cnx-software.com
·
18h
18 hours ago
Actions for OpenCV 5 release - New DNN engine with enhanced ONNX and LLM/VLM support, Intel, Arm, and RISC-V hardware optimizations - CNX Software
A field journal on Ray Data and Daft for multimodal data lake (14 minute read)
👁️
Multimodal LLMs
Content type:
Blog
mehulbatra.medium.com
·
6d
6 days ago
Actions for A field journal on Ray Data and Daft for multimodal data lake (14 minute read)
Intro — Sehastrajit
👁️
Multimodal LLMs
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for Intro — Sehastrajit
Where to Host Your Open-Source
Model
(Under 10B Parameters)
⚙️
AI Infrastructure
digitalocean.com
·
6d
6 days ago
Actions for Where to Host Your Open-Source Model (Under 10B Parameters)
not much happened today | AINews
⚙️
AI Infrastructure
news.smol.ai
·
2d
2 days ago
Actions for not much happened today | AINews
The 1-Second Timeout Hack: Running Infinite Parallel Workloads Natively on Google Apps Script
⚙️
AI Infrastructure
Content type:
Blog
medium.com
·
19h
19 hours ago
Actions for The 1-Second Timeout Hack: Running Infinite Parallel Workloads Natively on Google Apps Script
The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
🔍
Retrieval-Augmented Generation
devops.com
·
5d
5 days ago
Actions for The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step
LLM
Inference
⚙️
AI Infrastructure
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference
[AINews] FrontierCode: Benchmarking for Code Quality over Slop
⚙️
AI Infrastructure
Content type:
News
latent.space
·
1d
1 day ago
Actions for [AINews] FrontierCode: Benchmarking for Code Quality over Slop
Ask HN: Is software engineering still a good career choice for new students?
⚡
Inference Optimization
Content type:
Discussion
news.ycombinator.com
·
23h
23 hours ago
·
Hacker News
Actions for Ask HN: Is software engineering still a good career choice for new students?
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help