Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
⚡ Fast AI Inference
Cerebras, Groq, fast LLM tokens
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
220
posts in
23.5
ms
How to Run Gemma 4 12B Locally - The Best
AI
For Consumer Laptops
🤖
AI
Video
youtube.com
·
23h
Sources: ByteDance has partnered with chipmaker InnoStar to develop an
AI
inference
chip modeled after
Groq
's LPUs, which are built to run
AI
models at low cost...
🏗️
LLM Infrastructure
techmeme.com
·
6d
FitMyLLM — Independent benchmarks for self-hosted
AI
🏠
Self-Hosting
Discussion
lemmy.world
·
2d
Nvidia Pays $400 Million for
AI
Software Firm Kumo
🆕
New AI
pymnts.com
·
17h
Bit-Exact
AI
Inference
Verification Without Performance Tradeoffs
🏗️
LLM Infrastructure
Academic
arxiv.org
·
3d
Making Local
LLM
Go Brrr
🤖
AI
seanpedersen.github.io
·
1d
Experimenting with TPUs, GKE Managed DRANET, and Multi-cluster
Inference
Gateway
🌍
Distributed Systems
Blog
cloud.google.com
·
2d
jmaczan/tiny-vllm
: Build your own high performance
LLM
inference
engine in C++ and CUDA - a smaller version of
vLLM
🏗️
LLM Infrastructure
Code
github.com
·
6d
·
Hacker News
Deep X XM2 NPU: 80 TOPS Generative
AI
Accelerator at 5W
📱
Edge AI Optimization
armdevices.net
·
9h
3-Part Series:
LLM
Latency
in Production (Part 1)
🧠
LLM Inference
towardsai.net
·
1d
Where to Host Your Open-Source Model (Under 10B Parameters)
🤖
AI
digitalocean.com
·
18h
Step 3.7 Flash – 198B-A11B MoE vision-language model
🤖
AI
huggingface.co
·
5d
·
Hacker News
Nemotron 3 Ultra now available on
AI
Gateway
🪄
Prompt Engineering
vercel.com
·
1d
Intel's attempting to break into the
AI
market once more, but this time avoiding Nvidia's dominance in training by going for
inference
🖥
GPUs
pcgamer.com
·
3d
Qwen3.7 Plus - Intelligence, Performance & Price Analysis
💰
Tokenomics
artificialanalysis.ai
·
1d
·
Hacker News
Deploy Hermes Agent on OpenShift
AI
with
vLLM
model serving
🏗️
LLM Infrastructure
developers.redhat.com
·
3d
Your first model deployment on Foundry Local on Azure Local: from catalog to
inference
in 10 minutes
💻
Chips
techcommunity.microsoft.com
·
2d
Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? · ggml-org
llama.cpp
· Discussion #24102
🤖
AI
Discussion
Code
github.com
·
4h
·
r/LocalLLaMA
Build Personal
AI
Agents on Windows PCs with New Tools from Microsoft and Nvidia
🤖
AI
Blog
developer.nvidia.com
·
2d
·
Hacker News
YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition
🏗️
LLM Infrastructure
Academic
arxiv.org
·
5h
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help