Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🔲 ML Hardware
GPU, TPU, inference hardware, AI accelerators, CUDA
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
11712
posts in
23.2
ms
Efficient,
VRAM-Constrained
xLM
Inference on Clients
⚡
Performance Engineering
arxiv.org
·
6d
Supercharging LLM inference on Google
TPUs
: Achieving 3X
speedups
with diffusion-style speculative decoding
🤖
LLM
developers.googleblog.com
·
2d
·
Hacker News
,
Hacker News
,
r/LocalLLaMA
Step-by-Step: Deploying a Multimodal AI Model with Llama 3.2 and
FastAPI
0.112 on
ECS
4.0
🧠
LLMs
dev.to
·
16h
·
DEV
Google New
TPU
Generation is Specifically Designed for Agents and
SOTA
Model Training
🤖
AI Research
infoq.com
·
10h
SMG
: The Case for
Disaggregating
CPU from GPU in LLM Serving
⚡
Performance Engineering
pytorch.org
·
1d
·
Hacker News
Distributing model weights to your AI cluster: a faster pre-flight on
AKS
and
Slurm
☁️
Cloud Computing
techcommunity.microsoft.com
·
1h
Haru-neo/qengine
: Custom CUDA inference engine for Qwen3.5 hybrid models, tuned for sm_70 mining cards (CMP 100-210,
V100
). GPU-poor's vLLM.
⚡
Performance Engineering
github.com
·
3d
·
DEV
Rowhammer
Attack Against NVIDIA
Chips
🔌
Embedded Systems
schneier.com
·
9h
The Future of Linux Gaming: Why Intel
Merged
Jay Into
Mesa
🔌
Embedded Systems
hackernoon.com
·
1d
a 3D
globe
of every known AI
compute
cluster
🤖
AI Research
flopmap.com
·
3d
·
Hacker News
* Recap alert * Here’s what you missed on the AI front from our teams at @
GoogleCloud
, @
googledevs
and more last month — including Gemma 4 and our newest TPU ⬇️
🤖
AI Research
twitter.macworks.dev
·
1d
RTX 5090 and M4 MacBook Air: Implementing
PCI
Passthrough
for Gaming
🍎
Apple
scottjg.com
·
1d
·
Hacker News
,
r/macgaming
Your AI, Your Rules: Running a Local LLM with GPU
Acceleration
on
Proxmox
⚡
Performance Engineering
huggingface.co
·
5d
·
DEV
The Powerful Lenovo
Legion
RTX 5090 Gaming PC Drops Below $5,000 for the First Time in 2026, and It Even Includes a Rare 64GB of
DDR5
Memory
🔌
Embedded Systems
ign.com
·
2d
KV
Cache
Locality
: The Hidden Variable in Your LLM Serving Cost
⚡
Performance Engineering
ranvier.systems
·
6d
·
Hacker News
How to Add
Sentiment
Analysis to Any App in 5 Minutes (Free API)
💬
NLP
rapidapi.com
·
2d
·
DEV
The
Financialization
of
Compute
Futures
⚖️
Tech Policy
deep-research-agent.pagey.site
·
3d
·
Hacker News
Raising the
baseline
for the `
nvptx64-nvidia-cuda
` target
⚡
Performance Engineering
blog.rust-lang.org
·
5d
Building
resilient
networks for AI
supercomputers
🏗️
System Design
techcommunity.microsoft.com
·
8h
Google Cloud
TPU
Architecture Versions Explained: From v1 to the
Eighth
Generation
☁️
Cloud Computing
storage.googleapis.com
·
5d
·
DEV
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help