Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🔲 ML Hardware
GPU, TPU, inference hardware, AI accelerators, CUDA
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
182979
posts in
26.2
ms
Efficient,
VRAM-Constrained
xLM
Inference on Clients
⚡
Performance Engineering
arxiv.org
·
6d
GPU Power Prediction Tool for AI
Workloads
(MIT,
IBM
)
⚡
Performance Engineering
semiengineering.com
·
1d
Part
III
: The evolution to AI
GPUs
🤖
AI Research
jonpeddie.com
·
10h
Google New
TPU
Generation is Specifically Designed for Agents and
SOTA
Model Training
🤖
AI Research
infoq.com
·
13h
Supercharging LLM inference on Google
TPUs
: Achieving 3X
speedups
with diffusion-style speculative decoding
🤖
LLM
developers.googleblog.com
·
2d
·
Hacker News
,
Hacker News
,
r/LocalLLaMA
Distributing model weights to your AI cluster: a faster pre-flight on
AKS
and
Slurm
☁️
Cloud Computing
techcommunity.microsoft.com
·
4h
Performance of
CUDA
Python in AI
🤖
AI Research
medium.com
·
5d
Your old GPU can still run big LLMs – you just need the right
tweaks
🧠
LLMs
xda-developers.com
·
13h
Fitting
LLMs on Self-Hosted
GPUs
⚡
Performance Engineering
anup.io
·
2d
OpenAI is
teaming
up with other companies to improve
supercomputer
networking for AI training.
🤖
AI Research
theverge.com
·
3h
## deck\
.gl
is a GPU-powered framework for visual
exploratory
data analysis of large datasets\.
📊
Data Science
deck.gl
·
9h
OpenCV
’s
DNN
Library for Optimal Model Performance on CPU, CUDA, and New Architectures
👁️
Computer Vision
armdevices.net
·
1d
Google Announces TPU
v8t
Sunfish and TPU
v8i
Zebrafish
🏗️
System Design
storagereview.com
·
5d
e3ntity/e3rl
: Fast and simple implementation of RL algorithms, designed to run fully on GPU.
🤖
AI Research
github.com
·
12h
·
Hacker News
Boosting
multimodal inference performance by >10% with a single Python
dictionary
⚡
Performance Engineering
modal.com
·
2d
·
Hacker News
Difference between revisions of "
Jetson/L4T/Power
"
🐹
Go
elinux.org
·
19h
A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for
CPU/XPU/CUDA
, with
multi-datatype
support and full compatibility...
🧠
LLMs
lemmy.ml
·
5d
SMG
: The Case for
Disaggregating
CPU from GPU in LLM Serving
⚡
Performance Engineering
pytorch.org
·
1d
·
Hacker News
Why we’re at a decisive turning point for
resolving
data
fragmentation
[Q&A]
🏗️
System Design
betanews.com
·
14h
AI galaxy
hunters
could be adding to the global GPU
crunch
🤖
AI Research
techxplore.com
·
1d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help