Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎮 GPU Microarchitecture
GPU ISA, shader cores, warp scheduling, SIMT execution
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
183948
posts in
15.9
ms
What 2x
GH200
delivers: memory
paths
for LLM inference
🔴
ROCm
dnhkng.github.io
·
5d
Could
NP-hard
search trees be
tackled
through spatial mapping of computation rather than temporal execution?
🖥️
Bytecode VMs
github.com
·
2d
·
r/compsci
Flash Attention 2 in
CuteDSL
: A
Naive
Kernel, Three Optimizations, and Where I Got Stuck
⚡
PTX
kyrieblunders.bearblog.dev
·
5d
·
Hacker News
Microsoft announces Shader Model 6.10 preview, bringing neural rendering into the mainstream and just maybe making the games industry a bit less
reliant
on
Nvid
...
🔴
ROCm
pcgamer.com
·
2d
Revealing
NVIDIA Closed-Source Driver Command Streams for CPU-GPU
Runtime
Behavior Insight
🖥️
GPU Drivers
arxiv.org
·
13h
·
Hacker News
On
Interaction
Nets
and Hardware
🌐
Distributed Systems
tendrils.co
·
3d
·
Lobsters
,
Hacker News
Local-Run Graph-Based
Scalable
AGI
🌐
Distributed Systems
boggersthefish.com
·
5d
·
Hacker News
Kracuible
Spiral
🌀 Memory Architecture
🗄️
CUDA Memory
youtube.com
·
4d
·
r/SideProject
Microarchitecture
Tailored to 3D-Stacked Near-Memory Processing LLM Decoding (U. of
Edinburg
, Peking U., Cambridge et al.)
🗄️
CUDA Memory
semiengineering.com
·
1d
You don't need an
expensive
GPU to run a local LLM that actually works
🔴
ROCm
xda-developers.com
·
1d
A Matrix-Free Galerkin
Multigrid
Solver and Failure-Mode Screen for Single-GPU 3D
SIMP
Linear Systems
🔴
ROCm
arxiv.org
·
13h
From 200 lines to 15: How
Helion
is
rewriting
the rules of GPU programming
🗄️
CUDA Memory
developers.redhat.com
·
6d
Beginner-Friendly Shader Programming in
p5.js
v2 (
lgm2026
)
⚙️
PTX-to-SASS
cdn.media.ccc.de
·
6d
Mojo
language, any hardware. Systems-level performance.
Pythonic
syntax
⚡
PTX
modular.com
·
6d
·
Hacker News
Efficient,
VRAM-Constrained
xLM
Inference on Clients
🏗️
AI Infrastructure
arxiv.org
·
13h
Fast
Attention
for Short
Sequences
📡
Signal Processing
blog.qwertyforce.dev
·
5d
·
Hacker News
Building An AI Chip:
Silicon
Design And Advanced
Packaging
⚙️
ISA Design
semiengineering.com
·
1d
Your
CPU
Has More
Registers
Than You'd Think
🔧
Custom CPUs
fp32.org
·
6d
·
Lobsters
,
Hacker News
Show HN: Open-source GPU cost analysis tool
🔴
ROCm
github.com
·
2d
·
Hacker News
Delegated Execution Sharding (DES): A
hyper-parallelized
zkEVM
for theoretically optimal execution-layer scalability
🖥️
Bytecode VMs
ethresear.ch
·
5d
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help