Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
๐ฎ SIMT Execution
GPU Programming, Warp Divergence, Thread Blocks, CUDA Model
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
144946
posts in
41.6
ms
CUDA
From First
Principles
Part 2
pub.towardsai.net
ยท
1d
๐ฅ๏ธ
OpenCL
[Benchmark]
Qwen3.5-122B-A10B
FP8 weights / bf16 KV on 8x RTX PRO 6000 (SM120): 1,985 tok/s burst, MTP 2.75x, fp8 KV silent corruption finding ยท Issue #19603
github.com
ยท
7h
ยท
Discuss:
r/LocalLLaMA
๐
Performance
Porting
AI Music Generation to NVIDIA
Jetson
hackster.io
ยท
7h
โก
Hardware Acceleration
Exploiting
Dependency and
Parallelism
: Real-Time Scheduling and Analysis for GPU Tasks
arxiv.org
ยท
4d
๐ฎ
WebGPU
A GPU
Microarchitecture
Optimized for Fully
Homomorphic
Encryption
semiengineering.com
ยท
3h
๐ข
Homomorphic Encryption
Maximizing GPU
Utilization
with NVIDIA Run:ai and NVIDIA
NIM
developer.nvidia.com
ยท
1d
๐จ
WGPU
Kiln
:
WebGPU-Native
Out-of-Core Volume Rendering for Multi-GB Datasets
dev.to
ยท
1h
ยท
Discuss:
DEV
๐
WebGL
Visualizing
DeepSpeed
Ulysses
: Sequence Parallelism for 1M Context Windows
darshanfofadiya.com
ยท
3d
ยท
Discuss:
Hacker News
โก
Hardware Acceleration
C64
: Putting Sprite
Multiplexing
to Work
bumbershootsoft.wordpress.com
ยท
14h
๐พ
Retro Computing
Styx
: Blades of
Greed
PC Performance Analysis
dsogaming.com
ยท
2h
๐ฎ
Game Engines
Why
Structured
Kernels
?
modular.com
ยท
2d
โก
Hardware Acceleration
A
Rabbit
Hole Called
WebGL
(8-part series on the technical background of a
WebGL
application w/ functional demo)
hendrik-erz.de
ยท
15h
ยท
Discuss:
r/programming
๐
WebGL
What makes a game tick? Part 9 - Data Driven
Multi-Threading
Scheduler
mropert.github.io
ยท
1d
ยท
Discuss:
r/cpp
๐งต
Lightweight Threads
Exposing More
Parallelism
Is the Hidden Reason Why Some Vectorized Loops Are Faster - Not
Vectorization
per se
johnnysswlab.com
ยท
2d
ยท
Discuss:
Hacker News
๐
SIMD Programming
Optimise
AI
mason.bearblog.dev
ยท
1h
๐ฌ
Prompt Engineering
CCCL
: Node-Spanning GPU
Collectives
with CXL Memory Pooling
arxiv.org
ยท
2d
๐
CXL
Scaling ML Inference on Databricks: Liquid or
Partitioned
?
Salted
or Not?
towardsdatascience.com
ยท
22h
๐
Vectorized Query Execution
Simulating the IBM 360/50
mainframe
from its
microcode
righto.com
ยท
1d
๐
SIMD Programming
Running Machine Learning on
Microcontrollers
dev.to
ยท
1h
ยท
Discuss:
DEV
๐ค
TVM
GTX
1080
Ti
for Local LLM
ariya.io
ยท
11h
๐พ
HBM
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help