Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🎮 GPU Programming
CUDA, OpenCL, Vulkan, Compute Shaders, Parallel Processing
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
199648
posts in
38.2
ms
Towards Compute-Aware In-Switch Computing for LLMs
Tensor-Parallelism
on Multi-GPU Systems
🏗️
LLM Infrastructure
arxiv.org
·
6d
·
Hacker News
Atlas: An LLM inference engine written from
scratch
in Rust and
CUDA
🏗️
LLM Infrastructure
atlasinference.io
·
1d
·
Hacker News
CUDA
Proves
Nvidia Is a Software Company
🟩
Nvidia
hardware.slashdot.org
·
2d
The
cuda-oxide
Book
🎮
SIMT Execution
nvlabs.github.io
·
6d
·
Lobsters
,
Hacker News
,
Hacker News
,
r/rust
Show HN: I built a small
repertoir
of different
computing
systems
🖥️
Hardware Architecture
computers.tugdual.fr
·
1d
·
Hacker News
Distributed Training in
MLOps
Break GPU Vendor Lock-In: Distributed
MLOps
across mixed AMD and NVIDIA
Clusters
🟩
Nvidia
mlops.community
·
1d
NVIDIA Releases
CUDA-Oxide
0.1 For Experimental
Rust-To-CUDA
Compiler
🟩
Nvidia
lxer.com
·
3d
Efficient and Portable Support for
Overdecomposition
on Distributed Memory
GPGPU
Platforms
🎮
SIMT Execution
arxiv.org
·
14h
CUDA
Proves
Nvidia Is a Software Company
🟩
Nvidia
wired.com
·
3d
·
Hacker News
,
r/programming
CUDAHercules
:
Benchmarking
Hardware-Aware Expert-level CUDA Optimization for LLMs
🏗️
LLM Infrastructure
arxiv.org
·
2d
TLX: Hardware-Native,
Evolvable
MIMW
GPU Compiler for Large-scale Production Environments
🎮
GPU Microarchitecture
arxiv.org
·
2d
·
Hacker News
CUDABeaver
: Benchmarking LLM-Based Automated CUDA
Debugging
🏗️
LLM Infrastructure
arxiv.org
·
2d
Stencil
Computations
on Cerebras Wafer-Scale Engine
⚡
WebGPU Compute
arxiv.org
·
3d
ShardTensor
: Domain
Parallelism
for Scientific Machine Learning
🏗️
LLM Infrastructure
arxiv.org
·
1d
Data Path Fusion in GPU for
Analytical
Query
Processing
💎
Materialized Views
arxiv.org
·
2d
EULER-ADAS: Energy-Efficient & SIMD-Unified
Logarithmic-Posit
Engine for Precision-Reconfigurable Approximate ADAS Acceleration
🖥️
Hardware Architecture
arxiv.org
·
3d
CCD-Level
and Load-Aware Thread Orchestration for In-Memory Vector
ANNS
on Multi-Core CPUs
⚡
WebGPU Compute
arxiv.org
·
2d
DICE: Enabling Efficient General-Purpose
SIMT
Execution with
Statically
Scheduled Coarse-Grained Reconfigurable Arrays
🎮
SIMT Execution
arxiv.org
·
6d
Unleashing Scalable Context
Parallelism
for Foundation Models Pre-Training via
FCP
🏗️
LLM Infrastructure
arxiv.org
·
2d
TransDot
: An Area-efficient
Reconfigurable
Floating-Point Unit for Trans-Precision Dot-Product Accumulation for FPGA AI Engines
🎯
Emulation Accuracy
arxiv.org
·
3d
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help