Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
CUDA Memory Management
🧠 CUDA Memory Management
Specific
Memory Pool, Allocation Strategy, Fragmentation, cudaMalloc
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
21
posts in
8.0
ms
frankkk96/FlashQwen: From-scratch C++/
CUDA
inference engine for Qwen3-8B, with zero external libraries
📊
CUDA Graphs
Content type:
Code
github.com
·
22h
22 hours ago
Actions for frankkk96/FlashQwen: From-scratch C++/CUDA inference engine for Qwen3-8B, with zero external libraries
Training Cycle Halved: LoongForge End-to-End Optimization for GR00T N1.6 Delivers 2.3× Throughput
📊
CUDA Graphs
baidu-baige.github.io
·
1d
1 day ago
·
Hacker News
Actions for Training Cycle Halved: LoongForge End-to-End Optimization for GR00T N1.6 Delivers 2.3× Throughput
Less-relevant results
RATrain: A Resource-Aware Training Runtime for Large Language Models on Bandwidth-Constrained Heterogeneous Supercomputing Platforms
🌐
Distributed Computing
Content type:
Academic
arxiv.org
·
4d
4 days ago
Actions for RATrain: A Resource-Aware Training Runtime for Large Language Models on Bandwidth-Constrained Heterogeneous Supercomputing Platforms
Making FlashAttention-4 faster for inference
🎯
Tensor Cores
Content type:
Blog
modal.com
·
3d
3 days ago
·
Hacker News
,
Hacker News
Actions for Making FlashAttention-4 faster for inference
Bring-up and testing of systems with CXL Type 3
memory
expanders
⏱️
CUDA Events
edn.com
·
2d
2 days ago
Actions for Bring-up and testing of systems with CXL Type 3 memory expanders
Linux Kernel 7.1 Released with Rewritten NTFS Support
⚙️
Systems Programming
Content type:
Release
linuxiac.com
·
6h
6 hours ago
Actions for Linux Kernel 7.1 Released with Rewritten NTFS Support
massimo92/spark: CLI tool for serving LLMs with vLLM on NVIDIA DGX Spark. One file, zero friction.
🛠
Ml-eng
Content type:
Code
github.com
·
3d
3 days ago
·
Hacker News
Actions for massimo92/spark: CLI tool for serving LLMs with vLLM on NVIDIA DGX Spark. One file, zero friction.
Show HN: Flashback Booth, A tactile retro photo booth in the browser
🖥️
Terminal Multiplexers
Content type:
Discussion
Content type:
Tutorial
flashbackbooth.me
·
1d
1 day ago
·
Hacker News
Actions for Show HN: Flashback Booth, A tactile retro photo booth in the browser
The Parallel Revolution: A Comprehensive Guide to
GPU
Computing
🔥
PyTorch
Content type:
Blog
fitservers.com
·
6d
6 days ago
Actions for The Parallel Revolution: A Comprehensive Guide to GPU Computing
Mojo Nightly
📈
Occupancy Optimization
Content type:
Blog
mojolang.org
·
3d
3 days ago
·
Hacker News
Actions for Mojo Nightly
Introducing Piper: A Programmable Distributed Training System
🌊
CUDA Streams
Content type:
Academic
Content type:
Blog
syfi.cs.washington.edu
·
4d
4 days ago
·
Hacker News
Actions for Introducing Piper: A Programmable Distributed Training System
Release ensu-v0.1.17 · ente-io/ente
🤖
Automation
Content type:
Code
github.com
·
3d
3 days ago
Actions for Release ensu-v0.1.17 · ente-io/ente
Local models in mid-2026: the engineering that closed the gap
👁️
Attention Optimization
coles.codes
·
3d
3 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for Local models in mid-2026: the engineering that closed the gap
Can't format my 2TB
📝
Neovim
vita.hacks.guide
·
4d
4 days ago
·
r/VitaPiracy
Actions for Can't format my 2TB
8th June – Threat Intelligence Report
⚙️
Systems Programming
malware.news
·
6d
6 days ago
Actions for 8th June – Threat Intelligence Report
sgl-project/sglang-omni: SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
📈
Occupancy Optimization
Content type:
Code
github.com
·
5d
5 days ago
·
Cited by 1 article
Actions for sgl-project/sglang-omni: SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
Coupling Complementary Simulations for Combined Performance and Energy Optimization
🌐
Distributed Computing
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Coupling Complementary Simulations for Combined Performance and Energy Optimization
Homebrew, Again
🔄
ONNX
Content type:
Blog
jerryz.bearblog.dev
·
1w
1 week ago
Actions for Homebrew, Again
NetX-lab/Frontier: Frontier: A Discrete-Event Simulator for Modern LLM Serving
🔥
PyTorch
Content type:
Code
github.com
·
3d
3 days ago
·
Hacker News
Actions for NetX-lab/Frontier: Frontier: A Discrete-Event Simulator for Modern LLM Serving
KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4
GPU
(gfx1201): TurboQuant KV cache +
HIP-graph-safe
Flash-Attention for llama.cpp, fully measured on real hardware.
👁️
Attention Optimization
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help