CUDA Memory Management

Feeds to Scour
SubscribedAll
Scoured 21 posts in 8.0 ms

frankkk96/FlashQwen: From-scratch C++/CUDA inference engine for Qwen3-8B, with zero external libraries

 📊CUDA Graphs  Content type: Code
github.com·

Training Cycle Halved: LoongForge End-to-End Optimization for GR00T N1.6 Delivers 2.3× Throughput

 📊CUDA Graphs
Less-relevant results

RATrain: A Resource-Aware Training Runtime for Large Language Models on Bandwidth-Constrained Heterogeneous Supercomputing Platforms

 🌐Distributed Computing  Content type: Academic
arxiv.org·

Making FlashAttention-4 faster for inference

 🎯Tensor Cores  Content type: Blog

Bring-up and testing of systems with CXL Type 3 memory expanders

 ⏱️CUDA Events
edn.com·

Linux Kernel 7.1 Released with Rewritten NTFS Support

 ⚙️Systems Programming  Content type: Release
linuxiac.com·

massimo92/spark: CLI tool for serving LLMs with vLLM on NVIDIA DGX Spark. One file, zero friction.

 🛠Ml-eng  Content type: Code
github.com··Hacker News

Show HN: Flashback Booth, A tactile retro photo booth in the browser

 🖥️Terminal Multiplexers  Content type: Discussion  Content type: Tutorial

The Parallel Revolution: A Comprehensive Guide to GPU Computing

 🔥PyTorch  Content type: Blog
fitservers.com·

Mojo Nightly

 📈Occupancy Optimization  Content type: Blog
mojolang.org··Hacker News

Introducing Piper: A Programmable Distributed Training System

 🌊CUDA Streams  Content type: Academic  Content type: Blog

Release ensu-v0.1.17 · ente-io/ente

 🤖Automation  Content type: Code
github.com·

Local models in mid-2026: the engineering that closed the gap

 👁️Attention Optimization

Can't format my 2TB

 📝Neovim

8th June – Threat Intelligence Report

 ⚙️Systems Programming
malware.news·

sgl-project/sglang-omni: SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

 📈Occupancy Optimization  Content type: Code

Coupling Complementary Simulations for Combined Performance and Energy Optimization

 🌐Distributed Computing  Content type: Academic
arxiv.org·

Homebrew, Again

 🔄ONNX  Content type: Blog
jerryz.bearblog.dev·

NetX-lab/Frontier: Frontier: A Discrete-Event Simulator for Modern LLM Serving

 🔥PyTorch  Content type: Code
github.com··Hacker News

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

 👁️Attention Optimization  Content type: Code
github.com··Hacker News

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help