Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
CUDA
馃幃 CUDA
Specific
GPU programming, NVIDIA, CUDA kernels, GPU optimization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
43
posts in
7.2
ms
SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference
聽
馃捇
GPU Computing
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference
huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
聽
馃
LLMs
聽
Content type:
Code
github.com
路
6d
6 days ago
路
Hacker News
Actions for huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
DiffusionGemma: The Developer Guide- Google Developers Blog
聽
馃
LLMs
聽
Content type:
Blog
developers.googleblog.com
路
1d
1 day ago
路
r/LocalLLaMA
Actions for DiffusionGemma: The Developer Guide- Google Developers Blog
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help