Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
vLLM
⚡ vLLM
Specific
vLLM inference, PagedAttention, LLM serving, throughput inference
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
75
posts in
7.4
ms
Alignment Collapse Under
KV
Cache
Quantization: Diagnosis and Mitigation
⚡
LLM Inference
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation
Nvidia DGX Spark GB10 – AI Models and Guide with
vLLM
and Autonomous Script
⚡
LLM Inference
Content type:
Code
github.com
·
5d
5 days ago
·
Hacker News
Actions for Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script
Less-relevant results
Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
⚡
LLM Inference
Content type:
Blog
towardsai.net
·
2d
2 days ago
Actions for Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
GGUF vs GPTQ vs AWQ: The Plain-English Guide to
LLM
Quantization (and Which One to Pick)
⚡
LLM Inference
vettedconsumer.com
·
4d
4 days ago
·
Hacker News
Actions for GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)
#068 - Apple runs Siri on Google's Gemini, OpenAI files a secret IPO at $852B, Xiaomi clocks 1,000 tps
⚡
LLM Inference
indiehacker.news
·
2d
2 days ago
Actions for #068 - Apple runs Siri on Google's Gemini, OpenAI files a secret IPO at $852B, Xiaomi clocks 1,000 tps
Latest technical articles & videos.
🧠
KV Cache
certdepot.net
·
4d
4 days ago
Actions for Latest technical articles & videos.
[eCHO News] Episode #104: mTLS for Cilium. Lisp for eBPF
📁
Filesystems
isovalent-9197153.hs-sites.com
·
5d
5 days ago
Actions for [eCHO News] Episode #104: mTLS for Cilium. Lisp for eBPF
Google's new open model DiffusionGemma generates text from noise instead of word by word
⚡
LLM Inference
the-decoder.com
·
9h
9 hours ago
Actions for Google's new open model DiffusionGemma generates text from noise instead of word by word
Integrate OpenShift AI and PG Airman MCP Server
🗄️
Databases
developers.redhat.com
·
2d
2 days ago
Actions for Integrate OpenShift AI and PG Airman MCP Server
huawei-csl/KVarN: KVarN is a native
vLLM
KV-cache
quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
⚡
LLM Inference
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
⚡
LLM Inference
Content type:
News
Content type:
Blog
blog.google
·
5d
5 days ago
·
Hacker News
Actions for Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
Build a Medical Report Analyzer on Dedicated
Inference
with Python
🧠
KV Cache
digitalocean.com
·
6d
6 days ago
Actions for Build a Medical Report Analyzer on Dedicated Inference with Python
OpenPCC: Open and Confidential
LLM
Serving
on Commodity TEEs
🧠
KV Cache
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for OpenPCC: Open and Confidential LLM Serving on Commodity TEEs
The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
⚡
LLM Inference
devops.com
·
5d
5 days ago
Actions for The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
#065 - Claude writes 80% of Anthropic's own code, Cloudflare buys Vite, ChatGPT ships Dreaming memory
⚡
LLM Inference
indiehacker.news
·
6d
6 days ago
Actions for #065 - Claude writes 80% of Anthropic's own code, Cloudflare buys Vite, ChatGPT ships Dreaming memory
not much happened today | AINews
🧠
KV Cache
news.smol.ai
·
2d
2 days ago
Actions for not much happened today | AINews
Build a local voice agent with Red Hat OpenShift AI
⚡
LLM Inference
developers.redhat.com
·
3d
3 days ago
Actions for Build a local voice agent with Red Hat OpenShift AI
APEX4: Efficient Pure W4A4
LLM
Inference
via Intra-SM Compute Rebalancing
⚡
LLM Inference
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
heterodoxin/graphkv: Graph-guided
KV
cache
compression for memory-efficient
LLM
inference.
🧠
KV Cache
Content type:
Code
github.com
·
4d
4 days ago
·
r/LocalLLaMA
Actions for heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.
OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.
⚡
LLM Inference
Content type:
Blog
huggingface.co
·
3d
3 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help