Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Vllm
⚡ Vllm
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
112
posts in
6.7
ms
The economics of speculative decoding
🤖
LLM Inference
Content type:
Blog
fergusfinn.com
·
3d
3 days ago
·
Hacker News
Actions for The economics of speculative decoding
OpenCV 5 release - New DNN
engine
with enhanced ONNX and
LLM/VLM
support, Intel, Arm, and RISC-V hardware optimizations - CNX Software
🤖
LLM
Content type:
News
cnx-software.com
·
1d
1 day ago
Actions for OpenCV 5 release - New DNN engine with enhanced ONNX and LLM/VLM support, Intel, Arm, and RISC-V hardware optimizations - CNX Software
heterodoxin/graphkv: Graph-guided
KV
cache
compression for memory-efficient
LLM
inference.
🤖
LLM Inference
Content type:
Code
github.com
·
4d
4 days ago
·
r/LocalLLaMA
Actions for heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.
#065 - Claude writes 80% of Anthropic's own code, Cloudflare buys Vite, ChatGPT ships Dreaming memory
🤖
LLM Inference
indiehacker.news
·
6d
6 days ago
Actions for #065 - Claude writes 80% of Anthropic's own code, Cloudflare buys Vite, ChatGPT ships Dreaming memory
SpectrumKV: Per-Token Mixed-Precision
KV
Cache
Transfer for Prefill-Decode Disaggregated
LLM
Serving
🤖
LLM
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving
1-bit and 1.58 bit
LLM
Benchmarking on Jetson Orin Nano Super | Bonsai LM
🤖
LLM
smolhub.com
·
2d
2 days ago
·
r/LocalLLaMA
Actions for 1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
Issue #390 - The ML
Engineer
🤖
🤖
AI
Content type:
News
Content type:
Blog
machinelearning.substack.com
·
3d
3 days ago
·
Substack
Actions for Issue #390 - The ML Engineer 🤖
End-to-End Context Compression at Scale
🤖
LLM Inference
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for End-to-End Context Compression at Scale
google/gemma-4-12B-it-qat-q4_0-gguf
🤖
LLM
huggingface.co
·
5d
5 days ago
Actions for google/gemma-4-12B-it-qat-q4_0-gguf
FOD#155:
Continual
Learning in LLMs: Why AI Models Need Sleep
🤖
LLM
turingpost.com
·
2d
2 days ago
Actions for FOD#155: Continual Learning in LLMs: Why AI Models Need Sleep
KJLdefeated/RL.cu: RLVR training for
LLM
in CUDA/C++
🤖
LLM
Content type:
Code
github.com
·
3d
3 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
Gated DeltaNet, From First Principles
🤖
LLM Inference
Content type:
Blog
sankalp.bearblog.dev
·
1d
1 day ago
Actions for Gated DeltaNet, From First Principles
Build a Medical Report Analyzer on Dedicated Inference with Python
🤖
LLM
digitalocean.com
·
6d
6 days ago
Actions for Build a Medical Report Analyzer on Dedicated Inference with Python
FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion
🤖
LLM Inference
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion
How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies
🤖
LLM Inference
Content type:
Blog
blogs.nvidia.com
·
2d
2 days ago
Actions for How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies
The Memory Problem is Solved: How Google’s Memory
Caching
Makes RNNs Smart Again
🤖
LLM
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for The Memory Problem is Solved: How Google’s Memory Caching Makes RNNs Smart Again
See, Act, Correct: three levers for working with a code agent
🤖
AI
Content type:
Blog
blog.owulveryck.info
·
6d
6 days ago
·
Hacker News
,
Hacker News
Actions for See, Act, Correct: three levers for working with a code agent
How to cut the cost of long AI agent threads (without making the agent dumber)
🤖
LLM Inference
Content type:
Blog
viktor.com
·
2d
2 days ago
·
Hacker News
Actions for How to cut the cost of long AI agent threads (without making the agent dumber)
Benchmarking dots.tts on Strix Halo
🤖
LLM
sleepingrobots.com
·
3d
3 days ago
Actions for Benchmarking dots.tts on Strix Halo
Still: Amortized
KV
Cache
Compaction in a Single Forward Pass
🤖
LLM Inference
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Still: Amortized KV Cache Compaction in a Single Forward Pass
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help