Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
vLLM
⚡ vLLM
Specific
vLLM inference, PagedAttention, LLM serving, throughput inference
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
74
posts in
5.9
ms
[AINews] FrontierCode: Benchmarking for Code Quality over Slop
🧠
KV Cache
Content type:
News
latent.space
·
1d
1 day ago
Actions for [AINews] FrontierCode: Benchmarking for Code Quality over Slop
Florian Brand, Prime Intellect research
engineer
, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac
LLM
🧠
KV Cache
Content type:
News
digg.com
·
3d
3 days ago
·
Hacker News
Actions for Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM
KJLdefeated/RL.cu: RLVR training for
LLM
in CUDA/C++
🧠
KV Cache
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
Breaking the Ice: Analyzing Cold Start Latency in
vLLM
⚡
LLM Inference
Content type:
Academic
arxiv.org
·
3d
3 days ago
·
Hacker News
Actions for Breaking the Ice: Analyzing Cold Start Latency in vLLM
[AINews] not much happened today
⚡
LLM Inference
Content type:
News
latent.space
·
5d
5 days ago
Actions for [AINews] not much happened today
Show HN: Zerostack, an open coding agent optimized for memory footprint
🧠
KV Cache
gi-dellav.github.io
·
6d
6 days ago
·
Hacker News
Actions for Show HN: Zerostack, an open coding agent optimized for memory footprint
fix(gateway): fail closed for unknown model auth · openclaw/openclaw@85343ea
⚡
LLM Inference
Content type:
Code
github.com
·
5d
5 days ago
Actions for fix(gateway): fail closed for unknown model auth · openclaw/openclaw@85343ea
Using local LLMs for agentic coding
🧠
KV Cache
Content type:
Blog
blog.alexewerlof.com
·
6d
6 days ago
Actions for Using local LLMs for agentic coding
Five labs, five minds: building a multi-model finance drama on small models
⚡
LLM Inference
Content type:
Blog
huggingface.co
·
4d
4 days ago
Actions for Five labs, five minds: building a multi-model finance drama on small models
RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at
batch-1
LLM
decode.
🧠
KV Cache
Content type:
Code
github.com
·
2d
2 days ago
·
Hacker News
Actions for RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.
A drop-in replacement chat template for google/gemma-4-31B-it tuned for open-source agentic coding harnesses.
⚡
LLM Inference
gist.github.com
·
5d
5 days ago
·
r/LocalLLaMA
Actions for A drop-in replacement chat template for google/gemma-4-31B-it tuned for open-source agentic coding harnesses.
Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line
Blocking
in Serial
LLM
Backends
🧠
KV Cache
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends
mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model
vLLM
and sglang backends with zero external dependencies
🧠
KV Cache
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies
Google Gemma4 12B released
⚡
LLM Inference
Content type:
Blog
medium.com
·
6d
6 days ago
Actions for Google Gemma4 12B released
not much happened today | AINews
🧠
KV Cache
news.smol.ai
·
3d
3 days ago
Actions for not much happened today | AINews
google/gemma-4-12B-it-qat-q4_0-gguf
🧠
KV Cache
huggingface.co
·
5d
5 days ago
Actions for google/gemma-4-12B-it-qat-q4_0-gguf
Does anyone know what PCIe mode was used for these benchmarks?
⚡
LLM Inference
Content type:
Code
github.com
·
4d
4 days ago
·
r/LocalLLaMA
Actions for Does anyone know what PCIe mode was used for these benchmarks?
YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition
🧠
KV Cache
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition
Introducing Granite Libraries and Project Granite Switch
🧠
KV Cache
Content type:
Blog
research.ibm.com
·
6d
6 days ago
·
Hacker News
Actions for Introducing Granite Libraries and Project Granite Switch
DiffusionGemma: The Developer Guide
🧠
KV Cache
Content type:
Blog
developers.googleblog.com
·
1d
1 day ago
Actions for DiffusionGemma: The Developer Guide
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help