Skip to main content
Scour
Discover
Docs
Login
Sign Up
Discover
About
Docs
Changelog
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Inference
๐ง LLM Inference
Specific
Quantization, Attention Mechanisms, Batch Processing, KV Caching
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
290
posts in
50.1
ms
๐ค
AI
lemmy.ml
ยท
5d
5 days ago
Alpaca doesn't work with
Ollama
Cloud
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Alpaca doesn't work with Ollama Cloud
๐๏ธ
LLM Infrastructure
GitHub
ยท
2d
2 days ago
Pipeline-parallel
LLM
inference
across GPUs on separate machines
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Pipeline-parallel LLM inference across GPUs on separate machines
๐๏ธ
LLM Infrastructure
medium.com
ยท
4d
4 days ago
The Transformer Pipeline: A Complete Mathematical and Visual Guide
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Transformer Pipeline: A Complete Mathematical and Visual Guide
๐
Open Source AI
GitHub
ยท
2d
2 days ago
datalab-to/lift: Extract structured data from documents quickly and accurately.
Covered byย
habr.com
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for datalab-to/lift: Extract structured data from documents quickly and accurately.
๐๏ธ
LLM Infrastructure
Anyscale blog posts
ยท
6d
6 days ago
67% Cost Savings with PD Disaggregation Using Ray and
vLLM
on AMD MI325X
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 67% Cost Savings with PD Disaggregation Using Ray and vLLM on AMD MI325X
๐ฑ
Edge AI Optimization
arxiv.org
ยท
3d
3 days ago
Quantization
as a Malicious Task: Removing
Quantization-Conditioned
Backdoors via Task Arithmetic
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Quantization as a Malicious Task: Removing Quantization-Conditioned Backdoors via Task Arithmetic
๐๏ธ
LLM Infrastructure
arxiv.org
ยท
6d
6 days ago
SwiftCache: Efficient
LLM
Serving for Multi-turn Conversations with Heterogeneous
KV
Cache
Sharing
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for SwiftCache: Efficient LLM Serving for Multi-turn Conversations with Heterogeneous KV Cache Sharing
๐๏ธ
LLM Infrastructure
arxiv.org
ยท
5d
5 days ago
AnchorKV: Safety-Aware
KV
Cache
Compression via Soft Penalty with a Refusal Anchor
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor
๐ค
AI
GitHub
ยท
5d
5 days ago
fix(
ollama
): preserve configured API during discovery (#93729)
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for fix(ollama): preserve configured API during discovery (#93729)
๐ค
AI
GitHub
ยท
6d
6 days ago
[Bug]:
ollama-cloud
runtime fails DNS lookup for
ai.ollama.com
, whileโฆ
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for [Bug]: ollama-cloud runtime fails DNS lookup for ai.ollama.com, whileโฆ
๐๏ธ
LLM Infrastructure
arxiv.org
ยท
6d
6 days ago
ReQAT: Achieving Full-Precision Reasoning Accuracy with 4-bit Floating-Point
Quantization-Aware
Training
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for ReQAT: Achieving Full-Precision Reasoning Accuracy with 4-bit Floating-Point Quantization-Aware Training
๐ค
AI
GitHub
ยท
6d
6 days ago
[Bug]:
ollama-cloud
runtime fails DNS lookup for
ai.ollama.com
, whileโฆ
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for [Bug]: ollama-cloud runtime fails DNS lookup for ai.ollama.com, whileโฆ
๐๏ธ
LLM Infrastructure
arxiv.org
ยท
6d
6 days ago
Unified
KV
Pooling to Accelerate Long-Context
LLM
Serving
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Unified KV Pooling to Accelerate Long-Context LLM Serving
๐ค
AI
GitHub
ยท
3d
3 days ago
How I Architected a Multi-Provider Fallback for Local RAG
Discussed on
DEV
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for How I Architected a Multi-Provider Fallback for Local RAG
๐ง
Inference Serving
arxiv.org
ยท
6d
6 days ago
PolyKV: Heterogeneous Retention and Allocation for
KV
Cache
Compression
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for PolyKV: Heterogeneous Retention and Allocation for KV Cache Compression
๐ค
AI
GitHub
ยท
6d
6 days ago
Keep key-free web search providers opt-in (#93616)
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Keep key-free web search providers opt-in (#93616)
๐๏ธ
LLM Infrastructure
arxiv.org
ยท
3d
3 days ago
SAC: Disaggregated
KV
Cache
System for Sparse
Attention
LLMs with CXL
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for SAC: Disaggregated KV Cache System for Sparse Attention LLMs with CXL
๐ค
AI
GitHub
ยท
3d
3 days ago
Building a Safe, Local AI Coding Agent with Node.js
Discussed on
DEV
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Building a Safe, Local AI Coding Agent with Node.js
๐๏ธ
LLM Infrastructure
arxiv.org
ยท
6d
6 days ago
SMEPilot: Characterizing and Optimizing
LLM
Inference
with Scalable Matrix Extensions
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for SMEPilot: Characterizing and Optimizing LLM Inference with Scalable Matrix Extensions
๐๏ธ
LLM Infrastructure
GitHub
ยท
3d
3 days ago
Profile(v2.1.4) physics-aware optimizer for
vLLM
(31โ470 tok/s on A100)
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Profile(v2.1.4) physics-aware optimizer for vLLM (31โ470 tok/s on A100)
« Page 1
ยท
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report