Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🏠 Local LLM Deployment
Specific
Model Optimization, GPU Acceleration, Inference, Privacy
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
200049
posts in
33.6
ms
Ada-MK
: Adaptive
MegaKernel
Optimization via Automated DAG-based Search for LLM Inference
⚡
LLM Optimization
arxiv.org
·
1d
Gemma
3 Local LLM
Deployment
: Google's AI for Developers (2026)
⚡
LLM Optimization
sitepoint.com
·
4d
Unraveling
GPU Inference Costs for
Fine-tuned
Open-source Models V/S Closed Platforms
💰
Inference Cost
mlops.community
·
1d
Long-Context
Inference
at Scale: The Hidden Infrastructure Cost
💰
Inference Cost
digitalocean.com
·
6d
https://
www.together.ai/blog/accelerate-inference-large-scale-workloads
⚡
LLM Optimization
together.ai
·
1d
Cacheon
Launching Open Inference Arena for LLM
Serving
Optimization
🤖
LLM
prweb.com
·
2d
Local models, inference
incantations
and pi
extensions
🏠
Self-hosted AI
gurupanguji.com
·
5d
Show HN:
Sipsa
Inference –
lossless
serving at 50% off
⚡
LLM Optimization
sipsalabs.com
·
2d
·
Hacker News
Tracing tokens through Llama 3.1
8B
inference on
H100s
🤖
LLM
krithik.xyz
·
5d
·
Hacker News
Building
Blocks
for Foundation Model Training and
Inference
on AWS
🚀
Model Releases
huggingface.co
·
2d
Tiny company steals AMD's thunder and challenges Nvidia with old-tech PCIe AI accelerator that runs
700B
LLMs locally,
sipping
just 240W thanks to decade-old DD...
📊
AI Performance Profiling
techradar.com
·
3d
·
Hacker News
Enabling
Performant
and Flexible Model-Internal
Observability
for LLM Inference
⚡
LLM Optimization
arxiv.org
·
1d
DigitalOcean
Inference Mode
Comparison
for Your Each Use Case
💰
Inference Cost
digitalocean.com
·
6d
https://
www.together.ai/blog/flexgen-high-throughput-generative-inference-of-large-language-models-with-a-single-gpu
🤖
GenAI
together.ai
·
1d
Exploring
LLMs Speed
Benchmarks
⚡
LLM Optimization
mlops.community
·
1d
Efficient LLM-based Advertising via Model
Compression
and
Parallel
Verification
⚡
LLM Optimization
arxiv.org
·
1d
https://
www.together.ai/blog/flash-decoding-for-long-context-inference
⚡
LLM Optimization
together.ai
·
1d
Concepts
for
Reliability
of LLMs in Production
🤖
LLM
mlops.community
·
1d
Reformulating
KV Cache
Eviction
Problem for Long-Context LLM Inference
⚡
LLM Optimization
arxiv.org
·
3d
https://
www.together.ai/blog/medusa
⚡
LLM Optimization
together.ai
·
1d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help