Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🏠 Local LLM Deployment
Specific
Model Optimization, GPU Acceleration, Inference, Privacy
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
46674
posts in
24.0
ms
Unraveling
GPU Inference Costs for
Fine-tuned
Open-source Models V/S Closed Platforms
💰
Inference Cost
mlops.community
·
1d
Small Model
Forensics
⚡
LLM Optimization
blog.0xmmo.co
·
1h
·
Hacker News
https://
www.together.ai/blog/accelerate-inference-large-scale-workloads
⚡
LLM Optimization
together.ai
·
23h
Tracing tokens through Llama 3.1
8B
inference on
H100s
🤖
LLM
krithik.xyz
·
5d
·
Hacker News
Gemma
4: The Next
Frontier
in Open-Source AI for Developers
🤖
GenAI
dev.to
·
6h
·
DEV
Show HN:
Sipsa
Inference –
lossless
serving at 50% off
⚡
LLM Optimization
sipsalabs.com
·
2d
·
Hacker News
Understanding
KV
Cache in LLMs and How It
Affects
Inference
⚡
LLM Optimization
pub.towardsai.net
·
5d
Building
Blocks
for Foundation Model Training and
Inference
on AWS
🚀
Model Releases
huggingface.co
·
2d
What
Inference-Platform
Benchmark
Posts Leave Out
📊
AI Performance Profiling
dev.to
·
19h
·
DEV
In-Kernel Broadcast Optimization: Co-Designing
Kernels
for
RecSys
Inference
⚡
LLM Optimization
pytorch.org
·
1d
·
Hacker News
Tiny company steals AMD's thunder and challenges Nvidia with old-tech PCIe AI accelerator that runs
700B
LLMs locally,
sipping
just 240W thanks to decade-old DD...
📊
AI Performance Profiling
techradar.com
·
3d
·
Hacker News
Company behind
GLiNER
model released open source model for running LLM
guardrail
🤖
LLM
pioneer.ai
·
1d
·
Hacker News
Your GPU Is
Lying
to You About Its
Capacity
🤖
AI News
hackernoon.com
·
3d
The
Inference
Shift
💰
Inference Cost
stratechery.com
·
2d
·
Hacker News
Local LLMs Vs Cloud AI
APIs
: Which One Should Developers Use For Real Projects?
🏆
LLM Benchmarking
dev.to
·
2d
·
DEV
https://
www.together.ai/blog/flexgen-high-throughput-generative-inference-of-large-language-models-with-a-single-gpu
🤖
GenAI
together.ai
·
23h
Building a Fully
Offline
AI Coding Assistant with
Gemma
4
💻
Codex
dev.to
·
6d
·
DEV
OpenModels
: Explore LLM Models and Inference
Providers
🔌
MCP
dev.to
·
2d
·
DEV
Physics‑based
adaptation
slashes
edge LLM energy
⚡
LLM Optimization
dev.to
·
6d
·
DEV
Exploring
LLMs Speed
Benchmarks
⚡
LLM Optimization
mlops.community
·
1d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help