Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Inference Cost
💰 Inference Cost
GPU cost, inference pricing, cost per token, LLM economics
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
320
posts in
5.5
ms
Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
🧠
Inference Engineering
Content type:
Blog
towardsai.net
·
2d
2 days ago
Actions for Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
Optimal Post-Training
Quantization
Scales and Where to Find Them
🗜️
Quantization
Content type:
Academic
arxiv.org
·
16h
16 hours ago
Actions for Optimal Post-Training Quantization Scales and Where to Find Them
MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
🗜️
Quantization
Content type:
News
Content type:
Blog
kaitchup.substack.com
·
4d
4 days ago
·
r/LocalLLaMA
Actions for MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
Two Leaps to 1000
Tokens/s
on a 1T-Parameter Model: On
Inference
Systems, Execution Boundaries, and
Co-Design
🧠
Inference Engineering
Content type:
Blog
tilert.ai
·
2d
2 days ago
·
Hacker News
Actions for Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design
Domain-Specific Small Language Models (Manning)
⚙️
ML Compilers
i-programmer.info
·
5h
5 hours ago
Actions for Domain-Specific Small Language Models (Manning)
Apple rebuilt its on-device AI stack at WWDC 2026
🔢
GEMM Optimization
Content type:
Blog
ziraph.com
·
1d
1 day ago
·
Hacker News
Actions for Apple rebuilt its on-device AI stack at WWDC 2026
Where to Host Your Open-Source Model (Under 10B Parameters)
🧠
Inference Engineering
digitalocean.com
·
6d
6 days ago
Actions for Where to Host Your Open-Source Model (Under 10B Parameters)
Researchers Build Self-Replicating AI Worm That Operates Entirely on Local, Open-Weight Models
🐝
eBPF
thehackernews.com
·
1d
1 day ago
Actions for Researchers Build Self-Replicating AI Worm That Operates Entirely on Local, Open-Weight Models
Unsloth Gemma 4 QAT
🗜️
Quantization
unsloth.ai
·
5d
5 days ago
Actions for Unsloth Gemma 4 QAT
ASUS ExpertBook Ultra Flagship Business Laptop Debuts In SEA Markets, Featuring Sub-1kg Chassis & Intel Core Ultra X7 Processor
🧠
Inference Engineering
pokde.net
·
5h
5 hours ago
Actions for ASUS ExpertBook Ultra Flagship Business Laptop Debuts In SEA Markets, Featuring Sub-1kg Chassis & Intel Core Ultra X7 Processor
stable-diffusion.cpp/docs/quantization
_and_gguf.md at master · leejet/stable-diffusion.cpp
🗜️
Quantization
Content type:
Code
github.com
·
3d
3 days ago
·
r/StableDiffusion
Actions for stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp
Azure OpenAI Architecture: The Decisions That Actually Matter (Part 1)
☁️
Cloud Infrastructure
techcommunity.microsoft.com
·
2d
2 days ago
Actions for Azure OpenAI Architecture: The Decisions That Actually Matter (Part 1)
Local LLMs, Buy a
GPU
, and the Case for Cognitive Security
🎮
GPU Computing
briefing.forwardfuture.ai
·
6d
6 days ago
Actions for Local LLMs, Buy a GPU, and the Case for Cognitive Security
Ask HN: Is software engineering still a good career choice for new students?
🧠
Inference Engineering
Content type:
Discussion
news.ycombinator.com
·
21h
21 hours ago
·
Hacker News
Actions for Ask HN: Is software engineering still a good career choice for new students?
Why agentic AI needs an open
inference
stack
⚙️
MLOps
redhat.com
·
2d
2 days ago
Actions for Why agentic AI needs an open inference stack
LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector
Quantization
🧠
Inference Engineering
Content type:
Academic
arxiv.org
·
16h
16 hours ago
Actions for LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization
TFLite Edge Model
Quantizer
Snippet
🔢
FP8 Training
itsevilduck.gumroad.com
·
2d
2 days ago
·
DEV
Actions for TFLite Edge Model Quantizer Snippet
MLPerf and the rise of latency-aware
LLM
benchmarking
⏱️
Prefill Decoding
edn.com
·
5d
5 days ago
Actions for MLPerf and the rise of latency-aware LLM benchmarking
Re-quantizing
a local
LLM
14x faster by skipping the tensors that didn't change
🗜️
Quantization
Content type:
News
Content type:
Blog
andreaborio.substack.com
·
7h
7 hours ago
·
Substack
Actions for Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change
3x Faster Search: Parallel Test-Time
Scaling
with Instructed-Retriever-1
⏱️
Prefill Decoding
Content type:
Blog
databricks.com
·
6d
6 days ago
Actions for 3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help