Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
⚡ Speculative Decoding
LLM Inference, Token Generation, Draft Models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
7557
posts in
9.5
ms
Speculative
Decoding
in LLM Inference
ternarysearch.blogspot.com
·
5d
·
Discuss:
ternarysearch.blogspot.com
💬
LLMs
Make Every Draft Count: Hidden State based
Speculative
Decoding
arxiv.org
·
2d
⚡
Quantization
Multi-token prediction technique
triples
LLM inference speed without
auxiliary
draft models
infoworld.com
·
4d
💬
LLMs
Analyzing
ReLUfication
Limitations: Enhancing LLM
Sparsity
via Up Projection
hackernoon.com
·
1d
💬
LLMs
Vectorizing
the
Trie
: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators
arxiv.org
·
1d
💬
LLMs
Claude Skills and
Subagents
: Escaping the Prompt Engineering
Hamster
Wheel
towardsdatascience.com
·
1h
✍️
Prompt Engineering
"LLMs Out of
Context
"
lucek.ai
·
1d
·
Discuss:
Hacker News
💬
LLMs
Scaling ML Inference on Databricks: Liquid or
Partitioned
?
Salted
or Not?
towardsdatascience.com
·
3h
⚙️
MLOps
Asura
:
Looped
Language Models done better
neel04.github.io
·
2d
·
Discuss:
Hacker News
💬
LLMs
Unsloth
Dynamic 2.0
GGUFs
unsloth.ai
·
7h
·
Discuss:
Hacker News
,
r/LocalLLaMA
🎛️
Fine-tuning
DualPath
: Breaking the Storage
Bandwidth
Bottleneck in Agentic LLM Inference
mesuvash.github.io
·
14h
·
Discuss:
Hacker News
⚡
Hardware Acceleration
Understanding Large Language Models (LLMs)
insightsonindia.com
·
2d
💬
LLMs
Perplexity
launches high-performance
embedding
models
testingcatalog.com
·
22h
📈
Optimization
[
Python/Sage
]
Extended
Hidden Number Problem
leetarxiv.substack.com
·
1d
·
Discuss:
r/programming
🔐
Cryptography
Can LLM
Embeddings
Improve Time Series
Forecasting
? A Practical Feature Engineering Approach
machinelearningmastery.com
·
1d
💬
LLMs
A Positive Case for
Faithfulness
: LLM
Self-Explanations
Help Predict Model Behavior
lesswrong.com
·
1d
🎯
RLHF
The Lie algebra of XY-mixer
topologies
and warm starting
QAOA
for constrained optimization
nature.com
·
1d
⚛️
Quantum Computing
☕ AI battle
kill-the-newsletter.com
·
4h
🤖
AI
Show HN:
InferShrink
– Cut LLM API costs 10x with automatic model
routing
pypi.org
·
3d
·
Discuss:
Hacker News
⚙️
MLOps
Frikallo/parakeet.cpp
: Ultra fast and portable
Parakeet
implementation for on-device inference in C++ using Axiom with MPS+Unified Memory and Cuda support
github.com
·
1d
·
Discuss:
Hacker News
⚡
Hardware Acceleration
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help