Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🧠 LLM Inference
Quantization, Attention Mechanisms, Batch Processing, KV Caching
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
33092
posts in
14.8
ms
Accuracy
vs. Speed in Local LLMs: Finding Your
Sweet
Spot
grigio.org
·
17h
·
Discuss:
Hacker News
🏗️
LLM Infrastructure
Compress
the Easy, Explore the Hard: Difficulty-Aware Entropy
Regularization
for Efficient LLM Reasoning
arxiv.org
·
1d
📱
Edge AI Optimization
On language models and
intuition
aleksei.dev
·
8h
🔤
Tokenization
Fast
Autoscheduling
for Sparse ML
Frameworks
fredrikbk.com
·
2h
·
Discuss:
Hacker News
🕯️
Candle
Multi-token prediction technique
triples
LLM inference speed without
auxiliary
draft models
infoworld.com
·
4d
🏗️
LLM Infrastructure
InnerQ
: Hardware-aware Tuning-free Quantization of
KV
Cache for Large Language Models
arxiv.org
·
1d
🔬
RaBitQ
brendanhogan/base-model-agents
github.com
·
17h
🧮
SMT Solvers
A Positive Case for
Faithfulness
: LLM
Self-Explanations
Help Predict Model Behavior
lesswrong.com
·
2d
🏆
LLM Benchmarking
Asura
:
Looped
Language Models done better
neel04.github.io
·
2d
·
Discuss:
Hacker News
📦
Batch Embeddings
Context Window Optimization: Why
Ranking
, Not
Stuffing
, Is the Scaling Law for Agents
shaped.ai
·
2d
🧠
Agent Memory
Beyond Porting: How vLLM
Orchestrates
High-Performance Inference on AMD
ROCm
blog.vllm.ai
·
2d
🏗️
LLM Infrastructure
Physical echo state network based on the nonlinearity and dynamic response of
ambipolar
heterostructure
transistors
nature.com
·
6h
⚡
Hardware Acceleration
NVIDIA
Taught
LLMs to
Forget
pub.towardsai.net
·
3d
🏗️
LLM Infrastructure
Adaptive
drafter
model uses
downtime
to double LLM training speed
techxplore.com
·
2d
🏗️
LLM Infrastructure
PaddlePaddle/Paddle: PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『
飞桨
』
核心框架
,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
github.com
·
1h
🏗️
LLM Infrastructure
Finding the
subjective
truth -
Collecting
2M Votes for GenAI model evaluation
rapidata.ai
·
3h
🎖
Text Quality Models
An FPGA-based Accelerator Addressing Bottlenecks in GNN
Preprocessing
(
KAIST
et al.)
semiengineering.com
·
2d
⚡
Hardware Acceleration
How Large Language Models Learn
blog.bytebytego.com
·
5d
🏆
LLM Benchmarking
Perplexity
launches high-performance
embedding
models
testingcatalog.com
·
1d
🔮
pplx-embed-v1
MiniMax
M2 & Agent:
Ingenious
in Simplicity
minimax.io
·
1d
🏗️
LLM Infrastructure
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help