Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLMs
🧠 LLMs
Specific
large language models, GPT, Claude, inference
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
939
posts in
14.2
ms
harshuljain13/llm-inference-at-scale
: A Practitioner handbook for production
llm
serving.
🤖
AI Engineering
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
,
r/LLM
Actions for harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
Intelligent
inference
scheduling with
llm-d
on Red Hat AI
📐
System Design
developers.redhat.com
·
1d
1 day ago
Actions for Intelligent inference scheduling with llm-d on Red Hat AI
General-purpose
large
language
models
outperform specialized clinical AI tools on medical benchmarks
🤖
AI Engineering
Content type:
Academic
nature.com
·
19h
19 hours ago
Actions for General-purpose large language models outperform specialized clinical AI tools on medical benchmarks
Multi-Bitwidth Quantization for
LLMs
Using Additive Codebooks
🔍
RAG
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for Multi-Bitwidth Quantization for LLMs Using Additive Codebooks
147th airhacks tv: Local
LLMs
, LightMetal, ZSmith Agents, AI Rails, Saving
Tokens
🖥️
Backend Development
Content type:
Blog
adambien.blog
·
2d
2 days ago
Actions for 147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens
Unsloth Minimax M3 GGUF
🤖
AI Engineering
huggingface.co
·
4h
4 hours ago
·
r/LocalLLaMA
Actions for Unsloth Minimax M3 GGUF
A system programmer’s guide to
LLM
inference
🔍
RAG
Content type:
Blog
blog.xiangpeng.systems
·
4d
4 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
Introducing
LLM
as a Judge: Scaling search relevance evaluation with AI
🔍
RAG
Content type:
Blog
opensearch.org
·
21h
21 hours ago
Actions for Introducing LLM as a Judge: Scaling search relevance evaluation with AI
Tokenization
Consulting in the USA: The Ultimate Guide to RWA Compliance
🖥️
Backend Development
ziuma.com
·
1d
1 day ago
Actions for Tokenization Consulting in the USA: The Ultimate Guide to RWA Compliance
How to Run an
LLM
Locally: Ultimate Guide to Local AI 2026
🤖
AI Engineering
Content type:
Blog
cswithsanjay.blogspot.com
·
16h
16 hours ago
Actions for How to Run an LLM Locally: Ultimate Guide to Local AI 2026
Implications of Continual Learning for
LLM
Agents: Introduction
🛡️
AI Safety
lesswrong.com
·
1h
1 hour ago
Actions for Implications of Continual Learning for LLM Agents: Introduction
What Are
Tokens
in
LLMs
?
🔍
RAG
Content type:
Blog
bearisland.dev
·
5d
5 days ago
·
Hacker News
Actions for What Are Tokens in LLMs?
Making a Vintage
LLM
from Scratch
🤖
AI Engineering
crlf.link
·
1d
1 day ago
·
Hacker News
Actions for Making a Vintage LLM from Scratch
WhatLLM.org: Compare
LLMs
by Benchmarks, Price & Speed
🤝
AI Agents
Content type:
Discussion
Content type:
Reference
whatllm.org
·
13h
13 hours ago
Actions for WhatLLM.org: Compare LLMs by Benchmarks, Price & Speed
Inferoa
AI harness claimed 90% cache savings. We ran it and measured 97.8%
🤖
AI Engineering
zozo123.github.io
·
2d
2 days ago
·
Hacker News
Actions for Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%
2x GH200 for
LLM
inference
, Part 2:
vLLM
, DeepSeek V4 Flash, and MTP
🤖
AI Engineering
Content type:
Blog
dnhkng.github.io
·
4d
4 days ago
Actions for 2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP
I ran local
LLMs
on my phone for a month, and now my desktop setup feels like overkill
🤖
AI Engineering
xda-developers.com
·
8h
8 hours ago
Actions for I ran local LLMs on my phone for a month, and now my desktop setup feels like overkill
Context
windows
in AI: why every
token
is a budget decision
🖥️
Backend Development
Content type:
Blog
redis.io
·
2d
2 days ago
Actions for Context windows in AI: why every token is a budget decision
How
LLMs
are Actually Trained
🔍
RAG
Content type:
News
Content type:
Blog
blog.algomaster.io
·
1d
1 day ago
Actions for How LLMs are Actually Trained
Friday Five — June 12, 2026
🤖
AI Engineering
redhat.com
·
19h
19 hours ago
Actions for Friday Five — June 12, 2026
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help