Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLMs
💬 LLMs
Specific
large language models, GPT, foundation models, inference
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
622
posts in
7.2
ms
Running
LLM
Inference
on Kubernetes: What It Actually Takes
☁️
Cloud Infrastructure
Content type:
Blog
fairwinds.com
·
6d
6 days ago
Actions for Running LLM Inference on Kubernetes: What It Actually Takes
massimo92/spark: CLI tool for serving
LLMs
with
vLLM
on NVIDIA DGX Spark. One file, zero friction.
🖥️
Hypervisors
Content type:
Code
github.com
·
48m
48 minutes ago
·
Hacker News
Actions for massimo92/spark: CLI tool for serving LLMs with vLLM on NVIDIA DGX Spark. One file, zero friction.
The Neutral Mask: How
RLHF
Provides Shallow Alignment while Leaving Partisan Structure Intact in a
Large
Language
Model
🤖
AI/ML
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Ollama
0.30 GPU Boost: Faster local Qwen
inference
on NVIDIA
🚀
MLOps
everylocalai.com
·
23h
23 hours ago
·
DEV
Actions for Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA
Comprehensive evaluation of
LLM
capabilities for interpretation and analysis of genome-scale metabolic
models
in metabolic
engineering
🚀
MLOps
Content type:
Academic
biorxiv.org
·
2d
2 days ago
Actions for Comprehensive evaluation of LLM capabilities for interpretation and analysis of genome-scale metabolic models in metabolic engineering
Why Your
LLM
Gets Dumber With More
Context
🤖
AI/ML
siliconopera.com
·
5h
5 hours ago
Actions for Why Your LLM Gets Dumber With More Context
Intelligent
inference
scheduling with
llm-d
on Red Hat AI
☁️
Cloud Infrastructure
developers.redhat.com
·
20h
20 hours ago
Actions for Intelligent inference scheduling with llm-d on Red Hat AI
Claude vs
GPT-4
: Which AI API Is Better for Developers? (2026)
🧠
Agentic AI
kalyna.pro
·
6d
6 days ago
·
DEV
Actions for Claude vs GPT-4: Which AI API Is Better for Developers? (2026)
Inferoa
AI harness claimed 90% cache savings. We ran it and measured 97.8%
📡
Observability
zozo123.github.io
·
1d
1 day ago
·
Hacker News
Actions for Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%
What
Ollama
Reveals About Local AI, Agents, and Open
Models
🔍
AI Observability
Content type:
Blog
odsc.medium.com
·
21h
21 hours ago
Actions for What Ollama Reveals About Local AI, Agents, and Open Models
Prompt
Caching Explained: The AI Concept That Can Save Millions of Tokens
🔍
AI Observability
Content type:
Blog
sweta-nit.medium.com
·
11h
11 hours ago
Actions for Prompt Caching Explained: The AI Concept That Can Save Millions of Tokens
My Notes on the Progression from
Context
to
Prompt
to Harness
engineering
in making GPT LLMs Useful: (TUESDAY) MAMLMs
🧠
Agentic AI
Content type:
News
Content type:
Blog
braddelong.substack.com
·
2d
2 days ago
·
Substack
Actions for My Notes on the Progression from Context to Prompt to Harness engineering in making GPT LLMs Useful: (TUESDAY) MAMLMs
Introducing the Third Generation of Apple’s
Foundation
Models
🤖
AI/ML
machinelearning.apple.com
·
3d
3 days ago
·
Hacker News
,
r/apple
Actions for Introducing the Third Generation of Apple’s Foundation Models
AI
context
windows
: Why
context
quality beats
context
size
🚀
MLOps
Content type:
Blog
redis.io
·
22h
22 hours ago
Actions for AI context windows: Why context quality beats context size
DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
🤖
AI/ML
Content type:
News
newsletter.semianalysis.com
·
2d
2 days ago
·
Hacker News
Actions for DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
Show HN: In-browser real
LLM
token counter and cost estimation
🚀
MLOps
holaclaw.ai
·
4h
4 hours ago
·
Hacker News
Actions for Show HN: In-browser real LLM token counter and cost estimation
WWDC 2026:
Foundation
Models
(& Anarlog)
🏗️
Platform Engineering
skushagra.com
·
2d
2 days ago
Actions for WWDC 2026: Foundation Models (& Anarlog)
Improved performance and
model
support with GGUF
🤖
AI/ML
Content type:
Blog
ollama.com
·
6d
6 days ago
Actions for Improved performance and model support with GGUF
CommBench: Can
LLMs
Write Correct and Efficient GPU Communication Code?
🚀
MLOps
uccl-project.github.io
·
13h
13 hours ago
·
Hacker News
Actions for CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
Build
RAG-powered
AI solutions at the edge with AWS Local Zones and Outposts
☁️
Cloud Infrastructure
Content type:
Blog
aws.amazon.com
·
3h
3 hours ago
Actions for Build RAG-powered AI solutions at the edge with AWS Local Zones and Outposts
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help