Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLMs
🤖 LLMs
Specific
large language models, GPT, Claude, AI models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
309
posts in
7.5
ms
harshuljain13/llm-inference-at-scale
: A Practitioner handbook for production
llm
serving.
🤖
AI
Content type:
Code
github.com
·
5d
5 days ago
·
Hacker News
,
r/LLM
Actions for harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
Comprehensive evaluation of
LLM
capabilities for interpretation and analysis of genome-scale metabolic
models
in metabolic engineering
🤖
AI
Content type:
Academic
biorxiv.org
·
2d
2 days ago
Actions for Comprehensive evaluation of LLM capabilities for interpretation and analysis of genome-scale metabolic models in metabolic engineering
Fine-tuning Multi-modal
LLMs
with ART: Art-based Reinforcement Training
🤖
AI
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training
Intelligent inference scheduling with
llm-d
on Red Hat
AI
🤖
AI
developers.redhat.com
·
16h
16 hours ago
Actions for Intelligent inference scheduling with llm-d on Red Hat AI
Why
LLMs
(still) lack taste
⚙️
DevOps
beyondtheprior.com
·
2d
2 days ago
·
Hacker News
Actions for Why LLMs (still) lack taste
Why Your
LLM
Gets Dumber With More Context
🤖
AI
siliconopera.com
·
1h
1 hour ago
Actions for Why Your LLM Gets Dumber With More Context
Claude
vs
GPT-4
: Which
AI
API Is Better for Developers? (2026)
💻
Software Engineering
kalyna.pro
·
6d
6 days ago
·
DEV
Actions for Claude vs GPT-4: Which AI API Is Better for Developers? (2026)
Ollama
0.30 GPU Boost: Faster local Qwen inference on NVIDIA
🔌
APIs
everylocalai.com
·
19h
19 hours ago
·
DEV
Actions for Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA
Timing Trick Cuts Energy Used in
LLM
Training by Up to 14 Percent
🏃
Running
Content type:
News
spectrum.ieee.org
·
1d
1 day ago
·
Hacker News
Actions for Timing Trick Cuts Energy Used in LLM Training by Up to 14 Percent
What
Ollama
Reveals About Local
AI
, Agents, and Open
Models
🤖
AI
Content type:
Blog
odsc.medium.com
·
17h
17 hours ago
Actions for What Ollama Reveals About Local AI, Agents, and Open Models
MCP Architecture Explained for Beginners: Why
AI
Needs a Structured Communication System
🤖
AI
Content type:
Blog
medium.com
·
4h
4 hours ago
Actions for MCP Architecture Explained for Beginners: Why AI Needs a Structured Communication System
The smartest
ChatGPT
users are putting local
AI
in front of it — here's why
🤖
AI
tomsguide.com
·
5d
5 days ago
Actions for The smartest ChatGPT users are putting local AI in front of it — here's why
Fixing a stuck
Ollama
runner and building a GPU watchdog
⚙
System programming
patrickmccanna.net
·
2d
2 days ago
·
Hacker News
Actions for Fixing a stuck Ollama runner and building a GPU watchdog
CommBench: Can
LLMs
Write Correct and Efficient GPU Communication Code?
🤖
AI
uccl-project.github.io
·
9h
9 hours ago
·
Hacker News
Actions for CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
Built and launched a research-reading and highlighting tool with
Claude
over a few months. Here are the things
AI
was surprisingly good (and bad) at.
🤖
AI
highlyt.app
·
2d
2 days ago
·
r/ClaudeAI
Actions for Built and launched a research-reading and highlighting tool with Claude over a few months. Here are the things AI was surprisingly good (and bad) at.
AMD's Lemonade SDK For Local
AI
Adds NVIDIA CUDA Support
🤖
AI
phoronix.com
·
23h
23 hours ago
·
r/artificial
Actions for AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support
Running
Ollama
on a 15W CPU sounded ridiculous until I got it working with decent results
⚙
System programming
xda-developers.com
·
1w
1 week ago
Actions for Running Ollama on a 15W CPU sounded ridiculous until I got it working with decent results
MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent
🇬🇧
London Tech
Content type:
Blog
bric.pe.kr
·
2d
2 days ago
·
DEV
Actions for MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent
Improved performance and
model
support with GGUF
🤖
AI
Content type:
Blog
ollama.com
·
6d
6 days ago
Actions for Improved performance and model support with GGUF
Inferoa
AI
harness claimed 90% cache savings. We ran it and measured 97.8%
🤖
AI
zozo123.github.io
·
1d
1 day ago
·
Hacker News
Actions for Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help