Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLMs
🧠 LLMs
Specific
large language models, GPT, prompt engineering, inference
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
3145
posts in
11.5
ms
harshuljain13/llm-inference-at-scale
: A Practitioner handbook for production
llm
serving.
🍎
Apple
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
Ollama
0.30 GPU Boost: Faster local Qwen
inference
on NVIDIA
🖥️
Retro Computing
everylocalai.com
·
8h
8 hours ago
·
DEV
Actions for Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA
The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of
Large
Language
Models
🤨
AI Criticism
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models
What
Ollama
Reveals About Local AI, Agents, and Open
Models
🤨
AI Criticism
Content type:
Blog
odsc.medium.com
·
6h
6 hours ago
Actions for What Ollama Reveals About Local AI, Agents, and Open Models
lightmetal: GPU
LLM
Inference
From a Single Java 25 JAR
🍎
Apple
Content type:
Blog
adambien.blog
·
1d
1 day ago
Actions for lightmetal: GPU LLM Inference From a Single Java 25 JAR
Using
Scikit-LLM
with Open-Source LLMs
🐍
Python
machinelearningmastery.com
·
6d
6 days ago
Actions for Using Scikit-LLM with Open-Source LLMs
Fine-tuning
Large
Language Models (LLMs) using PEFT
🤨
AI Criticism
Content type:
Blog
medium.com
·
3h
3 hours ago
Actions for Fine-tuning Large Language Models (LLMs) using PEFT
Inferoa
AI harness claimed 90% cache savings. We ran it and measured 97.8%
⚙️
Systems Programming
zozo123.github.io
·
18h
18 hours ago
·
Hacker News
Actions for Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%
Why
LLMs
(still) lack taste
🤨
AI Criticism
beyondtheprior.com
·
2d
2 days ago
·
Hacker News
Actions for Why LLMs (still) lack taste
Running
LLM
Inference
on Kubernetes: What It Actually Takes
🤨
AI Criticism
Content type:
Blog
fairwinds.com
·
5d
5 days ago
Actions for Running LLM Inference on Kubernetes: What It Actually Takes
LLM
Routing: From Strategy Selection to Production
Architecture
🕸️
Networking
Content type:
Blog
blog.n8n.io
·
13h
13 hours ago
Actions for LLM Routing: From Strategy Selection to Production Architecture
Fixing a stuck
Ollama
runner and building a GPU watchdog
🦀
Rust
patrickmccanna.net
·
2d
2 days ago
·
Hacker News
Actions for Fixing a stuck Ollama runner and building a GPU watchdog
How to Build a Deterministic
RAG
Testing Tool — and Use
LLM
as an Advisor, Not a Judge
🤨
AI Criticism
Content type:
Blog
medium.com
·
2h
2 hours ago
Actions for How to Build a Deterministic RAG Testing Tool — and Use LLM as an Advisor, Not a Judge
I've tested so many desktop AI tools, but Hermes with
Ollama
is my new favorite - here's why
🍎
Apple
Content type:
News
Content type:
Tutorial
zdnet.com
·
14h
14 hours ago
Actions for I've tested so many desktop AI tools, but Hermes with Ollama is my new favorite - here's why
RAG
Pipeline Explained: From Query to Answer, Step by Step
🖥️
Retro Computing
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for RAG Pipeline Explained: From Query to Answer, Step by Step
How we fight GPU scarcity without compromise
🔐
Cybersecurity
Content type:
Blog
equixly.com
·
5d
5 days ago
·
Hacker News
Actions for How we fight GPU scarcity without compromise
LLMs
Are Brilliant. But They Can Be Fooled.
🤨
AI Criticism
Content type:
Blog
medium.com
·
19h
19 hours ago
Actions for LLMs Are Brilliant. But They Can Be Fooled.
LangChain Explained: Understanding
Models
,
Prompts
,
Chains
, Memory, Indexes, and Agents
🤨
AI Criticism
Content type:
Blog
towardsai.net
·
2d
2 days ago
Actions for LangChain Explained: Understanding Models, Prompts, Chains, Memory, Indexes, and Agents
Improved performance and
model
support with GGUF
🍎
Apple
Content type:
Blog
ollama.com
·
6d
6 days ago
Actions for Improved performance and model support with GGUF
Timing Trick Cuts Energy Used in
LLM
Training by Up to 14 Percent
⚙️
Systems Programming
Content type:
News
spectrum.ieee.org
·
17h
17 hours ago
·
Hacker News
Actions for Timing Trick Cuts Energy Used in LLM Training by Up to 14 Percent
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help