Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Affordable LLMs
💸 Affordable LLMs
Specific
Low-cost model APIs, token optimization, local alternatives
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
201
posts in
12.7
ms
Optimizing
Local
LLM
Inference on Constrained Hardware
🦙
Ollama
pub.towardsai.net
·
6h
6 hours ago
Actions for Optimizing Local LLM Inference on Constrained Hardware
local
llm
on laptop 780M GPU using
llama
+ gemma 4 qat
🦙
Ollama
Content type:
Blog
alper.bearblog.dev
·
4d
4 days ago
Actions for local llm on laptop 780M GPU using llama + gemma 4 qat
Open-LLM-VTuber
Review: Offline AI Companion with Live2D
🦙
Ollama
Content type:
Blog
dev.to
·
2d
2 days ago
·
DEV
Actions for Open-LLM-VTuber Review: Offline AI Companion with Live2D
Neo-X7/Neo-AI: A fully offline AI assistant powered by
Ollama
. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No
API
keys. Runs entirely on your machine.
🦙
Ollama
Content type:
Code
github.com
·
8h
8 hours ago
·
DEV
Actions for Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.
Ollama
0.30 delivers faster NVIDIA GPU performance and wider hardware support
🦙
Ollama
alternativeto.net
·
2d
2 days ago
Actions for Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support
Show HN: Run
Llama.cpp
In-Process from Java with Project Panama FFM
🦙
Ollama
deemwar-products.github.io
·
5d
5 days ago
·
Hacker News
Actions for Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM
Fixing a stuck
Ollama
runner and building a GPU watchdog
🦙
Ollama
patrickmccanna.net
·
2d
2 days ago
·
Hacker News
Actions for Fixing a stuck Ollama runner and building a GPU watchdog
martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by
local
LLMs
. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary
tokens
, tarpits, GZIP bombs, and threat intel reporting.
🦙
Ollama
Content type:
Code
github.com
·
5h
5 hours ago
·
Hacker News
Actions for martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.
Doubling Qwen3.6-27B on One RTX 3090:
ollama
llama.cpp
+ MTP, Lever by Lever (35.7 80.2
tok/s
)
🦙
Ollama
Content type:
Blog
dev.to
·
1d
1 day ago
·
DEV
Actions for Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)
Large
companies can add a
local
LLM
filter layer to considerably reducing their AI costs
📝
NLP
umrashrf.github.io
·
5d
5 days ago
·
Hacker News
Actions for Large companies can add a local LLM filter layer to considerably reducing their AI costs
Less-relevant results
Re-quantizing
a
local
LLM
14x faster by skipping the tensors that didn't change
🦙
Ollama
Content type:
News
Content type:
Blog
andreaborio.substack.com
·
8h
8 hours ago
·
Substack
Actions for Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change
NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet
🧩
LLM Integration
huggingface.co
·
2d
2 days ago
·
Hacker News
Actions for NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet
LLM
Inference
Engineering Room — Part 3: The Orchestration Layer
🧩
LLM Integration
Content type:
Blog
vimal-dwarampudi.medium.com
·
1w
1 week ago
Actions for LLM Inference Engineering Room — Part 3: The Orchestration Layer
Escalate the
Model
, Not the Conversation
🦙
Ollama
Content type:
Blog
dev.to
·
1h
1 hour ago
·
DEV
Actions for Escalate the Model, Not the Conversation
zhongkaifu/TensorSharp: A C#
inference
engine for running
large
language
models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability
🦙
Ollama
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for zhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability
A system programmer’s guide to
LLM
inference
🦙
Ollama
Content type:
Blog
blog.xiangpeng.systems
·
2d
2 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
How I benchmarked a 100%
local
RAG pipeline to 9/9 (zero
API
keys)
🗂️
Vector Databases
buy.polar.sh
·
2d
2 days ago
·
DEV
Actions for How I benchmarked a 100% local RAG pipeline to 9/9 (zero API keys)
Show HN:
Ext-Infer
🦙
Ollama
infer.displace.tech
·
3d
3 days ago
·
Hacker News
Actions for Show HN: Ext-Infer
Token4Token
— pay-per-token
inference
on Gnosis + Swarm
🦙
Ollama
t4t.eth.link
·
1d
1 day ago
·
Hacker News
Actions for Token4Token — pay-per-token inference on Gnosis + Swarm
I Benchmarked 3
Local
LLMs
on My Laptop — Here's What the Numbers Actually Show
🦙
Ollama
Content type:
Blog
dev.to
·
5d
5 days ago
·
DEV
Actions for I Benchmarked 3 Local LLMs on My Laptop — Here's What the Numbers Actually Show
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help