💸 Affordable LLMs - minezone

🦙Ollama Blog

alper.bearblog.dev·

Open-LLM-VTuber Review: Offline AI Companion with Live2D

🦙Ollama Blog

dev.to··DEV

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

🦙Ollama Code

github.com··DEV

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

🦙Ollama

alternativeto.net·

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

🦙Ollama

deemwar-products.github.io··Hacker News

Fixing a stuck Ollama runner and building a GPU watchdog

🦙Ollama

patrickmccanna.net··Hacker News

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

🦙Ollama Code

github.com··Hacker News

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)

🦙Ollama Blog

dev.to··DEV

Large companies can add a local LLM filter layer to considerably reducing their AI costs

📝NLP

umrashrf.github.io··Hacker News

Less-relevant results

Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change

🦙Ollama News Blog

andreaborio.substack.com··Substack

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

🧩LLM Integration

huggingface.co··Hacker News

LLM Inference Engineering Room — Part 3: The Orchestration Layer

🧩LLM Integration Blog

vimal-dwarampudi.medium.com·

Escalate the Model, Not the Conversation

🦙Ollama Blog

dev.to··DEV

zhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability

🦙Ollama Code

github.com··Hacker News

A system programmer’s guide to LLM inference

🦙Ollama Blog

blog.xiangpeng.systems··Hacker News

How I benchmarked a 100% local RAG pipeline to 9/9 (zero API keys)

🗂️Vector Databases

buy.polar.sh··DEV

Show HN: Ext-Infer

🦙Ollama

infer.displace.tech··Hacker News

Token4Token — pay-per-token inference on Gnosis + Swarm

🦙Ollama

t4t.eth.link··Hacker News

I Benchmarked 3 Local LLMs on My Laptop — Here's What the Numbers Actually Show

🦙Ollama Blog

dev.to··DEV

Optimizing Local LLM Inference on Constrained Hardware

local llm on laptop 780M GPU using llama + gemma 4 qat

Open-LLM-VTuber Review: Offline AI Companion with Live2D

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

Fixing a stuck Ollama runner and building a GPU watchdog

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)

Large companies can add a local LLM filter layer to considerably reducing their AI costs

Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

LLM Inference Engineering Room — Part 3: The Orchestration Layer

Escalate the Model, Not the Conversation

A system programmer’s guide to LLM inference

How I benchmarked a 100% local RAG pipeline to 9/9 (zero API keys)

Show HN: Ext-Infer

Token4Token — pay-per-token inference on Gnosis + Swarm

I Benchmarked 3 Local LLMs on My Laptop — Here's What the Numbers Actually Show