Ollama

Ollama Local LLM Server, Ollama Model Management, Ollama API Server and Inference Engine

Feeds to Scour
SubscribedAll
Scoured 172 posts in 7.2 ms

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

 SIMD Optimization
everylocalai.com··DEV

Unsloth Minimax M3 GGUF

 Ruff
huggingface.co··r/LocalLLaMA

feat(parallel): add free Parallel Search MCP as the zero-config defau… · openclaw/openclaw@983b65b

 Ruff  Content type: Code
github.com·

How to Run an LLM Locally: Ultimate Guide to Local AI 2026

 🔧LLVM  Content type: Blog

From Chatbot Hallucinations to Deterministic Agents: Forcing Local LLMs to Run Production-Grade…

 🎲Procedural Generation  Content type: Blog
medium.com
·

lightmetal: GPU LLM Inference From a Single Java 25 JAR

 🔧LLVM  Content type: Blog
adambien.blog·

6. Air-Gapped Claude Code - The Claude Code SRE Handbook

 ☸️Kubernetes

Ollama's highest performance on Apple Silicon yet with MLX

 ⚙️Performance Profiling  Content type: Blog
ollama.com·

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 🏗Computer Architecture

Orchestrate your LLM pipeline. Locally

 🔧LLVM

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

 🐧Linux Kernel
alternativeto.net·

What Ollama Reveals About Local AI, Agents, and Open Models

 🎲Procedural Generation  Content type: Blog
odsc.medium.com·

local llm on laptop 780M GPU using llama + gemma 4 qat

 🔧LLVM  Content type: Blog
alper.bearblog.dev·

Ask HN: What's the best LLM model that on a 24 GB VRAM GPU?

 🧠Memory Models  Content type: Discussion

A system programmer’s guide to LLM inference

 🐫OCaml  Content type: Blog

WhatLLM.org: Compare LLMs by Benchmarks, Price & Speed

 🗄️Database Sharding  Content type: Discussion  Content type: Reference
whatllm.org·

Fixing a stuck Ollama runner and building a GPU watchdog

 🤖Automation

fix: resolve managed secretref provider auth (#92235) · openclaw/openclaw@9386d62

 📦uv  Content type: Code  Content type: Release
github.com·

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

 🚀Performance Engineering  Content type: Academic
arxiv.org·

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

 🖥️Emulation
everylocalai.com··DEV

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help