🦙 Ollama - minezone

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

💸Affordable LLMs Code

github.com··DEV

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

🧩LLM Integration

alternativeto.net·

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

💸Affordable LLMs

deemwar-products.github.io··Hacker News

Qwen 3.6 27B AutoRound GGUF, need your feedback

📉Model Quantization

huggingface.co··r/LocalLLaMA

No Cloud, No Cost: Build an Offline Visual AI Agent with Gemma 4

🧩LLM Integration Blog

dev.to··DEV

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

💸Affordable LLMs Code

github.com··Hacker News

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🧩LLM Integration News Blog

blog.google··Hacker News

Fixing a stuck Ollama runner and building a GPU watchdog

🧩LLM Integration

patrickmccanna.net··Hacker News

Unsloth Gemma 4 QAT

🧩LLM Integration

unsloth.ai·

Escalate the Model, Not the Conversation

💸Affordable LLMs Blog

dev.to··DEV

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

📝NLP Code

github.com··Hacker News

Here's a llama.cpp CLI Command builder.

🧩LLM Integration

llamabuilding.com··r/LocalLLaMA

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

🧩LLM Integration Blog

ziraph.com··Hacker News

Token4Token — pay-per-token inference on Gnosis + Swarm

🧩LLM Integration

t4t.eth.link··Hacker News

Less-relevant results

Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change

💸Affordable LLMs News Blog

andreaborio.substack.com··Substack

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

📉Model Quantization

vettedconsumer.com··Hacker News

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

🧩LLM Integration Code

github.com··Hacker News

Optimizing Local LLM Inference on Constrained Hardware

local llm on laptop 780M GPU using llama + gemma 4 qat

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

Qwen 3.6 27B AutoRound GGUF, need your feedback

No Cloud, No Cost: Build an Offline Visual AI Agent with Gemma 4

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

Fixing a stuck Ollama runner and building a GPU watchdog

Unsloth Gemma 4 QAT

Escalate the Model, Not the Conversation

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

Here's a llama.cpp CLI Command builder.

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

Token4Token — pay-per-token inference on Gnosis + Swarm

Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.