🤖 AI - tionis

🧩lisp Blog

khnsakhnm.medium.com·

local llm on laptop 780M GPU using llama + gemma 4 qat

🐧unix Blog

alper.bearblog.dev·

What Ollama Reveals About Local AI, Agents, and Open Models

🕸️graphs Blog

odsc.medium.com·

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

🐧unix Blog

adambien.blog·

6. Air-Gapped Claude Code - The Claude Code SRE Handbook

🧱data structures

har-ki.github.io··Hacker News

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

🧱data structures Discussion

news.ycombinator.com··Hacker News

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

🧱data structures Code

github.com··DEV

Improved performance and model support with GGUF

🕸️graphs Blog

ollama.com·

Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon

🧱data structures

xda-developers.com·

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

🧱data structures Academic

arxiv.org·

Token4Token — pay-per-token inference on Gnosis + Swarm

🧱data structures

t4t.eth.link··Hacker News

Unsloth Gemma 4 QAT

🧱data structures

unsloth.ai·

On-device AI is a margin decision

🧱data structures Blog

ziraph.com··Hacker News

Fixing a stuck Ollama runner and building a GPU watchdog

🐧unix

patrickmccanna.net··Hacker News

There's one AI machine that doesn't need a nuclear power station to run, and it points to a potential way forward in the memory crisis

🧱data structures News

pcgamer.com

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

DiffusionGemma 26B A4B results on my 5090

Orchestrate your LLM pipeline. Locally

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

A Complete Beginner's Guide to Local LLM Inference

local llm on laptop 780M GPU using llama + gemma 4 qat

What Ollama Reveals About Local AI, Agents, and Open Models

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

6. Air-Gapped Claude Code - The Claude Code SRE Handbook

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

Improved performance and model support with GGUF

Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

Token4Token — pay-per-token inference on Gnosis + Swarm

Unsloth Gemma 4 QAT

On-device AI is a margin decision

Fixing a stuck Ollama runner and building a GPU watchdog

There's one AI machine that doesn't need a nuclear power station to run, and it points to a potential way forward in the memory crisis