🔓 Open Source AI - fediversial

🦬Emacs Blog

blog.alexewerlof.com·

Fixing a stuck Ollama runner and building a GPU watchdog

🏠Self-Hosting

patrickmccanna.net··Hacker News

Why agentic AI needs an open inference stack

🔌Single-Board Computers

redhat.com·

DiffusionGemma: 4x Faster Text Generation

🖥️Retro Computing News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

Running LLM Inference on Kubernetes: What It Actually Takes

🏠Self-Hosting Blog

fairwinds.com·

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

🌐Fediverse Code

github.com··DEV

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

🔌Single-Board Computers Academic

arxiv.org·

What's in the Box? A Field Guide to AI Models

🖥️Retro Computing Blog

iankduncan.com·

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

🦬Emacs

deemwar-products.github.io··Hacker News

On-device AI is a margin decision

🔌Single-Board Computers Blog

ziraph.com··Hacker News

Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!

🔌Single-Board Computers

gizchina.com·

Unsloth Gemma 4 QAT

🖧BSD

unsloth.ai·

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

🌐Fediverse

huggingface.co··Hacker News

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

🖥️Retro Computing

phoronix.com·

Google's new open model DiffusionGemma generates text from noise instead of word by word

Domain-Specific Small Language Models (Manning)

Using Scikit-LLM with Open-Source LLMs

LeLab Is Hugging Face’s New Browser-Based GUI for the LeRobot Ecosystem

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

Using local LLMs for agentic coding

Fixing a stuck Ollama runner and building a GPU watchdog

Why agentic AI needs an open inference stack

DiffusionGemma: 4x Faster Text Generation

Running LLM Inference on Kubernetes: What It Actually Takes

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

What's in the Box? A Field Guide to AI Models

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

On-device AI is a margin decision

Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!

Unsloth Gemma 4 QAT

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support