🏠 Local AI - bigkevuk

💻AI Coding News Blog

braddelong.substack.com··Substack

Making Local LLM Go Brrr

✍️Prompt Engineering

seanpedersen.github.io·

lightmetal: GPU LLM Inference From a Single Java 25 JAR

💾ARM Blog

adambien.blog·

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

⚙️AI Automation Code

github.com··DEV

LM Studio now lets you use your iPhone to talk to local models on your Mac

⌚Wearables

9to5mac.com··r/apple

Integrate on-device AI models into your app using Core AI - WWDC26 - Videos

🌐Open Source

developer.apple.com··Hacker News

Purpose-built local AI agents

🤖AI Agents Blog

samihonkonen.com··Hacker News

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

💾ARM News Blog

blog.google··Hacker News

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

🔌APIs

huggingface.co··Hacker News

Large companies can add a local LLM filter layer to considerably reducing their AI costs

⚙️LLM Fine-tuning

umrashrf.github.io··Hacker News

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

🍓Raspberry Pi Blog

dnhkng.github.io·

Quality Is Not a Safety Proxy Under Quantization

🛡️AI Safety Academic

arxiv.org·

When AI builds itself 👷, AI is not a line item 📝, local LLMs for agentic coding 🤖

🤖AI Agents

tldr.tech·

Apple rebuilt its on-device AI stack at WWDC 2026

💾ARM Blog

ziraph.com··Hacker News

WWDC 2026: Foundation Models (& Anarlog)

💾ARM

skushagra.com·

Running LLM Inference on Kubernetes: What It Actually Takes

☁️Cloud Infrastructure Blog

fairwinds.com·

What's in the Box? A Field Guide to AI Models

⚙️LLM Fine-tuning Blog

iankduncan.com·

Show HN: Ext-Infer

🪟Windows

infer.displace.tech··Hacker News

Token4Token — pay-per-token inference on Gnosis + Swarm

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

"AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY

Making Local LLM Go Brrr

lightmetal: GPU LLM Inference From a Single Java 25 JAR

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

LM Studio now lets you use your iPhone to talk to local models on your Mac

Integrate on-device AI models into your app using Core AI - WWDC26 - Videos

Purpose-built local AI agents

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

Large companies can add a local LLM filter layer to considerably reducing their AI costs

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

Quality Is Not a Safety Proxy Under Quantization

When AI builds itself 👷, AI is not a line item 📝, local LLMs for agentic coding 🤖

Apple rebuilt its on-device AI stack at WWDC 2026

WWDC 2026: Foundation Models (& Anarlog)

Running LLM Inference on Kubernetes: What It Actually Takes

What's in the Box? A Field Guide to AI Models

Show HN: Ext-Infer