🦙 Llama - abdus · Scour

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

deemwar-products.github.io··Hacker News

Less-relevant results

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Blog

blogs.nvidia.com·

"AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY

News Blog

braddelong.substack.com··Substack

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

Blog

ziraph.com··Hacker News

WWDC 2026: Foundation Models (& Anarlog)

skushagra.com·

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

Blog

dnhkng.github.io·

Can Open-Source LLM Agents Replace Static Application Security Testing Tools? An Empirical Assessment

Academic

Running LLM Inference on Kubernetes: What It Actually Takes

Blog

fairwinds.com·

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

Code

github.com··DEV

Meta ties up with Ambani's Reliance for AI data center in India

channelnewsasia.com·

Why Shrinking an AI Model Often Makes It More Useful

siliconopera.com·

Intelligent inference scheduling with llm-d on Red Hat AI

developers.redhat.com·

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

Blog

towardsai.net·

What's in the Box? A Field Guide to AI Models

Blog

iankduncan.com·

On-device AI is a margin decision

Blog

ziraph.com··Hacker News

Running Ollama on a 15W CPU sounded ridiculous until I got it working with decent results

xda-developers.com·

RakuOS fixes the one thing that annoys me most about immutable Linux distros

News

Google's new open model DiffusionGemma generates text from noise instead of word by word

the-decoder.com

·

Creating ADK Agent using locally running Gemma 4

Blog

Evaluating Hallucinations in Domain-Adapted Large Language Models

Academic

Log in to enable infinite scrolling