🤖 AI new techology - Josie · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖AI Code

github.com··Hacker News, r/LLM

Orchestrate your LLM pipeline. Locally

llmforge.app··Hacker News

UniSVQ: 2-bit Unified Scalar-Vector Quantization

🤖AI Academic

Intelligent inference scheduling with llm-d on Red Hat AI

developers.redhat.com·

Qwen 3.6 27B AutoRound GGUF, need your feedback

huggingface.co··r/LocalLLaMA

6. Air-Gapped Claude Code - The Claude Code SRE Handbook

har-ki.github.io··Hacker News

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🤖AI News Blog

blog.google··Hacker News

Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon

xda-developers.com·

Why LLMs (still) lack taste

beyondtheprior.com··Hacker News

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

🤖AI native Discussion

news.ycombinator.com··Hacker News

If LLMs are all persona, whose persona are they?

persona.earthpilot.ai··Hacker News

Google's new open-weights model brings image-generation tricks to AI text generation

🤖AI News

theregister.com·

Foundation Models: Apple Isn’t Building an AI Model. It’s Building an AI Platform.

🤖AI Blog

local llm on laptop 780M GPU using llama + gemma 4 qat

🤖AI native Blog

alper.bearblog.dev·

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

uccl-project.github.io··Hacker News

What's in the Box? A Field Guide to AI Models

🤖AI Blog

iankduncan.com·

Making a Vintage LLM from Scratch

crlf.link··Hacker News

Introducing the Third Generation of Apple’s Foundation Models

machinelearning.apple.com··Hacker News, r/apple

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

zozo123.github.io··Hacker News

Google open-sources speedy DiffusionGemma text diffusion model

siliconangle.com·

Log in to enable infinite scrolling