🤖 AI - Jayson · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

☁️Cloud Code

github.com··Hacker News

Why LLMs (still) lack taste

beyondtheprior.com··Hacker News

Trainable Smooth-Rotation Transforms with Learned Channel Scales for LLM Quantization

📊Observability Academic

DESi — Deterministic governance for LLM pipelines

hstre.github.io··Hacker News

LLM Observability: What To Instrument and How To Act on It

⚙️DevOps Blog

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

📊Observability

zozo123.github.io··Hacker News

LLM Inference Engineering Room — Part 3: The Orchestration Layer

🏗️Platform Engineering Blog

vimal-dwarampudi.medium.com·

A system programmer’s guide to LLM inference

🔒Security Blog

blog.xiangpeng.systems··Hacker News

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

🔀GitOps Blog

blogs.nvidia.com·

Microsoft just shared the frontier data engineering secrets

🏗️Platform Engineering

mail.bycloud.ai·

Domain-Specific Small Language Models (Manning)

i-programmer.info·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

📊Observability News Blog

blog.google··Hacker News

What's in the Box? A Field Guide to AI Models

🔀GitOps Blog

iankduncan.com·

Qwen 3.6 27B AutoRound GGUF, need your feedback

📊Observability

huggingface.co··r/LocalLLaMA

A Fun & Absurd Introduction to Vector Databases • Alexander Chatzizacharias

🔀GitOps Video

youtu.be··r/programming

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

vettedconsumer.com··Hacker News

DiffusionGemma: The Developer Guide- Google Developers Blog

🔀GitOps Blog

developers.googleblog.com··r/LocalLLaMA

The AI automation tool nobody talks about just replaced my entire workflow setup

xda-developers.com·

Critical Hugging Face Transformers flaw ran attacker code on a routine model load

siliconangle.com·

Siri AI at WWDC 2026

📊Observability

simonwillison.net··Hacker News

Log in to enable infinite scrolling