🏗️ MLSys - kelvinyu1117 · Scour

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

🤖Inference Blog

towardsai.net·

🇳🇱 Go/Golang job: Senior Backend Engineer (Go) | Studio AI at Creative Fabrica (Amsterdam, Netherlands)

golangprojects.com·

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

🤖Inference Code

github.com··Hacker News

Five labs, five minds: building a multi-model finance drama on small models

🤖Inference Blog

huggingface.co·

Intel aims Crescent Island at inference

⚙️Systems Programming

jonpeddie.com·

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

🎮GPUs Academic

Nvidia enters PC chip market

jonpeddie.com·

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

🤖AI Code

github.com··Hacker News

TechLetters ☕️ Prompt injection takes Instagram AI bot. Autonomous cyber gets cheap? Red Hat npm worm spreads. AI worm reasons through networks. Gaza data breach...

substackcdn.com··Substack

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

vettedconsumer.com··Hacker News

Breaking free of a single datacenter: Practical geo-distributed AI operations with the k0smos platforms

☁️Cloud Blog

Microsoft distances Surface Laptop Ultra from Copilot+ branding amid AI hardware shift

Google's new open model DiffusionGemma generates text from noise instead of word by word

the-decoder.com

·

Breaking architecture barriers: Running x86 games and apps on ARM (gpn24)

⚙️Systems Programming

cdn.media.ccc.de·

Supermicro and Arm advance compute for the agentic AI era

🌐Distributed Systems Blog

newsroom.arm.com·

A system programmer’s guide to LLM inference

🤖Inference Blog

blog.xiangpeng.systems··Hacker News

Using protein language models for pangenome construction

🔤PLT Academic

ASUS ExpertBook Ultra Flagship Business Laptop Debuts In SEA Markets, Featuring Sub-1kg Chassis & Intel Core Ultra X7 Processor

Computex 2026 – An Epilogue Instead of an Obituary, or How I Learned to At Least Accept AI

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

🎮GPUs Academic

Sign up or log in to see more results

Log in to enable infinite scrolling