MLSys

Feeds to Scour
SubscribedAll
Scoured 188 posts in 8.3 ms

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

 🤖Inference  Content type: Blog
towardsai.net·

🇳🇱 Go/Golang job: Senior Backend Engineer (Go) | Studio AI at Creative Fabrica (Amsterdam, Netherlands)

 ☁️Cloud
golangprojects.com·

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

 🤖Inference  Content type: Code
github.com··Hacker News

Five labs, five minds: building a multi-model finance drama on small models

 🤖Inference  Content type: Blog
huggingface.co·

Intel aims Crescent Island at inference

 ⚙️Systems Programming
jonpeddie.com·

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

 🎮GPUs  Content type: Academic
arxiv.org·

Nvidia enters PC chip market

 🎮GPUs
jonpeddie.com·

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

 🤖AI  Content type: Code
github.com··Hacker News

TechLetters ☕️ Prompt injection takes Instagram AI bot. Autonomous cyber gets cheap? Red Hat npm worm spreads. AI worm reasons through networks. Gaza data breach...

 ☁️Cloud
substackcdn.com··Substack

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 🤖Inference

Breaking free of a single datacenter: Practical geo-distributed AI operations with the k0smos platforms

 ☁️Cloud  Content type: Blog
cncf.io·

Microsoft distances Surface Laptop Ultra from Copilot+ branding amid AI hardware shift

 🔧Hardware
4sysops.com·

Google's new open model DiffusionGemma generates text from noise instead of word by word

 🧠LLMs
the-decoder.com
·

Breaking architecture barriers: Running x86 games and apps on ARM (gpn24)

 ⚙️Systems Programming
cdn.media.ccc.de·

Supermicro and Arm advance compute for the agentic AI era

 🌐Distributed Systems  Content type: Blog
newsroom.arm.com·

A system programmer’s guide to LLM inference

 🤖Inference  Content type: Blog

Using protein language models for pangenome construction

 🔤PLT  Content type: Academic
biorxiv.org·

ASUS ExpertBook Ultra Flagship Business Laptop Debuts In SEA Markets, Featuring Sub-1kg Chassis & Intel Core Ultra X7 Processor

 🔧Hardware
pokde.net·

Computex 2026 – An Epilogue Instead of an Obituary, or How I Learned to At Least Accept AI

 🎮GPUs
igorslab.de·

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

 🎮GPUs  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help