Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
MLSys
🏗️ MLSys
Specific
ML Infrastructure, Training, Inference Systems
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
188
posts in
8.3
ms
Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
🤖
Inference
Content type:
Blog
towardsai.net
·
2d
2 days ago
Actions for Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
🇳🇱 Go/Golang job: Senior Backend Engineer (Go) | Studio AI at Creative Fabrica (Amsterdam, Netherlands)
☁️
Cloud
golangprojects.com
·
4h
4 hours ago
Actions for 🇳🇱 Go/Golang job: Senior Backend Engineer (Go) | Studio AI at Creative Fabrica (Amsterdam, Netherlands)
huawei-csl/KVarN: KVarN is a native
vLLM
KV-cache
quantization
backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
🤖
Inference
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
Five labs, five minds: building a
multi-model
finance drama on small models
🤖
Inference
Content type:
Blog
huggingface.co
·
4d
4 days ago
Actions for Five labs, five minds: building a multi-model finance drama on small models
Intel aims Crescent Island at
inference
⚙️
Systems Programming
jonpeddie.com
·
6d
6 days ago
Actions for Intel aims Crescent Island at inference
APEX4: Efficient Pure W4A4 LLM
Inference
via Intra-SM Compute Rebalancing
🎮
GPUs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
Nvidia enters PC chip market
🎮
GPUs
jonpeddie.com
·
6d
6 days ago
Actions for Nvidia enters PC chip market
RightNow-AI/AutoMegaKernel: An agent harness that compiles a
model
into one provably-correct, self-retargeting
CUDA
megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.
🤖
AI
Content type:
Code
github.com
·
2d
2 days ago
·
Hacker News
Actions for RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.
TechLetters ☕️ Prompt injection takes Instagram AI bot. Autonomous cyber gets cheap? Red Hat npm worm spreads. AI worm reasons through networks. Gaza data breach...
☁️
Cloud
substackcdn.com
·
2d
2 days ago
·
Substack
Actions for TechLetters ☕️ Prompt injection takes Instagram AI bot. Autonomous cyber gets cheap? Red Hat npm worm spreads. AI worm reasons through networks. Gaza data breach...
GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM
Quantization
(and Which One to Pick)
🤖
Inference
vettedconsumer.com
·
4d
4 days ago
·
Hacker News
Actions for GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)
Breaking free of a single datacenter: Practical
geo-distributed
AI operations with the k0smos platforms
☁️
Cloud
Content type:
Blog
cncf.io
·
2d
2 days ago
Actions for Breaking free of a single datacenter: Practical geo-distributed AI operations with the k0smos platforms
Microsoft distances Surface Laptop Ultra from Copilot+ branding amid AI hardware shift
🔧
Hardware
4sysops.com
·
5d
5 days ago
Actions for Microsoft distances Surface Laptop Ultra from Copilot+ branding amid AI hardware shift
Google's new open
model
DiffusionGemma generates text from noise instead of word by word
🧠
LLMs
the-decoder.com
·
25m
25 minutes ago
Actions for Google's new open model DiffusionGemma generates text from noise instead of word by word
Breaking architecture barriers: Running x86 games and apps on ARM (gpn24)
⚙️
Systems Programming
cdn.media.ccc.de
·
3d
3 days ago
Actions for Breaking architecture barriers: Running x86 games and apps on ARM (gpn24)
Supermicro and Arm advance compute for the agentic AI era
🌐
Distributed Systems
Content type:
Blog
newsroom.arm.com
·
3h
3 hours ago
Actions for Supermicro and Arm advance compute for the agentic AI era
A
system
programmer’s guide to LLM
inference
🤖
Inference
Content type:
Blog
blog.xiangpeng.systems
·
2d
2 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
Using protein language
models
for pangenome construction
🔤
PLT
Content type:
Academic
biorxiv.org
·
3d
3 days ago
Actions for Using protein language models for pangenome construction
ASUS ExpertBook Ultra Flagship Business Laptop Debuts In SEA Markets, Featuring Sub-1kg Chassis & Intel Core Ultra X7 Processor
🔧
Hardware
pokde.net
·
5h
5 hours ago
Actions for ASUS ExpertBook Ultra Flagship Business Laptop Debuts In SEA Markets, Featuring Sub-1kg Chassis & Intel Core Ultra X7 Processor
Computex 2026 – An Epilogue Instead of an Obituary, or How I
Learned
to At Least Accept AI
🎮
GPUs
igorslab.de
·
4d
4 days ago
Actions for Computex 2026 – An Epilogue Instead of an Obituary, or How I Learned to At Least Accept AI
ASTRA-sim 3.0: Next-Level
Distributed
Machine
Learning
Simulations via High-Fidelity GPU and Infrastructure Modeling
🎮
GPUs
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help