ML Inference

Feeds to Scour
SubscribedAll
Scoured 315 posts in 7.3 ms

Ollama's highest performance on Apple Silicon yet with MLX

 Query Engines  Content type: Blog
ollama.com·

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

 🖥️GPU Computing

Real-time fraud detection for financial transactions

 ⚙️ML Systems  Content type: Blog
redis.io·

Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks

 🧠Deep Learning

How to Run an LLM Locally: Ultimate Guide to Local AI 2026

 ⚙️ML Systems  Content type: Blog

What's in the Box? A Field Guide to AI Models

 ⚙️ML Systems  Content type: Blog
iankduncan.com·

4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave

 🖥️GPU Computing  Content type: Blog

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

 🖥️GPU Computing  Content type: Blog

OpenAI’s IPO Math: $25B Revenue, $27B Burn Rate

 📄Systems Papers  Content type: Blog  Content type: Discussion
tildalice.io·

NVIDIA A100 vs RTX 4090 for AI Workloads: The Cost Per FLOP Reality

 🖥️GPU Computing  Content type: Blog
fitservers.com·

Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design

 ⚙️ML Systems  Content type: Blog

Anthropic apologizes for invisible Claude Fable guardrails

 📄Systems Papers  Content type: News  5 articles covering this post

TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization

 ⚙️ML Systems  Content type: Academic
arxiv.org·

AI Serving Platform That Adapts to Your Model

 ⚙️ML Systems  Content type: Blog
databricks.com·

Apple WWDC On-Device AI Deep Dive - Google Docs

 🧠Deep Learning
gist.is··Hacker News

NVIDIA RTX Pro 6000 Blackwell: 96GB GDDR7 and the End of VRAM Anxiety

 🖥️GPU Computing  Content type: Blog
fitservers.com·

Qwen 3.6 27B AutoRound GGUF, need your feedback

 🛠️Compilers

stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp

 🛠️Compilers  Content type: Code

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

 🖥️GPU Computing
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help