🚀 ML Inference - rishabh · Scour

Ollama's highest performance on Apple Silicon yet with MLX

⚡Query Engines Blog

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

🖥️GPU Computing

local-llm.utop.workers.dev··Hacker News·Cited by 1 article

Real-time fraud detection for financial transactions

⚙️ML Systems Blog

MiniMaxAI/MiniMax-M3

⚙️ML Systems

huggingface.co··r/LocalLLaMA·Cited by 2 articles

Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks

🧠Deep Learning

aarushgupta.io··Lobsters, Hacker News·Cited by 2 articles

How to Run an LLM Locally: Ultimate Guide to Local AI 2026

⚙️ML Systems Blog

cswithsanjay.blogspot.com·

What's in the Box? A Field Guide to AI Models

⚙️ML Systems Blog

iankduncan.com·

4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave

🖥️GPU Computing Blog

sabareesh.com··Hacker News, r/LocalLLaMA

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

🖥️GPU Computing Blog

bric.pe.kr··DEV·Cited by 1 article

OpenAI’s IPO Math: $25B Revenue, $27B Burn Rate

📄Systems Papers Blog Discussion

NVIDIA A100 vs RTX 4090 for AI Workloads: The Cost Per FLOP Reality

🖥️GPU Computing Blog

fitservers.com·

Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design

⚙️ML Systems Blog

tilert.ai··Hacker News·Cited by 2 articles

Anthropic apologizes for invisible Claude Fable guardrails

📄Systems Papers News 5

··Hacker News·Cited by 5 articles

TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization

⚙️ML Systems Academic

AI Serving Platform That Adapts to Your Model

⚙️ML Systems Blog

databricks.com·

Apple WWDC On-Device AI Deep Dive - Google Docs

🧠Deep Learning

gist.is··Hacker News

NVIDIA RTX Pro 6000 Blackwell: 96GB GDDR7 and the End of VRAM Anxiety

🖥️GPU Computing Blog

fitservers.com·

Qwen 3.6 27B AutoRound GGUF, need your feedback

🛠️Compilers

huggingface.co··r/LocalLLaMA

stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp

🛠️Compilers Code

github.com··r/StableDiffusion

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

🖥️GPU Computing

uccl-project.github.io··Hacker News

Sign up or log in to see more results

Log in to enable infinite scrolling