✨ Model optimizations in LLMs - pleto · Scour

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

📊AI Performance Profiling

local-llm.utop.workers.dev··Hacker News

Apple WWDC On-Device AI Deep Dive - Google Docs

🧠Large Language Models (LLMs)

gist.is··Hacker News

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

🔧Systems-level optimizations for LLM serving Blog

dnhkng.github.io·

Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon

🧠Large Language Models (LLMs)

xda-developers.com·

Create Your Own Programming Language with Rust

🧠Large Language Models (LLMs)

createlang.rs··Hacker News

NetX-lab/Frontier: Frontier: A Discrete-Event Simulator for Modern LLM Serving

🔧Systems-level optimizations for LLM serving Code

github.com··Hacker News

HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs

🔍Retrieval-augmented generation Blog

SPEAR: A System for Post-Quantization Error-Adaptive Recovery Enabling Efficient Low-Bit LLM Serving

💬Prompt optimizations for LLM serving Academic

Alduin 4B, an uncensored Vision LLm just released.

🚀LLM serving frameworks

huggingface.co··r/StableDiffusion

TurboQuant in PostgreSQL

🔍Retrieval-augmented generation Blog

blog.mayflower.de·

Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss

🚀LLM serving frameworks News

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

🧠Large Language Models (LLMs) News

·

What's in the Box? A Field Guide to AI Models

🧠Large Language Models (LLMs) Blog

iankduncan.com·

Google’s DiffusionGemma is 4x faster than its other Gemma models

🧠Large Language Models (LLMs)

thenewstack.io·

A system programmer’s guide to LLM inference

🔧Systems-level optimizations for LLM serving Blog

blog.xiangpeng.systems··Hacker News

Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!

🔧Systems-level optimizations for LLM serving

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

🚀LLM serving frameworks Blog

ziraph.com··Hacker News

Complexifying the Complex

🤖Agents using LLMs Academic

math.columbia.edu·

How One MSAI Student Built an AI Tool to Predict Supply Chain Disruptions

🔢Quantization of LLMs Academic

cs.utexas.edu·

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

🧠Large Language Models (LLMs) News Blog

developer.nvidia.com·

Log in to enable infinite scrolling