Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Model optimizations in LLMs
✨ Model optimizations in LLMs
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
187
posts in
7.7
ms
Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
📊
AI Performance Profiling
local-llm.utop.workers.dev
·
4d
4 days ago
·
Hacker News
Actions for Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
Apple WWDC On-Device AI Deep Dive - Google Docs
🧠
Large Language Models (LLMs)
gist.is
·
1d
1 day ago
·
Hacker News
Actions for Apple WWDC On-Device AI Deep Dive - Google Docs
2x GH200 for
LLM
inference
, Part 2:
vLLM
, DeepSeek V4 Flash, and MTP
🔧
Systems-level optimizations for LLM serving
Content type:
Blog
dnhkng.github.io
·
4d
4 days ago
Actions for 2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP
Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon
🧠
Large Language Models (LLMs)
xda-developers.com
·
10h
10 hours ago
Actions for Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon
Create Your Own Programming
Language
with Rust
🧠
Large Language Models (LLMs)
createlang.rs
·
2d
2 days ago
·
Hacker News
Actions for Create Your Own Programming Language with Rust
NetX-lab/Frontier: Frontier: A Discrete-Event Simulator for
Modern
LLM
Serving
🔧
Systems-level optimizations for LLM serving
Content type:
Code
github.com
·
20h
20 hours ago
·
Hacker News
Actions for NetX-lab/Frontier: Frontier: A Discrete-Event Simulator for Modern LLM Serving
HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs
🔍
Retrieval-augmented generation
Content type:
Blog
elastic.co
·
3d
3 days ago
Actions for HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs
SPEAR: A System for
Post-Quantization
Error-Adaptive Recovery Enabling Efficient
Low-Bit
LLM
Serving
💬
Prompt optimizations for LLM serving
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for SPEAR: A System for Post-Quantization Error-Adaptive Recovery Enabling Efficient Low-Bit LLM Serving
Alduin 4B, an uncensored Vision
LLm
just released.
🚀
LLM serving frameworks
huggingface.co
·
1d
1 day ago
·
r/StableDiffusion
Actions for Alduin 4B, an uncensored Vision LLm just released.
TurboQuant in PostgreSQL
🔍
Retrieval-augmented generation
Content type:
Blog
blog.mayflower.de
·
18h
18 hours ago
Actions for TurboQuant in PostgreSQL
Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive
llama.cpp
conversions suffer accuracy loss
🚀
LLM serving frameworks
Content type:
News
digg.com
·
6d
6 days ago
Actions for Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss
[AINews] FrontierCode: Benchmarking for Code Quality over Slop
🧠
Large Language Models (LLMs)
Content type:
News
latent.space
·
2d
2 days ago
Actions for [AINews] FrontierCode: Benchmarking for Code Quality over Slop
What's in the Box? A Field Guide to AI
Models
🧠
Large Language Models (LLMs)
Content type:
Blog
iankduncan.com
·
3d
3 days ago
Actions for What's in the Box? A Field Guide to AI Models
Google’s DiffusionGemma is 4x faster than its other Gemma
models
🧠
Large Language Models (LLMs)
thenewstack.io
·
1d
1 day ago
Actions for Google’s DiffusionGemma is 4x faster than its other Gemma models
A system programmer’s guide to
LLM
inference
🔧
Systems-level optimizations for LLM serving
Content type:
Blog
blog.xiangpeng.systems
·
4d
4 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!
🔧
Systems-level optimizations for LLM serving
gizchina.com
·
2d
2 days ago
Actions for Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!
Apples to Apples: MLX vs.
Llama.cpp
for Gemma 4 12B on an M1 16GB
🚀
LLM serving frameworks
Content type:
Blog
ziraph.com
·
6d
6 days ago
·
Hacker News
Actions for Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB
Complexifying the Complex
🤖
Agents using LLMs
Content type:
Academic
math.columbia.edu
·
9h
9 hours ago
Actions for Complexifying the Complex
How One MSAI Student Built an AI Tool to Predict Supply Chain Disruptions
🔢
Quantization of LLMs
Content type:
Academic
cs.utexas.edu
·
9h
9 hours ago
Actions for How One MSAI Student Built an AI Tool to Predict Supply Chain Disruptions
Train
Models
Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell
🧠
Large Language Models (LLMs)
Content type:
News
Content type:
Blog
developer.nvidia.com
·
3d
3 days ago
Actions for Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help