Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
ML Inference
🚀 ML Inference
Specific
model inference, inference optimization, TensorRT, ONNX
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
315
posts in
7.3
ms
Ollama's highest performance on Apple Silicon yet with MLX
⚡
Query Engines
Content type:
Blog
ollama.com
·
2d
2 days ago
Actions for Ollama's highest performance on Apple Silicon yet with MLX
Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
🖥️
GPU Computing
local-llm.utop.workers.dev
·
6d
6 days ago
·
Hacker News
·
Cited by 1 article
Actions for Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
Real-time fraud detection for financial transactions
⚙️
ML Systems
Content type:
Blog
redis.io
·
2d
2 days ago
Actions for Real-time fraud detection for financial transactions
MiniMaxAI/MiniMax-M3
⚙️
ML Systems
huggingface.co
·
2d
2 days ago
·
r/LocalLLaMA
·
Cited by 2 articles
Actions for MiniMaxAI/MiniMax-M3
Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks
🧠
Deep Learning
aarushgupta.io
·
4d
4 days ago
·
Lobsters
,
Hacker News
·
Cited by 2 articles
Actions for Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks
How to
Run
an LLM Locally: Ultimate Guide to Local AI 2026
⚙️
ML Systems
Content type:
Blog
cswithsanjay.blogspot.com
·
1d
1 day ago
Actions for How to Run an LLM Locally: Ultimate Guide to Local AI 2026
What's in the Box? A Field Guide to AI
Models
⚙️
ML Systems
Content type:
Blog
iankduncan.com
·
4d
4 days ago
Actions for What's in the Box? A Field Guide to AI Models
4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave
🖥️
GPU Computing
Content type:
Blog
sabareesh.com
·
1d
1 day ago
·
Hacker News
,
r/LocalLLaMA
Actions for 4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave
MTP Isn't Always a Win: 1.95x on My 3090, but
Speculative
Decoding
Is Hardware-Dependent
🖥️
GPU Computing
Content type:
Blog
bric.pe.kr
·
4d
4 days ago
·
DEV
·
Cited by 1 article
Actions for MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent
OpenAI’s IPO Math: $25B Revenue, $27B Burn Rate
📄
Systems Papers
Content type:
Blog
Content type:
Discussion
tildalice.io
·
1d
1 day ago
Actions for OpenAI’s IPO Math: $25B Revenue, $27B Burn Rate
NVIDIA A100 vs RTX 4090 for AI Workloads: The Cost Per FLOP Reality
🖥️
GPU Computing
Content type:
Blog
fitservers.com
·
4d
4 days ago
Actions for NVIDIA A100 vs RTX 4090 for AI Workloads: The Cost Per FLOP Reality
Two Leaps to 1000 Tokens/s on a 1T-Parameter
Model
: On
Inference
Systems, Execution Boundaries, and Co-Design
⚙️
ML Systems
Content type:
Blog
tilert.ai
·
5d
5 days ago
·
Hacker News
·
Cited by 2 articles
Actions for Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design
Anthropic apologizes for invisible Claude Fable guardrails
📄
Systems Papers
Content type:
News
5
articles covering this post
theverge.com
·
2d
2 days ago
·
Hacker News
·
Cited by 5 articles
Actions for Anthropic apologizes for invisible Claude Fable guardrails
TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training
Quantization
⚙️
ML Systems
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization
AI
Serving
Platform That Adapts to Your
Model
⚙️
ML Systems
Content type:
Blog
databricks.com
·
3d
3 days ago
Actions for AI Serving Platform That Adapts to Your Model
Apple WWDC On-Device AI Deep Dive - Google Docs
🧠
Deep Learning
gist.is
·
2d
2 days ago
·
Hacker News
Actions for Apple WWDC On-Device AI Deep Dive - Google Docs
NVIDIA RTX Pro 6000 Blackwell: 96GB GDDR7 and the End of VRAM Anxiety
🖥️
GPU Computing
Content type:
Blog
fitservers.com
·
4d
4 days ago
Actions for NVIDIA RTX Pro 6000 Blackwell: 96GB GDDR7 and the End of VRAM Anxiety
Qwen 3.6 27B AutoRound GGUF, need your feedback
🛠️
Compilers
huggingface.co
·
4d
4 days ago
·
r/LocalLLaMA
Actions for Qwen 3.6 27B AutoRound GGUF, need your feedback
stable-diffusion.cpp/docs/quantization
_and_gguf.md at master · leejet/stable-diffusion.cpp
🛠️
Compilers
Content type:
Code
github.com
·
6d
6 days ago
·
r/StableDiffusion
Actions for stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp
CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
🖥️
GPU Computing
uccl-project.github.io
·
2d
2 days ago
·
Hacker News
Actions for CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help