Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🏎️ TensorRT
Specific
Inference Optimization, Model Deployment, NVIDIA, Quantization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
104
posts in
6.3
ms
Artain-AI/ignite-ms: Fast self-hosted embedding
engine
for search, RAG, and reindexing workloads on
NVIDIA
GPUs.
Built
in Rust + TensorRT for teams that care about scale, cost, and control.
⚡
ONNX Runtime
github.com
·
23h
·
Hacker News
Need a second pair of eyes, this Qwen3.6 27B
quant
recipe consistently thinks less and is correct
⚡
ONNX Runtime
huggingface.co
·
6d
·
r/LocalLLaMA
Architecture Dependent Temporal Observability Under
Deployment
Interference
in Edge
Inference
Systems
⏱️
CUDA Events
arxiv.org
·
2d
GPU
Memory Math for LLMs: Formula That Tells You What Fits on Your
GPU
📈
GPU Occupancy
theahmadosman.substack.com
·
19h
·
Substack
,
r/LocalLLaMA
12x faster Elasticsearch vector indexing: deploying
NVIDIA
cuVS with
GPU
and CPU tiers
📈
GPU Occupancy
elastic.co
·
2d
reComputer RK3576/RK3588 Edge AI computers are supported by reComputer AI Lab one-click
deployment
platform
⚡
ONNX Runtime
cnx-software.com
·
3h
Inside the M4 Apple Neural
Engine
,
Part
2: ANE Benchmarks
🎯
Tensor Cores
maderix.substack.com
·
3d
·
Substack
Bolt Challenges
Nvidia
With a Focus on Cutting-Edge Graphics
🎯
GPU Kernels
spectrum.ieee.org
·
4h
·
Hacker News
PyTorch vs
TensorFlow
Syntax: 15 Operations Side-by-Side
📜
TorchScript
tildalice.io
·
2d
KV Cache and Flash Attention with
interactive
diagrams
🔲
Loop Tiling
kvcache.cobanov.dev
·
21h
·
Hacker News
Deep Moats and Platform Shifts in Computing
🌊
CUDA Streams
semiconductor.substack.com
·
3d
·
Substack
Running PyTorch
Models
on Apple Silicon GPUs with the ExecuTorch MLX Delegate
⚡
ONNX Runtime
pytorch.org
·
2d
·
Hacker News
Flipper One Tech Specs
⏱️
Benchmarking
docs.flipper.net
·
22h
I tried 4 LLM speedup techniques on CPU. Three made it slower.
📊
Profiling Tools
deemwar-products.github.io
·
22h
·
Hacker News
Initial Benchmarks Of The SpacemiT K3 RVA23 RISC-V CPU With The K3 Pico-ITX
📈
Occupancy Optimization
phoronix.com
·
1d
·
Hacker News
AMD promises to bring improved, hardware-backed FSR 4 upscaling to older Radeon GPUs
🎯
GPU Kernels
arstechnica.com
·
6d
Less-relevant results
Singtech's 200 TOPS AI PC Aims to Move Large AI
Models
Off the Cloud
🔗
NCCL
briefglance.com
·
10h
China unveils a CPU-only supercomputer capable of 1.54 exaflops — LineShine LX2 packs a frankly ridiculous 2.4 million Armv9 cores from Huawei
🌊
CUDA Streams
techradar.com
·
1d
Forlinx rolls out FET3572-C SoM and OK3572-C board with Rockchip RK3572
🧠
CPU Architecture
linuxgizmos.com
·
3d
LLM
Inference
🎓
Model Distillation
iop.systems
·
14h
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help