Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔀 SIMD Programming
Specific
Vectorization, Parallel Computing, CPU Instructions, Performance
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
159541
posts in
15.3
ms
Beating Python’s GIL: Achieving a 130x
Speedup
in Batch Processing with Rust and
Rayon
🦀
MIR Optimization
medium.com
·
1d
·
…
Iteratively
optimizing an
SPSC
queue
🎯
Ring Buffers
blog.c21-mac.com
·
3d
·
r/cpp
·
…
Metal Quantized Attention: pulling M5 Max ahead with
Int8
matrix
multiplication
🗺️
Region Inference
releases.drawthings.ai
·
21h
·
Hacker News
·
…
facebookincubator/dispenso
: The project provides high-performance concurrency, enabling highly parallel computation.
⏱️
Async Runtimes
github.com
·
10h
·
Hacker News
·
…
APL
Performance
🔗
Linear Lisp
aplwiki.com
·
2d
·
Hacker News
·
…
MXFP8
GEMM: Up to 99% of
cuBLAS
Performance Using CUDA and PTX
🔬
Nanopasses
danielvegamyhre.github.io
·
4d
·
Hacker News
·
…
Building
CompilerSutra
🎓
Teaching Compilers
docs.google.com
·
10h
·
DEV
·
…
Accelerate CPU-based AI inference workloads using Intel
AMX
on Amazon
EC2
🗺️
Region Inference
aws.amazon.com
·
2d
·
…
Intel Delivers Open, Scalable AI Performance in
MLPerf
Inference
v6.0
🗺️
Region Inference
newsroom.intel.com
·
23h
·
…
Intel
Binary
Optimization Tool Changes Code Execution with Heavy
Vectorization
🎯
CPU Dispatch
techpowerup.com
·
1d
·
…
Why I’m Building a
Database
Engine in C#
🗃️
Query Compilation
nockawa.github.io
·
5d
·
Hacker News
·
…
'Performance without compromise': AMD debuts first dual 3D V-Cache Ryzen CPU in potential showdown against
Threadripper
and
EPYC
siblings
🎯
CPU Dispatch
techradar.com
·
1d
·
…
MinIO
AIStor
and
Ampere
® Computing Reference Architecture for High-Performance AI Inference
🏰
Capability Machines
dzone.com
·
6d
·
…
Building a
Production-Grade
Vector Database in Rust: What We
Shipped
🚂
Cranelift Backend
ferres.io
·
1d
·
DEV
·
…
Finding performance
bottlenecks
with
Pyroscope
and Alloy: An example using TON blockchain
🔗
Hash Algorithms
grafana.com
·
2d
·
…
JetStream
3: A modern benchmark for high-performance,
compute-intensive
Web applications
⚡
Performance
blog.chromium.org
·
1d
·
Hacker News
,
Blogger
·
…
m0at/rvllm
:
rvLLM
: High-performance LLM inference in Rust. Drop-in vLLM replacement.
🦀
MIR Optimization
github.com
·
4d
·
Hacker News
·
…
abdimoallim/psimd
: A portable, header-only SIMD library for C (SSE2, SSE4.1, AVX/AVX2+FMA, NEON/AArch64, WebAssembly
SIMD128
, scalar fallback)
🔍
Peephole Optimization
github.com
·
1d
·
r/C_Programming
·
…
yash27-lab/batch
_forge: A high-performance, bare-metal inference engine for JAX and Equinox models written in Rust. Features zero-copy
Safetensors
loading and hand-optimized Metal/Vulkan compute kernels for Transformers, Vision Language Models, and State-Space Models
🗺️
Region Inference
github.com
·
3d
·
Hacker News
·
…
[Benchmark]
740k
QPS
Single-thread / 1.45M Dual-thread on a VM. Encountering fluctuations and seeking expert analysis.
🌐
WASM Runtimes
github.com
·
1d
·
r/java
·
…
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help