Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
⏩ SIMD
Specific
Vectorization, Parallel Processing, Performance, CPU Instructions
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
121651
posts in
27.2
ms
The
Parallel
Lanes
Nobody Uses
🛣️
Highway
dev.to
·
2d
·
DEV
·
…
Accelerate CPU-based AI inference workloads using Intel
AMX
on Amazon
EC2
🔢
Intel AMX
aws.amazon.com
·
3d
·
…
Supercharging
Redpanda
Streaming with profile-guided optimization
🚀
Performance
redpanda.com
·
23h
·
…
Intel
Binary
Optimization Tool Changes Code Execution with Heavy
Vectorization
📊
Profiling Tools
techpowerup.com
·
2d
·
…
Show HN:
PyNear
– exact and approximate KNN, faster than
Faiss
🎯
Qdrant
news.ycombinator.com
·
4d
·
Hacker News
·
…
CppCon 25 Matrix Multiplication Deep Dive || Cache Blocking, SIMD &
Parallelization
--
Aliaksei
Sala
🔀
SIMD Programming
isocpp.org
·
2d
·
…
Metal Quantized Attention: pulling M5 Max ahead with
Int8
matrix
multiplication
⚡
Hardware Acceleration
releases.drawthings.ai
·
1d
·
Hacker News
·
…
APL
Performance
🔀
SIMD Programming
aplwiki.com
·
3d
·
Hacker News
·
…
abdimoallim/psimd
: A portable, header-only SIMD library for C (SSE2, SSE4.1, AVX/AVX2+FMA, NEON/AArch64, WebAssembly
SIMD128
, scalar fallback)
🔢
AVX-512
github.com
·
1d
·
r/C_Programming
·
…
MXFP8
GEMM: Up to 99% of
cuBLAS
Performance Using CUDA and PTX
🧩
mimalloc
danielvegamyhre.github.io
·
4d
·
Hacker News
·
…
TX-Digital
Twin: Visualizing
Supercomputer
GPU Performance Data Stream
📈
TAU
arxiv.org
·
2d
·
…
'Performance without compromise': AMD debuts first dual 3D V-Cache Ryzen CPU in potential showdown against
Threadripper
and
EPYC
siblings
⚡
Hardware Acceleration
techradar.com
·
2d
·
…
Simdxml
for Python: a faster
ElementTree
you don't have to rewrite for
🎨
ART Trees
cigrainger.com
·
6d
·
Hacker News
·
…
Geekbench investigates up to 30% jump with Intel's
iBOT
— performance gain attributed to
newly-vectorized
instructions
⚙️
CPU Microarchitecture
tomshardware.com
·
2d
·
…
Iteratively
optimizing an
SPSC
queue
⭕
Ring Buffers
blog.c21-mac.com
·
4d
·
r/cpp
·
…
Building a Free AI Image Generator on 7
GPUs
: Architecture Deep
Dive
🎮
WebGPU
dev.to
·
11h
·
DEV
·
…
Why I’m Building a
Database
Engine in C#
🔨
Incremental Compilation
nockawa.github.io
·
5d
·
Hacker News
·
…
CPUBone
: Efficient Vision Backbone Design for Devices with Low
Parallelization
Capabilities
🎯
Intel IPP
arxiv.org
·
3d
·
…
Performance &
Recursion
🌳
Instruction Selection
dev.to
·
4d
·
DEV
·
…
GCC vs
Clang
: Same Instructions, Different Performance (
AGU
Insight)
🔗
GCC Link-Time Optimization
dev.to
·
6d
·
DEV
·
…
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help