Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔀 SIMD Programming
Specific
Vectorization, Parallel Computing, CPU Instructions, Performance
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
171518
posts in
27.4
ms
Automatic
Vectorization
⚡
SIMD Optimization
en.wikipedia.org
·
2d
·
Hacker News
CUTEv2
: Unified and
Configurable
Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead
⚡
Hardware Acceleration
arxiv.org
·
18h
Data
Oriented
Design by
example
(2017)
⚡
Cache Optimization
nikitablack.github.io
·
4d
·
Hacker News
Rust and AI: Building the Next Generation of High-Performance Machine Learning Systems
🍱
Nom
levelup.gitconnected.com
·
1d
naresh-cn2/Axiom-Turbo-IO
: High-performance C-engine for parallel data processing. Parses 10M rows in
0.08s
using multithreaded mmap.
🌀
Naiad
github.com
·
3d
·
DEV
Boost Your Spark Jobs: How
Photon
Accelerates
Apache
Spark Performance
💡
Photon
dzone.com
·
1d
AEG
: A
Baremetal
Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators
⚡
Hardware Acceleration
arxiv.org
·
18h
Interferences
within a
certifiable
design methodology for high-performance multi-core platforms
🧵
OpenMP
arxiv.org
·
18h
Leveraging
Mathematical
Reasoning of LLMs for Efficient GPU Thread Mapping
🎮
SIMT Execution
arxiv.org
·
18h
Sparsity-Aware
Roofline
Models for Sparse Matrix-Matrix
Multiplication
🔢
Sparse Matrices
arxiv.org
·
5d
SPEED-Bench: A Unified and
Diverse
Benchmark for
Speculative
Decoding
📦
Folly
arxiv.org
·
18h
Benchmarking
Compound
AI Applications for Hardware-Software Co-Design
🚀
Performance
arxiv.org
·
18h
A Full-Stack Performance Evaluation Infrastructure for
3D-DRAM-based
LLM
Accelerators
🌊
Memory Bandwidth
arxiv.org
·
4d
Tessera
: Unlocking Heterogeneous GPUs through Kernel-Granularity
Disaggregation
🚀
Milvus
arxiv.org
·
18h
WaveTune
: Wave-aware
Bilinear
Modeling for Efficient GPU Kernel Auto-tuning
💡
Photon
arxiv.org
·
18h
Wattlytics
: A Web Platform for Co-Optimizing Performance, Energy, and
TCO
in HPC Clusters
🏗️
System Design
arxiv.org
·
4d
Technology solutions targeting the performance of gen-AI inference in
resource
constrained
platforms
🧩
mimalloc
arxiv.org
·
18h
Making Room for AI: Multi-GPU Molecular Dynamics with Deep
Potentials
in
GROMACS
🚀
Milvus
arxiv.org
·
5d
Analyzing Persistent
Alltoallv
RMA
Implementations for High-Performance MPI Communication
⚡
RDMA
arxiv.org
·
6d
SepSeq
: A Training-Free Framework for Long
Numerical
Sequence Processing in LLMs
🌀
Naiad
arxiv.org
·
4d
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help