Implicit SIMD, Parallel Kernels, Vectorization, Gang Programming

Feeds to Scour
SubscribedAll
Scoured 72644 posts in 703.4 ms
The Story on ISPC (Intel SPMD Program Compiler)
pharr.org·1d·
Discuss: Hacker News
🔀SIMD Programming
Preview
Report Post
End-to-End Transformer Acceleration Through Processing-in-Memory Architectures
arxiv.org·3h
🧠PIM
Preview
Report Post
Pushing the Packed SIMD Extension Over the Line: An Update on the Progress of Key RISC-V Extension
semiwiki.com·1d
📏Picolibc
Preview
Report Post
Phase-space engineering and collective dynamics in memcomputing
link.aps.org·38m
🎴SIMD Shuffles
Preview
Report Post
istmarc/tenseur: C++23 Tensor, neural networks and mathematical library
github.com·13h·
Discuss: r/cpp
⚙️XLA
Preview
Report Post
Field Notes on Scaling MoE Expert Parallelism with DeepEP
nousresearch.com·1d·
🧵Core Scheduling
Preview
Report Post
I Made Zig Compute 33 Million Satellite Positions in 3 Seconds. No GPU Required.
atempleton.dev·1d·
Discuss: Hacker News
🛣️Highway
Preview
Report Post
Polimi chip speeds up computing and drastically reduces energy consumption
polimi.it·27m·
Discuss: Hacker News
🧠PIM
Preview
Report Post
Why AI Needs GPUs and TPUs: The Hardware Behind LLMs
blog.bytebytego.com·2d
Hardware Acceleration
Preview
Report Post
ZOTAC unveils new Ryzen AI MAX+ 395 powered ZBOX MAGNUS mini PC
tweaktown.com·3h
🔢Intel AMX
Preview
Report Post
Scientific Computing in Rust Monthly #14
scientificcomputing.rs·20h
🦀Rust Macros
Preview
Report Post
Computer-on-Modules for an efficient entry into rugged embedded edge AI applications
einpresswire.com·1d
🔌Embedded Systems
Preview
Report Post
AI Systems Performance Engineering
github.com·7h·
Discuss: Hacker News
🧩mimalloc
Preview
Report Post
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
machinelearning.apple.com·1d
🌀Naiad
Preview
Report Post
Building a mini PyTorch in C++ from scratch as a high school student...
dev.to·18h·
Discuss: DEV
🧮Vector Databases
Preview
Report Post
Hardware-Aware Reformulation of Convolutions for Efficient Execution on Specialized AI Hardware: A Case Study on NVIDIA Tensor Cores
arxiv.org·1d
🔬Deep Learning
Preview
Report Post
FlashAttention 4: Faster, Memory-Efficient Attention for LLMs
digitalocean.com·20h
🔄Hardware Transactional Memory
Preview
Report Post
**Abstract:** This research proposes a novel approach to dynamic resource allocation within CUDA Streaming Multiprocessors (SMs) to enhance performance and e...
freederia.com·2d
🧩mimalloc
Preview
Report Post
C++ Is An Absolute Blast
learncodethehardway.com·4h
⚙️SWC
Preview
Report Post
SHADOW: Simultaneous Multi-Threading Architecture with Asymmetric Threads
danglingpointers.substack.com·1d·
Discuss: Substack
🧵Lightweight Threads
Preview
Report Post

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help