Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Parallel Computing
⚡ Parallel Computing
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
452
posts in
28.6
ms
CUDA-Oxide
0.2 Brings Early Improvements To Pure Rust
CUDA
Kernels
🎨
LUT Compression
phoronix.com
·
5d
5 days ago
Actions for CUDA-Oxide 0.2 Brings Early Improvements To Pure Rust CUDA Kernels
Concepts in Practice: C++
MPI
Bindings for the
HPC
Ecosystem. From a Standardizable Core to a Composable
Interface
🔒
Type Safety
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Concepts in Practice: C++ MPI Bindings for the HPC Ecosystem. From a Standardizable Core to a Composable Interface
Latency-Aware,
High-Throughput
Homomorphic AES Evaluation with CKKS
🔐
Homomorphic Encryption
eprint.iacr.org
·
2d
2 days ago
Actions for Latency-Aware, High-Throughput Homomorphic AES Evaluation with CKKS
NVIDIA Nsight
Compute
🎨
LUT Compression
developer.nvidia.com
·
6d
6 days ago
Actions for NVIDIA Nsight Compute
jeffhuen/RustyCSV: High-performance CSV parsing for Elixir. Rust NIF with
SIMD
acceleration,
parallel
parsing, and bounded-memory streaming. Drop-in NimbleCSV replacement.
⚡
SIMD Optimization
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for jeffhuen/RustyCSV: High-performance CSV parsing for Elixir. Rust NIF with SIMD acceleration, parallel parsing, and bounded-memory streaming. Drop-in NimbleCSV replacement.
Elasticsearch
simdvec
deep-dive: Walking the memory tightrope to 2x better
vector
throughput
🔍
Search Indexing
Content type:
Blog
elastic.co
·
5d
5 days ago
Actions for Elasticsearch simdvec deep-dive: Walking the memory tightrope to 2x better vector throughput
Why my
SIMD
code was silently running as scalar, and what debugging it taught me about production environment assumptions
🚀
SIMD Parsing
Content type:
Blog
coloneltoad.substack.com
·
6d
6 days ago
·
Substack
Actions for Why my SIMD code was silently running as scalar, and what debugging it taught me about production environment assumptions
AMD's Lemonade SDK For Local AI Adds NVIDIA
CUDA
Support
🎨
LUT Compression
phoronix.com
·
4h
4 hours ago
Actions for AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support
Multiversion
Concurrency
Control for Multiversion B-Trees
🗄️
Database Recovery
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Multiversion Concurrency Control for Multiversion B-Trees
Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design
⚡
SIMD Vectorization
Content type:
Academic
arxiv.org
·
16h
16 hours ago
Actions for Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design
SWIFT: Shallow and
SIMD-Aware
CKKS Functional Bootstrapping for Low-Latency
🧮
Compute Optimization
eprint.iacr.org
·
6d
6 days ago
Actions for SWIFT: Shallow and SIMD-Aware CKKS Functional Bootstrapping for Low-Latency
KJLdefeated/RL.cu
: RLVR training for LLM in CUDA/C++
🎨
LUT Compression
Content type:
Code
github.com
·
3d
3 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
AgentCompile: An LLM-Guided Compiler for Direct
CUDA
Inference
🎨
LUT Compression
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference
When More Cores Hurts: The
Vector
Database Scaling Paradox in
HPC
🗂️
Vector Databases
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for When More Cores Hurts: The Vector Database Scaling Paradox in HPC
Structuring agentic AI for
HPC
code modernization
🔓
Hacking
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Structuring agentic AI for HPC code modernization
Twelve quick tips for designing AI-driven
HPC
workflows
🎯
Performance Proofs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Twelve quick tips for designing AI-driven HPC workflows
APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM
Compute
Rebalancing
🎨
LUT Compression
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
LLM-Based Porting of Optimized C++ to
CUDA
Through Deoptimization and Reoptimization
🎨
LUT Compression
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization
SET: Stream-Event-Triggered Scheduling for Efficient
CUDA
Graph Pipelines
🎨
LUT Compression
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines
YouZhi: Towards
High-Concurrency
Financial LLMs via Adaptive GQA-to-MLA Transition
⚡
SMT Integration
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help