Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔲 Loop Tiling
Cache Optimization, Blocking, Matrix Multiplication, Locality
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
81250
posts in
1.04
s
FlashSketch
: Sketch-Kernel Co-Design for Fast Sparse
Sketching
on GPUs
arxiv.org
·
1d
🎯
GPU Kernels
A Unified
Density
Operator View of Flow Control and
Merging
arxiv.org
·
16h
🔀
Operator Fusion
B+-Tree
Structure
: How Order Is
Maintained
at Scale
dev.to
·
2d
·
Discuss:
DEV
⚡
CUDA Programming Patterns
Show HN:
LocalGPT
– A local-first AI assistant in Rust with
persistent
memory
dev.to
·
2d
·
Discuss:
DEV
💡
LSP
Performance Tip of the Week #53: Precise C++ benchmark
measurements
with Hardware Performance
Counters
abseil.io
·
2d
📊
Profiling Tools
Performance Tip of the Week #74: Avoid
sweeping
street lights under
rugs
abseil.io
·
2d
📊
Profiling Tools
My Most Important C++
Aha
!
Moments
...Ever
artima.com
·
2d
🚀
Compiler Optimization
Implementing Auto
Tiling
With Just 5
Tiles
kyledunbar.dev
·
5d
·
Discuss:
Lobsters
,
Hacker News
📈
GPU Occupancy
Gathering
Scattered
I/O in C++
artima.com
·
2d
📊
Profiling Tools
Understanding LLM Inference
Engines
: Inside
Nano-vLLM
(Part 2)
neutree.ai
·
4d
·
Discuss:
Hacker News
🔄
ONNX
A general optimization framework for
mapping
local
transition-state
networks
nature.com
·
4d
🔗
Kernel Fusion
Both GCC and
Clang
generate
strange/inefficient
code
codingmarginalia.blogspot.com
·
3d
·
Discuss:
Hacker News
,
Hacker News
🚀
Compiler Optimization
The ‘Super Bowl’ standard:
Architecting
distributed systems for massive
concurrency
infoworld.com
·
5d
⏱️
CUDA Events
ggml
: backend-agnostic tensor parallelism by
JohannesGaessler
· Pull Request #19378
github.com
·
4d
·
Discuss:
r/LocalLLaMA
🎯
Tensor Cores
ahead-of-time wasm
gc
in
wastrel
wingolog.org
·
4d
·
Discuss:
Lobsters
,
Hacker News
🚀
Compiler Optimization
llOOPy
lOOPs (Dave
Jarvis
)
dave.autonoma.ca
·
4d
⚙️
Systems Programming
Clojure
’s Persistent Data Structures:
Immutability
Without the Performance Hit
javacodegeeks.com
·
5d
⚡
CUDA Programming Patterns
Virtual AI Memory
Chips
pgsgrove.com
·
4d
⚡
Flash Attention
Training language models on
TPUs
shouldn't be
scary
dogac.dev
·
5d
·
Discuss:
Hacker News
🏎️
TensorRT
Great Power, Great
Latency
: The Spider-Sense of
NUMA
Tuning
mydbanotebook.org
·
5d
📊
Profiling Tools
Loading...
Loading more...
« Page 7
•
Page 9 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help