Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔲 Loop Tiling
Cache Optimization, Blocking, Matrix Multiplication, Locality
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
81246
posts in
526.7
ms
Predicting Future Utility: Global
Combinatorial
Optimization for Task-Agnostic KV Cache
Eviction
arxiv.org
·
15h
🧠
CPU Architecture
Series-Parallel-Loop
Decompositions
of Control-flow Graphs
arxiv.org
·
15h
🔀
Operator Fusion
An introduction to
lockless
algorithms [
LWN.net
]
lwn.net
·
1d
⚡
CUDA Programming Patterns
Parallel Track Transformers:
Enabling
Fast GPU Inference with Reduced
Synchronization
machinelearning.apple.com
·
20h
⏱️
CUDA Events
Performance Tip of the Week #62:
Identifying
and reducing memory
bandwidth
needs
abseil.io
·
2d
📊
Profiling Tools
A Note on
Flat
Abstract
Syntax
Trees
gist.github.com
·
1d
·
Discuss:
Hacker News
🔬
Static Analysis
Faster
AI Training
Unlocked
With New System For Massive Language Models
quantumzeitgeist.com
·
1d
🎯
Tensor Cores
Hitting
1,000
tokens
per second on a single RTX 5090
blog.alpindale.net
·
1d
·
Discuss:
Hacker News
,
Hacker News
🎛️
CUDA Optimization
AFMTJ
Model For In-Memory Computing (University of
Arizona
)
semiengineering.com
·
3h
⚡
CUDA Programming Patterns
Concurrency
Deep Dive: Memory Models, Lock-Free, and
RCU
dev.to
·
3d
·
Discuss:
DEV
⚡
CUDA Programming Patterns
SectorC
: a C compiler in 512
bytes
blog.adafruit.com
·
22h
🚀
Compiler Optimization
LocalGPT
: A local AI assistant with
persistent
memory in a single binary
localgpt.app
·
1d
·
Discuss:
Hacker News
⚡
ONNX Runtime
The
Prospero
Challenge
mattkeeter.com
·
1d
✂️
CUTLASS
Rust Memory Management: The
Playroom
Analogy
adacore.com
·
6h
·
Discuss:
Hacker News
✂️
CUTLASS
Passing the
Torch
: Reflections on ARC’s Journey and the Future of
Specialized
Processing
eetimes.com
·
12h
🔧
PTX
In-depth Analysis of
Banker
's
Rounding
Algorithm in C# Math.Round and Its Applications
devgex.com
·
3h
✂️
CUTLASS
tzcnt/TooManyCooks
: C++20 concurrency framework with no compromises. Excellent performance, powerful features, and simple syntax.
github.com
·
1d
⚡
CUDA Programming Patterns
Unlocking core memories with
GoldSrc
engine and
CS
1.6 (2025)
danielbrendel.com
·
2d
·
Discuss:
Hacker News
✂️
CUTLASS
The
Datacenter
as a Computer: An Introduction to the Design of
Warehouse-Scale
Machines, Second Edition
research.google
·
3h
·
Discuss:
Hacker News
🌐
Distributed Computing
Main
Content ||
Math
∩ Programming
jeremykun.com
·
1d
📉
Model Quantization
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help