Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔲 Loop Tiling
Cache Optimization, Blocking, Matrix Multiplication, Locality
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
81226
posts in
949.1
ms
FCDP
: Fully
Cached
Data Parallel for Communication-Avoiding Large-Scale Training
arxiv.org
·
1d
🔗
NCCL
Rethinking
Code
Complexity
Through the Lens of Large Language Models
arxiv.org
·
15h
🚀
Compiler Optimization
Performance
Tip
of the Week #60: In-process
profiling
: lessons learned
abseil.io
·
2d
📊
Profiling Tools
Operations
on a B+-
Tree
: How the Search Works
dev.to
·
1d
·
Discuss:
DEV
🔍
Type Checkers
Performance
Tip
of the Week #93: Robots never
sleep
abseil.io
·
2d
🏗️
Build Optimization
Designing a
Drift-Resistant
Memory System for LLMs
dev.to
·
4d
·
Discuss:
DEV
⚡
CUDA Programming Patterns
Tensor‑Network Path‑Integral Algorithm for Efficient Simulation of Discrete 3‑D Quantum Gravity and its Application to
Cosmological
Data **Abstract** We
intr
...
freederia.com
·
4d
🏎️
TensorRT
**Tensor‑Network Compression of Affine Kac‑Moody Vertex Operator
Algebras
for Scalable Conformal Field Theory
Computations
** — ### Abstract Affine Kac‑...
freederia.com
·
3d
✂️
CUTLASS
24
bit
multifx
vs 20
bit
thegearpage.net
·
6d
📈
Occupancy Optimization
Sampling the
Oxford
CS
Library
blog.computationalcomplexity.org
·
6d
·
Discuss:
blog.computationalcomplexity.org
🔬
Static Analysis
How I Structure My Data
Pipelines
: The Silver
Layer
loglevelinfo.substack.com
·
6d
·
Discuss:
Substack
🐕
Ruff
The
Limit
in the
Loop
weaviate.io
·
6d
·
Discuss:
Hacker News
📊
Gradient Accumulation
I run local LLMs daily, but I'll never trust them for these
tasks
xda-developers.com
·
4d
⚡
ONNX Runtime
Computer Memory: Part I, The
Fundamentals
| by Tom
Herbert
| Feb, 2026
medium.com
·
6d
⚙️
Systems Programming
Optimized
LLM Inference
Engines
rishirajacharya.com
·
6d
⚡
ONNX Runtime
I
outperformed
Enterprise
Engines
by 225,000x on a $50 CPU. Here is the data
news.ycombinator.com
·
5d
·
Discuss:
Hacker News
📈
GPU Occupancy
AMD Zen 6: More
cores
, more cache,
hardly
any more surface area
igorslab.de
·
6d
⏱️
CUDA Events
Intel attacks the workstation segment with
Xeon
600 featuring up to 86
cores
and a new platform
igorslab.de
·
6d
🧠
CPU Architecture
adrianbrad/queue
: ⏪️ Go package providing multiple queue
implementations
. Developed in a thread-safe generic way.
github.com
·
6d
🦀
PyO3
Why Move To
2nm
?
semiengineering.com
·
6d
🎛️
CUDA Optimization
Loading...
Loading more...
« Page 11
•
Page 13 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help