Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔲 Loop Tiling
Cache Optimization, Blocking, Matrix Multiplication, Locality
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
111454
posts in
447.7
ms
Benchmarking for Single Feature Attribution with
Microarchitecture
Cliffs
arxiv.org
·
13h
🧠
CPU Architecture
DRAMPyML
: A Formal Description of DRAM Protocols with Timed
Petri
Nets
arxiv.org
·
1d
⚡
CUDA Programming Patterns
christopherkarani/Wax
: 🍯 Memory layer for on-device AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer.
github.com
·
1h
·
Discuss:
Hacker News
⚡
Flash Attention
Breaking the
Tractability
Barrier: A Generic Low-Level Solver for
NP-Hard
Instances (N=63) on Commodity 64-Bit Silicon
zenodo.org
·
8h
·
Discuss:
Hacker News
🎯
Tensor Cores
Bitsum
. Real-time
CPU
Optimization and Automation
bitsum.com
·
1d
📊
Profiling Tools
Beyond
Latency
and Communication Complexity - A Tutorial on the
Pipes
Model
decentralizedthoughts.github.io
·
13h
🌊
CUDA Streams
BalatroBench
Benchmarks
Large Language Models Playing Balatro
balatrobench.com
·
7h
·
Discuss:
Hacker News
⚡
ONNX Runtime
OpenAI GPT-5.3-Codex-Spark Now Running at 1K Tokens Per
Secondon
BIG
Cerebras
Chips
servethehome.com
·
1h
⚡
Flash Attention
SIEVE
: an Efficient Turn-Key Eviction Algorithm for Web
Caches
cachemon.github.io
·
1d
·
Discuss:
Hacker News
📊
Profiling Tools
Best CPU 2026 – the top AMD
Ryzen
and Intel Core
processors
tested
club386.com
·
8h
🧠
CPU Architecture
Minimum
Energy Per
Query
semiengineering.com
·
1d
📈
Occupancy Optimization
Zero State
Architecture
deep
dive
news.ycombinator.com
·
1d
·
Discuss:
Hacker News
🎯
Tensor Cores
Nvidia’s new
technique
cuts LLM reasoning costs by 8x without losing
accuracy
venturebeat.com
·
20h
·
Discuss:
r/LocalLLaMA
🔗
NCCL
Parallel Track Transformers:
Enabling
Fast GPU Inference with Reduced
Synchronization
machinelearning.apple.com
·
3d
⏱️
CUDA Events
Allocators
from C to
Zig
antonz.org
·
1d
·
Discuss:
Lobsters
,
Hacker News
,
r/C_Programming
,
r/programming
🧠
CUDA Memory Management
RocksDB
10 and
TidesDB
8 Benchmark Analysis on Dedicated Threadripper
tidesdb.com
·
19h
·
Discuss:
Hacker News
📊
Profiling Tools
Intel Posts 2026 Update For
Cache
Aware
Scheduling
On Linux
phoronix.com
·
21h
🧠
CPU Architecture
Our AI Orchestration Frameworks Are
Reinventing
Linda
(1985)
otavio.cat
·
5h
·
Discuss:
Hacker News
🤖
AI Coding Tools
Co-Routines
in 1-page of C (2013)
embeddedrelated.com
·
5h
·
Discuss:
Hacker News
🚀
Compiler Optimization
An introduction to
lockless
algorithms [
LWN.net
]
lwn.net
·
4d
⚡
CUDA Programming Patterns
Sign up or log in to see more results
Sign Up
Login
« Page 2
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help