Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Performance
⚡ Performance
Broad
performance engineering
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
181
posts in
4.5
ms
G.Skill explains how AMD EXPO ULL unlocks additional
performance
— expanded
profiles
allow
memory
makers to include subtiming tweaks for the first time
🧠
Memory Allocators
Content type:
News
tomshardware.com
·
4d
4 days ago
Actions for G.Skill explains how AMD EXPO ULL unlocks additional performance — expanded profiles allow memory makers to include subtiming tweaks for the first time
Records in Production: Where They Shine and Where They Silently Fail
🧠
Memory Management
javacodegeeks.com
·
10h
10 hours ago
Actions for Records in Production: Where They Shine and Where They Silently Fail
Intel is turning the wrong clock: The Core Ultra 7 265K shows why Arrow Lake loses more at NGU than D2D can recover
🧠
CPU Architecture
igorslab.de
·
1d
1 day ago
Actions for Intel is turning the wrong clock: The Core Ultra 7 265K shows why Arrow Lake loses more at NGU than D2D can recover
Apple WWDC On-Device AI Deep Dive - Google Docs
🤖
AI Agents
gist.is
·
3h
3 hours ago
·
Hacker News
Actions for Apple WWDC On-Device AI Deep Dive - Google Docs
HFT
Latency
Monitoring with Probabilistic Calling
Context
⚙️
Compilers
hftuniversity.com
·
1d
1 day ago
·
Hacker News
Actions for HFT Latency Monitoring with Probabilistic Calling Context
ARTA: Adaptive Reinforcement-Learning-Based Throttling Agent for RowHammer Vulnerabilities
⏱️
Tokio
Content type:
Academic
arxiv.org
·
21h
21 hours ago
Actions for ARTA: Adaptive Reinforcement-Learning-Based Throttling Agent for RowHammer Vulnerabilities
Elasticsearch simdvec deep-dive: Walking the
memory
tightrope to 2x better vector
throughput
🧠
CPU Architecture
Content type:
Blog
elastic.co
·
5d
5 days ago
Actions for Elasticsearch simdvec deep-dive: Walking the memory tightrope to 2x better vector throughput
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
🤖
AI Agents
Content type:
Blog
blogs.nvidia.com
·
9h
9 hours ago
Actions for NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference
Systems
, Execution Boundaries, and
Co-Design
🤖
AI Agents
Content type:
Blog
tilert.ai
·
2d
2 days ago
·
Hacker News
Actions for Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design
Now available: Amazon EC2 M9g and M9gd instances powered by new AWS Graviton5 processors
🤖
AI Agents
Content type:
Blog
aws.amazon.com
·
10h
10 hours ago
·
Hacker News
Actions for Now available: Amazon EC2 M9g and M9gd instances powered by new AWS Graviton5 processors
MLPerf and the rise of
latency-aware
LLM
benchmarking
🧠
AI Research
edn.com
·
5d
5 days ago
Actions for MLPerf and the rise of latency-aware LLM benchmarking
Building &
Benchmarking
: LLMs on a 16GB Jetson Orin NX for Hermes Agent
📱
Edge AI
Content type:
Blog
dnhkng.github.io
·
2d
2 days ago
Actions for Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent
The Inference Alpha: Maximizing Frontier Models on AMD
📱
Edge Computing
Content type:
Blog
digitalocean.com
·
11h
11 hours ago
Actions for The Inference Alpha: Maximizing Frontier Models on AMD
Why your database
benchmarking
data is probably wrong (and how I fixed mine)
⚙️
Database Internals
developers.redhat.com
·
6d
6 days ago
Actions for Why your database benchmarking data is probably wrong (and how I fixed mine)
bigattichouse/packed-twin-inference: PTI achieves ~2×
throughput
using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once
per
step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss
🧠
Memory Allocators
Content type:
Code
github.com
·
2d
2 days ago
·
r/LocalLLaMA
Actions for bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss
SanDisk's massive 8TB SD cards are finally close to launch
🔐
Hardware Security
Content type:
News
techspot.com
·
12h
12 hours ago
Actions for SanDisk's massive 8TB SD cards are finally close to launch
Tried to
benchmark
Google's new on-device dictation model and basically couldn't
📱
Edge AI
getonit.ai
·
4h
4 hours ago
·
Hacker News
Actions for Tried to benchmark Google's new on-device dictation model and basically couldn't
Benchmarking
OpenZFS vs EXT4 for my NAS | Heitor's log
🏠
Self-Hosting
heitorpb.github.io
·
3d
3 days ago
Actions for Benchmarking OpenZFS vs EXT4 for my NAS | Heitor's log
Massive AI Storage Demand Creates a New
Memory
Wall
📱
Edge AI
Content type:
News
eetimes.com
·
11h
11 hours ago
Actions for Massive AI Storage Demand Creates a New Memory Wall
Why My Windows
Benchmarks
Were Lying —
CPU
Pinning, Power Caps, and What Variance Actually Tells You
🐧
Linux
Content type:
News
Content type:
Blog
coloneltoad.substack.com
·
2d
2 days ago
·
Substack
Actions for Why My Windows Benchmarks Were Lying — CPU Pinning, Power Caps, and What Variance Actually Tells You
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help