Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
⚡ Tokenizer Optimization
Specific
SIMD Processing, State Machines, Unicode Handling, Performance
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
159568
posts in
17.6
ms
Accelerate CPU-based AI inference workloads using Intel
AMX
on Amazon
EC2
🗺️
Region Inference
aws.amazon.com
·
2d
·
…
Speculative
Decoding: Performance or
Illusion
?
🗺️
Region Inference
specdecode-bench.github.io
·
5d
·
Hacker News
·
…
Metal Quantized Attention: pulling M5 Max ahead with
Int8
matrix
multiplication
🗺️
Region Inference
releases.drawthings.ai
·
21h
·
Hacker News
·
…
OmniVoice
, high-quality
TTS
for 600+ Languages
🔄
Incremental Lexing
zhu-han.github.io
·
2h
·
Hacker News
·
…
Beating Python’s GIL: Achieving a 130x
Speedup
in Batch Processing with Rust and
Rayon
🦀
MIR Optimization
medium.com
·
1d
·
…
Building
CompilerSutra
🎓
Teaching Compilers
docs.google.com
·
10h
·
DEV
·
…
Context
Rot
: How
Increasing
Input Tokens Impacts LLM Performance
🔍
Tokenizers
trychroma.com
·
6d
·
DEV
·
…
General
scales
unlock AI evaluation with
explanatory
and predictive power
🪜
Recursive Descent
nature.com
·
22h
·
…
APL
Performance
🔀
SIMD Programming
aplwiki.com
·
2d
·
Hacker News
·
…
Intel Delivers Open, Scalable AI Performance in
MLPerf
Inference
v6.0
🗺️
Region Inference
newsroom.intel.com
·
23h
·
…
How we chose
Positron
’s Python type
checker
✅
Type Checking
positron.posit.co
·
1d
·
Hacker News
·
…
yash27-lab/batch
_forge: A high-performance, bare-metal inference engine for JAX and Equinox models written in Rust. Features zero-copy
Safetensors
loading and hand-optimized Metal/Vulkan compute kernels for Transformers, Vision Language Models, and State-Space Models
🗺️
Region Inference
github.com
·
3d
·
Hacker News
·
…
Donald
Raab
: Measuring the Startup Memory Cost for Lazy
Iteration
Patterns in Java
🗑️
Garbage Collection
donraab.medium.com
·
2d
·
…
Iteratively
optimizing an
SPSC
queue
🎯
Ring Buffers
blog.c21-mac.com
·
3d
·
r/cpp
·
…
MXFP8
GEMM: Up to 99% of
cuBLAS
Performance Using CUDA and PTX
🔬
Nanopasses
danielvegamyhre.github.io
·
4d
·
Hacker News
·
…
Scaling AI
Workloads
in Java Without Breaking Your
APIs
⚡
Interpreter Optimization
dzone.com
·
5d
·
…
Discord Engineers Add Distributed
Tracing
to
Elixir
's Actor Model Without Performance Penalty
✨
Gleam
infoq.com
·
5d
·
…
Systematic
Analysis of CPU-Induced
Slowdowns
in Multi-GPU LLM Inference (Georgia Tech)
🗺️
Region Inference
semiengineering.com
·
5d
·
…
MinIO
AIStor
and
Ampere
® Computing Reference Architecture for High-Performance AI Inference
🏰
Capability Machines
dzone.com
·
6d
·
…
Designing High-Concurrency
Databricks
Workloads Without Performance
Degradation
🗑️
Concurrent GC
dzone.com
·
5d
·
…
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help