Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
📱 Edge AI
Specific
Model Quantization, ONNX Runtime, Embedded Inference, TinyML
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
70
posts in
21.6
ms
Running Gemma 4 26B on GKE with a Single L4 GPU
🦭
Podman
dev.to
·
2d
·
DEV
What's in a GGUF, besides the
weights
- and what's still missing?
💸
Affordable LLMs
nobodywho.ooo
·
6d
·
Hacker News
,
r/LocalLLaMA
Artain-AI/ignite-ms
: Fast self-hosted
embedding
engine
for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.
🚀
Performance
github.com
·
9h
·
Hacker News
InferenceBench
: A Benchmark for Open-Ended Inference Optimization by
AI
Agents
📉
Model Quantization
inferencebench.ai
·
3h
·
Hacker News
KV Cache and Flash Attention with interactive diagrams
⚡
Cache Optimization
kvcache.cobanov.dev
·
7h
·
Hacker News
LLM
Inference
💬
Prompt Engineering
iop.systems
·
18m
Quantization
From First Principles: Build Your Own INT8
Inference
Engine
📉
Model Quantization
medium.com
·
5d
SuperInfer:
SLO-Aware
Rotary Scheduling and Memory Management for LLM
Inference
on Superchips
⚡
Cache Optimization
supercomputing-system-ai-lab.github.io
·
2d
·
Hacker News
Embedding
685 million texts in 32 minutes
📉
Model Quantization
dev.to
·
10h
·
DEV
xxxn3m3s1sxxx/ATLAS-TQ1_0: TQ1.0 ternary
inference
engine
for BitNet b1.58 on CPU. Pack +
run
Falcon3-1B/3B/7B/10B, no GPU needed.
📉
Model Quantization
github.com
·
2d
·
Hacker News
GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU
⚡
Cache Optimization
theahmadosman.substack.com
·
6h
·
Substack
,
r/LocalLLaMA
MoE expert
co-activations
: Reordering inputs yields easy throughput gains.
🔧
DSPy
blog.doubleword.ai
·
4h
·
Hacker News
3x Faster Video
Inference
Without Touching the
Model
📉
Model Quantization
pub.towardsai.net
·
2d
The Central Bank of Intelligence: Navigating the Token Economy
💸
Affordable LLMs
dev.to
·
6d
·
DEV
Software 3.0
💬
Prompt Engineering
dsebastien.net
·
2d
Show HN: GPT-2
inference
in pure C#, 0 bytes allocated per token
📉
Model Quantization
github.com
·
3d
·
Hacker News
TensorRT
`trt.Dims` SIGSEGV inside a GStreamer Python plugin
🔎
Static Analysis
dev.to
·
17h
·
DEV
CPritch/shiftpaper: Parallax wallpaper for Wayland with depth estimation, written in Rust + WGSL
📸
Visual Regression Testing
github.com
·
2d
·
Hacker News
I Open-Sourced a Browser-Based
AI
Background Remover — Here's the Full Architecture
🖼️
Lazy Loading
dev.to
·
23h
·
DEV
AlexRosito67/xyron-mnist-esp32: Implementation of Xyron (
Neural
network
CLI tool in C++ with configurable layers and activation functions)
📉
Model Quantization
github.com
·
20h
·
DEV
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help