Inference

Feeds to Scour
SubscribedAll
Scoured 352 posts in 8.0 ms

Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks

 🎛️Fine-tuning

How we fight GPU scarcity without compromise

 🧠LLMs  Content type: Blog
equixly.com··Hacker News

Report: GKE Inference Gateway delivers up to 92% faster AI responses

 🧠LLMs  Content type: Blog

Token4Token — pay-per-token inference on Gnosis + Swarm

 🧠LLMs

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 🌐Open Source AI  Content type: News  Content type: Blog
blog.google··Hacker News

libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.

 🌐Open Source AI  Content type: Code
github.com·

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

 🧠LLMs  Content type: Blog
jimmysong.io·

Speculators v0.5.0: DFlash support and online training

 🧠LLMs
developers.redhat.com·
Less-relevant results

Ask HN: Is software engineering still a good career choice for new students?

 🧠LLMs  Content type: Discussion

HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs

 🧠LLMs  Content type: Blog
elastic.co·

146th airhacks tv: Rust, Java 25, AI Agents, BCE, Web Components, zunit, zb

 🧠LLMs  Content type: Blog
adambien.blog·

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

 🧠LLMs  Content type: Academic
arxiv.org·

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 🌐Open Source AI

High Bandwidth Flash | A New Memory for AI Data Centers and Edge Computing | Sandisk

 🎛️Fine-tuning
ncnonline.net·

The Bill Arrives: How to Manage Agentic AI Costs at Scale

 🤖AI Agents  Content type: Blog
cockroachlabs.com·

Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!

 🌐Open Source AI
gizchina.com·

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

 🌐Open Source AI  Content type: News  Content type: Blog

Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work

 🎛️Fine-tuning  Content type: Blog  Content type: Discussion
tildalice.io·

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

 🎛️Fine-tuning  Content type: Blog

Build a Medical Report Analyzer on Dedicated Inference with Python

 🧠LLMs
digitalocean.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help