inference

Feeds to Scour
SubscribedAll
Scoured 162 posts in 22.2 ms

DiffusionGemma: The Developer Guide- Google Developers Blog

馃AIContent type: Blog

Google's new open-weights model brings image-generation tricks to AI text generation

馃AIContent type: News
theregister.com

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

馃AIContent type: Blog
jimmysong.io

How we fight GPU scarcity without compromise

馃AIContent type: Blog
equixly.comHacker News

Valkey: Unlocked Seattle: The Best Systems Let You Sleep At Night

馃AIContent type: Blog
valkey.io

Defense Against Prompt Inversion Attacks: An Information-Theoretic Approach for LLM Collaborative Inference

馃AIContent type: Academic
arxiv.org

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

馃AIContent type: NewsContent type: Blog

AI Serving Platform That Adapts to Your Model

馃AIContent type: Blog
databricks.com

massimo92/spark: CLI tool for serving LLMs with vLLM on NVIDIA DGX Spark. One file, zero friction.

馃AIContent type: Code
github.comHacker News

Ask HN: Is software engineering still a good career choice for new students?

馃AIContent type: Discussion

MLPerf and the rise of latency-aware LLM benchmarking

馃AI
edn.com

DiffusionGemma 26B A4B results on my 5090

馃AI

Latest technical articles & videos.

馃AI
certdepot.net

146th airhacks tv: Rust, Java 25, AI Agents, BCE, Web Components, zunit, zb

馃AIContent type: Blog
adambien.blog

Agentic AI Architecture: How CockroachDB Supports Memory, Context, and Control

馃AIContent type: Blog
cockroachlabs.com

Google's new open model DiffusionGemma generates text from noise instead of word by word

馃AI
the-decoder.com

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

馃AIContent type: NewsContent type: Blog
blog.googleHacker News

The Bill Arrives: How to Manage Agentic AI Costs at Scale

馃AIContent type: Blog
cockroachlabs.com

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

馃AI

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

馃AI

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help