Compute Costs

Feeds to Scour
SubscribedAll
Scoured 233 posts in 5.3 ms

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

 🗄️KV Cache  Content type: Code
github.com··Hacker News, r/LLM

lightmetal: GPU LLM Inference From a Single Java 25 JAR

 🖥️Inference Engineering  Content type: Blog
adambien.blog·

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

 🖥️Inference Engineering  Content type: Academic
arxiv.org·
Less-relevant results

A Complete Beginner's Guide to Local LLM Inference

 🖥️Inference Engineering  Content type: Blog
khnsakhnm.medium.com·

Introducing a new database category - the predictive database

 💰AI Economics  Content type: Blog
aito.ai··Hacker News

'The best solution is to murder him in his sleep': AI can learn violent tendencies from each other despite zero references to violence in training data

 🤖AI  Content type: News
livescience.com
·

A system programmer’s guide to LLM inference

 🖥️Inference Engineering  Content type: Blog

Shadow AI Governance: How to Secure Employee AI Use in 2026

 💰AI Economics  Content type: Blog

What to look for in an AI assistant

 🤖AI
proton.me·

Running LLM Inference on Kubernetes: What It Actually Takes

 🖥️Inference Engineering  Content type: Blog
fairwinds.com·

I built a "pay as you go" dictation app because I'm tired of all the subscriptions everywhere. Am looking for beta testers for feedback :)

 🔤Tokenization  Content type: Discussion
getvoxa.app··r/SideProject

Intro — Sehastrajit

 🔤Tokenization  Content type: Blog
medium.com·

PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference

 🗄️KV Cache  Content type: Blog
medium.com
·

Huawei chips refine DeepSeek model in major leap for China’s AI self-reliance

 🗄️KV Cache
oodaloop.com
·

ASUS ExpertBook Ultra Flagship Business Laptop Debuts In SEA Markets, Featuring Sub-1kg Chassis & Intel Core Ultra X7 Processor

 💰API Pricing
pokde.net·

Intelligent inference scheduling with llm-d on Red Hat AI

 🖥️Inference Engineering
developers.redhat.com·

Unlawful by design: Exposing the human rights costs of generative AI

 💰AI Economics  Content type: PDF
amnesty.org·

Autonomous AI worm uses local models to exploit networks and repair its own code

 🤖AI
4sysops.com·

PoQ-Judge: A Multi-Architecture Evaluation Framework for Cost-Aware Proof-of-Quality in Decentralized LLM Inference

 🖥️Inference Engineering  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help