🤖 AI Inference - buckman · Scour

Really excellent work by the inference team to serve this model so efficiently! ⚡Inference

twitter.macworks.dev

·3d

The Silent Versioning Problem in AI Inference 🤖LLM Inference

digitalocean.com·2d

Can IBM’s RITS Platform and vLLM Reset the Bar for Enterprise AI Access? 🔄AI Workflows

futurumgroup.com·1d

How to Explain AI to a Friend Who Doesn’t Follow Tech 🤖GenAI

hongkiat.com·5d

Building a Local LLM Server with Raspberry Pi 5, Ollama, Tailscale and Chatbox 🍓Raspberry Pi

woliveiras.com·1d·r/LLM

Move voxcpm to AI and Agents > Pre-trained Models and Inference · vinta/awesome-python@c08b123 🤖GenAI

vLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models 🤖LLM Inference

lesswrong.com·3d

New Google TPUs multiply AI infrastructure efficiency ☁️GCP

·4d

an open-source runtime for reliable on-device AI agents 🏛Sovereign AI Infrastructure

mirrorneuron.io·3d·Hacker News

Red Hat Performance and Scale Engineering 🔄AI Workflows

NVIDIA and Google infrastructure cuts AI inference costs 📊Compute Markets

artificialintelligence-news.com·3d

a16z: Large Model Deployment = Forgetting—Can “Continual Learning” Break This Vicious Cycle? 🤖LLM Inference

techflowpost.com·3d

not much happened today ✍️Prompt Engineering

news.smol.ai·5d

dunetrace/dunetrace: Runtime observability for AI agents. Privacy-safe by design. 📦Sandboxing

github.com·6d·Hacker News

Ship AI-powered Products Faster (Website) ⚙️AI Automation

Google is in talks with Marvell to build custom AI inference chips as it diversifies beyond Broadcom 🖥️Local AI

·6d

Prax: An agent runtime that learns from past mistakes and fixes code in a loop 🧠Context Engineering

github.com·3d·Hacker News

The Hidden Bottlenecks in LLM Inference and How to Fix Them 🤖LLM Inference

digitalocean.com·4d

llmrb/llm.rb: Ruby's most capable AI runtime 🧠Context Engineering

github.com·3d·Lobsters

The LLM Inference Trilemma: Throughput, Latency, Cost ⚡Inference

digitalocean.com·4d

Log in to enable infinite scrolling