Most execs get that AI is a game-changer, but when it comes to wrangling LLM-based apps in production, observability is still a black box. In her InfoQ session, Sally O’Malley argues that monitoring large language models is nothing like your typical microservices—they’re non-uniform, pricey to run, and throw off new signals around cost, performance, and output quality.In a hands-on demo, she assembles an all-star open-source observability stack—vLLM and Llama Stack instrumented with Prometheus, Tempo, and Grafana on Kubernetes—showing how to track everything from GPU usage to RAG, agentic, and multi-turn workflows. Whether you’re peeking at prefills vs. decodes or diving into traces, you’ll walk away ready to shine a light on your AI workloads.
Preview
Open Original
Similar Posts
Loading similar posts...