Evaluating Deep Agents using LangSmith on AWS (opens in new tab)

Covers 4 stories including LangChain: Observe, Evaluate, and Deploy Reliable AI Agents

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep agents, 2) build offline evaluations using pytest and LangSmith, and 3) configure online monitoring for production. The walkthrough uses a text-to-SQL deep agent with Amazon Bedrock for the full development to production lifecycle.

Read the original article