Large language models require a new form of oversight: capability-based monitoring (opens in new tab)
Large language models (LLMs) have been rapidly adopted in healthcare, but oversight strategies are lacking. We propose capability-based monitoring, motivated by the fact that LLMs are generalist systems whose overlapping internal capabilities are reused across numerous downstream tasks. This approach organizes monitoring around shared capabilities to enable cross-task detection of systemic weaknesses, long-tail errors, and emergent behaviors. We describe considerations for developers, organiz...
Read the original article