The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems (opens in new tab)
AI agents are granted access to tools, APIs, and other infrastructure, making them active principals in those systems. The dominant approach places controls inside the agent's own runtime: system prompts, output filters, and guardrail libraries. Any control in the agent's address space is reachable by inputs that influence it; this generalizes to any AI system with sufficient reach into its own runtime, a class we term escapable AI systems. We i...
Read the original article