TLA+, Model Checking, Safety Properties, Specifications
Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models
arxiv.org·6d
Loading...Loading more...
TLA+, Model Checking, Safety Properties, Specifications