Back to article

Synthetic Persona Pretraining: Alignment from Token Zero (opens in new tab)

Covers 7 stories including [2212.08073] Constitutional AI: Harmlessness from AI FeedbackCovered by threadreaderapp.com

Covers 7 related stories

[2212.08073] Constitutional AI: Harmlessness from AI Feedback

[2203.02155] Training language models to follow instructions with human feedback

Apertus – Open Foundation Model for Sovereign AI

Discussed on Hacker News

Refusal in Language Models Is Mediated by a Single Direction

Discussed on Hacker News

Alignment faking in large language models

Olmo 3

assets.anthropic.com·

System Card: Claude Opus 4.5 [pdf]

Discussed on Hacker News and r/ClaudeAI

Covered in 1 article

threadreaderapp.com·

Thread by @jkminder on Thread Reader App