Skip to main content
Scour
Discover
Docs
Login
Sign Up
Discover
About
Docs
Changelog
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Back to article
lesswrong.com
4w
4 weeks ago
Synthetic Persona Pretraining: Alignment from Token Zero
(opens in new tab)
Covers
7 stories
See all stories this covers
including
[2212.08073] Constitutional AI: Harmlessness from AI Feedback
Covered by
threadreaderapp.com
Love
Like
Not for me
Save
|
|
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Covers 7 related stories
arxiv.org
·
27w
27 weeks ago
[2212.08073] Constitutional AI: Harmlessness from AI Feedback
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for [2212.08073] Constitutional AI: Harmlessness from AI Feedback
arxiv.org
·
24w
24 weeks ago
[2203.02155] Training language models to follow instructions with human feedback
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for [2203.02155] Training language models to follow instructions with human feedback
apertvs.ai
·
12h
12 hours ago
Apertus – Open Foundation Model for Sovereign AI
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Apertus – Open Foundation Model for Sovereign AI
arxiv.org
·
7w
7 weeks ago
Refusal in Language Models Is Mediated by a Single Direction
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Refusal in Language Models Is Mediated by a Single Direction
arxiv.org
·
78w
78 weeks ago
Alignment faking in large language models
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Alignment faking in large language models
arxiv.org
·
26w
26 weeks ago
Olmo 3
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Olmo 3
assets.anthropic.com
·
29w
29 weeks ago
System Card: Claude Opus 4.5 [pdf]
Discussed on
Hacker News
and
r/ClaudeAI
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for System Card: Claude Opus 4.5 [pdf]
Covered in 1 article
threadreaderapp.com
·
4w
4 weeks ago
Thread by @jkminder on Thread Reader App
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Thread by @jkminder on Thread Reader App
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report