Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎯 Reinforcement Learning
RLHF, reward modeling, RL agents, self-improving AI
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
124
posts in
9.7
ms
I got so mad at poke(rogue)like that I trained a
RL
agent
to beat it for me
✍️
Prompt Engineering
thiagolira.blot.im
·
4d
4 days ago
·
Hacker News
Actions for I got so mad at poke(rogue)like that I trained a RL agent to beat it for me
You'
re
doing it wrong
🤖
AI
Content type:
News
understandably.com
·
2d
2 days ago
Actions for You're doing it wrong
AI自进化
💬
LLMs
elmagnifico.tech
·
23h
23 hours ago
Actions for AI自进化
Don't let the LLM speak, just probe it (8 minute read)
✍️
Prompt Engineering
Content type:
Blog
blog.j11y.io
·
23h
23 hours ago
Actions for Don't let the LLM speak, just probe it (8 minute read)
Sakana
AI
launches its Recursive
Self-Improvement
Lab to build autonomous,
self-improving
AI
systems
🤖
AI Coding
Content type:
News
digg.com
·
6d
6 days ago
Actions for Sakana AI launches its Recursive Self-Improvement Lab to build autonomous, self-improving AI systems
Why
AI
labs are betting big on
AI
coding
🤖
AI Coding
fastcompany.com
·
6h
6 hours ago
Actions for Why AI labs are betting big on AI coding
Posting for authoring
🔮
Future of Coding
turingpost.com
·
4d
4 days ago
Actions for Posting for authoring
A Unifying Lens on
Reward
Uncertainty in
RLHF
🤖
AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
AI
Runaway Risks, SpaceX IPO, & Orbital Data Centers
🤖
AI Coding
briefing.forwardfuture.ai
·
12h
12 hours ago
Actions for AI Runaway Risks, SpaceX IPO, & Orbital Data Centers
local
AI
agents
for Cursor with pre-tuned marketplace/commu
🔮
Future of Coding
locaible.com
·
1d
1 day ago
·
Hacker News
Actions for local AI agents for Cursor with pre-tuned marketplace/commu
Spotlight On: Dreamplug Technologies Private Limited (CRED), a New Principal Participating Organization
🏗️
System Design
Content type:
Blog
blog.pcisecuritystandards.org
·
3d
3 days ago
Actions for Spotlight On: Dreamplug Technologies Private Limited (CRED), a New Principal Participating Organization
Anthropic warns that
AI
will soon be able to
improve
itself without human intervention
🛡️
AI Safety
krdo.com
·
6d
6 days ago
Actions for Anthropic warns that AI will soon be able to improve itself without human intervention
dcm31/self-improving-podcast
✍️
Prompt Engineering
val.town
·
2d
2 days ago
·
Hacker News
Actions for dcm31/self-improving-podcast
sarichan777/kaizen-harness:
Self-improving
AI
agent infrastructure: Kaizen-style retrospective optimization, council debates,
self-healing
, verification
✍️
Prompt Engineering
Content type:
Code
github.com
·
2d
2 days ago
·
DEV
Actions for sarichan777/kaizen-harness: Self-improving AI agent infrastructure: Kaizen-style retrospective optimization, council debates, self-healing, verification
Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…
💬
LLMs
Content type:
Blog
medium.com
·
6d
6 days ago
Actions for Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…
Anthropic warns that
AI
could soon escape human control, calls for global freeze on development
🛡️
AI Safety
Content type:
News
abc7news.com
·
6d
6 days ago
·
Hacker News
Actions for Anthropic warns that AI could soon escape human control, calls for global freeze on development
I built a machine that turns
AI
papers into interactive explainers
✍️
Prompt Engineering
Content type:
Blog
blog.skz.dev
·
6d
6 days ago
Actions for I built a machine that turns AI papers into interactive explainers
Multilingual Sentiment Aware Text Summarization A
Reinforcement
Learning
Approach for Consistency Maintenance
🤖
AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
Anthropic did not call for a pause on
AI
🤖
AI
lesswrong.com
·
1d
1 day ago
Actions for Anthropic did not call for a pause on AI
Stack Overflow didn't just help
AI
learn
to code
🤖
AI
zozo123.github.io
·
4d
4 days ago
·
Hacker News
Actions for Stack Overflow didn't just help AI learn to code
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help