ddboline's Feed · Scour

Mode-Dependent Rectification for Stable PPO Training

arxiv.org·2d

🤖reinforcement learning

Learning the Value Systems of Agents with Preference-based and Inverse Reinforcement Learning

arxiv.org·3d

🤖reinforcement learning

Why AI Agents Make Different Decisions When They Think It's Real

dev.to·20h·

Discuss: DEV

🤖reinforcement learning

A Simple Method for Commonsense Reasoning

dev.to·9h·

Discuss: DEV

🤖reinforcement learning

a proposal for AI that's on your side

r.github.io·2d·

Discuss: Hacker News

🤖reinforcement learning

Against the Orthogonality Thesis

jonasmoman.substack.com·3d·

Discuss: Substack

🤖reinforcement learning

technicalchops.com·3d·

Discuss: Hacker News

Prompt Fidelity: Measuring How Much of Your Intent an AI Agent Actually Executes

towardsdatascience.com·2d

🤖reinforcement learning

I spent 2 weeks playing god. My learnings from 597 genetic algorithm lineages

blog.silennai.com·3d·

Discuss: Hacker News

🤖reinforcement learning

Style tips for less experienced developers coding with AI

honnibal.dev·2d·

Discuss: Hacker News

Bridging AI and Skills

bridge.surf·3d·

Discuss: Hacker News

🤖reinforcement learning

Beyond Roleplay: Jailbreaking Gemini with drugs and ritual

tidepool.leaflet.pub·3d·

Discuss: Hacker News

🤖reinforcement learning

The Top 10 Best Practices for AI/BI Dashboards Performance Optimization (Part 2)

databricks.com·3d

📊linear programming

The Game That Ate Itself

seeingthesystem.com·4d·

Discuss: Hacker News

🤖reinforcement learning

Feedback Loopable

ampcode.com·3d·

Discuss: Hacker News

🤖reinforcement learning

The Agentic Trust Framework: Zero Trust Governance for AI Agents

cloudsecurityalliance.org·3d·

Discuss: Hacker News

🤖reinforcement learning

Claude Code is the Inflection Point

newsletter.semianalysis.com

·3d·

Discuss: Hacker News, Hacker News

🧩operations research

Sign up or login to customize your feed and get personalized topic recommendations

As Rocks May Think

evjang.com·4d·

Discuss: Hacker News, r/programming

🤖reinforcement learning

Agentic Proof-Oriented Programming

risemsr.github.io·3d·

Discuss: Lobsters, Hacker News

🧩operations research

How close is AI to taking my job?

epoch.ai·2d·

Discuss: Hacker News

🤖reinforcement learning

Loading more...