Alignment Research, Model Robustness, Adversarial Examples, Risk Assessment

My Minor AI Safety Research Projects (Q3 2025)
lesswrong.com·5h
🪄Prompt Engineering
Robustly detecting sneaky cyberattacks that might throw AI spacecraft off-course
techxplore.com·22h
🛡️AI Security
Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment
arxiv.org·10h·
Discuss: Hacker News
💻Programming languages
Operationalizing trust: A C-level framework for scaling genAI responsibly
nordot.app·2h
🆕New AI
Language Models Wrestle with Gaps in Understanding
cacm.acm.org·22h
🧠LLM Inference
AI Chatbot Makers Are Facing A Reckoning Over Teen Safety
bloomberg.com·18h
🛡️AI Security
Six Frameworks for Efficient LLM Inferencing
thenewstack.io·1h
🏗️LLM Infrastructure
LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise
pub.towardsai.net·23h
🏗️LLM Infrastructure
Show HN: SandBox – AI agents simulating possible futures
github.com·21h·
Discuss: Hacker News
🆕New AI
New attack on ChatGPT research agent pilfers secrets from Gmail inboxes
arstechnica.com·22h·
Discuss: r/technews
🛡️AI Security
Welcome Kai!
understandingai.org·23h
🛡️AI Security
The Self-Betrayal Heuristic (SBH): A Simple Test for AI Misalignment
news.ycombinator.com·16h·
Discuss: Hacker News
🛡️AI Security
An Abbreviated Survey of AI (In Fiction)
rznicolet.com·1h·
Discuss: Hacker News
🎭Claude
AI Alliance forges agent-native language, knowledge base
infoworld.com·16h
🆕New AI
AI progress stalls for SEO tasks despite wave of new models
searchengineland.com·54m
🏆LLM Benchmarking
Coding as the epicenter of AI progress and the path to general agents
interconnects.ai·23h·
Discuss: Hacker News
🆕New AI
What AI’s Doomers and Utopians Have in Common
theatlantic.com·3h
🎭Claude
AI is changing how students learn—or avoid learning
phys.org·32m
🪄Prompt Engineering
AI model offers accurate and explainable insights to support autism assessment
medicalxpress.com·15h
🔍AI Interpretability