🛡️ AI Safety - kevincrane

🧠LLMs Academic

biorxiv.org·

Order Is Not Control

🕸️Distributed Systems Academic

arxiv.org·

Sequent: scale and automation for higher confidence in alignment

🧠LLMs

lesswrong.com·

Less-relevant results

A free diagnostic for the Claude Certified Architect exam

🤝AI Agents Discussion Tutorial

claudecertifiedarchitects.com··Hacker News

Contra Dance at LessOnline

🕸️Distributed Systems

jefftk.com··Cited by 1 article

Complete Drosophila Nervous System Mapped

🕸️Distributed Systems

neurosciencenews.com·

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

🧠LLMs Academic

arxiv.org·

Announcing the Next Phase of AI Forge

🤖AI Engineering

lesswrong.com·

Existential Indifference: Self-Nonpreservation as a Necessary Architectural Condition for Aligned Superintelligence (or: The Suicidal AI)

🤝AI Agents Academic

arxiv.org·

Construct validity of Claude Opus 4.8's System Card – A commentary

🤝AI Agents

lesswrong.com·

Coming Around To Political Donations

🧠LLMs

jefftk.com··Cited by 1 article

Eigenism: Ethics for a Human-AI Future

🕸️Distributed Systems Academic

arxiv.org·

Superspace Concentration and Adversarial Robustness in Quantum Algorithms

🕸️Distributed Systems Academic

arxiv.org·

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

🧠LLMs Academic

arxiv.org·

High Dynamic Range DIY Air Testing

🕸️Distributed Systems

jefftk.com··Cited by 1 article

RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue

🧠LLMs Academic

arxiv.org·

Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

🧠LLMs Academic

arxiv.org·

The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics

🧠LLMs Academic

arxiv.org·

OpenClaw Won: How Big Tech Adopted the AI Agent

Iliad is Hiring

scMTG reconstructs single-cell temporal dynamics with Markov transition generators

Order Is Not Control

Sequent: scale and automation for higher confidence in alignment

A free diagnostic for the Claude Certified Architect exam

Contra Dance at LessOnline

Complete Drosophila Nervous System Mapped

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

Announcing the Next Phase of AI Forge

Existential Indifference: Self-Nonpreservation as a Necessary Architectural Condition for Aligned Superintelligence (or: The Suicidal AI)

Construct validity of Claude Opus 4.8's System Card – A commentary

Coming Around To Political Donations

Eigenism: Ethics for a Human-AI Future

Superspace Concentration and Adversarial Robustness in Quantum Algorithms

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

High Dynamic Range DIY Air Testing

RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue

Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics