Selective Control under Noisy Perception: Governance Failures Hidden by Aggregate Metrics in Modular Networks (opens in new tab)
A content-moderation system can score well on every standard accuracy metric and still cause real harm, if its mistakes fall on the few users who connect otherwise separate communities. We show this in an agent-based model where N=240 learning agents on a community-structured network each post harmless, productive, or dangerous content, and a regulator removes or penalizes whatever a noisy classifier flags. Overall usefulness barely moves as the...
Read the original article