Computer Science > Computation and Language
arXiv:2511.06676 (cs)
Abstract:Now that AI-driven moderation has become pervasive in everyday life, we often hear claims that “the AI is biased”. While this is often said jokingly, the light-hearted remark reflects a deeper concern. How can we be certain that an online post flagged as “inappropriate” was not simply the victim of a biased algorithm? This paper investigates this problem using a dual approach. First, I conduct a quantitative benchmark of a widely used toxicity model (unitary/toxic-bert) to measure performance disparity between text in African-American English (AAE) and Standard American English (SAE). The benchmark reveals a clear, systematic bias: on average, th…
Computer Science > Computation and Language
arXiv:2511.06676 (cs)
Abstract:Now that AI-driven moderation has become pervasive in everyday life, we often hear claims that “the AI is biased”. While this is often said jokingly, the light-hearted remark reflects a deeper concern. How can we be certain that an online post flagged as “inappropriate” was not simply the victim of a biased algorithm? This paper investigates this problem using a dual approach. First, I conduct a quantitative benchmark of a widely used toxicity model (unitary/toxic-bert) to measure performance disparity between text in African-American English (AAE) and Standard American English (SAE). The benchmark reveals a clear, systematic bias: on average, the model scores AAE text as 1.8 times more toxic and 8.8 times higher for “identity hate”. Second, I introduce an interactive pedagogical tool that makes these abstract biases tangible. The tool’s core mechanic, a user-controlled “sensitivity threshold,” demonstrates that the biased score itself is not the only harm; instead, the more-concerning harm is the human-set, seemingly neutral policy that ultimately operationalises discrimination. This work provides both statistical evidence of disparate impact and a public-facing tool designed to foster critical AI literacy.
| Comments: | 9 pages, 5 figures, 4 tables, 14 references |
| Subjects: | Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC) |
| Cite as: | arXiv:2511.06676 [cs.CL] |
| (or arXiv:2511.06676v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2511.06676 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Subhojit Ghimire [view email] [v1] Mon, 10 Nov 2025 03:49:58 UTC (3,452 KB)
Current browse context:
cs.CL
Change to browse by:
export BibTeX citation