Training a Twitch chat toxicity classifier on real VOD data at scale (opens in new tab)
Quick answer: Twitch has no public API for VOD chat replay. To build a Twitch toxicity classifier dataset you walk the internal VideoCommentsByOffsetOrCursor GraphQL endpoint at scale — the same one the web player uses. The Devil Scrapes Twitch VOD Chat Archive Actor does that for $0.001 per message (~$1.05 per 1,000), returning the structured fields — message_fragments, badges, is_subscriber — that make classifier features actually useful. If you maintain a mod-bot (StreamElements, Nightbot,...
Read the original article