Content
Researchers at the University of Zurich found that AI-generated text can still be reliably distinguished from human writing. Their study shows that efforts to make models sound more natural often sacrifice accuracy.
Language models are increasingly being used to simulate human behavior in social research, for example as digital twins in surveys. The value of these methods depends heavily on how convincingly AI can imitate real people.
A specially trained BERT-based classifier distinguished AI-generated responses from human text with 70 to 80 percent accuracy, well above chance.
Model size didn’t seem to matter much. Larger models with more paramete…
Content
Researchers at the University of Zurich found that AI-generated text can still be reliably distinguished from human writing. Their study shows that efforts to make models sound more natural often sacrifice accuracy.
Language models are increasingly being used to simulate human behavior in social research, for example as digital twins in surveys. The value of these methods depends heavily on how convincingly AI can imitate real people.
A specially trained BERT-based classifier distinguished AI-generated responses from human text with 70 to 80 percent accuracy, well above chance.
Model size didn’t seem to matter much. Larger models with more parameters didn’t necessarily write more human-like text than smaller systems. Base models also often outperformed versions that had undergone instruction tuning.
Ad
THE DECODER Newsletter
The most important AI news straight to your inbox.
✓ Weekly
✓ Free
✓ Cancel at any time
The BERT classifier reliably identified AI text in almost all cases. Base models without instruction tuning often mimicked human behavior better than their fine-tuned counterparts. | Image: Pagan et al.
The researchers tested nine open language models on their ability to convincingly imitate user interactions on X, Bluesky, and Reddit. The lineup included Apertus, Deepseek-R1, Gemma 3, Qwen2.5, and various versions of Llama 3.1 and Mistral 7B.
Sophisticated techniques often backfire
Developers typically use complex strategies to make AI text sound more natural, including detailed persona descriptions and fine-tuning with specific data. The study found these complex interventions often failed or even made text easier to identify as artificial.
"Some sophisticated strategies, such as fine-tuning and persona descriptions, fail to improve realism or even make text more detectable," the researchers write.
Adding style examples and context lowered detection rates (bars pointing down), while complex methods like fine-tuning often made things worse. | Image: Pagan et al.
Simpler approaches worked better. Showing the AI specific writing style examples or providing context from previous posts measurably lowered detection rates. Even so, the analysis software could usually still identify the text as AI-generated.
Human tone vs. accurate content
One of the study’s key findings is a fundamental tradeoff: optimizing for human tone and accurate content at the same time appears nearly impossible. When researchers compared AI text to real responses from the people being simulated, they found that disguising AI origins often meant drifting away from what the actual human would have said.
Recommendation
Selecting the most human-sounding AI response (ML Optimal) significantly reduced content alignment with actual human responses. | Image: Pagan et al.
"Our findings […] identify a trade-off: optimizing for human-likeness often comes at the cost of semantic fidelity, and vice versa," the authors write.
This creates a dilemma. Models can either nail the style, tone, and sentence length to appear human, or stay closer to what a real person would actually say. According to the study, they struggle to do both in the same response.
Emotions remain the biggest weakness
While structural features like sentence length can be tweaked, emotions remain a major sticking point. Models consistently failed to capture human tone, particularly in emotional or aggressive language. AI models also struggled with platform-specific jargon—specifically emojis, hashtags, and emotional cues.
Deception worked fairly well on X—which doesn’t seem too surprising—but it was much harder to pull off on Reddit, where the communication style is a lot more blunt. The study warns that researchers should be cautious about using language models as substitutes for real human communication without careful checking.
Previous work by researchers at MIT and Harvard University explored automating social science research using AI agents and causal models. Even then, the biggest challenge was transferring results to real human behavior.
The findings also validate AI text detectors like Pangram, which claim high recognition rates. Contrary to earlier assumptions, AI text appears detectable by machines even without complex tools like watermarks. Whether this matters socially, however, is a cultural question rather than a technical one.