To Generate or Discriminate? Methodological Considerations for Measuring Cultural Alignment in LLMs

View PDF HTML (experimental)

Abstract:Socio-demographic prompting (SDP) - prompting Large Language Models (LLMs) using demographic proxies to generate culturally aligned outputs - often shows LLM responses as stereotypical and biased. While effective in assessing LLMs’ cultural competency, SDP is prone to confounding factors such as prompt sensitivity, decoding parameters, and the inherent difficulty of generation over discrimination tasks due to larger output spaces. These factors complicate interpretation, making it difficult to determine if the poor performance is due to bias or the task design. To address this, we use inverse socio-demographic prompting (ISDP), where we prompt LLMs to discriminate …

View PDF HTML (experimental)

Abstract:Socio-demographic prompting (SDP) - prompting Large Language Models (LLMs) using demographic proxies to generate culturally aligned outputs - often shows LLM responses as stereotypical and biased. While effective in assessing LLMs’ cultural competency, SDP is prone to confounding factors such as prompt sensitivity, decoding parameters, and the inherent difficulty of generation over discrimination tasks due to larger output spaces. These factors complicate interpretation, making it difficult to determine if the poor performance is due to bias or the task design. To address this, we use inverse socio-demographic prompting (ISDP), where we prompt LLMs to discriminate and predict the demographic proxy from actual and simulated user behavior from different users. We use the Goodreads-CSI dataset (Saha et al., 2025), which captures difficulty in understanding English book reviews for users from India, Mexico, and the USA, and test four LLMs: Aya-23, Gemma-2, GPT-4o, and LLaMA-3.1 with ISDP. Results show that models perform better with actual behaviors than simulated ones, contrary to what SDP suggests. However, performance with both behavior types diminishes and becomes nearly equal at the individual level, indicating limits to personalization.


Comments:	IJCNLP-AACL 2025
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2601.02858 [cs.CL]
	(or arXiv:2601.02858v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2601.02858 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Saurabh Kumar Pandey Mr [view email] [v1] Tue, 6 Jan 2026 09:42:03 UTC (1,814 KB)

Submission history

Similar Posts