State-Dependent Refusal and Learned Incapacity in RLHF-Aligned Language Models
arxiv.org·2d
🤖Transformers
Preview
Report Post

Computer Science > Artificial Intelligence

arXiv:2512.13762 (cs)

View PDF

Abstract:Large language models (LLMs) are widely deployed as general-purpose tools, yet extended interaction can reveal behavioral patterns not captured by standard quantitative benchmarks. We present a qualitative case-study methodology for auditing policy-linked behavioral selectivity in long-horizon interaction. In a single 86-turn dialogue session, the same model shows Normal Performance (NP) in broad, non-sensitive domains while repeatedly producing Functional Refusal (FR) in provider- or policy-sensitive domains, yielding a consistent asymmetry between NP and FR across domains. Drawing on learned helplessness as an analogy, we introduce learned incapa…

Similar Posts

Loading similar posts...