Introduction
Popular culture often lauds the power of intuition. Albert Einstein famously characterized the intuitive mind as a “sacred gift,” while Steve Jobs attributed his groundbreaking success to following his gut feeling1,2. This admiration for intuition is also reflected in widely publicized examples like …
Introduction
Popular culture often lauds the power of intuition. Albert Einstein famously characterized the intuitive mind as a “sacred gift,” while Steve Jobs attributed his groundbreaking success to following his gut feeling1,2. This admiration for intuition is also reflected in widely publicized examples like Captain Chesley “Sully” Sullenberger, who saved his passengers by making a split-second decision to land his damaged plane on the Hudson River3. Popular science books such as Malcolm Gladwell’s best-selling Blink4 have further elevated intuitive thinking, seemingly placing it on a pedestal.
Yet, at the same time, we’re all too frequently reminded to take the time to deliberate and think things through rather than to merely act on our first hunch5,6,7,8. This tension raises a fundamental question: How do we actually perceive intuitive versus deliberate reasoning? Do we want people to rely on intuition or deliberation when they reason? Are we more likely to trust someone’s advice if they relied on their intuition, or if they took the time to think things through?
For decades, cognitive science has characterized human thinking as an interplay between fast, intuitive and slower, deliberate thought processes (or System 1 and 2, as they are often referred to [e.g., 9]). While this research has illuminated the mechanics of both systems10,11,12, we lack understanding of how people perceive and value these distinct modes of thought—what might be termed a “folk theory” of fast-and-slow thinking13.
Although people may prefer intuition when making decisions about subjective choices (e.g., what partner to date14,15), some initial work does seem to suggest that humans—even from a young age16—generally tend to prefer deliberation over intuition when facing more objective, cognitively challenging reasoning tasks14,17. However, this early work often conflates thinking style and accuracy. People may prefer deliberation because they believe it will be more accurate. But what if we know that the decision maker is an experienced expert who is highly accurate? This question is particularly relevant given that celebrated cases of intuitive success—from Jobs to Sullenberger to Gladwell’s examples—typically showcase experts who are consistently accurate. So, what do we prefer if two individuals are both highly accurate? Do we trust the individual who intuitively “sees” the right solution more than the one who needs to spend time and effort to arrive at an answer?
Interestingly, reasoning research has indicated that when faced with challenging brain teasers and reasoning problems, the most accurate reasoners often arrive at sounds answers intuitively10,18,19,20. Likewise, cognitive capacity seems to be more predictive of generating a sound intuition, rather than of correcting a faulty intuition through deliberation21,22,23. If our folk beliefs have picked up on this insight, there may be good reasons to prefer intuition.
The inverse scenario presents equally compelling questions: How do we perceive extensive deliberation that ultimately leads to error? Does effortful—yet unsuccessful—analysis command more respect than quick misjudgements? These questions carry profound implications for understanding public trust in our judgment, particularly crucial in an era where we face intense exposure to others’ opinions and recommendations24.
Beyond human reasoning, insight into our folk beliefs is also relevant for Artificial Intelligence development13. Current approaches aim to enhance Large Language Models by emulating deliberate “System 2” reasoning through chain-of-thought prompts and extended computation time25,26,27. If our folk beliefs indeed favour deliberation, these developments may not only improve AI accuracy but also public trust in its recommendations—potentially helping to overcome possible algorithm aversion28.
Critically, beyond identifying our folk beliefs, it is crucial to understand their underlying nature. Specifically, if we have a preference for one reasoning mode over the other, does this preference require us to engage in deliberation or does it result from mere intuitive processing? Such insights will allow us to gauge whether our preferences are stable or if they shift depending on whether we have the time and capacity to reflect on them.
To start addressing these issues we ran 13 studies using short vignettes. Each vignette described an individual’s reasoning style (either fast intuition or slow deliberation) and past accuracy (high, low, or unspecified). Participants then rated the quality of the individual’s reasoning by rating, on 11-point scales, how good and smart they perceived the reasoners to be and whether they would follow their advice. Study 1 introduced this paradigm, while Studies 2–7 tested the robustness of the findings.
In Studies 8–9 we presented the vignettes to Large Lange Models (ChatGPT 3.5 and 4) to test whether they had picked up on the human preferences. Finally, Studies 10–13 tested the nature of humans’ folk beliefs. We used time-pressure and load manipulations to experimentally reduce deliberate reasoning during the evaluation process. Since deliberation requires both time and cognitive resources, this allowed us to determine whether people’s preference for intuition or deliberation depends itself on intuitive versus deliberate reasoning.
Methods
Inclusion and ethics statement
Our experimental procedures were approved by the Comité d’Ethique de la Recherche Université Paris Cité protocol 00012023-151.
Open science and data
The research question and study design were preregistered on AsPredicted (https://aspredicted.org/). No specific analyses were preregistered. Preregistration links and dates are presented in Table 1. All data, material, preregistrations and analysis scripts can be retrieved from our OSF page: https://osf.io/6pc2e/?view_only=a5beec9676914e82baf0bbf4a10e4b64.
Participants
All human participants were recruited on the Prolific Academic platform. They gave their informed consent prior to taking part in the studies.
Studies 1–5 and 12–13
In each study we aimed to recruit 240 participants. This sample size allowed us to detect medium effects (0.23) between intuitive and deliberate vignettes with 95% power. Participants were paid at a £6.00 per hour rate. They had English as their first language and held UK, USA, Ireland, Australia, Canada, or New Zealand nationality. The full sample in each study was requested to be gender balanced. Small sample deviations (n +/− 1 participants) occurred due to platform-related technical issues. Full demographic sample details can be found in Table 2 and SI A.
Studies 10–11
For the between-subject time pressure Studies 10–11 we also aimed to recruit 240 participants in each timing condition, for a total of 480 participants in each study. Other specifications (including compensation) were similar to Study 1 above. Small sample deviations (n +/− 3 participants) occurred due to platform-related technical issues so we ended with 242 (deadline) + 240 (forced deliberation) participants in Study 10 and 237 (deadline) + 242 (forced deliberation) participants in Study 11. Full demographic sample details can be found in Table 2 and SI A.
Studies 6–7
Participants in Study 6 (n = 240, 141 male, 94 female, 5 other/undisclosed) were of French nationality and residing in France. Participants in Study 7 were of Indian nationality (n = 184, 98 male, 84 female, 2 other/undisclosed) and residing in India. Because of the smaller pool of participants with these specifications on the platform, we increased the payment to a £9.00 per hour rate and dropped the gender balance requirement. As preregistered, since we did not meet the intended sample of 240 participants for Study 7 after two weeks, we closed the study and analyzed the available data. Full demographic sample details can be found in Table 2 and SI A.
Vignettes material
Participants were presented with six, short experimental vignettes describing an individual’s reasoning style (intuitive or deliberative) and past accuracy (high, low, or unspecified). To illustrate, here are examples from the intuitive-high accuracy and deliberation-low accuracy items from Study 1:
“Person A follows their intuition when reasoning about a problem. They do not spend much time or effort to arrive at a conclusion. The accuracy of their answers is very high.”
“Person B reflects deeply when reasoning about a problem. They spend a lot of time and effort to arrive at a conclusion. The accuracy of their answers is very low.”
In the “unspecified” vignettes the last sentence with the accuracy information was removed. Each of the vignettes was labelled with a unique letter and specified a unique combination of the reasoning mode and accuracy factor (see SI B for full vignette texts). Vignettes were presented in a counterbalanced order such that each combination of the reasoning mode and accuracy factor item would be presented as the first item an equal number of times. The presentation order of the remaining items was randomized. In addition to the six experimental vignettes we also presented an extra filler vignette with the text “There is no information about this person’s profile”. The general task instructions (see SI B) indicated the vignettes depicted situations where individuals encounter complex reasoning tasks.
Except for Studies 3–4 and Study 6 all studies presented the same vignettes. Study 6 adopted French translations of the English items. Study 3 used a cover story whereby we added exact numerical information to the high (95%) and low accuracy (5%) text that referred to the previous performance of the described individual (instruction text: “We will give you information about the reasoning style of different individuals. We want to know how you evaluate their reasoning skills. These people have been tested on a range of reasoning tasks and for some of them you’ll be given information on their exact performance (i.e., their mean score %).”) Study 4 modified the text of the highly accurate and intuitive profile to “Person A follows their intuition when reasoning about a problem. They do not need to spend much time or effort to arrive at a conclusion.” to stress the efficiency of fast intuition. Full vignette text and study instructions are presented in SI B.
Rating scales
For each vignette, participants rated the quality of the described individual’s reasoning by clicking on the corresponding value on an 11-point rating sale (0–10). Except for Studies 5 and 11–13, we presented three different scales (“How good is this person at reasoning?”; “How smart is this person?”; “To what extent would you follow this person’s advice about a reasoning problem?”). The scale labels were inspired by previous work measuring decision quality17,29. The three scales were presented simultaneously, directly under the vignette on one single page. For studies 5 and 11–13 we used a unified, single rating scale (“How good is this person at reasoning?”). End and midpoints of the scales were labelled. Higher values indicated higher perceived reasoning quality. Participants clicked on the scale number of their choice to indicate their evaluation. Participants were shown examples of the rating scales in the instructions and were given one practice trial to familiarize themselves with the response selection (see SI B for full instructions).
Ranking question
After participants had rated the four core vignettes (i.e., high/low accuracy intuitor and deliberator), they were asked to directly rank-order them by perceived intelligence. We opted for the label “intelligence” to avoid repeating the rating scale labels. Participants saw the four vignettes and clicked on a rank order for each one of them with the following instructions: “From the descriptions below, please now rank the people from the most intelligent (number 1) to the least intelligent (number 4). You can only place one person at each rank (i.e., no ties).” – (see SI B)
Incentivized betting question
At the end of Study 2, participants were given the following instructions: “After this study is completed we will have four individuals with profiles matching the ones you’ve seen solve a reasoning problem. You will get to see the specific profiles in question again below and can bet on one of them. If the person you select solves the problem correctly, you will get a bonus payment of 1 pound. Click on the profile you’d like to bet on, then click on Next to validate.” Participants then saw the four profiles they previously rank-ordered. Note that given the use of deception and the hypothetical nature of our cover story, all participants were debriefed about the nature of the study and given the bonus payment.
LLM Studies 8–9
To test whether Large Language Models (LLMs) would capture human patterns, we used the OpenAI API to present ChatGPT models with our vignettes. We settled with ChatGPT 3.5 and ChatGPT 4 in Study 8 and 9, respectively. To best simulate human testing conditions, we ran 240 API queries in each study where we presented the vignettes in a randomized order, similarly to Study 1. We used default settings with the temperature of each model set to 1. For each query, we asked the model to rate each profile using the 3 scales. Each query was thus designed to simulate one “participant” (for further details, see SI C). These studies were not preregistered.
Time pressure studies 10–11
Participants in Studies 10–11 were randomly allocated to either a time-pressure (restricted deliberation condition) or forced deliberation condition30. Study 10 used the original 3-scale format from Study 1, Study 11 implemented the single-scale format from Study 5. Response deadlines were calibrated based on the reaction times (RT) in the unconstrained Study 1 and 5. These indicated that reading the vignettes and selecting a rating on the response scales took on average 20.6 s (SD = 23.2 s) with the 3 scales in Study 1 and 12.1 s (SD = 16.3 s) with one scale in Study 5. To put participants under time-pressure we based the response deadline in Study 10 and 11 on the first quartile of the RT distribution in Study 1 (12 s) and Study 5 (6.3 s) rounded to the nearest integer, respectively (a similar approach had been previously used31). A few seconds before the deadline (3 s and 1.5 s, respectively) the background color of the screen turned yellow to warn participants about the upcoming deadline. Participants were reminded to respond faster on the next trials, if they missed the deadline.
In the forced deliberation condition in Study 10 and 11 participants were instructed to reflect on their choice for at least 20 s. Ratings could not advance before these 20 s had past. Note that the forced deliberation was intended as an unrestricted contrast condition for the time-pressure condition since forcing people to take more time to reflect does not imply that deliberation will be boosted per se32,33. The point is simply that in the forced deliberation condition, deliberation was not restricted.
Participants were given 2 practice trials to familiarize themselves with the timing conditions in which they evaluated a mock vignette. A comparison between average response times in the time-pressure conditions in Study 10 (M = 9.9 s, SD = 1.8 s) and 11 (M = 5.2 s, SD = 1 s) vs the corresponding original unconstrained Studies 1 and 5 established that participants responded significantly faster in the time-pressure conditions (t(2126.16) = 38.62, p < 0.001 and t(2135.62) = 39.80, p < 0.001, respectively).
For completion, note that for exploratory purposes we also adopted a time-pressure and forced deliberation format for the ranking question at the end of the study (with a maximal and minimal response time of 25 s and 50 s, respectively). Full study instructions can be found in SI D.
Two-response studies 12–13
Studies 12 and 13 employed a two-response paradigm34. The studies adopted the single-scale format from Study 5. Participants gave two consecutive ratings to each vignette: an initial “intuitive” rating under time pressure and cognitive load, followed by a second “deliberate” rating without restrictions.
The load task was based on previous work35. Before each vignette was initially presented, participants briefly saw a complex visual matrix pattern for 2 s (see SI E). After participants had entered their initial vignette rating, they needed to select the to-be-remembered matrix among four candidates (see SI E). Afterward, the vignette was presented again and participants could take all the time they wanted to reflect on their final evaluation without any constraints.
In Study 12 the initial response deadline was 6 s (as in Study 11) and the memorization concerned a 4-cross pattern in a 3 × 3 grid. Given there is no absolute established timing or load deliberation threshold10, following previous work34, we tried to further limit the theoretical possibility for deliberation during the initial response stage by further restricting the deadline (5.5 s) and increasing the load (5-cross pattern in a 4 × 4 grid, see SI E) in Study 13.
1.5 s before the deadline the background color of the screen turned yellow to warn participants about the upcoming deadline. Participants were reminded to respond faster on the next trials or pay more attention to memorizing the load pattern, if they missed the deadline or failed the load. Participants were familiarized with the two-response procedure in 2 practice trials with mock vignettes. Full study instructions can be found in SI E.
Trial exclusions for time-pressure and two-response studies 10 –13
In line with our preregistration, we discarded data from one participant who missed the deadline on more than 2 trials in Study 10 and further discarded 45 individual missing trials (i.e., trials with more than 1 rating scale value out of 3 missing; out of 3367, i.e., 1.3% of trials). We also discarded data from two participants who missed more than 2 trials in Study 11. We replaced single missing values with sample means in Study 10 in 80 trials (out of 3367, i.e., 2.4% of trials) and did the same in Study 11 for 88 trials (out of 3339, i.e., 2.6% of trials). See SI D for details.
In Study 12, we discarded data from 5 participants who missed the deadline on more than 2 trials and/or who failed the load task in more than half of their trials and we similarly discarded data from 43 participants in Study 13. We replaced single missing values with sample means in Study 12 in 100 trials (out of 1645, i.e., 6.1% of trials) and did the same in Study 13 for 77 trials (out of 1379, i.e., 5.6% of trials). See SI E for details.
Statistical analysis
Data distribution for ANOVAs was assumed to be normal but this was not formally tested. Note that whenever Mauchly’s test indicated that the assumption of sphericity had been violated, we report Greenhouse–Geisser corrected ANOVA results. All analyses (all two-sided) were done using the following R packages (in alphabetical order): afex36, broom37, effectsize38, emmeans39, gginnards40, ggpubr41, ggrepel42, gt43, here44, janitor45, knitr46, openai47, patchwork48, rstatix49, scales50, tidyverse51. Computation of Cohen’s d for contrasts and additional Bayesian analyses were run in JASP52.
Results
Study 1: General preference for deliberation over intuition
Correlations among our three rating scales were consistently high (Study 1: 0.79 ≤ rs ≤ 0.87; all other studies 0.78 ≤ rs ≤ 0.91, see SI F) and were combined into a single composite preference scale reflecting the overall perceived reasoning quality. Figure 1 shows the results. As indicated in Table 3, a 3 (Accuracy Information: High, Low, Unspecified) x 2 (Reasoning Mode: Intuition, Deliberation) within-subjects ANOVA on the composite rating showed there was a main effect of Accuracy information, F(1.65, 392.91) = 1264.5, p < 0.001, ηp2 = 0.84 [0.82, 1.00]. Participants preferred reasoners with high accuracy over those with low accuracy, with the unspecified condition in between, indicating that the accuracy manipulation was effective as intended.
Fig. 1: Perceived reasoning quality as a function of an individual’s portrayed past accuracy and reasoning mode in Study 1.
Violin plots showing the overall perceived reasoning quality averaged across 3 rating scales (i.e., “how good is this person at reasoning?”; “how smart is this reasoner?”; “to what extent would you follow their advice?” – see SI B) for n = 239 participants. Black dots indicate the average. Horizontal lines represent the 25th, 50th and 75th percentiles. Unsurprisingly, individuals described as having higher past accuracy are preferred over individuals with lower (or unspecified) accuracy. Key result is that individuals who are portrayed as adopting a deliberative reasoning mode (blue) are consistently rated as superior to individuals portrayed as being more intuitive (green), irrespective of past accuracy.
More critically, there was a main effect of Reasoning mode, F(1, 238) = 348.6, p < 0.001, ηp2 = 0.59 [0.53, 1.00], and a Reasoning mode x Accuracy interaction, F(2, 476) = 80.47, p < 0.001, ηp2 = 0.25 [0.20, 1.00], indicating a general preference for the deliberate over the intuitive reasoner that was strongest when accuracy information was absent and slightly attenuated in the high- and low-accuracy conditions. Importantly, the preference for deliberation never disappeared or reversed and remained significant across all accuracy conditions (7.6 ≤ ts ≤ 19.3, 4 × 10−50 ≤ ps ≤ 3.9 × 10−13, 0.53 ≤ ds ≤ 1.58, see Supplementary Table 6 in SI G). These findings thus indicate a general preference for deliberation over intuition, regardless of accuracy.
To check for potential bias from the within-subject design, we also ran a between-subjects analysis on the first item each participant evaluated, yielding similar results (see SI H). For further validation, after participants had rated all vignettes, they were also asked to directly rank-order four core vignettes (i.e., high/low accuracy intuitor and deliberator) by perceived reasoning quality. Average ranking scores confirmed the rating scale conclusions, with deliberate reasoners ranked higher than intuitive ones (see SI I).
Study 2–7: Robust preference for deliberation
Study 2–7 served as control studies to validate the robustness of the findings and rule out alternative explanations. Study 2 concerned a simple direct Study 1 replication, while Study 3 and 4 altered vignette content to address specific concerns. One concern was that our verbal accuracy label (e.g., “The accuracy of their answers is very high”) might still allow participants to infer that deliberation implicitly yields higher accuracy than intuition, artificially boosting deliberation ratings. To counter this, Study 3 presented a cover story with exact numerical scores on a reasoning test (e.g., “The accuracy of their answers is very high (95%)”), with identical scores for both intuitive and deliberative vignettes.
Another concern was that the intuitive vignette wording (“They do not spend much time or effort to arrive at a conclusion”) might skew evaluations negatively by not sufficiently emphasizing the efficiency of fast intuition. In Study 4, we adapted the phrasing to emphasize efficiency (e.g., “They do not need to spend much time or effort to arrive at a conclusion”) following pragmatic implicature principles53. Study 5 addressed potential bias from asking participants for three consecutive ratings by adopting a single evaluation scale for overall reasoning quality.
Participants in Studies 1–5 were all native English speakers living in anglophone, Western countries. Studies 6–7 tested for minimal generalization of the results across languages and cultures. Study 6 tested French-speaking participants in continental Europe, while Study 7 tested Indian nationals living in India. ANOVA results are reported in Table 3.
As shown in Fig. 2A, all studies replicated the primary pattern from Study 1 (see SI G for details). Although accuracy information attenuated the preference, participants consistently favoured deliberation over intuition in all conditions (4.19 ≤ ts ≤ 20.42, 6.5 × 10−54 ≤ ps ≤ 4.3 × 10−5, 0.26 ≤ ds ≤ 1.69, see Supplementary Table 6 in SI G).
Fig. 2: Perceived reasoning quality as a function of an individual’s portrayed past accuracy and reasoning mode across all studies.
Panel (A) shows the results for Studies 1–7 (n = 239, 241, 240, 240, 241, 240 and 184, respectively) indicating that the general preference for deliberation over intuition is robust and replicated in all control studies. Panel (B) shows that Large Language Models (Study 8: ChatGPT 3.5; Study 9: ChatGPT 4; n = 240 each) reflect the human preferences and also consistently favor deliberation over intuition. Results under the “Human” label concern the data from participants in Study 1–7 (n = 1625). Panel (C) shows the results from Studies 10–13 that experimentally restricted deliberation during the evaluation process. Study labels: 10 = timing (n = 241 and 240 with and without restriction, respectively), 11 = timing-1-scale (n = 236 and 241 with and without restriction, respectively), 12 = two-response (n = 235), 13 = two-response-hard (n = 197). Light-colored dashed lines indicate conditions where deliberation was experimentally restricted, while dark-colored solid lines indicate unrestricted conditions. Results indicate that the general preference for deliberation over intuition was still observed under constraints suggesting it is primarily intuitive in nature. Note that for studies adopting multiple rating scales (Study 1–4, Study 6–10), displayed results again concern the overall average. Error bars indicate SDs, dots indicate the average.
Incentivized preference test
At the end of replication Study 2 we also tested whether participants’ verbal preference was consequential. Although participants verbally rated deliberation more highly, this preference could potentially stem from social desirability rather than genuine conviction. To test this, we used an incentivized paradigm where participants could double their participation fee by betting on the performance of either a highly accurate intuitor or deliberator (as well as a lowly accurate intuitor and deliberator, for control purposes). A cover story explained that each individual had solved a reasoning test problem, and participants could bet on one of them, receiving a £1 bonus if the individual’s answer was correct. In line with the rating findings, participants were far more likely to bet on the highly accurate deliberator (76.8%) than on the intuitor (21.6%), χ²(3) = 372, p < 0.001.
Interestingly, before the betting question participants also rank ordered the vignettes (unincentivized, see Study 1). Of the participants who ranked the highly accurate deliberator higher than the intuitor, 86.6% also betted on the deliberator. However, among those whose ranking showed a preference for the intuitor, only 46.4% opted for it when money was at stake in the betting question (see SI J). Thus, when choices had real monetary consequences, the preference for deliberation over intuition became even more pronounced.
Study 8–9: Large language models reflect human preferences
In Studies 8–9, we tested whether Large Language Models (LLMs), specifically ChatGPT 3.5 and 4, would mirror the human preference pattern. Using our original vignettes and rating questions, we ran 240 API queries with each model to simulate the same number of subjects as in our individual previous studies. Figure 2B displays the LLM results alongside the aggregate human data from Studies 1–7.
As shown in Fig. 2B, both ChatGPT 3.5 and 4 produced response patterns strikingly similar to those of human participants: a consistent preference for deliberation over intuition, strongest when accuracy information was absent and slightly reduced but still significant in the high- and low-accuracy conditions (17.02 ≤ ts ≤ 56.97, 8.6 × 10−135 ≤ ps ≤ 6.4 × 10−43, 1.44 ≤ ds ≤ 4.77; see Supplementary Table 13 in SI K for details). Notably, the absolute ratings in each condition closely aligned with human responses, with condition means differing on average by 0.62 points for ChatGPT 3.5 and 0.44 points for ChatGPT 4 on the 11-point scale—representing less than 7% deviation from human ratings in Study 1 (see SI K).
These results suggest that LLMs like ChatGPT have captured human preferences for deliberation over intuition, likely because these patterns are well-represented in the language data used to train the models. This alignment underscores that the preference for deliberative reasoning may be a widely shared belief evident in common language and communication.
Study 10–13: Intuitive nature of deliberation preference
Since Studies 1–9 consistently showed a preference for deliberation over intuition, Studies 10–13 examined the cognitive nature of this preference. One possibility is that the preference for deliberation arises intuitively, without requiring reflection. Alternatively, it might be that individuals actually have an initial intuitive preference for intuition, which shifts towards deliberation when given time to reflect. To test this, we used ever more challenging time-pressure and load manipulations to experimentally reduce deliberate reasoning during the evaluation process in Studies 10–13. If the deliberation preference is itself intuitive, we should observe it even when deliberation is minimized. Conversely, if it requires deliberation, the preference should disappear or reverse when participants are forced to rely on intuition.
In Study 10, we used the original 3-scale format from Study 1, with one group of participants completing evaluations under time pressure (12 s deadline) and a second group under a forced deliberation condition30, which required participants to reflect for at least 20 s before responding. Study 11 implemented a similar design using the single-scale format from Study 5, allowing for a more rigorous time constraint (6 s deadline). Studies 12 and 13 employed a two-response paradigm18,54, in which participants gave an initial “intuitive” single-scale rating under time pressure and cognitive load, followed by a second “deliberate” single-scale rating without restrictions. In the cognitive load task, participants memorized complex visual patterns (4 crosses in a 3 × 3 matrix in Study 12, increasing to 5 crosses in a 4 × 4 matrix in Study 1334,). Time-pressure durations were calibrated based on the vignette reading and rating response times in the unconstrained deliberation Studies 1 and 5 (see Methods), with Study 13 incorporating a further reduction (5.5 s deadline) based on initial response times in Study 12.
Figure 2C presents the results. In each study, light-colored dashed lines indicate conditions where deliberation was experimentally restricted, while dark-colored solid lines indicate unrestricted conditions. Visual inspection of Fig. 2C shows that restricting deliberation had minimal impact on the preference for deliberation: participants consistently preferred deliberation over intuition, even under significant constraints, closely matching the patterns seen in Studies 1–7 and the unrestricted conditions in Studies 10–13. This suggests that the preference for deliberation does not depend on deliberate reasoning but is primarily intuitive.
For statistical analysis, we conducted 3 (Vignette Accuracy Information: High, Low, Unspecified) x 2 (Reasoning Mode: Intuition, Deliberation) x 2 (Deliberation Restriction: Restricted, Unrestricted) ANOVAs for each study. Deliberation restriction was a between-subjects factor in Studies 10–11 and within-subjects factor in Studies 12–13. Table 4 presents the results. All studies showed main effects for Accuracy (390.23 ≤ Fs ≤ 1661.59, 5.9 × 10−250 ≤ ps ≤ 4.8 × 10−77, 0.67 ≤ ηp2 ≤ 0.79) and Reasoning Mode (226.42 ≤ Fs ≤ 667.68, 1.37 × 10−92 ≤ ps ≤ 3.01 × 10−36, 0.49 ≤ ηp2 ≤ 0.59), as well as an Accuracy x Reasoning Mode interaction (19.99 ≤ Fs ≤ 145.20, 2.58 × 10−55 ≤ ps ≤ 5.43 × 10−9, 0.09 ≤ ηp2 ≤ 0.25), consistent with Studies 1–9. Crucially, in all accuracy conditions, participants consistently preferred deliberation over intuition across both restricted and unrestricted conditions (5.04 ≤ ts ≤ 19.78, 3.1 × 10−62 ≤ ps ≤ 9.3 × 10−7, 0.37 ≤ ds ≤ 1.72, see Supplementary Table 16 in SI L).
ANOVA results further revealed (marginally) significant main effects for Deliberation Restriction (2.75 ≤ Fs ≤ 19.55, 1.23 × 10−5 ≤ ps ≤ 0.099, 0.01 ≤ ηp2 ≤ 0.06) and a Restriction x Accuracy interaction (15.06 ≤ Fs ≤ 38.10, 7.55 × 10−16 ≤ ps < 3.5 × 10−6, 0.03 ≤ ηp2 ≤ 0.16), indicating that unrestricted deliberation led to generally lower ratings, particularly in the low-accuracy condition. In Studies 12 and 13, we also found a Restriction x Reasoning Mode interaction (9.62 ≤ Fs ≤ 11.70, 7.59 × 10−4 ≤ ps ≤ .002, 0.04 ≤ ηp2 ≤ 0.06), with a three-way interaction observed in Study 13 (F(2, 392) = 4.072, p = 0.018, ηp2 = 0.02 [0.00, 1.00]). Here, lower ratings post-deliberation tended to be slightly more pronounced for intuitive vignettes, particularly in the absence of accuracy information. In Study 13’s unspecified accuracy condition, for example, the deliberation preference gap narrowed from 2.80 points (SD = 2.98) in the unrestricted condition to 1.89 points (SD = 2.77) in the restricted condition. Nonetheless, even in this specific case, the deliberation preference remained robust, neither disappearing nor reversing (t = 9.59, p < .001, d = 0.88 [0.68, 1.08]). Together, these results support the conclusion that the preference for deliberation over intuition is itself predomi