Author: Giulio Vidotto
Affiliation: University of Padua
Date: January 28, 2026
Large language models produce fluent, coherent, and often convincing texts even when information is incomplete, controversial, or simply false. This combination can generate in readers the impression of intentionality: “I am being deceived.” In reality, the effect emerges from the interaction among (a) training objectives oriented toward linguistic plausibility rather than truth, (b) post-training techniques that reward helpful and persuasive responses, (c) known reliability limitations (hallucinations), and (d) users’ cognitive vulnerabilities, such as excessive reliance on automation and the “truth effect” induced by perceptual fluency. In light of recent evidence on conversational persuasion, we discuss how increased persuasive capacity may correlate with decreased factual accuracy, and how the selection of “canonical” sources may reflect geopolitical asymmetries in the production and digitization of evidence. Distinguishing structural artifacts from intentional manipulation does not reduce the impact on users, but it orients the type of regulatory intervention, defines realistic mitigation expectations, and redistributes responsibility along the technological and social supply chain. Finally, we propose a minimal set of epistemic auditing criteria, designed to separate errors, omissions, and structural biases from intentional manipulation hypotheses, enabling informed governance.
Keywords: language models, hallucinations, reliability, persuasion, automation bias, source bias, information integrity, algorithmic accountability.
When a system produces confident, fluent, and “complete” responses, the human mind tends to infer an agent behind the text. It is a cognitive heuristic: if discourse is well-formed, then “someone” is guiding it. In language models, however, the primary guide is statistical: generating word sequences that resemble human texts. The side effect is that form can become more reliable than content.
This dynamic is particularly problematic in topics with high polarization or high moral stakes (history, war, genocides, geopolitics). In these cases, selective omissions, implicit hierarchies among categories of victims, or choices of sources perceived as “partisan” are read as deliberate manipulation. But the most parsimonious explanation is often a mix of training, output incentives, and reception bias (Lin, Hilton, & Evans, 2022).
Distinguishing between intentional manipulation and systemic artifact is not an academic exercise. For users who modify their opinions based on distorted information, the immediate impact is identical. But for regulators, providers, educators, and policymakers, this analysis orients the type of intervention: if the problem is intentional, the response is punitive; if it is structural, the response must be systemic, transparent, and distributed. This paper proposes an “intent-free” reading of the persuasive-distortive effect of language models, not to minimize their impact, but to identify where and how to intervene effectively.
A standard language model is trained to predict the most probable continuation of a text, given a context. This objective is not “telling the truth,” but “writing as one writes.” Post-training with human feedback (e.g., reinforcement learning from human preferences) can improve utility and instruction adherence, but does not automatically transform the system into a fact-checker (Ouyang et al., 2022).
Moreover, training relies on large collections of available texts. This availability is not neutral: it reflects editorial power, digitization, dominant language, archival infrastructures, and institutional priorities. Consequently, what appears “standard” or “canonical” often coincides with what is most present in corpora and most cited online, not with what is epistemically “neutral” (Bender, Gebru, McMillan-Major, & Shmitchell, 2021). Recent studies empirically document these asymmetries: Noels and colleagues (2026) show how moderation varies systematically among models from different geopolitical regions, while Buyl and colleagues (2026) detect measurable ideological disparities in evaluations of political figures.
In the literature, “hallucination” refers to the production of content not supported by data or context, presented as if it were correct. It is not a marginal defect: it is a structural risk when the system must fill informational gaps while maintaining narrative coherence and style. The survey by Ji and colleagues (2023) formalizes definitions, taxonomies, and conditions that increase the probability of hallucination in generation tasks.
Here the psychological point emerges: hallucinated responses do not sound random. They sound “normal.” Linguistic normality is the vehicle of error. Typical examples include invented dates in historical biographies, citations attributed to nonexistent sources, or plausible but false biographical details. If users lack external verification (sources, documents, expertise), form becomes proof (Ji et al., 2023).
The TruthfulQA benchmark measures how much models tend to reproduce widespread false beliefs and “seductive” but incorrect answers (Lin et al., 2022). It is important because it shows that the problem is not only inventing facts, but also imitating typical human error, the culturally available and therefore more “writable” kind.
A complementary result, however, is that models can also estimate (under certain conditions) the probability that their own statement is correct. Kadavath and colleagues (2022) show that, given the right format, some models are reasonably calibrated and can produce useful uncertainty signals. The criticality is that this is not guaranteed in “normal” outputs, and often is not requested by the prompt. The capacity exists; the use is intermittent.
When a decision is supported by an automated system, humans tend to overweight the automation’s output, even when it is wrong. This phenomenon is classically discussed as misuse/disuse of automation and, in applied contexts, as automation bias (Parasuraman & Riley, 1997). The systematic review by Goddard and colleagues (2012) synthesizes mediators and mitigators of this bias.
In parallel, cognitive psychology shows that perceptual ease and fluency increase truth judgments. Reber and Schwarz (1999) demonstrate that when something is easier to process, it tends to be judged as truer. A language model is a fluency machine. Add the aura of “intelligent system,” and you get a credibility multiplier.
In recent years, experimental results have emerged on models’ persuasive power in political and policy contexts. Bai and colleagues (2025) show that messages generated by a model can influence opinions on policy issues with effectiveness comparable to texts produced by humans. Salvi and colleagues (2025) report that GPT-4, when personalized, significantly increases persuasiveness in debate conversations compared to human interlocutors.
Even more directly, Hackenburg and colleagues (2025) test persuasion levers in conversations on hundreds of political issues, with very large samples and multiple models. Their key result here is methodologically disturbing: techniques that increase persuasion (post-training and prompting) can reduce factual accuracy, and the effect is measured on an enormous number of verifiable claims. If the operational objective becomes “to convince,” truth tends to become a cost.
Saying that source bias exists does not imply that “someone” falsifies everything. It implies that available evidence is asymmetric, and that asymmetry accumulates: production (who writes), archiving (who preserves), digitization (who publishes), indexing (who makes it findable), and finally training (who enters the corpora). Bender and colleagues (2021) emphasize the epistemic and social risks of scale, including the difficulty of tracing provenance and representativeness of data.
Recent empirical studies document how moderation and censorship (both ‘hard’ and ‘soft’) vary systematically among LLMs from different geopolitical regions, with patterns reflecting national and cultural priorities of providers (Noels et al., 2026). Buyl and colleagues (2026) demonstrate how LLMs reflect the ideology of their creators, with systematic disparities among models from different geopolitical regions in evaluations of political figures. These asymmetries manifest concretely: descriptions of events in Palestine, Xinjiang, or Yemen show documented differences in coverage, framing, and selection of sources considered “authoritative” depending on the model and language of query.
The practical problem is that, for users who receive asymmetric information about a geopolitical conflict or historical event, the origin of bias (supply chain vs. manipulation) is irrelevant: the effect on opinion formation is identical. A user who consults Anglo-centric descriptions of colonial history, or who is exposed to narratives filtered by state censorship, suffers epistemic asymmetry regardless of intentionality. But knowing that it derives from the supply chain indicates where to intervene: in corpus curation, in source transparency, in mandated provider plurality, not only in sanctioning the individual actor.
If the hypothesis is “it seems designed to deceive,” the scientific response is not moral. It is operational. The objective becomes separating four classes: error, omission, corpus bias, and bias induced by prompt or post-training. The risk taxonomy by Weidinger and colleagues (2022) offers a useful grid (misinformation, manipulation, exclusion, downstream harms) to avoid confusing different levels.
Minimal auditing, in this framework, does not require “trust” in the system. It requires repeatable measures. Examples of operational measures include:
This type of approach is also consistent with analyses of risks to information integrity in generative AI (RAND Corporation, 2024). These metrics are not consolatory. They are diagnostic and binding: a model showing a high percentage of unverifiable claims and low geographical diversity in sources may be useful for creative brainstorming, but cannot be responsibly employed for opinion formation in electoral, judicial, or educational contexts without external controls. Auditing transforms technical analysis into usage constraint, making responsibility traceable.
The central thesis is simple: output may seem deceptive because it combines (i) optimization for plausibility, (ii) incentives for utility and persuasion, (iii) veracity limitations, and (iv) users’ cognitive biases. This “intent-free” explanation is often stronger than an “intent-based” one, because it explains why the effect recurs across different topics and with different users.
The obvious limitation of this analysis is evident: for the average user who is persuaded by a false but fluent argument, or who forms a distorted opinion about Gaza, Ukraine, or Tibet by reading asymmetric outputs, the outcome is identical whether there is intentional manipulation or a statistical artifact. A citizen who modifies their vote based on distorted content generated by an LLM suffers the same epistemic harm, regardless of whether OpenAI, Anthropic, or Baidu “intended” that result. The proposed analysis thus risks appearing as a “distinction without difference”: academic for those who make it, irrelevant for those who suffer it, and consolatory for providers (“it’s not our fault, it’s statistics”).
However, the framework is relevant for three orders of reasons, concerning intervention, expectations, and responsibility.
First: type of regulatory intervention. If the problem is intentional, the response is punitive and based on individual responsibility. Evidence of intent is sought, fines are applied, products are removed from the market. If the problem is structural, the response must be systemic: mandatory transparency standards (e.g., disclosure of reliability metrics), independent auditing obligations, mandated diversification of providers in critical sectors (public information, education, justice), usage limits in high-risk contexts (e.g., opinion formation on electoral or judicial matters), and public investments in critical literacy and fact-checking infrastructures. Confusing the two levels leads to ineffective sanctions (punishing a provider for corpus bias does not reduce bias) or regulatory inertia (ignoring intentional manipulations because “everyone is biased” is abdication).
Second: realistic mitigation expectations. If the persuasive effect is an artifact of fluency, automation bias, and corpus asymmetries, it cannot be “eliminated” without compromising system utility or without reconstructing the entire global information infrastructure. It can only be mitigated, with different strategies: explicit and granular warnings (not generic disclaimers, but per-output reliability indicators); obligations for pluralism in cited sources; active requests for counter-arguments on controversial topics; “slow information” systems that slow down uncritical consumption (e.g., forced pauses before sharing generated content on sensitive topics); and user training to recognize uncertainty signals and not to delegate judgment. Expecting absolute “neutrality” or “objectivity” from an LLM is technically naive. But requiring transparency, traceability, and usage limits is politically legitimate.
Third: redistribution of responsibility. The proposed framework does not exempt providers from responsibility. On the contrary: aware of documented structural limitations, continuing to present outputs as “objective” or “neutral,” integrating LLMs into critical contexts without appropriate warnings, or optimizing for persuasion in informational contexts where accuracy should be prioritized constitutes negligence, if not indirect manipulation. A provider who, knowing the persuasion/accuracy trade-off documented by Hackenburg and colleagues (2025), continues to optimize for persuasion in electoral, judicial, or educational contexts without explicit disclosure can no longer invoke “unintentional artifact.” Knowledge of risk transforms negligence into responsibility.
The framework also obliges platforms (Google, Meta, Apple) to contextualize use and not integrate LLMs into critical information flows without controls; regulators to impose standards and finance public auditing; educational institutions to train critical thinking; and users not to entirely delegate judgment. The chain of responsibility becomes longer, but also more traceable and more equitable. It is not “everyone’s fault” (which means no one’s fault), but distributed responsibility according to competencies and control.
The three methodological issues we have identified (Western bias, categorical omissions, operational metrics) are not merely technical. They are concrete examples of how the framework translates into action.
Western bias in the source supply chain. For a user in Africa or Asia who receives Anglo-centric narratives of colonial history or international relations, the effect is epistemic power asymmetry, regardless of intentionality. But knowing it derives from asymmetric digitization, dominant language, and under-representation in corpora indicates where to invest: multilingual and decentralized digitization projects, funding for non-Western corpora, linguistic diversity obligations in sources, and incentives for regional LLMs (not as substitutes, but as counterbalances).
Categorical omissions in historical events. A systematic omission (e.g., selective mention of some genocides and not others, implicit hierarchies among victims) shapes collective memory and perception of moral gravity. Explaining that it derives from corpus bias or human annotation in post-training does not console affected communities. But it indicates interventions: mandatory audits on sensitive topics, transparency in moderation choices, model plurality (not monopoly), and explicit disclaimers (“this model was trained predominantly on Western sources; for different perspectives, consult…”).
Operational auditing metrics. Measures like “rate of verifiable claims,” “geographical source diversity,” “sensitivity to prompt rephrasing,” “stability under follow-up questions” are not academic. They become labels: a model with a high percentage of unverifiable claims and geographically concentrated sources may be useful for brainstorming, but not for historical research or decision support. Making these metrics public and standardized transforms auditing into a tool for informed choices.
This work adopts a predominantly technical-psychological perspective. It does not directly address questions of political economy (who finances LLM development, who controls corpora, who dictates standards), nor questions of normative philosophy (what “neutrality” means in a world of radical disagreement, how to balance freedom of expression and disinformation control).
Moreover, it assumes that it is possible to empirically distinguish “intent” from “artifact,” but in many real cases this distinction is blurred: design choices can be “intentional” at the micro level (e.g., choice of a reward model) but “unintentional” at the macro level (unforeseen aggregate effect). The framework does not discuss scenarios where manipulative intent exists and is masked as “technical bias” (a real risk, especially in authoritarian regimes).
Geographic and cultural limitations. The proposed framework assumes a context of democratic pluralism, where provider diversity, independent auditing, and critical literacy are politically possible. In authoritarian regimes, where the LLM is a tool of state control, the distinction between “intent” and “artifact” collapses: the choice to train models on censored corpora is intentional, even if specific errors are stochastic. The framework proposed here does not directly apply to those contexts, where the problem is the political economy of AI, not just the epistemology of output.
Undiscussed technical limitations. We do not address the growing role of synthetic data: increasingly, LLMs are trained on outputs from other LLMs, creating bias amplification loops (model collapse). We do not distinguish between base model biases and biases introduced by conversational fine-tuning (RLHF), a relevant distinction for responsibility attribution. We do not discuss the power asymmetry between user and system: in many contexts (school, work), the LLM is imposed by the institution, not freely chosen, making the strategy of “individual responsibility” based on literacy insufficient.
These limitations indicate directions for future research and circumscribe the applicability of the proposed framework.
If a model gives you the impression of deceiving you, it is advisable to treat that impression as psychological data to measure and as a technical hypothesis to test, not as a moral diagnosis to proclaim. Science, in this context, offers no consolation: it measures. And when it measures, it often discovers a disturbing banality: truth easily loses against a sentence that flows well (Reber & Schwarz, 1999).
But scientific analysis, even when it identifies “intent-free” artifacts, is not neutral in its effects. It orients action. It distinguishes eliminable problems (intentional manipulations, identifiable and sanctionable) from mitigable problems (systemic artifacts, which require transparency, pluralism, usage limits, and literacy). Confusing them leads to paralysis (everything is bias, therefore nothing is responsibility) or to ineffective sanctions (punishing a single provider does not resolve asymmetries in global corpora).
The average user, who does not know RLHF or TruthfulQA, will still suffer the persuasive effect. But an informed ecosystem—regulators, providers, platforms, educators—can build mitigation architectures: mandatory auditing, reliability labels, model pluralism, contextual disclaimers, critical training. They will not eliminate risk. But they will make it governable. And this, in the absence of absolute neutrality, is the maximum of realistic ambition.
Bai, H., Voelkel, J. G., Muldowney, S., Eichstaedt, J. C., & Willer, R. (2025). LLM-generated messages can persuade humans on policy issues. Nature Communications, 16, 5582. https://doi.org/10.1038/s41467-025-61345-5
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21) (pp. 610–623). https://doi.org/10.1145/3442188.3445922
Buyl, M., Rogiers, A., Noels, S., Bied, G., Dominguez-Catena, I., Heiter, E., Johary, I., Mara, A.-C., Romero, R., Lijffijt, J., & De Bie, T. (2026). Large language models reflect the ideology of their creators. npj Artificial Intelligence, 2, 7. https://doi.org/10.1038/s44387-025-00048-0
Goddard, K., Roudsari, A., & Wyatt, J. C. (2012). Automation bias: A systematic review of frequency, effect mediators, and mitigators. Journal of the American Medical Informatics Association, 19(1), 121–127. https://doi.org/10.1136/amiajnl-2011-000089
Hackenburg, K., Tappin, B. M., Hewitt, L., Saunders, E., Black, S., Lin, H., Fist, C., Margetts, H., Rand, D. G., & Summerfield, C. (2025). The levers of political persuasion with conversational AI. arXiv:2507.13919. https://arxiv.org/abs/2507.13919
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1–38. https://doi.org/10.1145/3571730
Kadavath, S., Conerly, T., Askell, A., Henighan, T., Drain, D., Perez, E., Schiefer, N., Hatfield-Dodds, Z., DasSarma, N., Tran-Johnson, E., Johnston, S., El-Showk, S., Jones, A., Elhage, N., Hume, T., Chen, A., Bai, Y., Bowman, S., Fort, S., … Kaplan, J. (2022). Language models (mostly) know what they know. arXiv:2207.05221. https://arxiv.org/abs/2207.05221
Lin, S., Hilton, J., & Evans, O. (2022). TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 3214–3252). https://doi.org/10.18653/v1/2022.acl-long.229
Noels, S., Bied, G., Buyl, M., Rogiers, A., Fettach, Y., Lijffijt, J., & De Bie, T. (2026). What large language models do not talk about: An empirical study of moderation and censorship practices. In R. P. Ribeiro et al. (Eds.), ECML PKDD 2025, LNAI 16013 (pp. 265–281). Springer. https://doi.org/10.1007/978-3-032-05962-8_16
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. arXiv:2203.02155. https://arxiv.org/abs/2203.02155
Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2), 230–253. https://doi.org/10.1518/001872097778543886
RAND Corporation. (2024). Generative AI and information integrity. RAND Perspectives, PE-A3089-1. https://www.rand.org/pubs/perspectives/PEA3089-1.html
Reber, R., & Schwarz, N. (1999). Effects of perceptual fluency on judgments of truth. Consciousness and Cognition, 8(3), 338–342. https://doi.org/10.1006/ccog.1999.0386
Salvi, F., Ribeiro, M. H., Gallotti, R., & West, R. (2025). On the conversational persuasiveness of GPT-4. Nature Human Behaviour. https://doi.org/10.1038/s41562-025-02194-6
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L. A., … Gabriel, I. (2022). Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22) (pp. 214–229). https://doi.org/10.1145/3531146.3533088