How X national origin label is not a magic 8-ball at all

Personal essays

Deepfake regulation

A shower thought just came up; what if everytime a LLM produce an synthetic photo or video, it gets automatically submitted to a database as a hash, similar to the GIFCT’s Hash-Sharing Database? Social media platforms could then use it as a reference point to easily flag likely deepfake photos and videos.

There could be a legislation mandating LLM makers to share the hash of every photos and videos to the database which are generated by their models. This would result in a situation where major LLMs become online platforms instead of offline software tools. Due to resource constraints there can be a quota of how much synthetic media can be generated on such platforms, which might result in an artificial scarcity of AI-generated media which wo…

Personal essays

Deepfake regulation

However, the downsides are those measures might be easily circumvented and possible violations of some civil liberties.

International Space Station

I am of the opinion that there should be serious efforts to save the International Space Station instead of destroying it by letting it burn through the atmosphere, due to its historical significance.

There should be a lottery across the world to pool as much money as possible in, with the prize being the ability to send any appriopriate sized physical objects to be put into the ISS, like heirlooms, stamp collections or hard drives containing any information, before being sent into a heliocentric orbit through a modified Starship. Perhaps if only the Peregrine lander, which carried a lot of time capsules and other artifacts, could be put into such orbit in interplanetary space, instead of being destroyed.

Signs of AI-assisted writing

Although this particular essay is imported from Wikipedia, this should not be construed as an instruction to belittle editors who used AI to assist their efforts in general situations. Instead, this guide is also useful in rooting out false positives when conducting investigations related to disruptive sockpuppetry, particularly as LLM tools may be used by any person in Justpedia and beyond to obfuscate their writing styles including but not limited to adversarial stylometry, whether for legitimate reasons (such as privacy) or illegitimate reasons.

This is a list of writing and formatting conventions typical of AI chatbots such as ChatGPT.

Language and tone


	Words to watch: is/stands as/serves as a testament, plays a vital/significant role, underscores its importance, continues to captivate, leaves a lasting impact, watershed moment, key turning point, deeply rooted, profound heritage, steadfast dedication, stands as a, solidifies ...

LLM writing often puffs up the importance of the subject matter with reminders that it represents or contributes to a broader topic. There seems to be only a small repertoire of ways that it writes these reminders, so if they are otherwise appropriate it would be best to reword them anyway.

When talking about biology (e.g. when asked to discuss a given animal or plant species), LLMs tend to put too much emphasis on the species’ conservation status and the efforts to protect it, even if the status is unknown and no serious efforts exist.

Promotional language


	Words to watch: rich cultural heritage, rich history, breathtaking, must-visit, must-see, stunning natural beauty, enduring/lasting legacy, rich cultural tapestry, nestled, in the heart of ...

LLMs have serious problems keeping a neutral tone, especially when writing about something that could be considered “cultural heritage”—in which case they will constantly remind the reader that it is cultural heritage.

Editorializing


	Words to watch: it’s important to note/remember/consider, it is worth, no discussion would be complete without, this article wouldn’t exist without ...

LLMs may often introduce their own interpretation, analysis, and opinions in their writing, even when they are asked to write neutrally. Editorializing can appear through specific words or phrases or within broader sentence structures. This indicator often overlaps with other language and tone indicators in this list. Note that humans and especially new editors often make this mistake as well.

Overuse of certain conjunctions


	Words to watch: on the other hand, moreover, in addition, furthermore ...

While moderate use of connecting words or phrases is an essential element of good prose, LLMs tend to overuse them. This overuse often is a byproduct of an essay-like tone which is typical of LLM writing but is inappropriate for Wikipedia. Certain conjunctives such as ‘however’ or ‘in contrast’ inherently imply synthesis of facts, and are generally unsuitable for Wikipedia articles.

Examples

AI-generated example
> The methodology’s strength is its grounding in iterative, mixed-method development cycles that combine theoretical analysis with practical feedback. Its emphasis on prototyping and empirical validation supports early identification of design shortcomings, while the use of scenario-based design and claims analysis helps make the rationale behind design choices explicit and testable. Furthermore, the incorporation of values—as operational design elements rather than abstract principles—helps bring ethical and societal concerns into concrete design and evaluation processes.

AI-generated example

> The methodology’s strength is its grounding in iterative, mixed-method development cycles that combine theoretical analysis with practical feedback. Its emphasis on prototyping and empirical validation supports early identification of design shortcomings, while the use of scenario-based design and claims analysis helps make the rationale behind design choices explicit and testable. Furthermore, the incorporation of values—as operational design elements rather than abstract principles—helps bring ethical and societal concerns into concrete design and evaluation processes.

At the same time, several areas for improvement remain. For example, while the methodology supports transdisciplinary collaboration in principle, applying it effectively in large, heterogeneous teams can be challenging. Coordinating between cognitive scientists, engineers, designers, and domain experts requires careful facilitation and often additional effort in communication and documentation.

Another area for further development involves the scalability of design patterns and ontologies across domains. While abstraction is a key feature of the methodology, generalizing knowledge without oversimplifying context-specific constraints remains an ongoing tension. Similarly, methods for operationalizing and measuring values—especially those that are contested or context-dependent—can benefit from more robust frameworks and shared benchmarks.

SCE continues to evolve in response to these challenges. Its current form provides a solid foundation for responsible system development, particularly in settings where human judgment, collaboration, and adaptation remain essential. However, its long-term value will depend on ongoing methodological refinement, empirical validation in diverse settings, and sustained attention to issues of interdisciplinary coordination and value negotiation.

— From Draft:Socio-cognitive engineering |

Compare the above with below human-written prose. Even disregarding the absence of buzzwords and general vapidness, the connecting phrases in the below excerpt are more varied and less conspicuous.

Human-written example
> Social heuristics can include heuristics that use social information, operate in social contexts, or both. Examples of social information include information about the behavior of a social entity or the properties of a social system, while nonsocial information is information about something physical. Contexts in which an organism may use social heuristics can include “games against nature” and “social games”. In games against nature, the organism strives to predict natural occurrences (such as the weather) or competes against other natural forces to accomplish something. In social games, the organism is making decisions in a situation that involves other social beings. Importantly, in social games, the most adaptive course of action also depends on the decisions and behavior of the other actors. For instance, the follow-the-majority heuristic uses social information as inputs but is not necessarily applied in a social context, while the equity-heuristic uses non-social information but can be applied in a social context such as the allocation of parental resources amongst offspring.

Human-written example

> Social heuristics can include heuristics that use social information, operate in social contexts, or both. Examples of social information include information about the behavior of a social entity or the properties of a social system, while nonsocial information is information about something physical. Contexts in which an organism may use social heuristics can include “games against nature” and “social games”. In games against nature, the organism strives to predict natural occurrences (such as the weather) or competes against other natural forces to accomplish something. In social games, the organism is making decisions in a situation that involves other social beings. Importantly, in social games, the most adaptive course of action also depends on the decisions and behavior of the other actors. For instance, the follow-the-majority heuristic uses social information as inputs but is not necessarily applied in a social context, while the equity-heuristic uses non-social information but can be applied in a social context such as the allocation of parental resources amongst offspring.

Within social psychology, some researchers have viewed heuristics as closely linked to cognitive biases. Others have argued that these biases result from the application of social heuristics depending on the structure of the environment that they operate in. Researchers in the latter approach treat the study of social heuristics as closely linked to social rationality, a field of research that applies the ideas of bounded rationality and heuristics to the realm of social environments. Under this view, social heuristics are seen as ecologically rational. In the context of evolution, research utilizing evolutionary simulation models has found support for the evolution of social heuristics and cooperation when the outcomes of social interactions are uncertain.

— From Social heuristics, a Good Article |

Section summaries


	Words to watch: In summary, In conclusion, Overall ...

LLMs will often end a paragraph or section by summarizing and restating its core idea.

Negative parallelisms

Parallel constructions involving “not”, “but”, or “however” such as “Not only ... but ...” or “It is not just about ..., it’s ...” are common in LLM writing but are often unsuitable for writing in a neutral tone.

Rule of three

LLMs overuse the ‘rule of three’—“the good, the bad, and the ugly”. This can take different forms from “adjective, adjective, adjective” to “short phrase, short phrase, and short phrase”.

Whilst the ‘rule of three’, when used sparingly, is considered good writing, LLMs seem to rely heavily on it so the superficial explanations appear more comprehensive. Furthermore, this rule is generally suited to creative or argumentative writing, not purely informational texts.

Vague attributions of opinion


	Words to watch: Industry reports, Observers have cited, Some critics argue ...

AI chatbots tend to attribute opinions or claims to some vague authority—a practice called weasel wording—while citing only one or two sources which may or may not actually express such view. They also tend to overgeneralize a perspective of one or few sources into that of a wider group.

Title case in section headings

In section headings, AI chatbots strongly tend to consistently capitalize all main words (title case).

Excessive use of boldface

AI chatbots may display various phrases in boldface for emphasis in a manner that is excessive and can seem rather mechanical. One of their tendencies, inherited from readmes, fan wikis, how-tos, sales copies, listicles and other materials that might involve heavy use of boldface, is picking a type of word or object to emphasize and emphasizing every instance of it—without being able to “reflect” on the end result and evaluate it as unsatisfactory. Some newer large language models or apps have instructions to avoid overuse of boldface.

Lists

Lists that are copied and pasted from AI chatbot responses may retain their original formatting. Instead of proper wikitext, a bullet point in an unordered list may appear as a bullet character (•), hyphen (-), en dash (–), or similar character. Ordered lists (i.e. numbered lists) may use explicit numbers (such as 1.) instead of standard wikitext.

Emoji

Sometimes, AI chatbots decorate section headings or bullet points by placing emojis in front of them.

Overuse of em dashes

AI chatbots use the em dash (—) more frequently than most editors do, especially in places where human authors are much more likely to use parentheses or commas. AI chatbots may or may not add a space before and after the dash.

Curly quotation marks and apostrophes

AI chatbots typically use curly quotation marks (“...” or ‘...’) instead of straight quotation marks (“...” or ‘...’). In some cases, AI chatbots inconsistently use pairs of curly and straight quotation marks in the same response. Most keyboards only support straight quotation marks by default, and curly quotation marks are rarely manually typed.

They also tend to use the curly apostrophe (’; the same character as the curly right single quotation mark) instead of the straight apostrophe (’), such as in contractions and possessive forms. They may also do this inconsistently.

Curly quotes alone do not prove LLM use. Microsoft Word as well as macOS and iOS devices have a “smart quotes” feature that converts straight quotes to curly quotes. Grammar correcting tools such as LanguageTool may also have such a feature.

Collaborative communication

In some cases, editors will paste text from an AI chatbot that it had intended as correspondence, prewriting or advice for them, and not for direct use in an article. AI chatbots may also explicitly indicate that the text is for a Wikipedia article if prompted to produce one, and may mention various policies and guidelines in their outputs; such mentions are generally inappropriate for direct inclusion in articles.

Knowledge-cutoff disclaimers and speculation about gaps in sources

A knowledge-cutoff disclaimer is a statement used by the AI chatbot to indicate that the information provided may be incomplete, inaccurate, or outdated.

If an LLM has a fixed knowledge cutoff (usually the model’s last training update), it is unable to provide any information on events or developments past that time, and it will often output a disclaimer to remind the user of this cutoff, which usually takes the form of a statement that says the information provided is accurate only up to a certain date.

If an LLM with retrieval-augmented generation (for example, an AI chatbot that can search the web) fails to find sources on a given topic, or if information is not included in sources provided to it in a prompt, it will often output a statement to that effect, which is similar to a knowledge-cutoff disclaimer. It may also pair it with speculation about what that information “likely” may be and why it is significant. This information is entirely speculative and may be based on loosely related topics or completely fabricated. It is also frequently combined with the tells above.

Prompt refusal


	Words to watch: as an AI language model, as a large language model, I’m sorry ...

Occasionally, the AI chatbot will decline to answer a prompt as it is written, usually with an apology and a reminder that it is “an AI language model”. Attempting to be as helpful as possible, it often gives suggestions or an answer to an alternative, similar request.

Links to searches

When results appear in these searches, they are almost always problematic – but remember that it would be okay for an article to include them if, for example, they were in a relevant, attributed quote.

Phrasal templates and placeholder text

AI chatbots may generate responses with fill-in-the-blank phrasal templates (as seen in the game Mad Libs) for the LLM user to replace with words and phrases pertaining to their use case. When an LLM-using editor forgets to add the words, the result is obviously not written by the editor themselves.

Use of Markdown

AI chatbots are not proficient in wikitext, the markup language used to instruct Justapedia’s MediaWiki software how to format an article. As wikitext is mostly tied to a specific platform using a specific software (a wiki running on MediaWiki), it is a niche markup language, lacking wider exposure beyond Wikipedia and other MediaWiki-based platforms like Miraheze. As such, LLMs tend to lack wikitext-formatted training data—while the corpuses of chatbots did ingest millions of Wikipedia articles, these articles would not have been processed as text files containing wikitext syntax. This is compounded by the fact that most chatbots are factory-tuned to use another, conceptually similar but much more diversely applied markup language: Markdown. Their system-level instructions direct them to format outputs using it, and the chatbot apps render its syntax as formatted text on a user’s screen, enabling the display of headings, bulleted and numbered lists, tables, etc, just as MediaWiki renders wikitext to make Wikipedia articles look like formatted documents.

When asked about its “formatting guidelines”, a chatbot willing to reveal some of its system-level instructions will typically disclose some variation of the following (this is Microsoft Copilot in mid-2025):

## Formatting Guidelines

- All output uses GitHub-flavored Markdown.
- Use a single main title (`#`) and clear primary subheadings (`##`).
- Keep paragraphs short (3–5 sentences, ≤150 words).
- Break large topics into labeled subsections.
- Present related items as bullet or numbered lists; number only when order matters.
- Always leave a blank line before and after each paragraph.
- Avoid bold or italic styling in body text unless explicitly requested.
- Use horizontal dividers (`---`) between major sections.
- Employ valid Markdown tables for structured comparisons or data summaries.
- Refrain from complex Unicode symbols; stick to simple characters.
- Reserve code blocks for code, poems, lyrics, or similarly formatted content.
- For mathematical expressions, use LaTeX outside of code blocks.

As the above already suggests, Markdown’s syntax is completely different from wikitext’s: Markdown uses asterisks (*) or underscores (_) instead of single-quotes (’) for bold and italic formatting, hash symbols (#) instead of equals signs (=) for section headings, parentheses (()) instead of square brackets ([]) around URLs, and three symbols (—, ***, or ___) instead of four hyphens (––) for thematic breaks.

Even when they are told to do so explicitly, chatbots generally struggle to generate text using syntactically correct wikitext, as their inherent architectural biases and training data lead to a drastically greater affinity for and fluency in Markdown. When told by a user to “generate an article”, a chatbot will typically default to using Markdown for the generated output, which would be preserved in clipboard text by the copy functions on some chatbot platforms. If instructed by a user to generate content for Wikipedia, the chatbot might itself “realize” the need to generate Wikipedia-compatible code, and might include something like Would you like me to ... turn this into actual Wikipedia markup format (`wikitext`)? in its output. If told to proceed, the resulting syntax will generally be rudimentary, syntactically incorrect, or both. The chatbot might put its attempted-wikitext content in a Markdown-style fenced code block (its syntax for WP:PRE) surrounded by Markdown-based syntax and content, which may also be preserved by platform-specific copy-to-clipboard functions, leading to a telling footprint of both markup languages’ syntax. This might include the appearance of three backticks in the text, such as: \``wikitext`.

The presence of faulty wikitext syntax that mixes in Markdown syntax is a strong indicator that content is LLM-generated, especially if in the form of a fenced Markdown code block. However, Markdown alone is not such a strong indicator. Particularly, software developers, researchers, technical writers, and internet users in general frequently use Markdown in tools like Obsidian and GitHub, and on platforms like Reddit, Discord, and Slack. Software that editors may use to write content intended for Wikipedia, such as iOS Notes, Google Docs, and Windows Notepad, may support Markdown editing or exporting. The contemporary ubiquity of Markdown may also lead new editors to expect or assume Wikipedia to support Markdown by default.

Broken wikitext

As explained above, AI-chatbots are not proficient in wikitext and Wikipedia templates, leading to faulty syntax.

turn0search0

ChatGPT may include citeturn0search0 (surrounded by Unicode points in the Private Use Area) at the ends of sentences, with the “search” number increasing as the text progresses. These are places where the chatbot links to an external site, but a human pasting the conversation into Wikipedia has that link converted into placeholder code.

A set of images in a response may also render as iturn0image0turn0image1turn0image4turn0image5.

attribution and attributableIndex

ChatGPT may add JSON-formatted code at the end of sentences in the form of ({"attribution":{"attributableIndex":"X-Y"}}), with X and Y being increasing numeric indices.

utm_source=

ChatGPT may add the URL search parameter utm_source=chatgpt.com to URLs that it is using as sources. Likewise, other AI tools such as Copilot, Gemini, DeepSeek, Grok, or Meta AI may add a similar query parameter to URLs.

Abrupt cut offs

AI tools may suddenly stop generating content, for example if they predict the end of text sequence (appearing as <|endoftext|>) next. Also, the number of tokens that a single response has is usually limited and further responses will require the user to select “continue generating”.

This method is not foolproof, as a malformed copy/paste from one’s local computer can also result in a similar situation. It may also be indicative of a copyright violation rather than the use of an LLM.

Discrepancies in writing style and variety of English

A sudden shift in an editor’s writing style, such as unexpectedly flawless grammar, may indicate the use of AI tools.

Another discrepancy is a mismatch of user location, national ties of the topic to a variety of English, and the variety of English used. A human writer from India writing about an Indian university would probably not use American English; yet, the default variety of LLM outputs is American English, and such a user’s AI-generated content will exhibit this variety (unless the chatbot was specifically prompted to use Indian English). However, note that non-fluent English speakers commonly tend to mix up English varieties. Such signs should only raise suspicion if there is a sudden and complete shift in an editor’s English variety use.

How X national origin label is not a magic 8-ball at all

I use VPNs for safety reasons all the time because of my work in exposing organizational scandals along with past work in high-stakes political areas, such as the 2019–2020 Hong Kong protests, so the national origins label thing at X can be just a baloney, instead of a magic 8-ball. That’s not to mention that there are second-hand account markets which had existed for some time and which are about to massively grow, along with epistemologically blatant de-facto MCU “The Void” for accounts such as Mailinator and Yopmail.

Meanwhile, a major side effect is that any interactions on X will become even more toxic as those will devolve into playground insults based on national origins and associated stereotypes. For example, users in Israel would be rendered more vulnerable to pileup harassment by antisemitic trolls.

Tyrannical governments will find it much easier to identify X accounts for subsequent transnational repression. There are a lot of IPs which are VPNs in the past and which has not been detected as such yet in the present due to sophisticated obfuscation measures, including the use of remote desktop software like AnyDesk, TeamViewer and VNC so once again, just like Wikipedia’s CheckUser feature, the national origins label thing is not a magic 8-ball at all.

Some are suspecting that they might get sued under GDPR in the future. In that case the ideal way out could be by showing exact national origins by default at only blue check accounts while the rest will have their labels display their continent regions by default instead.

In the end, it’s better to use common sense and due diligence instead of blind faith when interpreting labels like that. My philosophy is that there should be a perfect balance between transparency and privacy, probably as many things should be.

Personal essays

Deepfake regulation

Personal essays

Deepfake regulation

International Space Station

Signs of AI-assisted writing

Language and tone

Promotional language

Editorializing

Overuse of certain conjunctions

Section summaries

Negative parallelisms

Rule of three

Vague attributions of opinion

Title case in section headings

Excessive use of boldface

Lists

Emoji

Overuse of em dashes

Curly quotation marks and apostrophes

Collaborative communication

Knowledge-cutoff disclaimers and speculation about gaps in sources

Prompt refusal

Links to searches

Phrasal templates and placeholder text

Use of Markdown

Broken wikitext

turn0search0

attribution and attributableIndex

utm_source=

Abrupt cut offs

Discrepancies in writing style and variety of English

How X national origin label is not a magic 8-ball at all

Similar Posts