7 Prompt Engineering Tricks to Mitigate Hallucinations in LLMs

Introduction

Large language models (LLMs) exhibit outstanding abilities to reason over, summarize, and creatively generate text. Still, they remain susceptible to the common problem of hallucinations, which consists of generating confident-looking but false, unverifiable, or sometimes even nonsensical information.

Introduction

LLMs generate text based on intricate statistical and probabilistic patterns rather than relying primarily on verifying grounded truths. In some critical fields, this issue can cause major negative impacts. Robust prompt engineering, which involves the craftsmanship of elaborating well-structured prompts with instructions, constraints, and context, can be an effective strategy to mitigate hallucinations.

The seven techniques listed in this article, with examples of prompt templates, illustrate how both standalone LLMs and retrieval augmented generation (RAG) systems can improve their performance and become more robust against hallucinations by simply implementing them in your user queries.

1. Encourage Abstention and “I Don’t Know” Responses

LLMs typically focus on providing answers that sound confident even when they are uncertain — check this article to comprehend in detail how LLMs generate text — generating sometimes fabricated facts as a result. Explicitly allowing abstention can guide the LLM toward mitigating a sense of false confidence. Let’s look at an example prompt to do this:

“You are a fact-checking assistant. If you are not confident in an answer, respond: ‘I don’t have enough information to answer that.’ If confident, give your answer with a short justification.”

The above prompt would be followed by an actual question or fact check.

A sample expected response would be:

“I don’t have enough information to answer that.”

“Based on the available evidence, the answer is … (reasoning).”

This is a good first line of defense, but nothing is stopping an LLM from disregarding those directions with some regularity. Let’s see what else we can do.

2. Structured, Chain-of-Thought Reasoning

Asking a language model to apply step-by-step reasoning incentivizes inner consistency and mitigates logic gaps that could sometimes cause model hallucinations. The Chain-of-Thought Reasoning (CoT) strategy basically consists of emulating an algorithm — like list of steps or stages that the model should sequentially tackle to address the overall task at hand. Once more, the example template below is assumed to be accompanied by a problem-specific prompt of your own.

“Please think through this problem step by step: 1) What information is given? 2) What assumptions are needed? 3) What conclusion follows logically?”

A sample expected response:

“1) Known facts: A, B. 2) Assumptions: C. 3) Therefore, conclusion: D.”

3. Grounding with “According To”

This prompt engineering trick is conceived to link the answer sought to named sources. The effect is to discourage invention-based hallucinations and stimulate fact-based reasoning. This strategy can be naturally combined with number 1 discussed earlier.

“According to the World Health Organization (WHO) report from 2023, explain the main drivers of antimicrobial resistance. If the report doesn’t provide enough detail, say ‘I don’t know.’”

A sample expected response:

“According to the WHO (2023), the main drivers include overuse of antibiotics, poor sanitation, and unregulated drug sales. Further details are unavailable.”

4. RAG with Explicit Instruction and Context

RAG grants the model access to a knowledge base or document base containing verified or current text data. Even so, the risk of hallucinations persists in RAG systems unless a well-crafted prompt instructs the system to exclusively rely on retrieved text.

*[Assume two retrieved documents: X and Y]* “Using only the information in X and Y, summarize the main causes of deforestation in the Amazon basin and related infrastructure projects. If the documents don’t cover a point, say ‘insufficient data.’”

A sample expected response:

“According to Doc X and Doc Y, key causes include agricultural expansion and illegal logging. For infrastructure projects, insufficient data.”

5. Output Constraints and Limiting Scope

Tightly controlling the format of generated outputs and their length contributes to reducing hallucinations in the form of speculative or tangential statements, like unsupported claims for causes, over-elaborated chains of reasoning, or made-up statistics, thereby preventing results that may drift away from source materials.

Constraining the “degrees of freedom” over the answer space increases the odds of returning verifiable information rather than filling the gaps “no matter what.”

“In no more than 100 words, summarize the role of mitochondria in human cells. If unsure, respond ‘I don’t know.’”

A sample expected response:

“Mitochondria generate ATP via oxidative phosphorylation, regulate calcium levels, and influence apoptosis. I lack sufficient data on additional roles.”

6. RAG-based Correction Loops: Chain-of-Verification

This approach consists of combining retrieval with self-checking, enabling the model’s ability to question and double-check its own claims against external sources of evidence, primarily in RAG systems. It mainly targets hallucinations due to “over-confidence,” trying to mitigate them.

“Step 1: Generate an initial answer to the question: ‘When did the Berlin Wall fall?’ Step 2: Retrieve and read relevant passages from a trusted historical database. Step 3: Compare the retrieved evidence to your answer. Step 4: If discrepancies exist, correct the answer and cite the retrieved source.”

A sample expected response:

“Initial answer: 1989. Retrieved evidence: Historical archives confirm the Berlin Wall opened on 9 November 1989. Final verified answer: The Berlin Wall fell on 9 November 1989 when East Berlin opened border crossings.”

7. Domain-Specific Prompts, Disclaimers, and Safety Guardrails

In high-stakes application domains like medicine, it is essential to specify constrained domain boundaries and require citations to sources, to reduce the risk of speculative claims that could in practice lead to negative consequences. Here is an example of doing so:

“You are a certified medical information assistant. Using peer-reviewed studies or official guidelines published before 2024, explain the first-line treatment for moderate persistent asthma in adults. If you cannot cite such a guideline, respond: ‘I cannot provide a recommendation; consult a medical professional.’”

A sample expected response:

“According to the Global Initiative for Asthma (GINA) 2023 guideline, first-line therapy for moderate persistent asthma is a low-dose inhaled corticosteroid with a long-acting β₂-agonist such as budesonide/formoterol. For patient-specific adjustments, consult a clinician.”

Wrapping Up

Below is a summary the 7 strategies we dicussed.

Feature	Description
Encourage abstention and “I don’t know” responses	Allow the model to say “I don’t know” and avoid speculations. Non-RAG.
Structured, Chain-of-Thought Reasoning	Step-by-step reasoning to improve consistency in responses. Non-RAG.
Grounding with “According To”	Use explicit references to ground responses on. Non-RAG.
RAG with Explicit Instruction and Context	Explicitly instruct the model to rely on evidence retrieved. RAG.
Output Constraints and Limiting Scope	Restrict format and length of responses to minimize speculative elaboration and make answers more verifiable. Non-RAG.
RAG-based Correction Loops: Chain-of-Verification	Tell the model to verify its own outputs against retrieved knowledge. RAG.
Domain-Specific Prompts, Disclaimers, and Safety Guardrails	Constrain prompts with domain rules, domain requirements, or disclaimers in high-stakes scenarios. Non-RAG.

This article listed seven useful prompt engineering tricks, based on versatile templates for multiple scenarios, that, when fed to LLMs or RAG systems, can help reduce hallucinations: a common and sometimes persisting problem in these otherwise almighty models.

Introduction

Introduction

1. Encourage Abstention and “I Don’t Know” Responses

2. Structured, Chain-of-Thought Reasoning

3. Grounding with “According To”

4. RAG with Explicit Instruction and Context

5. Output Constraints and Limiting Scope

6. RAG-based Correction Loops: Chain-of-Verification

7. Domain-Specific Prompts, Disclaimers, and Safety Guardrails

Wrapping Up

No comments yet.

Leave a Reply

Similar Posts