Hallucination is the single biggest obstacle to LLM reliability in production. Models generate fluent, confident text that is factually wrong, internally inconsistent, or unsupported by any source. This section covers the taxonomy of hallucination types, practical detection techniques (self-consistency, citation verification, natural language inference), and mitigation strategies ranging from RAG grounding to constrained generation and calibrated abstention.
1. Hallucination Taxonomy
| Type | Description | Example |
|---|---|---|
| Factual fabrication | Inventing facts that sound plausible | Citing a non-existent paper |
| Intrinsic hallucination | Contradicting the provided source text | Summarizing a document with wrong numbers |
| Extrinsic hallucination | Adding information not in the source | Introducing claims absent from context |
| Self-contradiction | Making inconsistent statements within one response | Saying "X is true" then "X is false" |
| Outdated knowledge | Stating facts that were true at training time but not now | Reporting a CEO who has since been replaced |
2. Self-Consistency Detection
from openai import OpenAI client = OpenAI() def self_consistency_check(question: str, n_samples: int = 5, temperature: float = 0.8): """Generate multiple answers and check agreement.""" responses = [] for _ in range(n_samples): r = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": question}], temperature=temperature, ) responses.append(r.choices[0].message.content.strip()) # Use LLM to check consistency across responses check_prompt = f"""Given these {n_samples} answers to the same question, are they consistent with each other? Report any contradictions. Question: {question} Answers: """ + "\n".join(f"{i+1}. {r}" for i, r in enumerate(responses)) verdict = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": check_prompt}], temperature=0.0, ) return {"responses": responses, "consistency_check": verdict.choices[0].message.content}
NLI-Based Hallucination Detection
from transformers import pipeline nli = pipeline("text-classification", model="facebook/bart-large-mnli") def check_faithfulness(source: str, claim: str) -> dict: """Check if a claim is supported by the source using NLI.""" result = nli(f"{source}", candidate_labels=[ "entailment", "contradiction", "neutral" ]) return { "claim": claim, "supported": result["labels"][0] == "entailment", "label": result["labels"][0], "confidence": round(result["scores"][0], 3), } source = "The Eiffel Tower was completed in 1889 and is 330 meters tall." print(check_faithfulness(source, "The Eiffel Tower is 330 meters tall.")) print(check_faithfulness(source, "The Eiffel Tower was built in 1920."))
3. Mitigation Strategies
def calibrated_abstention(question: str, context: str, threshold: float = 0.7): """Generate answer with confidence score; abstain if below threshold.""" from openai import OpenAI client = OpenAI() prompt = f"""Based ONLY on the context below, answer the question. After your answer, rate your confidence from 0.0 to 1.0. Context: {context} Question: {question} Format: Answer: [your answer] Confidence: [0.0 to 1.0]""" response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}], temperature=0.0, ) text = response.choices[0].message.content # Parse confidence import re conf_match = re.search(r"Confidence:\s*([\d.]+)", text) confidence = float(conf_match.group(1)) if conf_match else 0.0 if confidence < threshold: return {"answer": "I don't have enough information to answer confidently.", "confidence": confidence, "abstained": True} return {"answer": text, "confidence": confidence, "abstained": False}
LLM self-reported confidence scores are not well calibrated. Models tend to express high confidence even when wrong. Use self-consistency (agreement across multiple samples) as a more reliable confidence proxy than asking the model to rate its own certainty.
RAG reduces but does not eliminate hallucination. Models can hallucinate even with perfect context if the answer requires reasoning that the model performs incorrectly, or if the model ignores the context and relies on parametric knowledge instead. Always verify RAG outputs against their cited sources.
For high-stakes applications (medical, legal, financial), the best strategy is calibrated abstention: the system should refuse to answer when confidence is low rather than risk a convincing but wrong response. Users prefer "I don't know" over a confident hallucination that could lead to harmful decisions.
Knowledge Check
1. What is the difference between intrinsic and extrinsic hallucination?
Show Answer
2. How does self-consistency detect hallucination?
Show Answer
3. Why is NLI useful for hallucination detection in RAG systems?
Show Answer
4. What is calibrated abstention and when should it be used?
Show Answer
5. Why doesn't RAG completely solve the hallucination problem?
Show Answer
Key Takeaways
- Hallucinations come in five types: factual fabrication, intrinsic contradiction, extrinsic addition, self-contradiction, and outdated knowledge.
- Self-consistency detection generates multiple samples and measures agreement; low agreement signals potential hallucination.
- NLI-based detection checks whether output claims are entailed by the source context, catching both intrinsic and extrinsic hallucination.
- RAG grounding reduces hallucination by anchoring responses to retrieved documents, but does not eliminate it entirely.
- Calibrated abstention is the safest strategy for high-stakes applications: refuse to answer when confidence is low.
- LLM self-reported confidence is not well calibrated; use self-consistency or NLI scores as more reliable confidence proxies.