Module 26 · Section 26.6

Hallucination & Reliability

Types of hallucination, detection methods, mitigation strategies, constrained generation, confidence calibration, and abstention
★ Big Picture

Hallucination is the single biggest obstacle to LLM reliability in production. Models generate fluent, confident text that is factually wrong, internally inconsistent, or unsupported by any source. This section covers the taxonomy of hallucination types, practical detection techniques (self-consistency, citation verification, natural language inference), and mitigation strategies ranging from RAG grounding to constrained generation and calibrated abstention.

1. Hallucination Taxonomy

TypeDescriptionExample
Factual fabricationInventing facts that sound plausibleCiting a non-existent paper
Intrinsic hallucinationContradicting the provided source textSummarizing a document with wrong numbers
Extrinsic hallucinationAdding information not in the sourceIntroducing claims absent from context
Self-contradictionMaking inconsistent statements within one responseSaying "X is true" then "X is false"
Outdated knowledgeStating facts that were true at training time but not nowReporting a CEO who has since been replaced

2. Self-Consistency Detection

from openai import OpenAI

client = OpenAI()

def self_consistency_check(question: str, n_samples: int = 5, temperature: float = 0.8):
    """Generate multiple answers and check agreement."""
    responses = []
    for _ in range(n_samples):
        r = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": question}],
            temperature=temperature,
        )
        responses.append(r.choices[0].message.content.strip())

    # Use LLM to check consistency across responses
    check_prompt = f"""Given these {n_samples} answers to the same question, are they
consistent with each other? Report any contradictions.

Question: {question}

Answers:
""" + "\n".join(f"{i+1}. {r}" for i, r in enumerate(responses))

    verdict = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": check_prompt}],
        temperature=0.0,
    )
    return {"responses": responses, "consistency_check": verdict.choices[0].message.content}
Hallucination Detection Methods Self-Consistency Sample N responses at high temperature Check agreement across samples Low agreement = likely hallucination Citation Verification Extract cited claims from LLM output Verify each claim against source docs Unsupported claims = extrinsic hallucination NLI-Based Use NLI model to check entailment Source = premise; Output = hypothesis Contradiction = intrinsic hallucination
Figure 26.6.1: Three complementary approaches detect different types of hallucination at different stages.

NLI-Based Hallucination Detection

from transformers import pipeline

nli = pipeline("text-classification", model="facebook/bart-large-mnli")

def check_faithfulness(source: str, claim: str) -> dict:
    """Check if a claim is supported by the source using NLI."""
    result = nli(f"{source}", candidate_labels=[
        "entailment", "contradiction", "neutral"
    ])
    return {
        "claim": claim,
        "supported": result["labels"][0] == "entailment",
        "label": result["labels"][0],
        "confidence": round(result["scores"][0], 3),
    }

source = "The Eiffel Tower was completed in 1889 and is 330 meters tall."
print(check_faithfulness(source, "The Eiffel Tower is 330 meters tall."))
print(check_faithfulness(source, "The Eiffel Tower was built in 1920."))

3. Mitigation Strategies

def calibrated_abstention(question: str, context: str, threshold: float = 0.7):
    """Generate answer with confidence score; abstain if below threshold."""
    from openai import OpenAI
    client = OpenAI()

    prompt = f"""Based ONLY on the context below, answer the question.
After your answer, rate your confidence from 0.0 to 1.0.

Context: {context}
Question: {question}

Format:
Answer: [your answer]
Confidence: [0.0 to 1.0]"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0,
    )
    text = response.choices[0].message.content

    # Parse confidence
    import re
    conf_match = re.search(r"Confidence:\s*([\d.]+)", text)
    confidence = float(conf_match.group(1)) if conf_match else 0.0

    if confidence < threshold:
        return {"answer": "I don't have enough information to answer confidently.",
                "confidence": confidence, "abstained": True}
    return {"answer": text, "confidence": confidence, "abstained": False}
Hallucination Mitigation Spectrum RAG Grounding Anchor to source docs Constrained Gen JSON schema, regex Confidence Cal. Score + threshold Abstention "I don't know" Lower effort, partial coverage Higher effort, better coverage
Figure 26.6.2: Mitigation strategies range from RAG grounding (simple but partial) to calibrated abstention (most reliable for high-stakes use cases).
⚠ Warning

LLM self-reported confidence scores are not well calibrated. Models tend to express high confidence even when wrong. Use self-consistency (agreement across multiple samples) as a more reliable confidence proxy than asking the model to rate its own certainty.

📝 Note

RAG reduces but does not eliminate hallucination. Models can hallucinate even with perfect context if the answer requires reasoning that the model performs incorrectly, or if the model ignores the context and relies on parametric knowledge instead. Always verify RAG outputs against their cited sources.

★ Key Insight

For high-stakes applications (medical, legal, financial), the best strategy is calibrated abstention: the system should refuse to answer when confidence is low rather than risk a convincing but wrong response. Users prefer "I don't know" over a confident hallucination that could lead to harmful decisions.

Knowledge Check

1. What is the difference between intrinsic and extrinsic hallucination?

Show Answer
Intrinsic hallucination contradicts the provided source text (e.g., stating wrong numbers from a document). Extrinsic hallucination introduces information not present in any source (e.g., adding fabricated claims). Intrinsic hallucinations are easier to detect because they conflict with available evidence.

2. How does self-consistency detect hallucination?

Show Answer
Self-consistency generates multiple responses to the same question at high temperature and checks whether they agree. When the model is confident and correct, different samples tend to converge on the same answer. When the model is hallucinating, samples diverge because there is no grounded answer to converge on. Low agreement signals potential hallucination.

3. Why is NLI useful for hallucination detection in RAG systems?

Show Answer
NLI (Natural Language Inference) models classify whether a hypothesis is entailed by, contradicts, or is neutral to a premise. In RAG, the retrieved context is the premise and each claim in the LLM output is a hypothesis. Claims labeled as "contradiction" indicate intrinsic hallucination; "neutral" claims may indicate extrinsic hallucination.

4. What is calibrated abstention and when should it be used?

Show Answer
Calibrated abstention is a strategy where the system refuses to answer when its confidence falls below a threshold, responding with "I don't have enough information" instead. It should be used in high-stakes domains (medical, legal, financial) where a confident but wrong answer could cause real harm, and where users would rather receive no answer than an unreliable one.

5. Why doesn't RAG completely solve the hallucination problem?

Show Answer
RAG provides relevant context, but the model can still hallucinate for several reasons: it may perform incorrect reasoning over the context, ignore the context and rely on parametric knowledge, combine information from multiple passages in invalid ways, or extrapolate beyond what the context actually states. RAG reduces but does not eliminate the fundamental tendency to generate unsupported claims.

Key Takeaways