Section 26.9: LLM Risk Governance & Audit

★ Big Picture

Enterprise AI governance requires structured frameworks that map every LLM deployment to a risk classification, assign ownership, and maintain auditable records. Established frameworks like SR 11-7 (banking model risk), NIST AI RMF, and ISO 42001 provide the scaffolding. This section covers how to build a practical AI governance program that satisfies regulators while remaining lightweight enough for engineering teams to follow.

1. Governance Frameworks Comparison

Framework	Origin	Scope	Key Contribution
SR 11-7	US Federal Reserve	Banking / Financial	Three lines of defense, independent validation
NIST AI RMF	US NIST	Cross-sector	Govern, Map, Measure, Manage lifecycle
ISO 42001	ISO	International	AI management system certification
EU AI Act	European Parliament	EU market	Risk-based obligations, conformity assessment

Figure 26.9.1: The NIST AI RMF defines four core functions; Govern is the overarching function while Map, Measure, and Manage form a continuous cycle.

2. Model Inventory and Risk Classification

from dataclasses import dataclass
from enum import Enum
from datetime import datetime

class RiskTier(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"

@dataclass
class ModelInventoryEntry:
    """Enterprise model inventory record for governance tracking."""
    model_id: str
    model_name: str
    use_case: str
    owner: str
    risk_tier: RiskTier
    deployment_date: str
    last_validation: str
    next_review: str
    data_sources: list[str]
    regulations: list[str]

    def needs_review(self) -> bool:
        return datetime.fromisoformat(self.next_review) <= datetime.now()

    def to_dict(self):
        return {
            "model_id": self.model_id,
            "model_name": self.model_name,
            "use_case": self.use_case,
            "owner": self.owner,
            "risk_tier": self.risk_tier.value,
            "overdue": self.needs_review(),
        }

entry = ModelInventoryEntry(
    model_id="LLM-CS-001", model_name="Customer Support Bot v2",
    use_case="Customer service automation", owner="ML Platform Team",
    risk_tier=RiskTier.MEDIUM, deployment_date="2025-01-15",
    last_validation="2025-01-10", next_review="2025-07-10",
    data_sources=["support_tickets", "knowledge_base"],
    regulations=["GDPR", "EU AI Act (limited risk)"],
)
print(entry.to_dict())

Audit Trail Implementation

import json, hashlib
from datetime import datetime

class AuditTrail:
    """Immutable audit log for LLM interactions."""

    def __init__(self):
        self.entries = []

    def log(self, request_id: str, model: str, input_text: str,
           output_text: str, user_id: str, metadata: dict = None):
        entry = {
            "request_id": request_id,
            "timestamp": datetime.utcnow().isoformat(),
            "model": model,
            "user_id": user_id,
            "input_hash": hashlib.sha256(input_text.encode()).hexdigest()[:16],
            "output_hash": hashlib.sha256(output_text.encode()).hexdigest()[:16],
            "metadata": metadata or {},
        }
        # Chain entries for tamper detection
        if self.entries:
            prev_hash = hashlib.sha256(
                json.dumps(self.entries[-1]).encode()
            ).hexdigest()[:16]
            entry["prev_hash"] = prev_hash
        self.entries.append(entry)
        return entry

    def verify_chain(self) -> bool:
        for i in range(1, len(self.entries)):
            expected = hashlib.sha256(
                json.dumps(self.entries[i-1]).encode()
            ).hexdigest()[:16]
            if self.entries[i].get("prev_hash") != expected:
                return False
        return True

Figure 26.9.2: SR 11-7's three lines of defense separate model development, independent validation, and audit oversight.

⚠ Warning

Many organizations track traditional ML models but forget to inventory their LLM deployments. Every use of an LLM API, whether it is a direct OpenAI call, a LangChain chain, or an embedded copilot feature, should be registered in the enterprise model inventory with a risk classification and assigned owner.

📝 Note

ISO 42001 is the first international standard for AI management systems. It provides a certifiable framework for organizations to demonstrate responsible AI practices, similar to how ISO 27001 certifies information security management. Certification may become a market differentiator as AI regulation increases.

★ Key Insight

Audit trails for LLM systems should use hash chaining (similar to blockchain) to ensure tamper resistance. Each log entry includes a hash of the previous entry, creating an immutable chain. If any entry is modified after the fact, the chain verification fails, alerting auditors to potential tampering.

Knowledge Check

1. What are the four core functions of the NIST AI RMF?

Show Answer

Govern (establish policies, roles, and accountability), Map (identify context, stakeholders, and risks), Measure (assess risks through testing and metrics), and Manage (mitigate risks and respond to incidents). Govern is the overarching function, while Map, Measure, and Manage form a continuous operational cycle.

2. What is SR 11-7 and why does it matter for LLM deployments in banking?

Show Answer

SR 11-7 is the US Federal Reserve's guidance on model risk management for banking institutions. It requires a three-lines-of-defense approach: model developers own the first line, independent validation teams provide the second line of challenge, and internal audit provides the third line of oversight. Any LLM used in banking decisions (credit, fraud, compliance) must comply with SR 11-7.

3. Why should audit trail entries use hash chaining?

Show Answer

Hash chaining creates a tamper-evident log where each entry includes a cryptographic hash of the previous entry. If any entry is modified after the fact, the hash chain breaks, and the verify_chain function returns False. This provides auditors with assurance that the log has not been altered, which is essential for regulatory compliance and incident investigation.

4. What should an enterprise model inventory capture for each LLM deployment?

Show Answer

At minimum: model ID, name, use case description, risk tier, owner/responsible team, deployment date, last validation date, next review date, data sources, applicable regulations, performance metrics, and known limitations. The inventory should also track dependencies (API providers, frameworks) and trigger alerts when reviews are overdue.

5. How does ISO 42001 differ from the NIST AI RMF?

Show Answer

ISO 42001 is a certifiable management system standard (similar to ISO 27001 for security), while NIST AI RMF is a voluntary framework. ISO 42001 specifies requirements for establishing, implementing, and continuously improving an AI management system, with formal audit and certification processes. NIST AI RMF provides guidance and best practices without a certification mechanism.

Key Takeaways

The NIST AI RMF provides a four-function framework (Govern, Map, Measure, Manage) applicable across industries.
SR 11-7 requires three lines of defense for model risk in banking: development, independent validation, and audit.
Every LLM deployment should be registered in an enterprise model inventory with risk classification and assigned ownership.
Audit trails should use hash chaining for tamper resistance, logging request hashes (not raw content) to protect privacy.
ISO 42001 provides a certifiable AI management system standard that may become a market differentiator.
Risk classification should consider data sensitivity, decision impact, user population, and regulatory applicability.