Module 26 · Section 26.9

LLM Risk Governance & Audit

Enterprise model inventory, risk classification, SR 11-7, NIST AI RMF, ISO 42001, and audit trail implementation
★ Big Picture

Enterprise AI governance requires structured frameworks that map every LLM deployment to a risk classification, assign ownership, and maintain auditable records. Established frameworks like SR 11-7 (banking model risk), NIST AI RMF, and ISO 42001 provide the scaffolding. This section covers how to build a practical AI governance program that satisfies regulators while remaining lightweight enough for engineering teams to follow.

1. Governance Frameworks Comparison

FrameworkOriginScopeKey Contribution
SR 11-7US Federal ReserveBanking / FinancialThree lines of defense, independent validation
NIST AI RMFUS NISTCross-sectorGovern, Map, Measure, Manage lifecycle
ISO 42001ISOInternationalAI management system certification
EU AI ActEuropean ParliamentEU marketRisk-based obligations, conformity assessment
NIST AI RMF Core Functions GOVERN MAP Context, stakeholders, risks MEASURE Metrics, testing, monitoring MANAGE Mitigate, respond, improve GOVERN provides overarching structure; MAP, MEASURE, MANAGE operate continuously.
Figure 26.9.1: The NIST AI RMF defines four core functions; Govern is the overarching function while Map, Measure, and Manage form a continuous cycle.

2. Model Inventory and Risk Classification

from dataclasses import dataclass
from enum import Enum
from datetime import datetime

class RiskTier(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"

@dataclass
class ModelInventoryEntry:
    """Enterprise model inventory record for governance tracking."""
    model_id: str
    model_name: str
    use_case: str
    owner: str
    risk_tier: RiskTier
    deployment_date: str
    last_validation: str
    next_review: str
    data_sources: list[str]
    regulations: list[str]

    def needs_review(self) -> bool:
        return datetime.fromisoformat(self.next_review) <= datetime.now()

    def to_dict(self):
        return {
            "model_id": self.model_id,
            "model_name": self.model_name,
            "use_case": self.use_case,
            "owner": self.owner,
            "risk_tier": self.risk_tier.value,
            "overdue": self.needs_review(),
        }

entry = ModelInventoryEntry(
    model_id="LLM-CS-001", model_name="Customer Support Bot v2",
    use_case="Customer service automation", owner="ML Platform Team",
    risk_tier=RiskTier.MEDIUM, deployment_date="2025-01-15",
    last_validation="2025-01-10", next_review="2025-07-10",
    data_sources=["support_tickets", "knowledge_base"],
    regulations=["GDPR", "EU AI Act (limited risk)"],
)
print(entry.to_dict())

Audit Trail Implementation

import json, hashlib
from datetime import datetime

class AuditTrail:
    """Immutable audit log for LLM interactions."""

    def __init__(self):
        self.entries = []

    def log(self, request_id: str, model: str, input_text: str,
           output_text: str, user_id: str, metadata: dict = None):
        entry = {
            "request_id": request_id,
            "timestamp": datetime.utcnow().isoformat(),
            "model": model,
            "user_id": user_id,
            "input_hash": hashlib.sha256(input_text.encode()).hexdigest()[:16],
            "output_hash": hashlib.sha256(output_text.encode()).hexdigest()[:16],
            "metadata": metadata or {},
        }
        # Chain entries for tamper detection
        if self.entries:
            prev_hash = hashlib.sha256(
                json.dumps(self.entries[-1]).encode()
            ).hexdigest()[:16]
            entry["prev_hash"] = prev_hash
        self.entries.append(entry)
        return entry

    def verify_chain(self) -> bool:
        for i in range(1, len(self.entries)):
            expected = hashlib.sha256(
                json.dumps(self.entries[i-1]).encode()
            ).hexdigest()[:16]
            if self.entries[i].get("prev_hash") != expected:
                return False
        return True
Three Lines of Defense (SR 11-7) 1st Line Model Development Business units own risk ML engineers, product teams, data scientists 2nd Line Model Validation Independent challenge Risk management, compliance teams 3rd Line Internal Audit Framework effectiveness Internal/external audit functions
Figure 26.9.2: SR 11-7's three lines of defense separate model development, independent validation, and audit oversight.
⚠ Warning

Many organizations track traditional ML models but forget to inventory their LLM deployments. Every use of an LLM API, whether it is a direct OpenAI call, a LangChain chain, or an embedded copilot feature, should be registered in the enterprise model inventory with a risk classification and assigned owner.

📝 Note

ISO 42001 is the first international standard for AI management systems. It provides a certifiable framework for organizations to demonstrate responsible AI practices, similar to how ISO 27001 certifies information security management. Certification may become a market differentiator as AI regulation increases.

★ Key Insight

Audit trails for LLM systems should use hash chaining (similar to blockchain) to ensure tamper resistance. Each log entry includes a hash of the previous entry, creating an immutable chain. If any entry is modified after the fact, the chain verification fails, alerting auditors to potential tampering.

Knowledge Check

1. What are the four core functions of the NIST AI RMF?

Show Answer
Govern (establish policies, roles, and accountability), Map (identify context, stakeholders, and risks), Measure (assess risks through testing and metrics), and Manage (mitigate risks and respond to incidents). Govern is the overarching function, while Map, Measure, and Manage form a continuous operational cycle.

2. What is SR 11-7 and why does it matter for LLM deployments in banking?

Show Answer
SR 11-7 is the US Federal Reserve's guidance on model risk management for banking institutions. It requires a three-lines-of-defense approach: model developers own the first line, independent validation teams provide the second line of challenge, and internal audit provides the third line of oversight. Any LLM used in banking decisions (credit, fraud, compliance) must comply with SR 11-7.

3. Why should audit trail entries use hash chaining?

Show Answer
Hash chaining creates a tamper-evident log where each entry includes a cryptographic hash of the previous entry. If any entry is modified after the fact, the hash chain breaks, and the verify_chain function returns False. This provides auditors with assurance that the log has not been altered, which is essential for regulatory compliance and incident investigation.

4. What should an enterprise model inventory capture for each LLM deployment?

Show Answer
At minimum: model ID, name, use case description, risk tier, owner/responsible team, deployment date, last validation date, next review date, data sources, applicable regulations, performance metrics, and known limitations. The inventory should also track dependencies (API providers, frameworks) and trigger alerts when reviews are overdue.

5. How does ISO 42001 differ from the NIST AI RMF?

Show Answer
ISO 42001 is a certifiable management system standard (similar to ISO 27001 for security), while NIST AI RMF is a voluntary framework. ISO 42001 specifies requirements for establishing, implementing, and continuously improving an AI management system, with formal audit and certification processes. NIST AI RMF provides guidance and best practices without a certification mechanism.

Key Takeaways