Module 11: Hybrid ML+LLM Architectures & Decision Frameworks

Chapter Overview

In production systems, LLMs rarely work in isolation. The most effective architectures combine large language models with classical machine learning, rules engines, and traditional software in carefully designed pipelines. The challenge is knowing when to use an LLM, when a simpler model will do, and how to orchestrate both into a system that maximizes quality while minimizing cost and latency.

This module provides a principled decision framework for choosing between LLMs and classical ML. It covers patterns for using LLMs as feature extractors, building hybrid triage and escalation pipelines, optimizing total cost of ownership, and extracting structured information from unstructured text. Each pattern is grounded in real production scenarios with concrete benchmarks, code examples, and cost analyses.

By the end of this module, you will be able to evaluate any ML task against a rigorous decision matrix, design hybrid architectures that route work to the right model at the right cost, and build production information extraction pipelines that combine classical NLP with LLM capabilities.

Learning Objectives

Apply a structured decision framework to determine when an LLM is appropriate versus classical ML, rule-based systems, or hybrid approaches
Calculate per-query cost at scale for different model tiers and identify breakeven points between API and self-hosted inference
Use LLM-generated embeddings as features in classical ML pipelines and evaluate their impact on downstream accuracy
Design hybrid architectures including classical triage with LLM escalation, confidence-based routing, and ensemble voting
Build cascading model systems that route queries from small to large models based on complexity signals
Perform total cost of ownership analysis across API costs, infrastructure, engineering time, and maintenance
Construct a quality-cost Pareto frontier and select optimal operating points for production deployments
Build information extraction pipelines combining spaCy NER with LLM-based relation extraction and structured output enforcement using BAML and Instructor

Prerequisites

Module 09: LLM APIs and Tooling (API usage, structured outputs, function calling)
Module 10: Prompt Engineering (few-shot prompting, output formatting)
Module 08: Inference Optimization (quantization, serving infrastructure, latency concepts)
Familiarity with classical ML concepts: logistic regression, XGBoost, TF-IDF, embeddings
Python, scikit-learn, and basic NLP library experience (spaCy or similar)

Chapter Overview

Learning Objectives

Prerequisites

Sections