A comprehensive, hands-on course covering the full stack of modern Large Language Model technology: from foundational NLP to production-grade AI agent systems.
Prerequisite refresher covering core machine learning concepts and hands-on PyTorch programming. Ensures all students share a common foundation before diving into NLP and LLMs.
nn.Module: layers, parameters, forward passDataset, DataLoader, transforms, batchingstate_dict, checkpointstorch.profilerBuild intuition for how machines understand text: from bag-of-words to dense vector spaces. Covers classical and neural word representations that underpin all modern LLM work.
Tokenization is the critical first step of every LLM pipeline. Understand the algorithms behind BPE, WordPiece, and SentencePiece, and learn how tokenizer choice affects model behavior, cost, and multilingual capability.
Trace the evolution from RNNs to the attention mechanism: the key breakthrough that enabled transformers. Build deep intuition for how attention works mathematically and conceptually.
nn.MultiheadAttention; visualize attention weight distributions and entropyDeep dive into the full transformer architecture: the foundation of every modern LLM. Understand every component, from positional encoding to layer normalization, and implement one from scratch.
Understand how LLMs generate text token-by-token. Master the algorithms that control quality, diversity, and speed of generation: from greedy search to speculative decoding. Identified as a gap vs. Stanford CS336 and CMU ANLP.
Understand how LLMs are trained at scale: pre-training objectives, data curation pipelines, scaling laws, and the computational infrastructure behind modern foundation models. Expanded with deeper treatment of scaling laws and data curation per Stanford CS336.
Survey the current state of LLMs, both closed and open-source, and understand the architectural innovations, reasoning capabilities, and multilingual dimensions of modern models.
Master the techniques that make LLM inference fast and affordable: from quantization and KV cache optimization to speculative decoding and high-throughput serving. Identified as a gap vs. Stanford CS336 and CMU ANLP.
Master the practical skills of calling, configuring, and optimizing LLM APIs from all major providers.
Prompting is programming with natural language. Learn systematic techniques from basic few-shot to advanced reasoning chains, reflection patterns, and automated prompt optimization.
In production, LLMs rarely work alone. Learn when to use an LLM vs. classical ML, how to combine them in hybrid architectures, and how to make principled cost-performance tradeoffs. Addresses the #1 gap identified across all three executive perspectives.
Synthetic data is the backbone of this course's project. Learn to generate high-quality, diverse, and domain-specific datasets, and use LLMs as simulators for evaluation and testing.
Learn the complete workflow of fine-tuning LLMs: from data preparation and formatting to training, monitoring, and evaluating adapted models.
Train large models on consumer hardware by only updating a fraction of parameters. Master LoRA, QLoRA, and other PEFT methods that democratize fine-tuning.
Create smaller, faster models that retain the capabilities of larger ones. Learn distillation techniques and model merging strategies that are widely used in the open-source LLM community. Identified as a gap: core technique behind Phi, Orca, distilled DeepSeek-R1.
Align LLMs with human preferences using reinforcement learning and direct optimization methods.
Peer inside the black box. Understand how and why LLMs produce their outputs using probing, attention analysis, and mechanistic interpretability techniques. Identified as a gap vs. Berkeley CS294-267 Understanding LLMs.
Master the retrieval infrastructure that powers RAG systems.
Build production-quality RAG systems: from naive implementations to advanced architectures with re-ranking, query transformation, knowledge graphs, and deep research agents.
Design and implement robust conversational AI: from simple chatbots to complex multi-turn dialogue systems with state management, memory, personas, and personality.
Build autonomous AI agents that reason, plan, use tools, and take actions. Covers the four core agentic patterns: reflection, tool use, planning, and multi-agent collaboration.
Scale from single agents to multi-agent architectures. Learn modern agent frameworks, orchestration patterns, and how to build complex systems where multiple agents collaborate.
Extend LLMs beyond text into image, audio, video, and 3D generation. Understand the architectures behind the most impactful generative AI systems.
Survey the most impactful real-world applications of LLMs across industries. For each domain, understand the architecture patterns, unique challenges, risks, and the current state of the art.
You can't improve what you can't measure. Learn systematic approaches to evaluating LLM outputs, designing rigorous experiments, testing agent behavior, and monitoring production systems.
Take LLM applications from notebook to production. Cover deployment, scaling, security hardening, and the ethical and regulatory frameworks for responsible AI systems.
The business and organizational layer that turns LLM technology into business value. Covers strategy, product thinking, ROI measurement, vendor evaluation, and compute planning. Addresses critical gaps from the Head of AI and Head of Data Science perspectives.
Integrate everything from the course into a complete, deployable conversational AI system built on synthetic data: demonstrating mastery of the full LLM application stack.
Reference appendix covering the essential mathematical background for understanding LLMs: linear algebra, calculus, probability, information theory, and optimization.
Core machine learning concepts that underpin LLM training and evaluation: learning paradigms, loss functions, training pipelines, evaluation metrics, and classical algorithms.
Python tooling and practices essential for LLM development: environment management, key libraries, async programming, type safety, and debugging.
Step-by-step guides for setting up local and cloud development environments for LLM work: GPU setup, model serving, API keys, cloud instances, and containerization.
Version control and collaboration practices tailored for machine learning projects: experiment tracking, data versioning, and notebook management.
Comprehensive alphabetical glossary of 300+ technical terms used throughout the course. Each entry includes a concise definition and a reference to the module where it is first introduced.
Quick-reference tables for GPU specifications, VRAM requirements, training cost estimates, and guidance on when to use different compute tiers.
One-page summaries for the 20 most-used models, covering architecture type, parameter count, context window, license, key strengths, and API access.
Ready-to-use prompt templates organized by task type: classification, extraction, summarization, code generation, evaluation, synthetic data, and agent systems.
Comprehensive reference for major LLM benchmarks and datasets, organized by category. For each: what it measures, size, known limitations, and contamination status.
Key libraries, frameworks, and platforms used throughout the course.
Core ML & LLM
Inference & Serving
LLM APIs & SDKs
RAG & Vector Search
Agents & Orchestration
Data & Evaluation
Observability & Deployment
NLP & Interpretability