Capstone Project: End-to-End LLM System

Project Overview

The capstone project is the culminating experience of this course. You will design, build, and present a complete LLM-powered system that demonstrates mastery across the full stack: data preparation, model training and adaptation, retrieval-augmented generation, agent orchestration, production deployment, evaluation, and business strategy.

Unlike individual module labs that focus on a single technique, the capstone requires you to make architectural decisions that balance competing concerns: model quality versus latency, accuracy versus cost, flexibility versus reliability. These tradeoffs are what distinguish a course exercise from a production system.

You will work on this project over approximately 4 to 6 weeks. The project culminates in a GitHub repository with working code, a model and dataset published on Hugging Face Hub, a written technical report, and a 15-minute presentation.

        What Makes a Strong Capstone
        Integration over novelty: The goal is not to invent a new architecture but to demonstrate that you can combine multiple techniques into a coherent, working system.
Production mindset: Include evaluation suites, monitoring hooks, safety guardrails, and deployment configuration. A demo that works on a laptop is not enough.
Business grounding: Frame the project around a real use case with measurable success criteria and an ROI estimate, not just a technical exercise.
Honest evaluation: Report what does not work as thoroughly as what does. Identifying limitations demonstrates deeper understanding than cherry-picked results.

    

Learning Objectives

Design an end-to-end LLM system architecture that balances quality, cost, latency, and safety
Prepare and publish a synthetic or curated dataset suitable for fine-tuning
Fine-tune or adapt a language model using techniques from Modules 12 to 17
Build a RAG pipeline with vector search, reranking, and citation generation
Implement an agent with tool use, planning, and multi-step reasoning
Deploy the system with appropriate security, monitoring, and observability instrumentation
Design and execute a rigorous evaluation suite with both automated and human evaluation
Produce a technical report with architecture diagrams, evaluation results, and honest limitation analysis
Present the project in a clear, concise 15-minute format suitable for technical and business audiences

Capstone Sections

C.1 Requirements & Deliverables Detailed technical requirements (synthetic dataset, fine-tuned model, RAG system, agent with tools, deep research, production deployment, security, evaluation suite, hybrid architecture, ROI analysis, risk governance) and deliverable specifications (GitHub repo, HF Hub artifacts, technical report, demo, presentation).

Suggested Timeline (6 weeks)

Week 1 Design: Select use case, define requirements, design architecture, identify datasets

Week 2 Data + Model: Prepare synthetic dataset, begin fine-tuning or adapter training

Week 3 RAG + Agent: Build RAG pipeline, implement agent with tools, integrate components

Week 4 Deploy + Evaluate: Deploy to cloud, set up monitoring, run evaluation suite

Week 5 Refine: Address evaluation findings, add safety guardrails, optimize performance

Week 6 Report + Present: Write technical report, prepare presentation, publish artifacts

Deliverable Summary

GitHub Repository with clean code, README, and deployment instructions
Hugging Face Hub artifacts: fine-tuned model and curated dataset
Technical Report (8 to 12 pages) with architecture, evaluation, and limitations
Interpretability Analysis documenting attention patterns, token attributions, or probing results
Live Demo (deployed or screencast) showing the system in action
Presentation (15 minutes) covering motivation, architecture, results, and lessons learned