Module 14: Parameter-Efficient Fine-Tuning (PEFT)

Chapter Overview

Full fine-tuning of a 7B parameter model requires 56 GB of memory just for the weights in FP16, plus optimizer states that can triple that figure. For most practitioners, this puts full fine-tuning out of reach without expensive multi-GPU setups. Parameter-efficient fine-tuning (PEFT) methods solve this problem by training only a tiny fraction of parameters (often less than 1%) while achieving quality that rivals or matches full fine-tuning.

This module covers the most important PEFT techniques in depth, starting with LoRA and QLoRA (the dominant methods in practice) and extending to newer approaches like DoRA, LoRA+, and adapter-based methods. You will learn not just the theory behind each method, but also how to configure hyperparameters, select target modules, and merge adapters for efficient deployment.

The final section surveys the rapidly evolving ecosystem of training platforms and tools, from Unsloth (which delivers 2x speedups with half the memory) to managed platforms like Axolotl and LLaMA-Factory. By the end of this module, you will be able to fine-tune any open-weight model on a single consumer GPU.

Learning Objectives

Explain the mathematical foundation of LoRA, including the low-rank decomposition W' = W + BA and why it works
Configure LoRA hyperparameters (rank, alpha, target modules, dropout) for different model architectures and task types
Apply QLoRA with NF4 quantization, double quantization, and paged optimizers to fine-tune large models on consumer hardware
Compare advanced PEFT methods (DoRA, LoRA+, Prefix Tuning, IA3, adapters) and select the right one for a given scenario
Implement multi-adapter serving strategies using LoRAX or S-LoRA for production deployments
Use modern training platforms (Unsloth, Axolotl, LLaMA-Factory, torchtune, TRL) to streamline the fine-tuning workflow
Merge trained LoRA adapters back into base models and evaluate the merged result
Select appropriate cloud infrastructure (Colab, Lambda Labs, RunPod, Modal) based on budget and scale requirements

Prerequisites

Module 13: Fine-Tuning Fundamentals (supervised fine-tuning workflow, data preparation, evaluation)
Module 08: Inference Optimization (quantization basics, GPU memory concepts)
Module 06: Inside the Transformer (attention mechanism, weight matrices)
Familiarity with PyTorch training loops and the Hugging Face Transformers library
Basic linear algebra (matrix multiplication, rank of a matrix)

Chapter Overview

Learning Objectives

Prerequisites

Sections