Module 05: Decoding Strategies & Text Generation

Chapter Overview

A language model learns a probability distribution over sequences of tokens, but that distribution alone does not produce text. The bridge between a trained model and the words it generates is the decoding strategy: the algorithm that selects which token comes next (or, in newer paradigms, which tokens appear all at once). The choice of decoding method profoundly affects quality, diversity, coherence, speed, and even the safety of generated output.

This chapter walks through the full landscape of text generation, from the simplest deterministic methods (greedy search, beam search) through stochastic sampling techniques (temperature, top-k, top-p, min-p) to advanced and emerging approaches (contrastive decoding, speculative decoding, structured generation, watermarking, and diffusion-based language models). By the end, you will understand not just what each method does, but when and why to choose one over another.

Learning Objectives

Implement greedy decoding and beam search from scratch; explain their strengths and failure modes
Apply temperature scaling, top-k, top-p, and min-p sampling; visualize how each reshapes the token probability distribution
Explain repetition penalties, frequency penalties, and presence penalties, and when each is appropriate
Describe contrastive decoding, speculative decoding, and minimum Bayes risk decoding at a conceptual and algorithmic level
Use grammar-constrained decoding to enforce structured output (JSON, XML) at the logit level
Explain the principles behind text watermarking and its implications for AI safety
Articulate how diffusion-based language models differ from autoregressive generation, including their advantages and current limitations

Sections

Prerequisites

Module 03: Sequence Models and the Attention Mechanism
Module 04: The Transformer Architecture (particularly the decoder and autoregressive generation)
Familiarity with softmax, probability distributions, and basic PyTorch