Module 09: Working with LLM APIs

Chapter Overview

Large language models are only as useful as the interface through which you access them. For the vast majority of production applications, that interface is an API: a set of HTTP endpoints exposed by OpenAI, Anthropic, Google, or an open-source serving framework. Knowing how to call these APIs correctly, efficiently, and reliably is a core skill for any engineer building with LLMs.

This chapter covers the full lifecycle of working with LLM APIs. We begin with the landscape of providers and their architectural differences, then move into structured output techniques and tool integration patterns that let models interact with external systems. Finally, we tackle the engineering challenges of running LLM calls in production: routing across providers, caching, retry strategies, circuit breakers, cost management, and observability.

Learning Objectives

Call the Chat Completions API (OpenAI), Messages API (Anthropic), and Gemini API (Google) with correct parameter usage
Implement streaming responses using Server-Sent Events and understand when streaming is appropriate
Use function calling and tool use to connect LLMs to external systems
Enforce structured output with JSON mode, response schemas, and validation libraries like Instructor and Pydantic
Build provider-agnostic LLM clients using abstraction layers such as LiteLLM
Implement production-grade error handling with retry logic, circuit breakers, and graceful degradation
Design caching strategies (semantic caching, prompt caching) to reduce cost and latency
Set up token budget enforcement, cost tracking, and observability using AI gateways

Sections

Prerequisites

Module 05: Decoding Strategies and Text Generation (understanding of temperature, top-p, and generation parameters)
Module 08: Inference Optimization and Efficient Serving (context for why APIs are structured as they are)
Basic familiarity with Python, HTTP requests, and JSON
API keys for at least one provider (OpenAI, Anthropic, or Google) for hands-on labs