Why structured output matters: LLMs generate free-form text by default, but production applications need predictable, parseable data. Whether you are extracting entities from documents, generating API parameters, or building agent pipelines, you need the model's output to conform to a specific schema. This section covers two complementary approaches: constrained output formats (JSON mode, response schemas) that guarantee structural validity, and tool/function calling that lets models invoke external systems as part of their reasoning process.
1. The Structured Output Problem
Consider a simple task: extract the name, email, and sentiment from a customer support message. If you ask an LLM to do this with a plain text prompt, you might get the information scattered across prose, formatted inconsistently, or wrapped in unnecessary explanation. To build reliable pipelines, you need the model to return a specific JSON structure, every time, without deviation.
Without enforcement, an LLM asked for JSON might return: Here is the JSON: {"name": "Alice"... (wrapped in prose), or {"name": "Alice", "sentiment": "frustrated"} (missing required fields), or sometimes valid JSON with trailing commas that crash json.loads(). Teams that skip structured output enforcement spend significant engineering time writing parsing heuristics, handling edge cases, and retrying failures. The patterns in this section eliminate that entire category of bugs.
There are three levels of structured output enforcement, each with increasing reliability:
- Prompt-based: You ask the model to return JSON in the system prompt. This works most of the time, but the model can still return malformed JSON or add commentary outside the JSON block.
- JSON mode: The API guarantees the response is valid JSON, but does not enforce a specific schema. The model chooses the keys and structure.
- Schema-constrained: The API guarantees the response matches a specific JSON Schema. This is the most reliable approach, as it enforces both validity and structure at the decoding level.
2. JSON Mode and Response Schemas
2.1 OpenAI JSON Mode
OpenAI's simplest structured output option is JSON mode, activated by setting response_format={"type": "json_object"}. This guarantees the response is valid JSON but does not enforce any particular schema. You must include the word "JSON" in your prompt for this mode to work reliably.
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": "Extract contact info. Return JSON with keys: name, email, sentiment."},
{"role": "user", "content": "Hi, I'm Sarah Chen (sarah@example.com) and I love your product!"}
]
)
import json
data = json.loads(response.choices[0].message.content)
print(json.dumps(data, indent=2))
2.2 OpenAI Structured Outputs with JSON Schema
For maximum reliability, OpenAI's Structured Outputs feature lets you provide a complete JSON Schema. The model is constrained at the decoding level to produce output that conforms exactly to your schema. This means required fields are always present, types are always correct, and enum values are always valid.
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
response_format={
"type": "json_schema",
"json_schema": {
"name": "contact_extraction",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Full name of the person"},
"email": {"type": "string", "description": "Email address"},
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
"description": "Overall sentiment"
},
"confidence": {
"type": "number",
"description": "Confidence score between 0 and 1"
}
},
"required": ["name", "email", "sentiment", "confidence"],
"additionalProperties": False
}
}
},
messages=[
{"role": "user", "content": "Hi, I'm Sarah Chen (sarah@example.com) and I love your product!"}
]
)
import json
data = json.loads(response.choices[0].message.content)
print(json.dumps(data, indent=2))
print(f"Type of confidence: {type(data['confidence']).__name__}")
print(f"Sentiment is valid enum: {data['sentiment'] in ['positive', 'negative', 'neutral']}")
Schema-constrained decoding: When you use strict: true, the provider modifies the decoding process itself. At each token generation step, only tokens that could lead to valid JSON according to your schema are considered. This is fundamentally different from post-hoc validation; the model literally cannot produce invalid output. The tradeoff is that the first request with a new schema incurs a small latency penalty while the schema is compiled into decoding constraints.
3. Pydantic and the Instructor Library
While raw JSON Schema works, defining schemas as dictionaries is verbose and error-prone. The Instructor library bridges this gap by letting you define your output schema as a Pydantic model (a Python class with typed fields) and automatically converting it to the appropriate API format. Instructor works with OpenAI, Anthropic, Google, and many other providers.
3.1 Basic Instructor Usage
import instructor
from pydantic import BaseModel, Field
from openai import OpenAI
# Patch the OpenAI client with Instructor
client = instructor.from_openai(OpenAI())
# Define your schema as a Pydantic model
class ContactInfo(BaseModel):
name: str = Field(description="Full name of the person")
email: str = Field(description="Email address")
sentiment: str = Field(description="positive, negative, or neutral")
confidence: float = Field(ge=0.0, le=1.0, description="Confidence score")
# Extract structured data; the response is a Pydantic object, not raw JSON
contact = client.chat.completions.create(
model="gpt-4o",
response_model=ContactInfo,
messages=[
{"role": "user", "content": "Hi, I'm Sarah Chen (sarah@example.com) and I love your product!"}
]
)
print(f"Name: {contact.name}")
print(f"Email: {contact.email}")
print(f"Sentiment: {contact.sentiment}")
print(f"Confidence: {contact.confidence}")
print(f"Type: {type(contact).__name__}")
3.2 Nested Models and Enums
Pydantic models support rich type hierarchies, including nested objects, lists, enums, and optional fields. This allows you to define complex extraction schemas that would be tedious to express as raw JSON Schema.
import instructor
from pydantic import BaseModel, Field
from enum import Enum
from typing import Optional
from openai import OpenAI
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class Issue(BaseModel):
category: str = Field(description="Issue category like billing, technical, shipping")
description: str = Field(description="Brief description of the issue")
severity: int = Field(ge=1, le=5, description="Severity from 1 (low) to 5 (critical)")
class TicketExtraction(BaseModel):
customer_name: str
customer_email: str
sentiment: Sentiment
issues: list[Issue] = Field(description="List of issues mentioned")
requires_escalation: bool = Field(description="Whether this needs manager attention")
summary: str = Field(max_length=200, description="Brief summary of the ticket")
client = instructor.from_openai(OpenAI())
ticket = client.chat.completions.create(
model="gpt-4o",
response_model=TicketExtraction,
messages=[
{"role": "user", "content": (
"From: alex.wong@company.com\n"
"Subject: Multiple issues with my order\n\n"
"Hi, I'm Alex Wong. My order #4521 arrived damaged and I was "
"charged twice on my credit card. The box was completely crushed "
"and two items were broken. I need this resolved immediately. "
"This is the third time I've had shipping problems."
)}
]
)
print(f"Customer: {ticket.customer_name} ({ticket.customer_email})")
print(f"Sentiment: {ticket.sentiment.value}")
print(f"Escalation needed: {ticket.requires_escalation}")
print(f"Issues found: {len(ticket.issues)}")
for issue in ticket.issues:
print(f" [{issue.severity}/5] {issue.category}: {issue.description}")
Validation and retries: Instructor includes built-in retry logic. If the model's response fails Pydantic validation (for example, a severity value of 6 when the maximum is 5), Instructor automatically retries the request with the validation error included in the prompt, guiding the model to fix its output. You can configure the maximum number of retries with the max_retries parameter.
4. Function Calling and Tool Use
Structured output and function calling serve fundamentally different purposes, even though both produce JSON. Structured output constrains the model's response format (data extraction, classification). Function calling enables the model to request external actions (API calls, database queries, calculations). Think of structured output as "give me data in this shape" and function calling as "do this thing for me." They are complementary: you can use structured output to validate function call arguments, and function results can be returned as structured data.
Function calling (also called "tool use") is a mechanism that lets the model indicate it wants to invoke an external function rather than produce a text response. The model does not actually execute the function; instead, it generates a structured JSON object containing the function name and arguments. Your application code executes the function and sends the result back to the model, which then incorporates it into its response.
4.1 OpenAI Function Calling
In the OpenAI API, you define available tools in the tools parameter, each with a name, description, and a JSON Schema for its parameters. When the model decides a tool is needed, it returns a response with finish_reason="tool_calls" instead of producing text.
from openai import OpenAI
import json
client = OpenAI()
# Define the available tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
}
]
# Simulated weather function
def get_weather(city: str, units: str = "celsius") -> str:
data = {"Paris": "22C, sunny", "London": "15C, cloudy", "Tokyo": "28C, humid"}
return data.get(city, f"No data for {city}")
# Step 1: Send user message with tools
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools
)
# Step 2: Check if model wants to call a tool
message = response.choices[0].message
if message.tool_calls:
tool_call = message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print(f"Model wants to call: {tool_call.function.name}({args})")
# Step 3: Execute the function
result = get_weather(**args)
# Step 4: Send result back to model
final_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "What's the weather in Paris?"},
message, # The assistant's tool_call message
{"role": "tool", "tool_call_id": tool_call.id, "content": result}
],
tools=tools
)
print(f"Final answer: {final_response.choices[0].message.content}")
4.2 Anthropic Tool Use
Anthropic's tool use follows the same conceptual pattern but with different API conventions. Tools are defined with input_schema (instead of parameters), and the response uses content blocks with a tool_use type. The tool result is sent back as a tool_result content block.
import anthropic
import json
client = anthropic.Anthropic()
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
]
def get_weather(city, units="celsius"):
data = {"Paris": "22C, sunny", "London": "15C, cloudy"}
return data.get(city, f"No data for {city}")
# Step 1: Send request with tools
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=300,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Paris?"}]
)
# Step 2: Find the tool_use block
for block in response.content:
if block.type == "tool_use":
print(f"Tool call: {block.name}({block.input})")
result = get_weather(**block.input)
# Step 3: Send tool result back
final = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=300,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Paris?"},
{"role": "assistant", "content": response.content},
{"role": "user", "content": [
{"type": "tool_result", "tool_use_id": block.id, "content": result}
]}
]
)
print(f"Final: {final.content[0].text}")
4.3 Google Gemini Function Calling
Google Gemini uses function_declarations for tool definitions and returns function calls as structured parts in the response. The syntax differs from both OpenAI and Anthropic, but the conceptual loop (define tools, receive a call, execute, return results) is identical.
from google import genai
from google.genai import types
client = genai.Client()
# Define tools using function_declarations
weather_tool = types.Tool(
function_declarations=[
types.FunctionDeclaration(
name="get_weather",
description="Get current weather for a city",
parameters=types.Schema(
type="OBJECT",
properties={
"city": types.Schema(type="STRING", description="City name"),
},
required=["city"],
),
)
]
)
def get_weather(city: str) -> str:
data = {"Paris": "22C, sunny", "Tokyo": "28C, humid"}
return data.get(city, f"No data for {city}")
# Step 1: Send request with tools
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="What's the weather in Paris?",
config=types.GenerateContentConfig(tools=[weather_tool]),
)
# Step 2: Extract the function call from the response
part = response.candidates[0].content.parts[0]
print(f"Function call: {part.function_call.name}({dict(part.function_call.args)})")
result = get_weather(**dict(part.function_call.args))
# Step 3: Send function response back
from google.genai.types import Content, Part
final = client.models.generate_content(
model="gemini-2.5-flash",
contents=[
Content(parts=[Part(text="What's the weather in Paris?")], role="user"),
response.candidates[0].content,
Content(parts=[Part(function_response=types.FunctionResponse(
name="get_weather", response={"result": result}
))], role="user"),
],
config=types.GenerateContentConfig(tools=[weather_tool]),
)
print(f"Final: {final.text}")
Malformed tool call arguments: Although models are generally reliable at producing valid JSON for tool calls, they can occasionally generate malformed arguments, especially with complex schemas. Always wrap your json.loads() call in a try/except block and implement a retry strategy. When using Instructor with tools, this retry logic is handled automatically.
5. Parallel and Sequential Tool Calls
Modern LLMs can request multiple tool calls in a single response. For instance, if a user asks "What is the weather in Paris and Tokyo?", the model may emit two get_weather calls simultaneously. Your application should detect multiple tool calls and execute them in parallel for better performance.
import asyncio
from openai import OpenAI
import json
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}
]
def get_weather(city):
data = {"Paris": "22C, sunny", "Tokyo": "28C, humid", "London": "15C, cloudy"}
return data.get(city, "Unknown")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Weather in Paris and Tokyo?"}],
tools=tools
)
message = response.choices[0].message
print(f"Number of tool calls: {len(message.tool_calls)}")
# Execute all tool calls and collect results
tool_messages = []
for tc in message.tool_calls:
args = json.loads(tc.function.arguments)
result = get_weather(**args)
print(f" {tc.function.name}({args}) = {result}")
tool_messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": result
})
# Send all results back at once
final = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Weather in Paris and Tokyo?"},
message,
*tool_messages
],
tools=tools
)
print(f"\nFinal: {final.choices[0].message.content}")
6. Cross-Provider Tool Use Comparison
| Aspect | OpenAI | Anthropic | Google Gemini |
|---|---|---|---|
| Tool definition key | tools[].function.parameters |
tools[].input_schema |
tools[].function_declarations |
| Tool call in response | message.tool_calls[] |
Content block type tool_use |
function_call part |
| Tool result role | "tool" |
"user" with tool_result block |
"function" response part |
| Parallel calls | Yes (multiple tool_calls) | Yes (multiple tool_use blocks) | Yes |
| Force tool use | tool_choice: "required" |
tool_choice: {"type": "any"} |
tool_config: {mode: "ANY"} |
Tools + structured output = reliable agents: Function calling and structured output serve complementary roles. Structured output constrains what the model produces (data extraction). Tool use enables what the model can do (action execution). Combining both lets you build agent loops where the model reasons about which actions to take, invokes tools with validated parameters, and returns structured results. This combination is the foundation of the agentic architectures we will explore in later modules.
The tool definitions shown in this section are provider-specific (OpenAI format, Anthropic format, Gemini format). The Model Context Protocol (MCP) is an emerging open standard that defines a provider-agnostic way to expose tools, data sources, and prompts to LLMs. Rather than defining tools in each provider's format, you define them once in MCP format, and compatible clients handle the translation. As the agentic ecosystem matures, MCP (or a successor) is likely to become the standard interface between LLMs and external systems.
Knowledge Check
Show Answer
json_schema with strict: true) guarantees the response matches a specific JSON Schema, including required fields, correct types, and valid enum values. Schema-constrained output modifies the decoding process itself, making it impossible for the model to produce output that violates the schema.Show Answer
Show Answer
Show Answer
"user" containing a content block of type "tool_result". The block must include the tool_use_id from the original tool call and the result as a string in the content field. This differs from OpenAI, which uses a dedicated "tool" role.Show Answer
message.tool_calls for OpenAI, or multiple tool_use content blocks for Anthropic). Your application should execute all the tool calls (ideally in parallel for better performance), collect all results, and send them all back in the next request so the model can incorporate all the information into its final response.Key Takeaways
- Three levels of structure: Prompt-based JSON (~85% reliable), JSON mode (~98% reliable), and schema-constrained output (~100% reliable). Always use the strongest enforcement your provider supports.
- Pydantic + Instructor simplifies everything: Define schemas as Python classes with type annotations. Instructor handles conversion to provider-specific formats, validation, and retries automatically.
- Function calling is a proposal, not execution: The model generates structured arguments for a function call. Your code executes the function and returns the result. This separation keeps the model sandboxed.
- Tool definitions differ across providers: OpenAI uses
parameters, Anthropic usesinput_schema, and Google usesfunction_declarations. The conceptual pattern is identical, but the JSON structures require provider-specific handling. - Parallel tool calls improve performance: Modern models can request multiple tool calls simultaneously. Execute them concurrently and return all results together.
- Tools + structured output = reliable agents: Structured output constrains the format; tool use enables external actions. Together, they form the backbone of agentic architectures.
- Beyond Pydantic: For teams that need compile-time type safety across multiple services, BAML offers a dedicated schema language that compiles to type-safe client code. See Section 11.5 for BAML in action with information extraction.