Section 9.2: Structured Output & Tool Integration

★ Big Picture

Why structured output matters: LLMs generate free-form text by default, but production applications need predictable, parseable data. Whether you are extracting entities from documents, generating API parameters, or building agent pipelines, you need the model's output to conform to a specific schema. This section covers two complementary approaches: constrained output formats (JSON mode, response schemas) that guarantee structural validity, and tool/function calling that lets models invoke external systems as part of their reasoning process.

1. The Structured Output Problem

Consider a simple task: extract the name, email, and sentiment from a customer support message. If you ask an LLM to do this with a plain text prompt, you might get the information scattered across prose, formatted inconsistently, or wrapped in unnecessary explanation. To build reliable pipelines, you need the model to return a specific JSON structure, every time, without deviation.

🎯 The Pain Without Structured Output

Without enforcement, an LLM asked for JSON might return: Here is the JSON: {"name": "Alice"... (wrapped in prose), or {"name": "Alice", "sentiment": "frustrated"} (missing required fields), or sometimes valid JSON with trailing commas that crash json.loads(). Teams that skip structured output enforcement spend significant engineering time writing parsing heuristics, handling edge cases, and retrying failures. The patterns in this section eliminate that entire category of bugs.

There are three levels of structured output enforcement, each with increasing reliability:

Prompt-based: You ask the model to return JSON in the system prompt. This works most of the time, but the model can still return malformed JSON or add commentary outside the JSON block.
JSON mode: The API guarantees the response is valid JSON, but does not enforce a specific schema. The model chooses the keys and structure.
Schema-constrained: The API guarantees the response matches a specific JSON Schema. This is the most reliable approach, as it enforces both validity and structure at the decoding level.

Figure 9.3: Three levels of structured output enforcement. Production systems should use Level 3 (schema-constrained) whenever the provider supports it.

2. JSON Mode and Response Schemas

2.1 OpenAI JSON Mode

OpenAI's simplest structured output option is JSON mode, activated by setting response_format={"type": "json_object"}. This guarantees the response is valid JSON but does not enforce any particular schema. You must include the word "JSON" in your prompt for this mode to work reliably.

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Extract contact info. Return JSON with keys: name, email, sentiment."},
        {"role": "user", "content": "Hi, I'm Sarah Chen (sarah@example.com) and I love your product!"}
    ]
)

import json
data = json.loads(response.choices[0].message.content)
print(json.dumps(data, indent=2))

{ "name": "Sarah Chen", "email": "sarah@example.com", "sentiment": "positive" }

2.2 OpenAI Structured Outputs with JSON Schema

For maximum reliability, OpenAI's Structured Outputs feature lets you provide a complete JSON Schema. The model is constrained at the decoding level to produce output that conforms exactly to your schema. This means required fields are always present, types are always correct, and enum values are always valid.

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "contact_extraction",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string", "description": "Full name of the person"},
                    "email": {"type": "string", "description": "Email address"},
                    "sentiment": {
                        "type": "string",
                        "enum": ["positive", "negative", "neutral"],
                        "description": "Overall sentiment"
                    },
                    "confidence": {
                        "type": "number",
                        "description": "Confidence score between 0 and 1"
                    }
                },
                "required": ["name", "email", "sentiment", "confidence"],
                "additionalProperties": False
            }
        }
    },
    messages=[
        {"role": "user", "content": "Hi, I'm Sarah Chen (sarah@example.com) and I love your product!"}
    ]
)

import json
data = json.loads(response.choices[0].message.content)
print(json.dumps(data, indent=2))
print(f"Type of confidence: {type(data['confidence']).__name__}")
print(f"Sentiment is valid enum: {data['sentiment'] in ['positive', 'negative', 'neutral']}")

{ "name": "Sarah Chen", "email": "sarah@example.com", "sentiment": "positive", "confidence": 0.95 } Type of confidence: float Sentiment is valid enum: True

✓ Key Insight

Schema-constrained decoding: When you use strict: true, the provider modifies the decoding process itself. At each token generation step, only tokens that could lead to valid JSON according to your schema are considered. This is fundamentally different from post-hoc validation; the model literally cannot produce invalid output. The tradeoff is that the first request with a new schema incurs a small latency penalty while the schema is compiled into decoding constraints.

3. Pydantic and the Instructor Library

While raw JSON Schema works, defining schemas as dictionaries is verbose and error-prone. The Instructor library bridges this gap by letting you define your output schema as a Pydantic model (a Python class with typed fields) and automatically converting it to the appropriate API format. Instructor works with OpenAI, Anthropic, Google, and many other providers.

3.1 Basic Instructor Usage

import instructor
from pydantic import BaseModel, Field
from openai import OpenAI

# Patch the OpenAI client with Instructor
client = instructor.from_openai(OpenAI())

# Define your schema as a Pydantic model
class ContactInfo(BaseModel):
    name: str = Field(description="Full name of the person")
    email: str = Field(description="Email address")
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(ge=0.0, le=1.0, description="Confidence score")

# Extract structured data; the response is a Pydantic object, not raw JSON
contact = client.chat.completions.create(
    model="gpt-4o",
    response_model=ContactInfo,
    messages=[
        {"role": "user", "content": "Hi, I'm Sarah Chen (sarah@example.com) and I love your product!"}
    ]
)

print(f"Name: {contact.name}")
print(f"Email: {contact.email}")
print(f"Sentiment: {contact.sentiment}")
print(f"Confidence: {contact.confidence}")
print(f"Type: {type(contact).__name__}")

Name: Sarah Chen Email: sarah@example.com Sentiment: positive Confidence: 0.95 Type: ContactInfo

3.2 Nested Models and Enums

Pydantic models support rich type hierarchies, including nested objects, lists, enums, and optional fields. This allows you to define complex extraction schemas that would be tedious to express as raw JSON Schema.

import instructor
from pydantic import BaseModel, Field
from enum import Enum
from typing import Optional
from openai import OpenAI

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class Issue(BaseModel):
    category: str = Field(description="Issue category like billing, technical, shipping")
    description: str = Field(description="Brief description of the issue")
    severity: int = Field(ge=1, le=5, description="Severity from 1 (low) to 5 (critical)")

class TicketExtraction(BaseModel):
    customer_name: str
    customer_email: str
    sentiment: Sentiment
    issues: list[Issue] = Field(description="List of issues mentioned")
    requires_escalation: bool = Field(description="Whether this needs manager attention")
    summary: str = Field(max_length=200, description="Brief summary of the ticket")

client = instructor.from_openai(OpenAI())

ticket = client.chat.completions.create(
    model="gpt-4o",
    response_model=TicketExtraction,
    messages=[
        {"role": "user", "content": (
            "From: alex.wong@company.com\n"
            "Subject: Multiple issues with my order\n\n"
            "Hi, I'm Alex Wong. My order #4521 arrived damaged and I was "
            "charged twice on my credit card. The box was completely crushed "
            "and two items were broken. I need this resolved immediately. "
            "This is the third time I've had shipping problems."
        )}
    ]
)

print(f"Customer: {ticket.customer_name} ({ticket.customer_email})")
print(f"Sentiment: {ticket.sentiment.value}")
print(f"Escalation needed: {ticket.requires_escalation}")
print(f"Issues found: {len(ticket.issues)}")
for issue in ticket.issues:
    print(f"  [{issue.severity}/5] {issue.category}: {issue.description}")

Customer: Alex Wong (alex.wong@company.com) Sentiment: negative Escalation needed: True Issues found: 2 [4/5] shipping: Order arrived with crushed box and two broken items [5/5] billing: Customer was charged twice on credit card

ⓘ Note

Validation and retries: Instructor includes built-in retry logic. If the model's response fails Pydantic validation (for example, a severity value of 6 when the maximum is 5), Instructor automatically retries the request with the validation error included in the prompt, guiding the model to fix its output. You can configure the maximum number of retries with the max_retries parameter.

4. Function Calling and Tool Use

🎯 Important Distinction

Structured output and function calling serve fundamentally different purposes, even though both produce JSON. Structured output constrains the model's response format (data extraction, classification). Function calling enables the model to request external actions (API calls, database queries, calculations). Think of structured output as "give me data in this shape" and function calling as "do this thing for me." They are complementary: you can use structured output to validate function call arguments, and function results can be returned as structured data.

Function calling (also called "tool use") is a mechanism that lets the model indicate it wants to invoke an external function rather than produce a text response. The model does not actually execute the function; instead, it generates a structured JSON object containing the function name and arguments. Your application code executes the function and sends the result back to the model, which then incorporates it into its response.

Figure 9.4: The function calling loop. The model proposes a tool call, your code executes it, and the result is sent back for the model to formulate a response.

4.1 OpenAI Function Calling

In the OpenAI API, you define available tools in the tools parameter, each with a name, description, and a JSON Schema for its parameters. When the model decides a tool is needed, it returns a response with finish_reason="tool_calls" instead of producing text.

from openai import OpenAI
import json

client = OpenAI()

# Define the available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

# Simulated weather function
def get_weather(city: str, units: str = "celsius") -> str:
    data = {"Paris": "22C, sunny", "London": "15C, cloudy", "Tokyo": "28C, humid"}
    return data.get(city, f"No data for {city}")

# Step 1: Send user message with tools
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

# Step 2: Check if model wants to call a tool
message = response.choices[0].message
if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    print(f"Model wants to call: {tool_call.function.name}({args})")

    # Step 3: Execute the function
    result = get_weather(**args)

    # Step 4: Send result back to model
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": "What's the weather in Paris?"},
            message,  # The assistant's tool_call message
            {"role": "tool", "tool_call_id": tool_call.id, "content": result}
        ],
        tools=tools
    )
    print(f"Final answer: {final_response.choices[0].message.content}")

Model wants to call: get_weather({'city': 'Paris'}) Final answer: The current weather in Paris is 22°C and sunny.

4.2 Anthropic Tool Use

Anthropic's tool use follows the same conceptual pattern but with different API conventions. Tools are defined with input_schema (instead of parameters), and the response uses content blocks with a tool_use type. The tool result is sent back as a tool_result content block.

import anthropic
import json

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
]

def get_weather(city, units="celsius"):
    data = {"Paris": "22C, sunny", "London": "15C, cloudy"}
    return data.get(city, f"No data for {city}")

# Step 1: Send request with tools
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=300,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Paris?"}]
)

# Step 2: Find the tool_use block
for block in response.content:
    if block.type == "tool_use":
        print(f"Tool call: {block.name}({block.input})")
        result = get_weather(**block.input)

        # Step 3: Send tool result back
        final = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=300,
            tools=tools,
            messages=[
                {"role": "user", "content": "What's the weather in Paris?"},
                {"role": "assistant", "content": response.content},
                {"role": "user", "content": [
                    {"type": "tool_result", "tool_use_id": block.id, "content": result}
                ]}
            ]
        )
        print(f"Final: {final.content[0].text}")

Tool call: get_weather({'city': 'Paris'}) Final: The current weather in Paris is 22°C and sunny.

4.3 Google Gemini Function Calling

Google Gemini uses function_declarations for tool definitions and returns function calls as structured parts in the response. The syntax differs from both OpenAI and Anthropic, but the conceptual loop (define tools, receive a call, execute, return results) is identical.

from google import genai
from google.genai import types

client = genai.Client()

# Define tools using function_declarations
weather_tool = types.Tool(
    function_declarations=[
        types.FunctionDeclaration(
            name="get_weather",
            description="Get current weather for a city",
            parameters=types.Schema(
                type="OBJECT",
                properties={
                    "city": types.Schema(type="STRING", description="City name"),
                },
                required=["city"],
            ),
        )
    ]
)

def get_weather(city: str) -> str:
    data = {"Paris": "22C, sunny", "Tokyo": "28C, humid"}
    return data.get(city, f"No data for {city}")

# Step 1: Send request with tools
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What's the weather in Paris?",
    config=types.GenerateContentConfig(tools=[weather_tool]),
)

# Step 2: Extract the function call from the response
part = response.candidates[0].content.parts[0]
print(f"Function call: {part.function_call.name}({dict(part.function_call.args)})")
result = get_weather(**dict(part.function_call.args))

# Step 3: Send function response back
from google.genai.types import Content, Part

final = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        Content(parts=[Part(text="What's the weather in Paris?")], role="user"),
        response.candidates[0].content,
        Content(parts=[Part(function_response=types.FunctionResponse(
            name="get_weather", response={"result": result}
        ))], role="user"),
    ],
    config=types.GenerateContentConfig(tools=[weather_tool]),
)
print(f"Final: {final.text}")

Function call: get_weather({'city': 'Paris'}) Final: The current weather in Paris is 22°C and sunny.

⚠ Warning

Malformed tool call arguments: Although models are generally reliable at producing valid JSON for tool calls, they can occasionally generate malformed arguments, especially with complex schemas. Always wrap your json.loads() call in a try/except block and implement a retry strategy. When using Instructor with tools, this retry logic is handled automatically.

5. Parallel and Sequential Tool Calls

Modern LLMs can request multiple tool calls in a single response. For instance, if a user asks "What is the weather in Paris and Tokyo?", the model may emit two get_weather calls simultaneously. Your application should detect multiple tool calls and execute them in parallel for better performance.

import asyncio
from openai import OpenAI
import json

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    }
]

def get_weather(city):
    data = {"Paris": "22C, sunny", "Tokyo": "28C, humid", "London": "15C, cloudy"}
    return data.get(city, "Unknown")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Weather in Paris and Tokyo?"}],
    tools=tools
)

message = response.choices[0].message
print(f"Number of tool calls: {len(message.tool_calls)}")

# Execute all tool calls and collect results
tool_messages = []
for tc in message.tool_calls:
    args = json.loads(tc.function.arguments)
    result = get_weather(**args)
    print(f"  {tc.function.name}({args}) = {result}")
    tool_messages.append({
        "role": "tool",
        "tool_call_id": tc.id,
        "content": result
    })

# Send all results back at once
final = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Weather in Paris and Tokyo?"},
        message,
        *tool_messages
    ],
    tools=tools
)
print(f"\nFinal: {final.choices[0].message.content}")

Number of tool calls: 2 get_weather({'city': 'Paris'}) = 22C, sunny get_weather({'city': 'Tokyo'}) = 28C, humid Final: Here's the current weather: - Paris: 22°C and sunny - Tokyo: 28°C and humid

6. Cross-Provider Tool Use Comparison

Aspect	OpenAI	Anthropic	Google Gemini
Tool definition key	`tools[].function.parameters`	`tools[].input_schema`	`tools[].function_declarations`
Tool call in response	`message.tool_calls[]`	Content block type `tool_use`	`function_call` part
Tool result role	`"tool"`	`"user"` with `tool_result` block	`"function"` response part
Parallel calls	Yes (multiple tool_calls)	Yes (multiple tool_use blocks)	Yes
Force tool use	`tool_choice: "required"`	`tool_choice: {"type": "any"}`	`tool_config: {mode: "ANY"}`

✓ Key Insight

Tools + structured output = reliable agents: Function calling and structured output serve complementary roles. Structured output constrains what the model produces (data extraction). Tool use enables what the model can do (action execution). Combining both lets you build agent loops where the model reasons about which actions to take, invokes tools with validated parameters, and returns structured results. This combination is the foundation of the agentic architectures we will explore in later modules.

💡 Emerging Standard: Model Context Protocol (MCP)

The tool definitions shown in this section are provider-specific (OpenAI format, Anthropic format, Gemini format). The Model Context Protocol (MCP) is an emerging open standard that defines a provider-agnostic way to expose tools, data sources, and prompts to LLMs. Rather than defining tools in each provider's format, you define them once in MCP format, and compatible clients handle the translation. As the agentic ecosystem matures, MCP (or a successor) is likely to become the standard interface between LLMs and external systems.

Knowledge Check

1. What is the difference between JSON mode and schema-constrained structured output?

Show Answer

JSON mode guarantees the response is valid JSON but does not enforce any particular schema; the model chooses the keys and structure. Schema-constrained output (using json_schema with strict: true) guarantees the response matches a specific JSON Schema, including required fields, correct types, and valid enum values. Schema-constrained output modifies the decoding process itself, making it impossible for the model to produce output that violates the schema.

2. In function calling, does the model execute the function?

Show Answer

No. The model generates a JSON object containing the function name and arguments, but your application code is responsible for executing the function. The result is then sent back to the model in a subsequent message, and the model uses that result to formulate its final text response.

3. What advantage does the Instructor library provide over raw JSON Schema?

Show Answer

Instructor lets you define output schemas as Pydantic models (Python classes with typed fields), which are more readable and maintainable than raw JSON Schema dictionaries. It also provides automatic conversion to the appropriate API format for different providers, built-in retry logic when validation fails, and returns typed Python objects instead of raw JSON strings.

4. How do you send tool results back to the Anthropic API?

Show Answer

In the Anthropic API, tool results are sent as a message with role "user" containing a content block of type "tool_result". The block must include the tool_use_id from the original tool call and the result as a string in the content field. This differs from OpenAI, which uses a dedicated "tool" role.

5. What happens when a model wants to call multiple tools in a single response?

Show Answer

The model returns multiple tool calls in a single response (for example, multiple entries in message.tool_calls for OpenAI, or multiple tool_use content blocks for Anthropic). Your application should execute all the tool calls (ideally in parallel for better performance), collect all results, and send them all back in the next request so the model can incorporate all the information into its final response.

Key Takeaways

Three levels of structure: Prompt-based JSON (~85% reliable), JSON mode (~98% reliable), and schema-constrained output (~100% reliable). Always use the strongest enforcement your provider supports.
Pydantic + Instructor simplifies everything: Define schemas as Python classes with type annotations. Instructor handles conversion to provider-specific formats, validation, and retries automatically.
Function calling is a proposal, not execution: The model generates structured arguments for a function call. Your code executes the function and returns the result. This separation keeps the model sandboxed.
Tool definitions differ across providers: OpenAI uses parameters, Anthropic uses input_schema, and Google uses function_declarations. The conceptual pattern is identical, but the JSON structures require provider-specific handling.
Parallel tool calls improve performance: Modern models can request multiple tool calls simultaneously. Execute them concurrently and return all results together.
Tools + structured output = reliable agents: Structured output constrains the format; tool use enables external actions. Together, they form the backbone of agentic architectures.
Beyond Pydantic: For teams that need compile-time type safety across multiple services, BAML offers a dedicated schema language that compiles to type-safe client code. See Section 11.5 for BAML in action with information extraction.