Module 21 · Section 21.1

Foundations of AI Agents

The perception-reasoning-action loop, agentic design patterns, ReAct, cognitive architectures, and the building blocks of autonomous LLM systems
★ Big Picture

An AI agent is an LLM operating in a loop. Instead of producing a single response, an agent repeatedly perceives its environment, reasons about what to do, takes an action, and observes the result. This perception-reasoning-action cycle is the fundamental abstraction that transforms language models from passive text generators into active problem solvers. Understanding this loop, and the design patterns built on top of it, is essential for building any agentic system.

1. What Makes an Agent?

The term "agent" has been used loosely across the AI community, often applied to anything from a simple prompt chain to a fully autonomous system. To build effective agentic systems, we need precise definitions. An AI agent is a system that uses a language model to decide which actions to take and in what order, operating in a loop until a task is complete or a stopping condition is met. The critical distinction is autonomy in action selection: the model itself determines the next step rather than following a predetermined sequence.

The Perception-Reasoning-Action Loop

Every agent, regardless of its complexity, follows the same fundamental cycle. The agent perceives its environment by receiving input (user messages, tool outputs, observations from previous actions). It then reasons about what to do next using the language model. Finally, it takes an action, which could be calling a tool, generating a response, or requesting more information. The results of that action become new perceptions, and the cycle repeats.

PERCEIVE Gather observations, tool outputs, user input REASON LLM decides next step (think, plan, reflect) ACT Call tool, respond, or request info OBSERVE Collect action results
Figure 21.1: The perception-reasoning-action loop that defines all AI agents

Agents vs. Chains vs. Workflows

Understanding the spectrum from simple to complex orchestration helps clarify where agents fit. A chain is a fixed sequence of LLM calls with predetermined steps. A workflow uses conditional logic (if/else, loops) but with control flow defined by the developer. An agent gives the LLM itself control over the execution path. The model decides which tools to call, in what order, and when to stop.

Aspect Chain Workflow Agent
Control flow Fixed sequence Developer-defined conditionals LLM-determined
Steps known in advance Yes, always Paths defined, selection dynamic No, emergent
Determinism High Medium Low
Error handling Static retry logic Branching on error type Model reasons about recovery
Complexity Simple Moderate High
Best for Predictable pipelines Structured tasks with variants Open-ended problem solving
Key Insight

Start with the simplest approach that works. Anthropic and other leading AI labs recommend using agents only when simpler patterns fail. Chains are easiest to debug and most predictable. Workflows add flexibility with manageable complexity. Agents provide maximum flexibility but introduce non-determinism, higher latency, and harder debugging. Choose the right level of autonomy for your use case.

2. The Four Agentic Design Patterns

Andrew Ng identified four foundational agentic design patterns that appear across virtually all agent architectures. These patterns can be used individually or composed together, and understanding them provides a vocabulary for designing and analyzing agentic systems.

Pattern 1: Reflection

In the reflection pattern, the LLM reviews its own output and iteratively improves it. This can be as simple as asking the model to critique its response, or as sophisticated as having separate "generator" and "critic" roles. Reflection is powerful because it lets the model catch errors, improve quality, and refine its approach without external feedback.

import openai

client = openai.OpenAI()

def reflect_and_improve(task: str, max_rounds: int = 3) -> str:
    """Generate a response, then iteratively improve it via self-reflection."""

    # Step 1: Generate initial response
    draft = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": task}]
    ).choices[0].message.content

    for round_num in range(max_rounds):
        # Step 2: Critique the current draft
        critique = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a critical reviewer. Find flaws, "
                    "gaps, and areas for improvement. Be specific."},
                {"role": "user", "content": f"Task: {task}\n\nDraft:\n{draft}\n\n"
                    f"Provide specific, actionable critique."}
            ]
        ).choices[0].message.content

        # Step 3: Check if quality is satisfactory
        if "no major issues" in critique.lower():
            break

        # Step 4: Revise based on critique
        draft = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Revise the draft to address all critique points."},
                {"role": "user", "content": f"Original task: {task}\n\n"
                    f"Current draft:\n{draft}\n\n"
                    f"Critique:\n{critique}\n\nRevised version:"}
            ]
        ).choices[0].message.content

    return draft

Pattern 2: Tool Use

Tool use extends the LLM beyond text generation by giving it the ability to call external functions: searching the web, querying databases, executing code, sending emails, or interacting with any API. The model receives tool descriptions, decides when and which tools to call, and incorporates the results into its reasoning. This is covered in depth in Section 21.2.

Pattern 3: Planning

Planning involves the LLM decomposing a complex task into subtasks before executing them. Rather than acting step by step reactively, a planning agent creates an explicit plan, then executes each step while potentially revising the plan based on intermediate results. Plan-and-execute architectures, reflection loops, and tree search methods all fall under this pattern. Section 21.3 covers planning in detail.

Pattern 4: Multi-Agent Collaboration

In the multi-agent pattern, multiple LLM instances (each potentially with different system prompts, tools, or roles) collaborate to solve a problem. One agent might research while another writes; a supervisor agent might coordinate workers; or agents might debate to reach a consensus. Module 22 is dedicated entirely to multi-agent architectures.

REFLECTION Generate Critique Revise Self-improvement loop TOOL USE LLM reasons Calls function Observes result External capabilities PLANNING Decompose task Create plan Execute + revise Structured decomposition MULTI-AGENT Agent A (research) Agent B (write) Agent C (review) ↔ ↕ Collaborate Specialized roles
Figure 21.2: The four agentic design patterns (Ng, 2024)

3. The ReAct Framework

ReAct (Reasoning + Acting) is the most widely adopted agent architecture. Introduced by Yao et al. in 2022, ReAct interleaves reasoning traces ("Thought") with actions ("Action") and observations ("Observation") in a structured loop. The key insight is that explicit reasoning before each action dramatically improves decision quality compared to acting without thinking or thinking without acting.

from typing import Callable

class ReActAgent:
    """Minimal ReAct agent: Thought -> Action -> Observation loop."""

    def __init__(self, client, tools: dict[str, Callable], model: str = "gpt-4o"):
        self.client = client
        self.tools = tools
        self.model = model

    def run(self, task: str, max_steps: int = 10) -> str:
        # Build tool descriptions for the system prompt
        tool_desc = "\n".join(
            f"- {name}: {func.__doc__}" for name, func in self.tools.items()
        )

        system_prompt = f"""You are a ReAct agent. For each step:
1. Thought: Reason about the current state and what to do next
2. Action: Call a tool using the format: ACTION: tool_name(args)
3. Wait for Observation (tool result)

When you have the final answer, respond: FINAL ANSWER: [your answer]

Available tools:
{tool_desc}"""

        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": task}
        ]

        for step in range(max_steps):
            response = self.client.chat.completions.create(
                model=self.model,
                messages=messages
            ).choices[0].message.content

            messages.append({"role": "assistant", "content": response})

            # Check for final answer
            if "FINAL ANSWER:" in response:
                return response.split("FINAL ANSWER:")[1].strip()

            # Parse and execute action
            if "ACTION:" in response:
                action_str = response.split("ACTION:")[1].strip()
                observation = self._execute_action(action_str)
                messages.append({
                    "role": "user",
                    "content": f"Observation: {observation}"
                })

        return "Max steps reached without final answer."

    def _execute_action(self, action_str: str) -> str:
        # Parse "tool_name(args)" format and execute
        try:
            name = action_str.split("(")[0].strip()
            args_str = action_str.split("(", 1)[1].rsplit(")", 1)[0]
            if name in self.tools:
                return str(self.tools[name](args_str))
            return f"Error: Unknown tool '{name}'"
        except Exception as e:
            return f"Error executing action: {e}"
Note

The ReAct implementation above uses text parsing for simplicity. In production, you would use the provider's native function calling API (covered in Section 21.2), which gives structured JSON outputs instead of requiring text parsing. The conceptual loop is the same: think, act, observe.

ReAct Trace Example

A typical ReAct trace shows the interleaved thought-action-observation pattern. Notice how the agent explicitly reasons before each action, and how observations feed back into the next reasoning step.

# Example trace for: "What is the population of the capital of France?"

Thought: I need to find the capital of France, then look up its population.
         The capital of France is Paris, but let me verify and get the
         current population figure.

Action: search("Paris population 2024")

Observation: Paris has a city population of approximately 2.1 million
             and a metropolitan area population of about 12.3 million.

Thought: I now have the information. The capital of France is Paris,
         with a city population of about 2.1 million. I should provide
         both the city and metro figures for completeness.

FINAL ANSWER: The capital of France is Paris, with a city population
of approximately 2.1 million and a metropolitan area population
of about 12.3 million.

4. Cognitive Architectures and State Machines

As agents grow more complex, the simple ReAct loop becomes insufficient. Cognitive architectures provide a richer framework for organizing agent behavior by introducing explicit state management, memory systems, and structured decision-making processes. A cognitive architecture defines how an agent thinks, not just what it thinks about.

Agent State Machines

Many production agents are best modeled as state machines, where the agent transitions between well-defined states based on its observations and decisions. This provides predictability and debuggability while still allowing the LLM to make autonomous decisions within each state.

from enum import Enum
from dataclasses import dataclass, field

class AgentState(Enum):
    PLANNING = "planning"
    EXECUTING = "executing"
    REFLECTING = "reflecting"
    WAITING_FOR_HUMAN = "waiting_for_human"
    COMPLETE = "complete"
    ERROR = "error"

@dataclass
class AgentContext:
    """Tracks the full state of an agent's execution."""
    task: str
    state: AgentState = AgentState.PLANNING
    plan: list[str] = field(default_factory=list)
    completed_steps: list[str] = field(default_factory=list)
    observations: list[dict] = field(default_factory=list)
    current_step_index: int = 0
    error_count: int = 0
    max_errors: int = 3

class StatefulAgent:
    """Agent that operates as a state machine with explicit transitions."""

    def __init__(self, client, tools):
        self.client = client
        self.tools = tools
        self.transitions = {
            AgentState.PLANNING: self._handle_planning,
            AgentState.EXECUTING: self._handle_executing,
            AgentState.REFLECTING: self._handle_reflecting,
            AgentState.ERROR: self._handle_error,
        }

    def run(self, task: str) -> str:
        ctx = AgentContext(task=task)

        while ctx.state not in (AgentState.COMPLETE, AgentState.WAITING_FOR_HUMAN):
            handler = self.transitions.get(ctx.state)
            if handler:
                ctx = handler(ctx)
            else:
                break

        return self._format_result(ctx)

    def _handle_planning(self, ctx: AgentContext) -> AgentContext:
        # LLM creates a step-by-step plan
        plan = self._call_llm(
            f"Break this task into concrete steps:\n{ctx.task}"
        )
        ctx.plan = self._parse_plan(plan)
        ctx.state = AgentState.EXECUTING
        return ctx

    def _handle_executing(self, ctx: AgentContext) -> AgentContext:
        if ctx.current_step_index >= len(ctx.plan):
            ctx.state = AgentState.REFLECTING
            return ctx

        step = ctx.plan[ctx.current_step_index]
        try:
            result = self._execute_step(step, ctx)
            ctx.observations.append({"step": step, "result": result})
            ctx.completed_steps.append(step)
            ctx.current_step_index += 1
        except Exception as e:
            ctx.error_count += 1
            ctx.state = AgentState.ERROR if ctx.error_count >= ctx.max_errors \
                else AgentState.EXECUTING
        return ctx

    def _handle_reflecting(self, ctx: AgentContext) -> AgentContext:
        # LLM reviews results and decides: complete or replan
        assessment = self._call_llm(
            f"Task: {ctx.task}\nCompleted: {ctx.completed_steps}\n"
            f"Results: {ctx.observations}\n\n"
            f"Is the task fully complete? If not, what remains?"
        )
        if "complete" in assessment.lower():
            ctx.state = AgentState.COMPLETE
        else:
            ctx.state = AgentState.PLANNING  # Replan with new context
        return ctx
PLANNING EXECUTING REFLECTING COMPLETE ERROR replan
Figure 21.3: Agent state machine with planning, executing, reflecting, and error states

5. Agent Memory Systems

Effective agents require memory that goes beyond the conversation history within a single context window. Agent memory can be categorized into three types, each serving a different purpose and operating at a different timescale.

Working Memory (Short-Term)

Working memory holds the current conversation context, including the system prompt, user messages, tool calls and their results, and the agent's reasoning traces. This maps directly to the LLM's context window and is the most straightforward form of memory. The challenge is that it is bounded: as the agent takes more actions, the context window fills up.

Episodic Memory (Session-Based)

Episodic memory stores records of past interactions, allowing agents to recall previous conversations, successful strategies, and common user preferences. This is typically implemented via vector stores or structured databases that the agent can query.

Semantic Memory (Long-Term Knowledge)

Semantic memory stores factual knowledge, learned procedures, and domain-specific information. This includes the agent's tool documentation, domain knowledge bases, and procedural memory about how to accomplish recurring tasks. RAG systems (Module 19) are the primary mechanism for semantic memory.

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class AgentMemory:
    """Three-tier memory system for an AI agent."""

    # Working memory: current context window contents
    working: list[dict] = field(default_factory=list)
    max_working_tokens: int = 100_000

    # Episodic memory: past interaction summaries
    episodes: list[dict] = field(default_factory=list)

    # Semantic memory: learned facts and procedures
    knowledge: dict[str, str] = field(default_factory=dict)

    def add_to_working(self, message: dict):
        """Add a message to working memory, evicting old entries if needed."""
        self.working.append(message)
        self._evict_if_needed()

    def save_episode(self, summary: str, outcome: str):
        """Save a completed interaction to episodic memory."""
        self.episodes.append({
            "timestamp": datetime.now().isoformat(),
            "summary": summary,
            "outcome": outcome
        })

    def recall_relevant(self, query: str, top_k: int = 3) -> list[dict]:
        """Retrieve relevant episodes (in production, use vector similarity)."""
        # Simplified: in practice, embed query and search vector store
        return self.episodes[-top_k:]

    def _evict_if_needed(self):
        """Summarize and evict old messages when context is too large."""
        # Estimate token count (rough: 4 chars per token)
        total = sum(len(str(m)) // 4 for m in self.working)
        while total > self.max_working_tokens and len(self.working) > 2:
            # Summarize oldest messages and replace them
            removed = self.working.pop(1)  # Keep system prompt at index 0
            self.save_episode(str(removed)[:200], "evicted")
            total = sum(len(str(m)) // 4 for m in self.working)
Warning

Token budgets are the primary constraint on agent capabilities. Every tool call result, observation, and reasoning trace consumes tokens from the context window. A single web search might return several thousand tokens. An agent that calls ten tools could easily consume 50,000+ tokens before generating its final response. Careful management of what goes into and out of the context window is essential for agents that need to take many steps.

6. Token Budget Management

Token management is one of the most practical challenges in building agents. Unlike a single-turn completion where you control the input size, agents accumulate context over many iterations. Without careful budgeting, agents hit context limits, lose important early context, or incur excessive costs.

Strategies for Managing Token Budgets

Strategy Token Savings Implementation Risk
Summarize tool outputs 50-90% LLM-based or rule-based extraction May lose relevant details
Sliding window Variable Drop oldest N messages Loses early context
Tiered priority eviction 30-60% Score and rank all messages Complex priority logic
Lazy tool loading 20-40% Tool registry with on-demand injection Extra LLM call to select tools
Hard step limits Bounded Counter in agent loop May not complete complex tasks
Key Insight

The best agents are frugal with their context. Every token in the context window should earn its place. Production agents typically combine multiple strategies: summarizing tool outputs immediately, using a sliding window for conversation history, and imposing step limits as a safety net. The goal is to maintain the information density of the context while staying well within token limits.

7. Designing for Failure

Agents fail in ways that are qualitatively different from non-agentic systems. A simple chain either succeeds or produces an error. An agent can get stuck in loops, waste tokens on unproductive actions, misinterpret tool outputs, or take increasingly erratic actions as its context window degrades. Robust agent design requires anticipating and handling these failure modes.

Common Agent Failure Modes

Knowledge Check

1. What is the fundamental difference between an AI agent and a workflow?
Show Answer
In a workflow, the developer defines the control flow (which steps run and in what order, including conditionals and branches). In an agent, the LLM itself determines the control flow, deciding which tools to call, in what order, and when to stop. The key distinction is who controls action selection: the developer (workflow) or the model (agent).
2. Name the four agentic design patterns and briefly describe each.
Show Answer
Reflection: The LLM reviews and iteratively improves its own output. Tool Use: The LLM calls external functions to extend its capabilities. Planning: The LLM decomposes complex tasks into subtasks before execution. Multi-Agent: Multiple LLM instances with different roles collaborate to solve a problem.
3. In the ReAct framework, what are the three components of each iteration, and why is the "Thought" step important?
Show Answer
Each ReAct iteration consists of a Thought (reasoning trace), an Action (tool call or response), and an Observation (action result). The Thought step is important because explicit reasoning before each action improves decision quality, provides a chain-of-thought for debugging, and helps the agent maintain focus on the overall task rather than acting reflexively.
4. What are the three types of agent memory, and how do they differ in timescale?
Show Answer
Working memory (short-term) holds the current context window contents, lasting for a single agent run. Episodic memory (session-based) stores records of past interactions and can persist across sessions. Semantic memory (long-term) stores factual knowledge and learned procedures, persisting indefinitely via vector stores or knowledge bases.
5. Why is token budget management critical for agents, and what is the most effective strategy?
Show Answer
Token budget management is critical because agents accumulate context over many iterations. Every tool call, observation, and reasoning trace adds tokens. Without management, agents hit context limits, lose important early context, or incur excessive costs. The most effective strategy is combining multiple approaches: summarize tool outputs immediately to reduce their size by 50-90%, use a sliding window with summarization for old messages, and impose hard step limits as a safety net.

Key Takeaways