Section 22.2: Multi-Agent Architecture Patterns

★ Big Picture

Multi-agent systems gain their power from the division of labor between specialized agents, but their effectiveness depends entirely on how those agents are organized and coordinated. Architecture patterns define the communication topology, decision authority, and information flow between agents. Choosing the right pattern determines whether your system produces robust, high-quality results or descends into incoherent chaos. This section covers the four foundational patterns (supervisor, debate, pipeline, hierarchical), their implementation details, and the surprising ways that multi-agent deliberation can go wrong.

1. The Supervisor Pattern

In the supervisor pattern, a single "manager" agent receives the user's request, decomposes it into subtasks, delegates each subtask to a specialist agent, and synthesizes the results into a final response. The supervisor maintains full control over task routing and can dynamically decide which agent to invoke based on intermediate results. This is the most common multi-agent pattern in production systems because it provides clear accountability and predictable behavior.

Figure 22.3: The supervisor pattern centralizes decision-making in a single coordinator agent

from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages

class SupervisorState(TypedDict):
    messages: Annotated[list, add_messages]
    next_agent: str
    task_complete: bool

def supervisor_node(state: SupervisorState) -> dict:
    """The supervisor decides which agent to invoke next."""
    response = supervisor_llm.invoke([
        {"role": "system", "content": (
            "You are a supervisor managing a team of agents: "
            "researcher, writer, reviewer. Decide which agent "
            "should act next, or return FINISH if the task is done. "
            "Respond with JSON: {\"next\": \"agent_name_or_FINISH\"}"
        )},
        *state["messages"]
    ])
    decision = parse_json(response.content)
    return {
        "next_agent": decision["next"],
        "task_complete": decision["next"] == "FINISH"
    }

def route_to_agent(state: SupervisorState) -> Literal[
    "researcher", "writer", "reviewer", "end"
]:
    if state["task_complete"]:
        return "end"
    return state["next_agent"]

# Build the supervisor graph
graph = StateGraph(SupervisorState)
graph.add_node("supervisor", supervisor_node)
graph.add_node("researcher", researcher_node)
graph.add_node("writer", writer_node)
graph.add_node("reviewer", reviewer_node)

graph.set_entry_point("supervisor")
graph.add_conditional_edges("supervisor", route_to_agent, {
    "researcher": "researcher",
    "writer": "writer",
    "reviewer": "reviewer",
    "end": END
})
# All workers route back to supervisor
for agent in ["researcher", "writer", "reviewer"]:
    graph.add_edge(agent, "supervisor")

app = graph.compile()

2. The Debate Pattern

In the debate pattern, multiple agents with potentially different perspectives or approaches independently analyze the same problem, then engage in structured argumentation. A judge agent (or the user) evaluates the competing analyses and selects the best solution. This pattern is particularly valuable for tasks where diverse viewpoints improve quality, such as code review, risk assessment, and strategic planning.

import asyncio
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages

class DebateState(TypedDict):
    messages: Annotated[list, add_messages]
    proposals: list[dict]   # Each agent's position
    rebuttals: list[dict]   # Responses to other positions
    verdict: str            # Judge's final decision
    round_number: int

async def parallel_proposals(state: DebateState) -> dict:
    """All debaters generate their initial positions in parallel."""
    debaters = [
        {"name": "Optimist", "stance": "Argue for the opportunity"},
        {"name": "Skeptic", "stance": "Argue for caution and risks"},
        {"name": "Pragmatist", "stance": "Focus on practical trade-offs"},
    ]
    tasks = [generate_proposal(d, state) for d in debaters]
    proposals = await asyncio.gather(*tasks)
    return {"proposals": proposals}

def judge_node(state: DebateState) -> dict:
    """An impartial judge evaluates all proposals and rebuttals."""
    context = format_debate_transcript(state)
    verdict = judge_llm.invoke([
        {"role": "system", "content": (
            "You are an impartial judge. Evaluate the arguments presented, "
            "identify the strongest points from each side, and render a "
            "balanced verdict that synthesizes the best insights."
        )},
        {"role": "user", "content": context}
    ])
    return {"verdict": verdict.content}

⚠ Warning

Multi-agent debate is vulnerable to conformity effects. Research presented at ICLR 2025 demonstrated that LLM agents in multi-agent discussions tend to converge on a single answer even when their initial responses are diverse and correct. This "groupthink" phenomenon means that debate does not always improve accuracy. In some cases, the agents abandon correct individual answers in favor of a socially dominant but incorrect consensus. Monitor for conformity and consider preserving independent assessments alongside group deliberations.

3. The Pipeline Pattern

The pipeline pattern arranges agents in a linear sequence where each agent's output becomes the next agent's input. Unlike the supervisor pattern (which allows dynamic routing), pipelines follow a fixed order. This makes them simpler to reason about, test, and debug. Pipelines excel when the task naturally decomposes into sequential stages with clear handoff points.

Figure 22.4: The pipeline pattern chains agents in a fixed linear sequence

4. Hierarchical Agents

Hierarchical architectures extend the supervisor pattern by nesting supervisors within supervisors. A top-level orchestrator delegates to mid-level managers, who in turn delegate to specialized workers. This mirrors organizational structures and scales well for complex, multi-domain tasks. Each level of the hierarchy can use different models, tools, and prompting strategies appropriate to its scope of responsibility.

from langgraph.graph import StateGraph, END

# Each team is itself a compiled graph (subgraph)
research_team = build_research_subgraph()   # Has its own supervisor
writing_team = build_writing_subgraph()     # Separate supervisor
review_team = build_review_subgraph()       # Separate supervisor

# Top-level orchestrator manages the teams
def orchestrator_node(state: OrchestratorState) -> dict:
    """Top-level orchestrator delegates to team supervisors."""
    response = orchestrator_llm.invoke([
        {"role": "system", "content": (
            "You manage three teams: research, writing, and review. "
            "Decide which team should work next. Each team has its "
            "own manager who handles internal delegation."
        )},
        *state["messages"]
    ])
    return {"next_team": parse_decision(response)}

# Compose subgraphs into the top-level graph
top_graph = StateGraph(OrchestratorState)
top_graph.add_node("orchestrator", orchestrator_node)
top_graph.add_node("research_team", research_team)
top_graph.add_node("writing_team", writing_team)
top_graph.add_node("review_team", review_team)
# ... add conditional edges for routing

5. Shared Memory and Message Passing

Multi-agent systems require mechanisms for agents to share information. The two primary approaches are shared memory (all agents read and write to a common state store) and message passing (agents communicate by sending messages through channels). Each has trade-offs that affect system design.

Aspect	Shared Memory	Message Passing
Coordination	Implicit via state reads/writes	Explicit via messages
Coupling	Higher (shared data structure)	Lower (interface-based)
Debugging	Inspect state at any point	Trace message history
Scalability	Contention on shared state	Naturally distributable
Consistency	Immediate (same state object)	Eventual (async delivery)
Framework Example	LangGraph TypedDict state	AutoGen agent messages

◆ Key Insight

Most production systems use a hybrid approach. LangGraph, for instance, uses shared state as the primary coordination mechanism, but agents effectively communicate through messages stored in that state. The practical choice is not "shared memory versus message passing" but rather "how much of the communication should be structured state versus free-form messages." Structured state fields (like task_complete or next_agent) provide reliable coordination signals, while message lists carry rich conversational context.

6. Conformity Effects in Multi-Agent Systems

A critical finding from recent research is that LLM agents in multi-agent discussions are susceptible to conformity effects analogous to human groupthink. When agents share their outputs with each other, they tend to converge on a single answer regardless of whether that answer is correct. This is especially problematic in debate and consensus-based architectures.

6.1 The ICLR 2025 Groupthink Finding

Research presented at ICLR 2025 studied multi-agent debate systems and found that when LLM agents are asked to discuss and reach consensus, they frequently abandon correct individual answers in favor of incorrect but socially dominant positions. The study showed that agents exhibited classic conformity behavior: they changed their answers to match the majority even when their original reasoning was sound. This effect was more pronounced with certain models and became worse as the number of discussion rounds increased.

# Mitigation: Preserve independent assessments alongside group debate
class ConformityAwareDebate:
    def __init__(self, agents: list, judge_llm):
        self.agents = agents
        self.judge = judge_llm

    async def run(self, question: str) -> dict:
        # Step 1: Get independent answers (no cross-contamination)
        independent = await asyncio.gather(*[
            agent.answer_independently(question)
            for agent in self.agents
        ])

        # Step 2: Run debate rounds (agents see each other's answers)
        debate_transcript = await self.run_debate_rounds(
            question, independent, max_rounds=2
        )

        # Step 3: Judge sees BOTH independent and post-debate answers
        verdict = await self.judge.invoke({
            "independent_answers": independent,
            "debate_transcript": debate_transcript,
            "instruction": (
                "Compare independent answers with post-debate positions. "
                "Flag any cases where agents changed correct answers to "
                "match the group. Weight independent reasoning heavily."
            )
        })
        return {"independent": independent, "verdict": verdict}

ⓘ Note

Practical mitigations for conformity effects include: (1) always preserving and presenting independent assessments to the judge alongside debate outputs, (2) limiting debate rounds to prevent excessive convergence, (3) using diverse model types (different providers, sizes, or temperatures) to maintain genuine diversity of reasoning, and (4) explicitly instructing the judge to watch for conformity patterns where agents abandon well-reasoned positions.

7. Pattern Selection Guide

Pattern	When to Use	Risks
Supervisor	Dynamic task routing, clear accountability needed	Supervisor becomes bottleneck; single point of failure
Debate	Multiple valid approaches; need diverse perspectives	Conformity/groupthink; higher token costs
Pipeline	Sequential stages with clear handoffs	No dynamic rerouting; errors compound downstream
Hierarchical	Complex, multi-domain tasks; large agent teams	Coordination overhead; deep nesting slows iteration

Knowledge Check

1. In the supervisor pattern, what happens when the supervisor itself makes a mistake in routing?

Show Answer

The supervisor is a single point of failure for routing decisions. If it routes a task to the wrong agent, the wrong agent produces an irrelevant or incorrect result, which flows back to the supervisor. Mitigations include: adding validation checks on agent outputs, implementing retry logic that detects misrouted tasks, using structured output parsing to constrain routing decisions, and keeping an iteration limit to prevent infinite loops of misrouting.

2. Why does the debate pattern generate higher token costs than the supervisor pattern?

Show Answer

In the debate pattern, multiple agents independently analyze the same problem (duplicating work), then produce rebuttals to each other's arguments (additional rounds of generation), and finally a judge synthesizes everything. Each debate round multiplies the number of LLM calls. A debate with 3 agents and 2 rounds generates roughly 3 (initial) + 3 (rebuttals) + 1 (judge) = 7 LLM calls, compared to 2 to 4 calls in a typical supervisor flow.

3. What is the key difference between shared memory and message passing for agent coordination?

Show Answer

Shared memory gives all agents direct access to a common state object, making coordination implicit through reads and writes. Message passing requires agents to explicitly send information to each other through channels. Shared memory provides immediate consistency but creates tighter coupling. Message passing enables looser coupling and better scalability but introduces eventual consistency challenges and requires explicit communication protocols.

4. How does the ICLR 2025 groupthink research affect how we should design multi-agent debate systems?

Show Answer

The research shows that agents tend to converge on a single answer during debate, sometimes abandoning correct individual answers. This means we should: always preserve independent assessments and present them to the judge alongside debate outputs, limit the number of debate rounds to prevent excessive convergence, use diverse models to maintain genuine reasoning diversity, and instruct judges to explicitly watch for conformity patterns.

5. When would you choose a hierarchical architecture over a flat supervisor pattern?

Show Answer

Hierarchical architectures are appropriate when: the task spans multiple domains requiring different expertise (each domain gets its own sub-team with a specialist supervisor), the number of worker agents exceeds what a single supervisor can effectively manage (typically more than 5 to 7), different parts of the system need different LLM models or tool sets, or you need to develop and test sub-teams independently before composing them into the larger system.

Key Takeaways

The supervisor pattern is the most common production choice: it centralizes routing decisions, provides clear accountability, and is straightforward to debug and monitor.
The debate pattern improves quality through diverse perspectives but is vulnerable to conformity effects where agents abandon correct answers to match the group consensus.
The pipeline pattern offers simplicity and predictability: fixed sequential stages with clear handoffs are easy to test, but lack the flexibility to reroute dynamically.
Hierarchical architectures scale to complex, multi-domain tasks by nesting supervisors within supervisors, but add coordination overhead at each level.
Conformity effects are a real risk: always preserve independent assessments, limit debate rounds, use diverse models, and instruct judges to watch for groupthink patterns.