AHMED
SHEIKH
← Insights
LangGraphMulti-AgentArchitectureLangChain

LangGraph Multi-Agent Orchestration: A Reference Architecture for Mid-Market Teams

June 26, 2026·16 min read·Ahmed Sheikh

TL;DR

LangGraph is the production standard for multi-agent AI orchestration in 2026. This post covers the reference architecture I use for mid-market deployments: the five-node pattern (router → specialists → tools → evaluator → synthesizer), typed state management, conditional routing, human-in-the-loop approval gates, and the checkpointing setup that makes it debuggable and fault-tolerant.

LangGraph is the framework used in production multi-agent systems at Uber, JPMorgan, LinkedIn, and Klarna. It's also the framework most engineers hit a wall with because the concepts — graphs, nodes, edges, state — are unfamiliar outside of computer science backgrounds.

This post maps those concepts to a concrete reference architecture. By the end, you should be able to design the node structure for any multi-agent use case and understand why the architecture choices are made.

Why LangGraph Won

The multi-agent framework landscape in 2024–2025 had three serious contenders: LangGraph, CrewAI, and AutoGen. All three can build multi-agent systems. They differ in where they put the control.

FrameworkControl modelStateBest for
LangGraphExplicit graph — you define every node, edge, and conditionTyped TypedDict — fully inspectableProduction systems requiring fine-grained control and debuggability
CrewAIRole-based — agents have roles, tasks, and sequential/parallel modesImplicit — managed by frameworkRapid prototyping with opinionated agent roles
AutoGenConversation-driven — agents send messages to each otherMessage historyConversational multi-agent tasks, research workflows

LangGraph won for production use cases because of three properties the others lack: explicit control flow (you can read the graph and understand exactly what the agent can do), inspectable state (every node receives a typed dict — no hidden message passing), and checkpointing (execution can pause, persist, and resume across human approval gates or failures). For enterprise systems that need audit trails and human oversight, these properties are essential.

Core Concepts in 4 Minutes

State

LangGraph state is a typed dictionary that flows through every node in the graph. Every node reads from it and writes to it. Nothing is hidden.

from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    # Input
    user_query: str

    # Routing
    intent: str          # "lookup" | "analysis" | "action"
    confidence: float

    # Retrieval
    retrieved_docs: List[dict]

    # Messages — add_messages handles deduplication + ordering
    messages: Annotated[List, add_messages]

    # Output
    final_answer: str
    citations: List[str]

    # Control
    should_escalate: bool
    retry_count: int

Nodes

Nodes are pure functions. They receive the current state and return a partial state update. They can call LLMs, run tools, or make routing decisions.

def router_node(state: AgentState) -> dict:
    """Classifies intent and sets routing fields."""
    response = llm.invoke([
        SystemMessage(content=ROUTER_SYSTEM_PROMPT),
        HumanMessage(content=state["user_query"])
    ])

    parsed = json.loads(response.content)
    return {
        "intent": parsed["intent"],
        "confidence": parsed["confidence"]
    }

Conditional Edges

Edges define control flow. Conditional edges route to different nodes based on state.

def route_after_router(state: AgentState) -> str:
    """Returns the name of the next node based on intent."""
    if state["confidence"] < 0.6:
        return "clarification_node"

    routing = {
        "lookup":   "retrieval_agent",
        "analysis": "analysis_agent",
        "action":   "action_agent",
    }
    return routing.get(state["intent"], "fallback_node")

The Reference Architecture

For mid-market enterprise deployments, the five-node pattern covers the majority of use cases cleanly:

routerEntry point

Classifies intent from user query. Sets routing fields. Routes to the appropriate specialist. Handles low-confidence cases with a clarification request.

in: user_queryout: intent, confidence
specialist_agentsDomain experts

One node per task category. Each has a domain-specific system prompt and can invoke tools. Returns partial state with its findings. Multiple specialists can run in parallel via Send().

in: user_query, retrieved_docsout: specialist_output, messages
tool_nodesDeterministic execution

External system calls — CRM queries, database lookups, API calls. Return structured results. Failures return structured error dicts, not exceptions.

in: tool_call from specialistout: tool_result
evaluatorQuality gate

Validates specialist output. Checks faithfulness to retrieved context. Scores confidence. Routes back to specialist for retry if quality < threshold (max 2 retries).

in: specialist_output, retrieved_docsout: quality_score, should_retry
synthesizerResponse assembly

Assembles final answer from multi-specialist outputs. Adds citations. Formats for the target interface (API JSON, Slack markdown, email prose). Returns final_answer.

in: All specialist outputsout: final_answer, citations

Wiring It Together

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

builder = StateGraph(AgentState)

# Add nodes
builder.add_node("router",       router_node)
builder.add_node("retrieval",    retrieval_agent)
builder.add_node("analysis",     analysis_agent)
builder.add_node("tools",        ToolNode(tools))
builder.add_node("evaluator",    evaluator_node)
builder.add_node("synthesizer",  synthesizer_node)

# Entry
builder.add_edge(START, "router")

# Conditional routing after router
builder.add_conditional_edges("router", route_after_router)

# Specialist → tools (agents call tools via ToolNode pattern)
builder.add_conditional_edges("retrieval", tools_condition)
builder.add_conditional_edges("analysis",  tools_condition)
builder.add_edge("tools", "retrieval")  # Return to caller

# Specialists → evaluator
builder.add_edge("retrieval", "evaluator")
builder.add_edge("analysis",  "evaluator")

# Evaluator: pass or retry
builder.add_conditional_edges("evaluator", route_after_eval)

# Evaluator → synthesizer → END
builder.add_edge("synthesizer", END)

# Checkpointing for persistence + human-in-the-loop
checkpointer = MemorySaver()  # Use PostgresSaver in production
graph = builder.compile(
    checkpointer=checkpointer,
    interrupt_before=["action_agent"]  # Pause before destructive actions
)

Human-in-the-Loop: The Pattern That Makes Enterprise Trust It

Enterprise AI agents take actions — updating CRM records, sending emails, triggering workflows. These actions need human approval before execution. LangGraph's interrupt_before makes this a first-class concern, not a workaround.

# Step 1: Run until the approval gate
config = {"configurable": {"thread_id": "session-123"}}
result = graph.invoke({"user_query": user_input}, config)

# graph pauses at interrupt_before=["action_agent"]
# result contains the pending action for human review

# Step 2: Show pending action to human
pending_action = result["pending_action"]
display_for_approval(pending_action)

# Step 3: Resume with human decision
if human_approved:
    # Resume — graph continues from interrupt point
    final = graph.invoke(None, config)
else:
    # Cancel — update state before resuming
    graph.update_state(config, {"should_cancel": True})
    final = graph.invoke(None, config)

The entire agent state — including every LLM call, every tool result, every intermediate decision — is persisted by the checkpointer across the human review pause. You get a complete audit trail for every action taken by the agent.

Production Deployment Checklist

PostgresSaver or RedisCheckpointer instead of MemorySaver

MemorySaver is in-process only — doesn't survive restarts

interrupt_before on all state-mutating nodes

Any node that calls write APIs, sends messages, or modifies records

LangSmith tracing enabled

LANGCHAIN_TRACING_V2=true in env — full graph traces, zero code changes

Max retry count in state

Prevents evaluator → specialist loops from running indefinitely

Token budget per run

Hard ceiling on total tokens per graph invocation — log and alert on approach

Typed state with validation

Pydantic BaseModel state catches type errors before they corrupt downstream nodes

Frequently Asked

What is LangGraph and why is it used for multi-agent orchestration?

LangGraph is a framework for building stateful, multi-actor AI applications as directed graphs. Each node is a function; edges define control flow; state is a typed dictionary shared across all nodes. It's preferred for production systems because it makes control flow explicit, state inspectable, and execution checkpointable.

What is the difference between LangGraph and CrewAI?

LangGraph gives you explicit control over state, control flow, and checkpointing — it's a low-level orchestration primitive. CrewAI is a higher-level framework with opinionated agent roles and communication patterns. LangGraph is preferred for production systems where you need fine-grained control and debuggable execution traces. CrewAI is faster to prototype.

How do you add human-in-the-loop to a LangGraph agent?

LangGraph supports human-in-the-loop via interrupt_before and interrupt_after parameters on the compiled graph. When the graph reaches a designated node, execution pauses and returns control to your application. You can present the pending action for human approval, then resume execution by calling graph.invoke() with the same thread_id. State is persisted across the interruption.

What checkpointer should I use in production LangGraph?

Use PostgresSaver or RedisCheckpointer in production — MemorySaver is in-process only and does not survive restarts. PostgresSaver persists the full agent state to a Postgres database, enabling pause/resume across server restarts, cross-session history, and complete audit trails for human-in-the-loop workflows.

Written by

Ahmed Sheikh

Cloud-Native Agentic AI Engineer · worldwide

Book a Call →