What does Ahmed Sheikh do?

Ahmed Sheikh is a cloud-native agentic AI engineer who helps medium and large businesses move AI agents from stalled pilots to reliable, governed production. He specialises in multi-agent orchestration, agentic RAG, LLMOps, AI evaluation, and cloud-native AI infrastructure — serving clients worldwide.

What does Ahmed Sheikh build, and what does it cost?

Ahmed builds production-ready AI applications as three fixed-price packages, each with an optional monthly retainer for ongoing work after launch: a Full-Stack AI App ($800, or $350/month), an Enterprise Full-Stack Agentic AI build with multi-agent orchestration, agentic RAG, evals and observability ($2,000, or $800/month), and a Custom Product scoped and quoted on a call. Payments are processed worldwide via Paddle.

Why do most enterprise AI agent pilots fail to reach production?

Research shows only 11–14% of enterprise AI agent pilots reach production at scale. The primary reasons are orchestration complexity, inadequate evaluation and output validation, LLMOps debt, poor retrieval quality in RAG pipelines, and lack of governance controls. These are the exact failure modes Ahmed's engagements are designed to address.

Does Ahmed Sheikh work with medium and large businesses globally?

Yes. Ahmed works with medium and large businesses worldwide, remotely. He delivers fixed-price AI application builds with optional monthly retainers — including full-stack AI apps, enterprise agentic AI systems with multi-agent orchestration via LangGraph, agentic RAG implementation, and LLMOps infrastructure.

What technologies does Ahmed Sheikh use for agentic AI systems?

Ahmed builds agentic AI systems using LangGraph for multi-agent orchestration, LangChain for RAG pipelines, Python and FastAPI for backend services, Next.js and TypeScript for interfaces, and cloud-native infrastructure on Vercel, AWS, and Docker/Kubernetes. He also designs evaluation harnesses, output validation layers, and LLMOps monitoring stacks.

What is the reference architecture for a production LangGraph multi-agent system?

The production reference architecture has five node types: (1) router — classifies intent and routes to the appropriate specialist; (2) specialist agents — domain-specific nodes for each task category; (3) tool nodes — deterministic functions for external system calls; (4) evaluator — validates output quality before responding; (5) synthesizer — assembles final response from multi-agent outputs. State is a typed TypedDict shared across all nodes. Conditional edges handle routing. A checkpointer provides persistence.

LangGraph Multi-Agent Orchestration: A Reference Architecture for Mid-Market Teams

TL;DR

LangGraph is the production standard for multi-agent AI orchestration in 2026. This post covers the reference architecture I use for mid-market deployments: the five-node pattern (router → specialists → tools → evaluator → synthesizer), typed state management, conditional routing, human-in-the-loop approval gates, and the checkpointing setup that makes it debuggable and fault-tolerant.

LangGraph is the framework used in production multi-agent systems at Uber, JPMorgan, LinkedIn, and Klarna. It's also the framework most engineers hit a wall with because the concepts — graphs, nodes, edges, state — are unfamiliar outside of computer science backgrounds.

This post maps those concepts to a concrete reference architecture. By the end, you should be able to design the node structure for any multi-agent use case and understand why the architecture choices are made.

Why LangGraph Won

The multi-agent framework landscape in 2024–2025 had three serious contenders: LangGraph, CrewAI, and AutoGen. All three can build multi-agent systems. They differ in where they put the control.

FrameworkControl modelStateBest for

LangGraphExplicit graph — you define every node, edge, and conditionTyped TypedDict — fully inspectableProduction systems requiring fine-grained control and debuggability

CrewAIRole-based — agents have roles, tasks, and sequential/parallel modesImplicit — managed by frameworkRapid prototyping with opinionated agent roles

AutoGenConversation-driven — agents send messages to each otherMessage historyConversational multi-agent tasks, research workflows

LangGraph won for production use cases because of three properties the others lack: explicit control flow (you can read the graph and understand exactly what the agent can do), inspectable state (every node receives a typed dict — no hidden message passing), and checkpointing (execution can pause, persist, and resume across human approval gates or failures). For enterprise systems that need audit trails and human oversight, these properties are essential.

Core Concepts in 4 Minutes

State

LangGraph state is a typed dictionary that flows through every node in the graph. Every node reads from it and writes to it. Nothing is hidden.

from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    # Input
    user_query: str

    # Routing
    intent: str          # "lookup" | "analysis" | "action"
    confidence: float

    # Retrieval
    retrieved_docs: List[dict]

    # Messages — add_messages handles deduplication + ordering
    messages: Annotated[List, add_messages]

    # Output
    final_answer: str
    citations: List[str]

    # Control
    should_escalate: bool
    retry_count: int

Nodes

Nodes are pure functions. They receive the current state and return a partial state update. They can call LLMs, run tools, or make routing decisions.

def router_node(state: AgentState) -> dict:
    """Classifies intent and sets routing fields."""
    response = llm.invoke([
        SystemMessage(content=ROUTER_SYSTEM_PROMPT),
        HumanMessage(content=state["user_query"])
    ])

    parsed = json.loads(response.content)
    return {
        "intent": parsed["intent"],
        "confidence": parsed["confidence"]
    }

Conditional Edges

Edges define control flow. Conditional edges route to different nodes based on state.

def route_after_router(state: AgentState) -> str:
    """Returns the name of the next node based on intent."""
    if state["confidence"] < 0.6:
        return "clarification_node"

    routing = {
        "lookup":   "retrieval_agent",
        "analysis": "analysis_agent",
        "action":   "action_agent",
    }
    return routing.get(state["intent"], "fallback_node")

The Reference Architecture

For mid-market enterprise deployments, the five-node pattern covers the majority of use cases cleanly:

routerEntry point

Classifies intent from user query. Sets routing fields. Routes to the appropriate specialist. Handles low-confidence cases with a clarification request.

in: user_queryout: intent, confidence

specialist_agentsDomain experts

One node per task category. Each has a domain-specific system prompt and can invoke tools. Returns partial state with its findings. Multiple specialists can run in parallel via Send().

in: user_query, retrieved_docsout: specialist_output, messages

tool_nodesDeterministic execution

External system calls — CRM queries, database lookups, API calls. Return structured results. Failures return structured error dicts, not exceptions.

in: tool_call from specialistout: tool_result

evaluatorQuality gate

Validates specialist output. Checks faithfulness to retrieved context. Scores confidence. Routes back to specialist for retry if quality < threshold (max 2 retries).

in: specialist_output, retrieved_docsout: quality_score, should_retry

synthesizerResponse assembly

Assembles final answer from multi-specialist outputs. Adds citations. Formats for the target interface (API JSON, Slack markdown, email prose). Returns final_answer.

in: All specialist outputsout: final_answer, citations

Wiring It Together

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

builder = StateGraph(AgentState)

# Add nodes
builder.add_node("router",       router_node)
builder.add_node("retrieval",    retrieval_agent)
builder.add_node("analysis",     analysis_agent)
builder.add_node("tools",        ToolNode(tools))
builder.add_node("evaluator",    evaluator_node)
builder.add_node("synthesizer",  synthesizer_node)

# Entry
builder.add_edge(START, "router")

# Conditional routing after router
builder.add_conditional_edges("router", route_after_router)

# Specialist → tools (agents call tools via ToolNode pattern)
builder.add_conditional_edges("retrieval", tools_condition)
builder.add_conditional_edges("analysis",  tools_condition)
builder.add_edge("tools", "retrieval")  # Return to caller

# Specialists → evaluator
builder.add_edge("retrieval", "evaluator")
builder.add_edge("analysis",  "evaluator")

# Evaluator: pass or retry
builder.add_conditional_edges("evaluator", route_after_eval)

# Evaluator → synthesizer → END
builder.add_edge("synthesizer", END)

# Checkpointing for persistence + human-in-the-loop
checkpointer = MemorySaver()  # Use PostgresSaver in production
graph = builder.compile(
    checkpointer=checkpointer,
    interrupt_before=["action_agent"]  # Pause before destructive actions
)

Human-in-the-Loop: The Pattern That Makes Enterprise Trust It

Enterprise AI agents take actions — updating CRM records, sending emails, triggering workflows. These actions need human approval before execution. LangGraph's interrupt_before makes this a first-class concern, not a workaround.

# Step 1: Run until the approval gate
config = {"configurable": {"thread_id": "session-123"}}
result = graph.invoke({"user_query": user_input}, config)

# graph pauses at interrupt_before=["action_agent"]
# result contains the pending action for human review

# Step 2: Show pending action to human
pending_action = result["pending_action"]
display_for_approval(pending_action)

# Step 3: Resume with human decision
if human_approved:
    # Resume — graph continues from interrupt point
    final = graph.invoke(None, config)
else:
    # Cancel — update state before resuming
    graph.update_state(config, {"should_cancel": True})
    final = graph.invoke(None, config)

The entire agent state — including every LLM call, every tool result, every intermediate decision — is persisted by the checkpointer across the human review pause. You get a complete audit trail for every action taken by the agent.

Production Deployment Checklist

✓

PostgresSaver or RedisCheckpointer instead of MemorySaver

MemorySaver is in-process only — doesn't survive restarts

✓

interrupt_before on all state-mutating nodes

Any node that calls write APIs, sends messages, or modifies records

✓

LangSmith tracing enabled

LANGCHAIN_TRACING_V2=true in env — full graph traces, zero code changes

✓

Max retry count in state

Prevents evaluator → specialist loops from running indefinitely

✓

Token budget per run

Hard ceiling on total tokens per graph invocation — log and alert on approach

✓

Typed state with validation

Pydantic BaseModel state catches type errors before they corrupt downstream nodes

Frequently Asked

What is LangGraph and why is it used for multi-agent orchestration?

LangGraph is a framework for building stateful, multi-actor AI applications as directed graphs. Each node is a function; edges define control flow; state is a typed dictionary shared across all nodes. It's preferred for production systems because it makes control flow explicit, state inspectable, and execution checkpointable.

What is the difference between LangGraph and CrewAI?

LangGraph gives you explicit control over state, control flow, and checkpointing — it's a low-level orchestration primitive. CrewAI is a higher-level framework with opinionated agent roles and communication patterns. LangGraph is preferred for production systems where you need fine-grained control and debuggable execution traces. CrewAI is faster to prototype.

How do you add human-in-the-loop to a LangGraph agent?

LangGraph supports human-in-the-loop via interrupt_before and interrupt_after parameters on the compiled graph. When the graph reaches a designated node, execution pauses and returns control to your application. You can present the pending action for human approval, then resume execution by calling graph.invoke() with the same thread_id. State is persisted across the interruption.

What checkpointer should I use in production LangGraph?

Use PostgresSaver or RedisCheckpointer in production — MemorySaver is in-process only and does not survive restarts. PostgresSaver persists the full agent state to a Postgres database, enabling pause/resume across server restarts, cross-session history, and complete audit trails for human-in-the-loop workflows.