Combining the Power of LangGraph with Pydantic AI Agents

We wired three Pydantic AI agents into a LangGraph workflow, hit run, and watched a research pipeline extract entities, score sentiment, and generate a validated summary — all in a single async pass with zero malformed outputs. The secret was letting each framework do exactly what it does best: LangGraph handles the when and where of execution, while Pydantic AI enforces the what at every data boundary. The result is agent systems that are simultaneously flexible and bulletproof.

If you have built multi-agent pipelines before, you know the usual tradeoff. You can have autonomous agents that adapt to messy real-world inputs, or you can have rigid pipelines that guarantee valid output formats. Pick one. We spent months stuck in that false dichotomy, chaining LLM calls together with duct-tape parsing and silent prayer. One malformed JSON response three steps deep would blow up the entire run, and the stack trace told us nothing useful.

LangGraph and Pydantic AI dissolve that tradeoff entirely. LangGraph models your workflow as a directed graph where nodes are processing steps and edges control execution flow, including conditional branches, parallel paths, and cycles. Pydantic AI wraps every agent call in type-safe validation, so you know at compile time what shape data will have when it arrives at the next node. Together, they give you the orchestration flexibility of a full workflow engine with the data guarantees of a statically typed system.

The Two Halves of the Problem#

What LangGraph and Pydantic AI Actually Do#

LangGraph is a framework for building stateful, multi-agent applications with LLMs. It models AI workflows as directed graphs where nodes represent distinct processing steps and edges define execution flow. Unlike simple sequential chains, LangGraph treats both nodes and transitions as first-class citizens. You can build workflows with conditional branching, parallel execution, and cycles, all while maintaining persistent state across the entire execution.

Pydantic AI brings type safety and data validation to LLM outputs. Built on the foundation of Pydantic (the same library powering FastAPI), it ensures that every piece of data flowing through your system conforms to well-defined schemas. When an LLM generates a response, Pydantic AI validates it against your models and catches errors before they propagate downstream.

The combination works because these tools occupy different layers of the stack. LangGraph decides which agent runs next, manages shared state between steps, and routes execution down different paths based on conditions. Pydantic AI ensures that at every handoff point, the data conforms to expectations. No more wondering if the LLM included all required fields. No more hoping the format is correct.

Where We Got It Wrong First#

Our first attempt at combining these frameworks was a mess. We defined Pydantic models for every possible output, wired them into a LangGraph state machine, and tried to force every piece of intermediate data through strict validation. The result? A system that rejected half its own outputs. An agent would return a perfectly reasonable summary, but because it included an extra field or formatted a date slightly differently, Pydantic would throw a ValidationError and the workflow would halt.

The fix was counterintuitive: we stopped validating everything and started validating only at the boundaries between agents. Internal scratch-pad data could be loose. But every time data crossed from one agent to another, it had to pass through a Pydantic model. That single architectural decision cut our error rate by an order of magnitude while keeping the pipeline flexible enough to handle unexpected LLM behavior.

KEY INSIGHT: Validate at agent boundaries, not inside them. Strict typing between nodes plus loose handling within nodes gives you reliability without brittleness.

Seeing the Integration in Action#

Here is what it looks like when these two frameworks work together. A research query goes in, validated data comes out, and LangGraph manages the flow between steps:

1
from langgraph.graph import StateGraph, State
2
from pydantic import BaseModel, Field
3
from pydantic_ai import Agent
4
from typing import List, Dict, Optional
5

6
# Define our data models with Pydantic
7
class ResearchQuery(BaseModel):
8
    """Structured representation of a research query."""
9
    topic: str = Field(..., description="Main research topic")
10
    subtopics: List[str] = Field(default_factory=list, description="Specific areas to explore")
11
    max_sources: int = Field(default=5, ge=1, le=20, description="Maximum number of sources to find")
12

13
class ResearchResult(BaseModel):
14
    """Validated research output."""
15
    summary: str = Field(..., description="Executive summary of findings")
16
    key_insights: List[str] = Field(..., min_items=1, description="Main discoveries")
17
    sources: List[Dict[str, str]] = Field(..., description="Citations with title and URL")
18
    confidence_score: float = Field(..., ge=0, le=1, description="Confidence in the findings")
19

20
# Create Pydantic AI agents for each step
21
research_agent = Agent(
22
    'openai:gpt-4o',
23
    result_type=ResearchResult,
24
    system_prompt="You are a research specialist. Analyze information and extract key insights."
25
)
26

27
# Define Langgraph state
28
class ResearchState(TypedDict):
29
    query: ResearchQuery
30
    raw_data: Optional[str]
31
    research_result: Optional[ResearchResult]
32
    status: str
33

34
# Build the workflow graph
35
workflow = StateGraph(ResearchState)
36

37
# Node functions that leverage Pydantic AI agents
38
async def gather_data(state: ResearchState) -> Dict:
39
    """Gather raw data based on the research query."""
40
    # In a real implementation, this might call APIs or search databases
41
    # For now, we'll simulate with a simple prompt
42
    query = state["query"]
43
    data = f"Simulated data about {query.topic} covering {', '.join(query.subtopics)}"
44
    return {"raw_data": data, "status": "data_gathered"}
45

46
async def analyze_data(state: ResearchState) -> Dict:
47
    """Use Pydantic AI agent to analyze and structure the data."""
48
    result = await research_agent.run(
49
        f"Analyze this data and extract insights: {state['raw_data']}"
50
    )
51
    # The result is guaranteed to be a valid ResearchResult
52
    return {"research_result": result.data, "status": "analysis_complete"}
53

54
# Wire up the graph
55
workflow.add_node("gather", gather_data)
56
workflow.add_node("analyze", analyze_data)
57
workflow.add_edge("gather", "analyze")
58
workflow.set_entry_point("gather")
59

60
# Compile and use
61
app = workflow.compile()

Notice how naturally these pieces fit. LangGraph handles the flow — deciding what happens when, managing state between steps, routing to different paths based on conditions. Pydantic AI ensures that at every step, the data matches our schemas. The ResearchResult model guarantees that key_insights has at least one entry and confidence_score stays between 0 and 1, without a single manual check in our code.

How LangGraph Compares to Other Orchestration Approaches#

To appreciate what LangGraph brings to the table, compare it with traditional approaches:

Feature	LangGraph	Traditional Chains	Workflow Engines
State Management	Built-in, persistent across nodes	Limited, often passed manually	Varies, often external
Conditional Logic	Native support via edges	Requires workarounds	Usually supported
Parallel Execution	First-class with proper state handling	Difficult to implement	Depends on engine
Error Recovery	Can route to error handlers	Typically fails entire chain	Varies by implementation
Debugging	Visualizable graph structure	Linear, harder to trace	Often good tooling
LLM Integration	Designed for it	Retrofitted	Usually generic

LangGraph was built specifically for LLM applications, not adapted from generic workflow engines. That shows in features like checkpointing (saving state for long-running workflows) and native support for the conditional logic common in agent systems.

Figure 1: LangGraph vs. Sequential Chains — LangGraph’s graph-based approach supports conditional branching and multiple paths to the same endpoint, while sequential chains force a rigid, linear flow.

Architecture: Four Layers Working Together#

How the Layers Stack Up#

Understanding the integration requires looking at the system as four distinct layers, each with a clear responsibility.

Figure 2: Integrated Architecture — Data flows from the client through LangGraph’s orchestration layer, where Pydantic AI validates all inputs and outputs before they interact with LLMs, tools, or storage systems.

Here is what each layer does:

Orchestration Layer (LangGraph)

The top layer maintains application state and manages transitions between processing nodes. It decides which node to execute next based on current state, manages parallel execution when multiple paths are viable, handles checkpoints for long-running workflows, and provides retry and error recovery mechanisms.
Validation Layer (Pydantic AI)

Every piece of data flowing between nodes passes through this layer. Pydantic AI confirms that LLM outputs conform to expected schemas, tool inputs are properly formatted before execution, state transitions only happen with valid data, and type mismatches get caught immediately instead of three steps later.
Execution Layer

Where the actual work happens. This layer coordinates LLM invocations with proper prompt formatting, tool execution with validated parameters, external API calls with error handling, and result aggregation from parallel operations.
Integration Layer

The bottom layer handles all external interactions: database connections for state persistence, API integrations for external services, message queues for async processing, and monitoring infrastructure.

State Transitions and Data Flow#

One of the most powerful aspects of this integration is how state transitions work. Here is a practical document analysis pipeline that extracts entities, scores sentiment, and generates summaries, with Pydantic validating every state transition:

1
from langgraph.graph import StateGraph, END
2
from pydantic import BaseModel, Field, validator
3
from pydantic_ai import Agent
4
from typing import List, Optional, Literal
5
from enum import Enum
6

7
# Define workflow states with Pydantic models
8
class DocumentStatus(str, Enum):
9
    PENDING = "pending"
10
    EXTRACTING = "extracting"
11
    ANALYZING = "analyzing"
12
    REVIEWING = "reviewing"
13
    COMPLETE = "complete"
14
    ERROR = "error"
15

16
class DocumentAnalysisState(BaseModel):
17
    """State that flows through our document analysis workflow."""
18
    document_id: str
19
    content: str
20
    status: DocumentStatus = DocumentStatus.PENDING
21
    extracted_entities: Optional[List[str]] = None
22
    sentiment_score: Optional[float] = None
23
    summary: Optional[str] = None
24
    review_notes: Optional[str] = None
25
    error_message: Optional[str] = None
26

27
    @validator('sentiment_score')
28
    def validate_sentiment(cls, v):
29
        if v is not None and not -1 <= v <= 1:
30
            raise ValueError("Sentiment score must be between -1 and 1")
31
        return v
32

33
# Create specialized agents for each task
34
entity_extractor = Agent(
35
    'openai:gpt-4o',
36
    system_prompt="Extract all named entities (people, organizations, locations) from the text."
37
)
38

39
sentiment_analyzer = Agent(
40
    'openai:gpt-4o',
41
    system_prompt="Analyze the sentiment of the text. Return a score between -1 (negative) and 1 (positive)."
42
)
43

44
summarizer = Agent(
45
    'openai:gpt-4o',
46
    system_prompt="Create a concise summary of the document's main points."
47
)
48

49
# Define node functions
50
async def extract_entities(state: DocumentAnalysisState) -> Dict:
51
    """Extract entities from the document."""
52
    try:
53
        result = await entity_extractor.run(state.content)
54
        # Parse the result and extract entity list
55
        entities = parse_entities(result.data)  # Custom parsing function
56
        return {
57
            "extracted_entities": entities,
58
            "status": DocumentStatus.ANALYZING
59
        }
60
    except Exception as e:
61
        return {
62
            "status": DocumentStatus.ERROR,
63
            "error_message": f"Entity extraction failed: {str(e)}"
64
        }
65

66
async def analyze_sentiment(state: DocumentAnalysisState) -> Dict:
67
    """Analyze document sentiment."""
68
    try:
69
        result = await sentiment_analyzer.run(state.content)
70
        score = float(result.data)  # Pydantic AI ensures this is valid
71
        return {
72
            "sentiment_score": score,
73
            "status": DocumentStatus.REVIEWING
74
        }
75
    except Exception as e:
76
        return {
77
            "status": DocumentStatus.ERROR,
78
            "error_message": f"Sentiment analysis failed: {str(e)}"
79
        }
80

81
async def create_summary(state: DocumentAnalysisState) -> Dict:
82
    """Generate document summary."""
83
    # Include extracted entities in the summary for context
84
    context = f"Document contains entities: {', '.join(state.extracted_entities or [])}"
85
    prompt = f"{context}\n\nDocument: {state.content}"
86

87
    result = await summarizer.run(prompt)
88
    return {
89
        "summary": result.data,
90
        "status": DocumentStatus.COMPLETE
91
    }
92

93
# Build the workflow
94
def create_document_workflow():
95
    workflow = StateGraph(DocumentAnalysisState)
96

97
    # Add nodes
98
    workflow.add_node("extract", extract_entities)
99
    workflow.add_node("sentiment", analyze_sentiment)
100
    workflow.add_node("summarize", create_summary)
101

102
    # Define the flow
103
    workflow.set_entry_point("extract")
104

105
    # Conditional routing based on status
106
    def route_after_extraction(state: DocumentAnalysisState) -> str:
107
        if state.status == DocumentStatus.ERROR:
108
            return END
109
        return "sentiment"
110

111
    workflow.add_conditional_edges("extract", route_after_extraction)
112
    workflow.add_edge("sentiment", "summarize")
113
    workflow.add_edge("summarize", END)
114

115
    return workflow.compile()
116

117
# Use the workflow
118
app = create_document_workflow()
119

120
# Process a document
121
initial_state = DocumentAnalysisState(
122
    document_id="doc123",
123
    content="Apple Inc. announced record profits in Q4 2024..."
124
)
125

126
result = await app.ainvoke(initial_state)
127
print(f"Summary: {result.summary}")
128
print(f"Sentiment: {result.sentiment_score}")
129
print(f"Entities: {result.extracted_entities}")

The critical detail here is how failures get handled gracefully. If entity extraction fails, the workflow routes to an error state instead of crashing the entire pipeline. The state is preserved, so you can resume from where things went wrong. No partial results lost, no silent corruption.

Transition Mechanisms: Conditional Routing and Map-Reduce#

LangGraph provides several powerful patterns for controlling flow through your workflow:

Figure 3: Transition Mechanisms in LangGraph — Conditional edges handle dynamic routing while map-reduce handles parallel processing. These two patterns cover the vast majority of real-world workflow needs.

Here is a map-reduce pattern for parallel document processing. We fan out across multiple documents, process each one through the full analysis pipeline, then aggregate the results:

1
from langgraph.graph import StateGraph
2
from typing import List, Dict
3
import asyncio
4

5
class BatchProcessingState(BaseModel):
6
    """State for processing multiple documents in parallel."""
7
    documents: List[Dict[str, str]]
8
    processed_results: List[Dict] = Field(default_factory=list)
9
    aggregated_summary: Optional[str] = None
10

11
async def map_documents(state: BatchProcessingState) -> Dict:
12
    """Map phase - process each document in parallel."""
13
    async def process_single_doc(doc):
14
        # Create individual workflow for each document
15
        doc_state = DocumentAnalysisState(
16
            document_id=doc["id"],
17
            content=doc["content"]
18
        )
19

20
        # Process through our document workflow
21
        result = await app.ainvoke(doc_state)
22
        return result.dict()
23

24
    # Process all documents in parallel
25
    tasks = [process_single_doc(doc) for doc in state.documents]
26
    results = await asyncio.gather(*tasks)
27

28
    return {"processed_results": results}
29

30
async def reduce_results(state: BatchProcessingState) -> Dict:
31
    """Reduce phase - aggregate results into final summary."""
32
    # Collect all summaries
33
    summaries = [r["summary"] for r in state.processed_results if r.get("summary")]
34

35
    # Use an agent to create an aggregated summary
36
    aggregator = Agent(
37
        'openai:gpt-4o',
38
        system_prompt="Create a comprehensive summary combining multiple document summaries."
39
    )
40

41
    combined_text = "\n\n".join(f"Document {i+1}: {s}" for i, s in enumerate(summaries))
42
    result = await aggregator.run(combined_text)
43

44
    return {"aggregated_summary": result.data}
45

46
# Create batch processing workflow
47
batch_workflow = StateGraph(BatchProcessingState)
48
batch_workflow.add_node("map", map_documents)
49
batch_workflow.add_node("reduce", reduce_results)
50
batch_workflow.add_edge("map", "reduce")
51
batch_workflow.set_entry_point("map")
52

53
batch_app = batch_workflow.compile()

Instead of processing documents one at a time, you fan out across all of them simultaneously while still maintaining type safety and validation at every step. For a batch of 20 documents, this can cut total processing time from minutes to seconds.

KEY INSIGHT: Map-reduce is the natural scaling pattern for LangGraph workflows. Fan out for parallel agent calls, fan in for aggregation, and let Pydantic validate at both boundaries.

Building Production-Ready Systems#

Error Handling and Recovery#

LLMs timeout. APIs go down. Sometimes the output just does not make sense. Our first production deployment taught us this the hard way: a workflow that ran perfectly for 200 test documents started failing on document 47 in a production batch because a network hiccup caused a timeout that cascaded into a full pipeline crash. We lost 46 documents worth of already-completed work.

The fix required three layers of resilience: retries with exponential backoff, fallback to cheaper models when the primary fails, and partial result preservation so a failure at step 3 does not destroy the work from steps 1 and 2.

1
from langgraph.graph import StateGraph
2
from pydantic import BaseModel, Field
3
from typing import Optional, List
4
import asyncio
5
from datetime import datetime
6

7
class ResilientState(BaseModel):
8
    """State with built-in error tracking and recovery."""
9
    task_id: str
10
    input_data: str
11
    current_step: str = "start"
12
    retry_count: int = 0
13
    max_retries: int = 3
14
    errors: List[Dict] = Field(default_factory=list)
15
    partial_results: Dict = Field(default_factory=dict)
16
    final_result: Optional[str] = None
17

18
    def record_error(self, step: str, error: Exception):
19
        """Record an error for debugging and recovery."""
20
        self.errors.append({
21
            "step": step,
22
            "error": str(error),
23
            "timestamp": datetime.now().isoformat(),
24
            "retry_count": self.retry_count
25
        })
26

27
class RetryableAgent:
28
    """Wrapper for Pydantic AI agents with retry logic."""
29

30
    def __init__(self, agent: Agent, max_retries: int = 3, backoff_factor: float = 2.0):
31
        self.agent = agent
32
        self.max_retries = max_retries
33
        self.backoff_factor = backoff_factor
34

35
    async def run_with_retry(self, prompt: str, state: ResilientState) -> Optional[str]:
36
        """Execute agent with exponential backoff retry."""
37
        for attempt in range(self.max_retries):
38
            try:
39
                result = await self.agent.run(prompt)
40
                return result.data
41
            except Exception as e:
42
                state.record_error(state.current_step, e)
43

44
                if attempt < self.max_retries - 1:
45
                    # Exponential backoff
46
                    wait_time = self.backoff_factor ** attempt
47
                    await asyncio.sleep(wait_time)
48
                else:
49
                    # Final attempt failed
50
                    return None
51

52
        return None
53

54
# Create resilient workflow
55
def create_resilient_workflow():
56
    workflow = StateGraph(ResilientState)
57

58
    # Initialize retryable agents
59
    analyzer = RetryableAgent(
60
        Agent('openai:gpt-4o', system_prompt="Analyze the provided text.")
61
    )
62

63
    enhancer = RetryableAgent(
64
        Agent('openai:gpt-4o', system_prompt="Enhance and improve the analysis.")
65
    )
66

67
    async def analyze_with_fallback(state: ResilientState) -> Dict:
68
        """Analyze with fallback options."""
69
        state.current_step = "analysis"
70

71
        # Try primary analysis
72
        result = await analyzer.run_with_retry(state.input_data, state)
73

74
        if result:
75
            return {
76
                "partial_results": {"analysis": result},
77
                "current_step": "enhancement"
78
            }
79

80
        # Fallback to simpler analysis
81
        fallback_agent = Agent(
82
            'openai:gpt-3.5-turbo',
83
            system_prompt="Provide a basic analysis of the text."
84
        )
85

86
        fallback_result = await fallback_agent.run(state.input_data)
87
        return {
88
            "partial_results": {"analysis": fallback_result.data, "used_fallback": True},
89
            "current_step": "enhancement"
90
        }
91

92
    async def enhance_with_recovery(state: ResilientState) -> Dict:
93
        """Enhance analysis with error recovery."""
94
        state.current_step = "enhancement"
95

96
        if not state.partial_results.get("analysis"):
97
            return {
98
                "final_result": "Analysis failed - no results to enhance",
99
                "current_step": "complete"
100
            }
101

102
        analysis = state.partial_results["analysis"]
103
        result = await enhancer.run_with_retry(
104
            f"Enhance this analysis: {analysis}",
105
            state
106
        )
107

108
        if result:
109
            return {
110
                "final_result": result,
111
                "current_step": "complete"
112
            }
113

114
        # If enhancement fails, return original analysis
115
        return {
116
            "final_result": analysis,
117
            "current_step": "complete"
118
        }
119

120
    # Build workflow with error paths
121
    workflow.add_node("analyze", analyze_with_fallback)
122
    workflow.add_node("enhance", enhance_with_recovery)
123

124
    workflow.set_entry_point("analyze")
125
    workflow.add_edge("analyze", "enhance")
126
    workflow.add_edge("enhance", END)
127

128
    return workflow.compile()

This gives you multiple layers of resilience stacked together: automatic retries with exponential backoff, fallback to cheaper models if the primary fails, partial result preservation so you never lose completed work, and detailed error tracking for post-mortem debugging.

State Management That Scales#

Managing state effectively is the difference between a demo and a production system. The key challenge: as workflows grow, serializing and deserializing the entire state at every checkpoint becomes a bottleneck. We learned to keep the in-flight state minimal and shunt large data to external storage.

1
from langgraph.graph import StateGraph
2
from langgraph.checkpoint import MemorySaver
3
from pydantic import BaseModel, Field
4
from typing import Dict, List, Optional
5
import json
6

7
class CheckpointableState(BaseModel):
8
    """State designed for efficient checkpointing."""
9

10
    # Minimal core state
11
    workflow_id: str
12
    current_phase: str
13

14
    # Separate large data that can be stored externally
15
    data_references: Dict[str, str] = Field(
16
        default_factory=dict,
17
        description="References to external data storage"
18
    )
19

20
    # Computed properties that don't need persistence
21
    _cache: Dict = {}  # Not persisted
22

23
    class Config:
24
        # Only persist defined fields
25
        fields = {"workflow_id", "current_phase", "data_references"}
26

27
    def store_large_data(self, key: str, data: any) -> str:
28
        """Store large data externally and keep reference."""
29
        # In production, this would use S3, Redis, etc.
30
        reference = f"s3://bucket/{self.workflow_id}/{key}"
31
        # Simulate storage
32
        self._cache[reference] = data
33
        self.data_references[key] = reference
34
        return reference
35

36
    def retrieve_data(self, key: str) -> any:
37
        """Retrieve data using reference."""
38
        reference = self.data_references.get(key)
39
        if reference:
40
            # In production, fetch from external storage
41
            return self._cache.get(reference)
42
        return None
43

44
# Create workflow with checkpointing
45
def create_checkpointed_workflow():
46
    # Initialize with memory saver for checkpointing
47
    checkpointer = MemorySaver()
48

49
    workflow = StateGraph(CheckpointableState)
50

51
    async def process_phase_1(state: CheckpointableState) -> Dict:
52
        """First phase - might take a long time."""
53
        # Simulate processing
54
        result = {"processed": "large amount of data"}
55

56
        # Store large result externally
57
        state.store_large_data("phase1_result", result)
58

59
        return {
60
            "current_phase": "phase2",
61
            "data_references": state.data_references
62
        }
63

64
    async def process_phase_2(state: CheckpointableState) -> Dict:
65
        """Second phase - uses results from phase 1."""
66
        # Retrieve previous results
67
        phase1_data = state.retrieve_data("phase1_result")
68

69
        if not phase1_data:
70
            raise ValueError("Phase 1 data not found")
71

72
        # Continue processing
73
        final_result = f"Completed processing of {phase1_data}"
74

75
        return {
76
            "current_phase": "complete"
77
        }
78

79
    workflow.add_node("phase1", process_phase_1)
80
    workflow.add_node("phase2", process_phase_2)
81

82
    workflow.set_entry_point("phase1")
83
    workflow.add_edge("phase1", "phase2")
84
    workflow.add_edge("phase2", END)
85

86
    # Compile with checkpointing
87
    return workflow.compile(checkpointer=checkpointer)
88

89
# Use with checkpointing
90
app = create_checkpointed_workflow()
91

92
# Start processing
93
initial_state = CheckpointableState(
94
    workflow_id="job-123",
95
    current_phase="phase1"
96
)
97

98
# This can be interrupted and resumed
99
config = {"configurable": {"thread_id": "job-123"}}
100
result = await app.ainvoke(initial_state, config=config)
101

102
# Later, resume from checkpoint if needed
103
# resumed_result = await app.ainvoke(None, config=config)

Multi-Agent Orchestration#

Real-world applications often require multiple specialized agents collaborating on a single task. Here is a content creation pipeline where a research agent feeds a writing agent, a fact-checker verifies the output, and the workflow can loop back to research if accuracy falls below threshold:

Figure 4: Multi-Agent Orchestration Workflow — Three specialized agents (research, writing, fact-checking) collaborate in a LangGraph workflow. Pydantic AI validates data at every handoff point, and conditional edges route the flow back to research if fact-checking scores drop below 0.7.

1
from langgraph.graph import StateGraph
2
from pydantic import BaseModel, Field
3
from pydantic_ai import Agent
4
from typing import List, Dict, Optional
5

6
# Define specialized schemas for each agent
7
class ResearchFindings(BaseModel):
8
    """Output from research agent."""
9
    topic: str
10
    key_facts: List[str] = Field(..., min_items=3, max_items=10)
11
    sources: List[str] = Field(..., min_items=1)
12
    confidence: float = Field(..., ge=0, le=1)
13

14
class WrittenContent(BaseModel):
15
    """Output from writing agent."""
16
    title: str
17
    introduction: str
18
    body_paragraphs: List[str] = Field(..., min_items=2)
19
    conclusion: str
20
    word_count: int
21

22
class FactCheckReport(BaseModel):
23
    """Output from fact-checking agent."""
24
    verified_facts: List[Dict[str, bool]]
25
    accuracy_score: float = Field(..., ge=0, le=1)
26
    corrections_needed: List[str] = Field(default_factory=list)
27
    recommendation: str
28

29
# Create specialized agents
30
research_agent = Agent(
31
    'openai:gpt-4o',
32
    result_type=ResearchFindings,
33
    system_prompt="""You are a meticulous research specialist.
34
    Find accurate, relevant information and always cite sources."""
35
)
36

37
writing_agent = Agent(
38
    'openai:gpt-4o',
39
    result_type=WrittenContent,
40
    system_prompt="""You are a skilled content writer.
41
    Create engaging, well-structured content based on research findings."""
42
)
43

44
fact_check_agent = Agent(
45
    'openai:gpt-4o',
46
    result_type=FactCheckReport,
47
    system_prompt="""You are a fact-checking expert.
48
    Verify claims, check sources, and ensure accuracy."""
49
)
50

51
# Define the multi-agent workflow state
52
class ContentCreationState(BaseModel):
53
    """State for multi-agent content creation workflow."""
54
    user_request: str
55
    research_findings: Optional[ResearchFindings] = None
56
    written_content: Optional[WrittenContent] = None
57
    fact_check_report: Optional[FactCheckReport] = None
58
    final_content: Optional[str] = None
59
    workflow_status: str = "started"
60

61
# Implement workflow nodes
62
async def conduct_research(state: ContentCreationState) -> Dict:
63
    """Research phase using specialized agent."""
64
    result = await research_agent.run(
65
        f"Research this topic thoroughly: {state.user_request}"
66
    )
67

68
    return {
69
        "research_findings": result.data,
70
        "workflow_status": "research_complete"
71
    }
72

73
async def write_content(state: ContentCreationState) -> Dict:
74
    """Writing phase using research findings."""
75
    research = state.research_findings
76

77
    # Prepare context for writing agent
78
    context = f"""
79
    Topic: {research.topic}
80
    Key Facts:
81
    {chr(10).join(f'- {fact}' for fact in research.key_facts)}
82

83
    Sources: {', '.join(research.sources)}
84

85
    Write comprehensive content about this topic.
86
    """
87

88
    result = await writing_agent.run(context)
89

90
    return {
91
        "written_content": result.data,
92
        "workflow_status": "writing_complete"
93
    }
94

95
async def fact_check_content(state: ContentCreationState) -> Dict:
96
    """Fact-checking phase."""
97
    content = state.written_content
98
    research = state.research_findings
99

100
    # Prepare fact-checking context
101
    check_context = f"""
102
    Original Research Facts:
103
    {chr(10).join(f'- {fact}' for fact in research.key_facts)}
104

105
    Written Content to Verify:
106
    Title: {content.title}
107
    Introduction: {content.introduction}
108
    Body: {chr(10).join(content.body_paragraphs)}
109

110
    Verify all claims against the research facts.
111
    """
112

113
    result = await fact_check_agent.run(check_context)
114

115
    return {
116
        "fact_check_report": result.data,
117
        "workflow_status": "fact_check_complete"
118
    }
119

120
async def finalize_content(state: ContentCreationState) -> Dict:
121
    """Final phase - incorporate fact-checking feedback."""
122
    content = state.written_content
123
    fact_check = state.fact_check_report
124

125
    if fact_check.accuracy_score >= 0.9:
126
        # Content is accurate enough
127
        final = f"{content.title}\n\n{content.introduction}\n\n"
128
        final += "\n\n".join(content.body_paragraphs)
129
        final += f"\n\n{content.conclusion}"
130
    else:
131
        # Need to revise based on fact-checking
132
        revision_agent = Agent(
133
            'openai:gpt-4o',
134
            system_prompt="Revise content to address fact-checking concerns."
135
        )
136

137
        revision_context = f"""
138
        Original content needs revision.
139

140
        Corrections needed:
141
        {chr(10).join(f'- {correction}' for correction in fact_check.corrections_needed)}
142

143
        Original content:
144
        {content.title}
145
        {content.introduction}
146
        {chr(10).join(content.body_paragraphs)}
147

148
        Revise to ensure accuracy.
149
        """
150

151
        revision = await revision_agent.run(revision_context)
152
        final = revision.data
153

154
    return {
155
        "final_content": final,
156
        "workflow_status": "complete"
157
    }
158

159
# Build the multi-agent workflow
160
def create_content_workflow():
161
    workflow = StateGraph(ContentCreationState)
162

163
    # Add all nodes
164
    workflow.add_node("research", conduct_research)
165
    workflow.add_node("write", write_content)
166
    workflow.add_node("fact_check", fact_check_content)
167
    workflow.add_node("finalize", finalize_content)
168

169
    # Define the flow
170
    workflow.set_entry_point("research")
171
    workflow.add_edge("research", "write")
172
    workflow.add_edge("write", "fact_check")
173

174
    # Conditional edge based on fact-checking results
175
    def route_after_fact_check(state: ContentCreationState) -> str:
176
        if state.fact_check_report.accuracy_score < 0.7:
177
            # Too many errors, go back to research
178
            return "research"
179
        return "finalize"
180

181
    workflow.add_conditional_edges(
182
        "fact_check",
183
        route_after_fact_check,
184
        {
185
            "research": "research",
186
            "finalize": "finalize"
187
        }
188
    )
189

190
    workflow.add_edge("finalize", END)
191

192
    return workflow.compile()
193

194
# Use the multi-agent system
195
content_app = create_content_workflow()
196

197
# Create content on any topic
198
initial_request = ContentCreationState(
199
    user_request="Write about the latest advances in quantum computing"
200
)
201

202
result = await content_app.ainvoke(initial_request)
203
print(f"Final content:\n{result.final_content}")

Several patterns make this system robust. Each agent has a specialized role with its own validation schema. Data flows between agents are type-safe and validated. The workflow can loop back if quality checks fail. And the entire system is modular — you can swap agents or add new ones without touching the rest of the graph.

Performance Optimization#

Parallel Processing at Scale#

When production workloads arrive, performance becomes the constraint that matters. The research platform below demonstrates how to split a complex query into independent subtopics, research them all in parallel, and synthesize the results:

Figure 5: Research Platform Architecture — Multiple topics are researched simultaneously through parallel async calls, with results flowing into a synthesis phase that combines findings into a unified report.

1
from langgraph.graph import StateGraph
2
from pydantic import BaseModel, Field
3
from pydantic_ai import Agent
4
from typing import List, Dict
5
import asyncio
6
from concurrent.futures import ThreadPoolExecutor
7

8
class TopicResearchTask(BaseModel):
9
    """Individual research task."""
10
    topic_id: str
11
    topic_name: str
12
    search_queries: List[str]
13
    max_sources: int = 5
14

15
class TopicResearchResult(BaseModel):
16
    """Results from researching a single topic."""
17
    topic_id: str
18
    topic_name: str
19
    findings: List[str]
20
    sources: List[str]
21
    relevance_score: float
22

23
class ResearchPlatformState(BaseModel):
24
    """State for high-performance research platform."""
25
    main_query: str
26
    generated_topics: List[TopicResearchTask] = Field(default_factory=list)
27
    research_results: List[TopicResearchResult] = Field(default_factory=list)
28
    synthesis: Optional[str] = None
29
    performance_metrics: Dict = Field(default_factory=dict)
30

31
# Specialized agents
32
topic_generator = Agent(
33
    'openai:gpt-4o',
34
    system_prompt="""Break down complex research queries into specific,
35
    researchable subtopics. Each topic should be focused and independent."""
36
)
37

38
topic_researcher = Agent(
39
    'openai:gpt-4o',
40
    result_type=TopicResearchResult,
41
    system_prompt="Research the given topic thoroughly and provide structured findings."
42
)
43

44
synthesis_agent = Agent(
45
    'openai:gpt-4o',
46
    system_prompt="""Synthesize multiple research findings into a comprehensive,
47
    coherent report that addresses the original query."""
48
)
49

50
# High-performance workflow implementation
51
async def generate_research_topics(state: ResearchPlatformState) -> Dict:
52
    """Generate parallel research topics."""
53
    import time
54
    start_time = time.time()
55

56
    # Generate topics using the agent
57
    prompt = f"""
58
    Break down this research query into 3-5 independent subtopics:
59
    {state.main_query}
60

61
    For each topic, provide:
62
    1. A focused topic name
63
    2. 2-3 specific search queries
64
    """
65

66
    result = await topic_generator.run(prompt)
67

68
    # Parse result into structured topics
69
    # In production, you'd have more robust parsing
70
    topics = []
71
    for i in range(3):  # Simplified for example
72
        topics.append(TopicResearchTask(
73
            topic_id=f"topic_{i}",
74
            topic_name=f"Subtopic {i+1}",
75
            search_queries=[f"query {i}.1", f"query {i}.2"]
76
        ))
77

78
    generation_time = time.time() - start_time
79

80
    return {
81
        "generated_topics": topics,
82
        "performance_metrics": {"topic_generation_time": generation_time}
83
    }
84

85
async def parallel_research(state: ResearchPlatformState) -> Dict:
86
    """Research all topics in parallel."""
87
    import time
88
    start_time = time.time()
89

90
    async def research_single_topic(topic: TopicResearchTask) -> TopicResearchResult:
91
        """Research a single topic asynchronously."""
92
        # Simulate API calls or database queries
93
        search_results = await asyncio.gather(*[
94
            simulate_search(query) for query in topic.search_queries
95
        ])
96

97
        # Use agent to analyze search results
98
        context = f"""
99
        Topic: {topic.topic_name}
100
        Search Results: {search_results}
101

102
        Analyze and summarize the findings.
103
        """
104

105
        result = await topic_researcher.run(context)
106
        return result.data
107

108
    # Execute all topic research in parallel
109
    research_tasks = [
110
        research_single_topic(topic) for topic in state.generated_topics
111
    ]
112

113
    results = await asyncio.gather(*research_tasks)
114

115
    research_time = time.time() - start_time
116
    metrics = state.performance_metrics.copy()
117
    metrics["parallel_research_time"] = research_time
118
    metrics["topics_researched"] = len(results)
119

120
    return {
121
        "research_results": results,
122
        "performance_metrics": metrics
123
    }
124

125
async def synthesize_findings(state: ResearchPlatformState) -> Dict:
126
    """Synthesize all research into final report."""
127
    # Prepare synthesis context
128
    findings_text = "\n\n".join([
129
        f"Topic: {r.topic_name}\n" +
130
        "\n".join(f"- {finding}" for finding in r.findings)
131
        for r in state.research_results
132
    ])
133

134
    synthesis_prompt = f"""
135
    Original Query: {state.main_query}
136

137
    Research Findings:
138
    {findings_text}
139

140
    Create a comprehensive synthesis that addresses the original query.
141
    """
142

143
    result = await synthesis_agent.run(synthesis_prompt)
144

145
    return {"synthesis": result.data}
146

147
# Simulate external operations
148
async def simulate_search(query: str) -> str:
149
    """Simulate an external search operation."""
150
    await asyncio.sleep(0.1)  # Simulate network delay
151
    return f"Results for '{query}'"
152

153
# Build the high-performance workflow
154
def create_research_platform():
155
    workflow = StateGraph(ResearchPlatformState)
156

157
    workflow.add_node("generate_topics", generate_research_topics)
158
    workflow.add_node("parallel_research", parallel_research)
159
    workflow.add_node("synthesize", synthesize_findings)
160

161
    workflow.set_entry_point("generate_topics")
162
    workflow.add_edge("generate_topics", "parallel_research")
163
    workflow.add_edge("parallel_research", "synthesize")
164
    workflow.add_edge("synthesize", END)
165

166
    return workflow.compile()
167

168
# Performance monitoring wrapper
169
async def run_with_monitoring(app, state):
170
    """Run workflow with performance monitoring."""
171
    import time
172

173
    start_time = time.time()
174
    result = await app.ainvoke(state)
175
    total_time = time.time() - start_time
176

177
    print(f"Performance Metrics:")
178
    print(f"- Total execution time: {total_time:.2f}s")
179
    print(f"- Topic generation: {result.performance_metrics['topic_generation_time']:.2f}s")
180
    print(f"- Parallel research: {result.performance_metrics['parallel_research_time']:.2f}s")
181
    print(f"- Topics processed: {result.performance_metrics['topics_researched']}")
182
    print(f"- Speedup vs sequential: {result.performance_metrics['topics_researched']:.1f}x")
183

184
    return result

Four Optimization Strategies That Matter#

Beyond parallelization, there are four areas that consistently deliver performance gains in production LangGraph + Pydantic AI systems:

Figure 6: Performance Optimization Strategies — Four pillars of optimization: state minimization to reduce checkpoint overhead, validation caching to avoid redundant schema checks, parallelization to maximize throughput, and response caching to eliminate duplicate LLM calls.

Here are implementations of each strategy:

1
from functools import lru_cache
2
from typing import Dict, Any
3
import hashlib
4
import json
5

6
class OptimizedWorkflowComponents:
7
    """Collection of optimization techniques for Langgraph + Pydantic AI."""
8

9
    def __init__(self):
10
        self.validation_cache = {}
11
        self.llm_response_cache = {}
12

13
    # 1. State Optimization
14
    def minimize_state(self, state: Dict[str, Any]) -> Dict[str, Any]:
15
        """Remove unnecessary data from state before transitions."""
16
        # Define fields that need to persist
17
        essential_fields = {'id', 'status', 'current_data', 'next_step'}
18

19
        # Store large data externally and keep references
20
        minimized = {}
21
        for key, value in state.items():
22
            if key in essential_fields:
23
                minimized[key] = value
24
            elif isinstance(value, (list, dict)) and len(str(value)) > 1000:
25
                # Store large objects externally
26
                ref = self.store_large_object(value)
27
                minimized[f"{key}_ref"] = ref
28

29
        return minimized
30

31
    def store_large_object(self, obj: Any) -> str:
32
        """Store large object and return reference."""
33
        # In production, use S3, Redis, etc.
34
        obj_hash = hashlib.md5(json.dumps(obj, sort_keys=True).encode()).hexdigest()
35
        # Simulate storage
36
        return f"ref://{obj_hash}"
37

38
    # 2. Validation Optimization
39
    @lru_cache(maxsize=1000)
40
    def cached_validation(self, model_class: type, data_hash: str) -> bool:
41
        """Cache validation results for repeated data."""
42
        # This would be called with hash of data
43
        # In practice, implement actual validation
44
        return True
45

46
    # 3. Parallel Agent Execution
47
    async def parallel_agent_calls(self, agents: List[Agent], prompts: List[str]) -> List[Any]:
48
        """Execute multiple agents in parallel."""
49
        tasks = [
50
            agent.run(prompt)
51
            for agent, prompt in zip(agents, prompts)
52
        ]
53

54
        results = await asyncio.gather(*tasks, return_exceptions=True)
55

56
        # Handle any failures
57
        processed_results = []
58
        for i, result in enumerate(results):
59
            if isinstance(result, Exception):
60
                # Fallback or retry logic
61
                processed_results.append(None)
62
            else:
63
                processed_results.append(result.data)
64

65
        return processed_results
66

67
    # 4. Response Caching
68
    def get_cached_response(self, agent_id: str, prompt: str) -> Optional[str]:
69
        """Check if we have a cached response for this prompt."""
70
        cache_key = f"{agent_id}:{hashlib.md5(prompt.encode()).hexdigest()}"
71
        return self.llm_response_cache.get(cache_key)
72

73
    def cache_response(self, agent_id: str, prompt: str, response: str):
74
        """Cache an LLM response."""
75
        cache_key = f"{agent_id}:{hashlib.md5(prompt.encode()).hexdigest()}"
76
        self.llm_response_cache[cache_key] = response
77

78
# Example: Optimized document processing workflow
79
class OptimizedDocumentProcessor:
80
    """Document processor with performance optimizations."""
81

82
    def __init__(self):
83
        self.optimizer = OptimizedWorkflowComponents()
84
        self.summary_agent = Agent(
85
            'openai:gpt-3.5-turbo',  # Cheaper, faster model
86
            system_prompt="Summarize documents concisely."
87
        )
88
        self.analysis_agent = Agent(
89
            'openai:gpt-4o',  # More capable model for complex analysis
90
            system_prompt="Provide deep analysis of document content."
91
        )
92

93
    async def process_documents_batch(self, documents: List[str]) -> Dict:
94
        """Process multiple documents with optimizations."""
95
        # 1. Use cheaper model for initial filtering
96
        summaries = await self.optimizer.parallel_agent_calls(
97
            [self.summary_agent] * len(documents),
98
            [f"Summarize: {doc[:500]}" for doc in documents]  # Only send first 500 chars
99
        )
100

101
        # 2. Filter documents that need deep analysis
102
        documents_for_analysis = []
103
        for i, (doc, summary) in enumerate(zip(documents, summaries)):
104
            if self.needs_deep_analysis(summary):
105
                documents_for_analysis.append((i, doc))
106

107
        # 3. Parallel deep analysis only for selected documents
108
        if documents_for_analysis:
109
            analysis_prompts = [
110
                f"Analyze this document in detail: {doc}"
111
                for _, doc in documents_for_analysis
112
            ]
113

114
            analyses = await self.optimizer.parallel_agent_calls(
115
                [self.analysis_agent] * len(documents_for_analysis),
116
                analysis_prompts
117
            )
118

119
        # 4. Combine results
120
        results = {
121
            "total_documents": len(documents),
122
            "analyzed_documents": len(documents_for_analysis),
123
            "summaries": summaries,
124
            "detailed_analyses": {
125
                i: analysis
126
                for (i, _), analysis in zip(documents_for_analysis, analyses)
127
            }
128
        }
129

130
        return results
131

132
    def needs_deep_analysis(self, summary: str) -> bool:
133
        """Determine if document needs deep analysis based on summary."""
134
        # Simple heuristic - in practice, use more sophisticated logic
135
        keywords = ['important', 'critical', 'urgent', 'complex']
136
        return any(keyword in summary.lower() for keyword in keywords)

KEY INSIGHT: Use a cheap, fast model (like GPT-3.5) for triage and filtering, then route only the documents that need it to an expensive model (like GPT-4o). This two-tier approach can cut LLM costs by 60-80% on batch workloads.

Real-World Application: E-commerce Order Processing#

To ground these patterns in something concrete, here is a full e-commerce order processing system. It handles inventory checks, payment processing, and shipping coordination as a single LangGraph workflow with Pydantic validation at every step:

1
from langgraph.graph import StateGraph
2
from pydantic import BaseModel, Field, validator
3
from pydantic_ai import Agent
4
from typing import List, Optional, Literal
5
from datetime import datetime
6
from decimal import Decimal
7

8
# Domain models
9
class OrderItem(BaseModel):
10
    """Individual item in an order."""
11
    product_id: str
12
    quantity: int = Field(gt=0)
13
    unit_price: Decimal = Field(gt=0, decimal_places=2)
14

15
    @property
16
    def total_price(self) -> Decimal:
17
        return self.quantity * self.unit_price
18

19
class Order(BaseModel):
20
    """E-commerce order with validation."""
21
    order_id: str
22
    customer_id: str
23
    items: List[OrderItem] = Field(min_items=1)
24
    shipping_address: str
25
    billing_address: str
26
    payment_method: Literal["credit_card", "paypal", "bank_transfer"]
27
    status: str = "pending"
28

29
    @property
30
    def total_amount(self) -> Decimal:
31
        return sum(item.total_price for item in self.items)
32

33
    @validator('items')
34
    def validate_items(cls, items):
35
        # Check for duplicate products
36
        product_ids = [item.product_id for item in items]
37
        if len(product_ids) != len(set(product_ids)):
38
            raise ValueError("Duplicate products in order")
39
        return items
40

41
class OrderProcessingState(BaseModel):
42
    """State for order processing workflow."""
43
    order: Order
44
    inventory_checked: bool = False
45
    payment_verified: bool = False
46
    shipping_arranged: bool = False
47
    notifications_sent: List[str] = Field(default_factory=list)
48
    processing_errors: List[str] = Field(default_factory=list)
49

50
# Specialized agents
51
inventory_agent = Agent(
52
    'openai:gpt-4o',
53
    system_prompt="""You are an inventory management specialist.
54
    Check product availability and reserve items."""
55
)
56

57
payment_agent = Agent(
58
    'openai:gpt-4o',
59
    system_prompt="""You are a payment processing specialist.
60
    Verify payment methods and process transactions securely."""
61
)
62

63
shipping_agent = Agent(
64
    'openai:gpt-4o',
65
    system_prompt="""You are a shipping coordinator.
66
    Arrange optimal shipping based on destination and items."""
67
)
68

69
# Workflow implementation
70
async def check_inventory(state: OrderProcessingState) -> Dict:
71
    """Check inventory for all items."""
72
    # In production, this would query real inventory systems
73
    inventory_query = "\n".join([
74
        f"Product {item.product_id}: {item.quantity} units"
75
        for item in state.order.items
76
    ])
77

78
    result = await inventory_agent.run(
79
        f"Check availability for:\n{inventory_query}"
80
    )
81

82
    # Simulate inventory check
83
    all_available = True  # In reality, parse agent response
84

85
    if all_available:
86
        return {
87
            "inventory_checked": True,
88
            "order": {**state.order.dict(), "status": "inventory_confirmed"}
89
        }
90
    else:
91
        return {
92
            "processing_errors": state.processing_errors + ["Insufficient inventory"],
93
            "order": {**state.order.dict(), "status": "failed"}
94
        }
95

96
async def process_payment(state: OrderProcessingState) -> Dict:
97
    """Process payment for the order."""
98
    if not state.inventory_checked:
99
        return {
100
            "processing_errors": state.processing_errors + ["Cannot process payment before inventory check"]
101
        }
102

103
    payment_context = f"""
104
    Order Total: ${state.order.total_amount}
105
    Payment Method: {state.order.payment_method}
106
    Customer ID: {state.order.customer_id}
107
    """
108

109
    result = await payment_agent.run(
110
        f"Process payment:\n{payment_context}"
111
    )
112

113
    # Simulate payment processing
114
    payment_successful = True  # In reality, integrate with payment gateway
115

116
    if payment_successful:
117
        return {
118
            "payment_verified": True,
119
            "order": {**state.order.dict(), "status": "payment_confirmed"},
120
            "notifications_sent": state.notifications_sent + ["payment_confirmation"]
121
        }
122
    else:
123
        return {
124
            "processing_errors": state.processing_errors + ["Payment failed"],
125
            "order": {**state.order.dict(), "status": "payment_failed"}
126
        }
127

128
async def arrange_shipping(state: OrderProcessingState) -> Dict:
129
    """Arrange shipping for the order."""
130
    if not state.payment_verified:
131
        return {
132
            "processing_errors": state.processing_errors + ["Cannot ship before payment"]
133
        }
134

135
    shipping_context = f"""
136
    Destination: {state.order.shipping_address}
137
    Items: {len(state.order.items)} items, Total weight: TBD
138
    Priority: Standard
139
    """
140

141
    result = await shipping_agent.run(
142
        f"Arrange shipping:\n{shipping_context}"
143
    )
144

145
    return {
146
        "shipping_arranged": True,
147
        "order": {**state.order.dict(), "status": "shipped"},
148
        "notifications_sent": state.notifications_sent + ["shipping_confirmation"]
149
    }
150

151
# Build the order processing workflow
152
def create_order_workflow():
153
    workflow = StateGraph(OrderProcessingState)
154

155
    workflow.add_node("inventory", check_inventory)
156
    workflow.add_node("payment", process_payment)
157
    workflow.add_node("shipping", arrange_shipping)
158

159
    # Define flow with conditional routing
160
    workflow.set_entry_point("inventory")
161

162
    def route_after_inventory(state: OrderProcessingState) -> str:
163
        if state.inventory_checked and not state.processing_errors:
164
            return "payment"
165
        return END
166

167
    def route_after_payment(state: OrderProcessingState) -> str:
168
        if state.payment_verified and not state.processing_errors:
169
            return "shipping"
170
        return END
171

172
    workflow.add_conditional_edges("inventory", route_after_inventory)
173
    workflow.add_conditional_edges("payment", route_after_payment)
174
    workflow.add_edge("shipping", END)
175

176
    return workflow.compile()
177

178
# Usage example
179
order_processor = create_order_workflow()
180

181
# Process an order
182
test_order = Order(
183
    order_id="ORD-12345",
184
    customer_id="CUST-789",
185
    items=[
186
        OrderItem(product_id="PROD-A", quantity=2, unit_price=Decimal("29.99")),
187
        OrderItem(product_id="PROD-B", quantity=1, unit_price=Decimal("49.99"))
188
    ],
189
    shipping_address="123 Main St, Anytown, USA",
190
    billing_address="123 Main St, Anytown, USA",
191
    payment_method="credit_card"
192
)
193

194
initial_state = OrderProcessingState(order=test_order)
195
result = await order_processor.ainvoke(initial_state)
196

197
print(f"Order Status: {result.order.status}")
198
print(f"Notifications sent: {result.notifications_sent}")

This example demonstrates the full integration pattern at work: complex business logic with multiple steps, Pydantic validation at every stage, conditional routing based on success or failure, integration points for external systems, and comprehensive error handling that prevents partial failures from corrupting the order state.

Benefits, Challenges, and Honest Tradeoffs#

What You Gain#

After building several production systems with this stack, five advantages stand out:

Workflow flexibility — LangGraph’s graph-based approach lets you model complex workflows naturally. Need to add a fraud check step? Add a node. Want parallel processing? Add parallel edges. The graph evolves with your requirements.
Type safety throughout — Pydantic’s validation ensures data integrity at every step. You are not hoping the LLM returns the right format. You are guaranteeing it.
Debuggable failures — When something goes wrong, you can see exactly which node failed and why. The combination of structured logs and validated data makes debugging tractable instead of maddening.
Reusable components — Both frameworks encourage modular design. Agents can be reused across workflows, and workflow patterns can be templated for similar use cases.
Horizontal scalability — The graph structure naturally supports scaling out. Different nodes can run on different machines, and the state management keeps everything coordinated.

What You Will Struggle With#

The challenges are real too:

Learning curve — If you are coming from simple LLM chains, graph-based thinking requires adjustment. Start with simple linear workflows and gradually add complexity.
State management complexity — As workflows grow, managing state becomes the hardest part of the system. Use checkpointing features and external storage for large data from the beginning, not as an afterthought.
Performance overhead — Validation and state management add latency. Profile your workflows and optimize critical paths with caching and parallelization.
Testing difficulty — Testing stateful workflows is harder than testing individual functions. Build comprehensive test suites that cover both happy paths and edge cases, and invest in deterministic mock agents for CI.

What Comes Next#

Patterns on the Horizon#

Four emerging patterns are worth watching:

Adaptive workflows — Systems that modify their own graph topology based on performance metrics or user feedback.
Cross-organization workflows — Federated graphs that span multiple organizations while maintaining security and privacy boundaries.
Real-time collaboration — Multiple agents and humans working together in the same workflow with live state updates.
Advanced state persistence — More sophisticated checkpointing with automatic recovery and schema migration capabilities.

Getting Started: A Practical Checklist#

If you are ready to build production AI systems with LangGraph and Pydantic AI, here is what we recommend based on our own trial-and-error:

Sketch the workflow as a graph first — Before writing any code, draw out your nodes, edges, and state requirements. The visual representation catches design flaws early.
Define your Pydantic models before your agents — Comprehensive data models for all inter-node data is an investment that pays off within the first week.
Build incrementally — Start with a simple linear workflow. Add conditional edges, then parallel execution, then error recovery. Test thoroughly at each step.
Instrument everything — Track not just errors but also performance metrics, state transitions, and agent interactions. You cannot optimize what you cannot measure.
Design for failure from day one — Use conditional edges to handle failures gracefully. Build the resilience patterns (retries, fallbacks, partial results) into the first version, not the second.

KEY INSIGHT: Start with a two-node linear workflow that actually runs in production. Then add complexity one edge at a time. Teams that try to build the full graph on day one never ship.

Conclusion#

LangGraph and Pydantic AI solve different problems that, combined, address the central challenge of production AI systems: how do you build agent workflows that are both flexible enough to handle real-world complexity and reliable enough to trust with real-world data?

LangGraph gives you the orchestration — conditional routing, parallel execution, stateful checkpoints, and error recovery paths. Pydantic AI gives you the guarantees — type-safe outputs, validated state transitions, and schema enforcement at every boundary. Neither framework alone gets you to production-ready. Together, they do.

The future of AI development is not about better models or cleverer prompts. It is about building robust systems that can handle the messiness of production while maintaining the reliability that businesses demand. With LangGraph and Pydantic AI, we finally have the right tools for that job.

References#

[1] “Langgraph: A Comprehensive Technical Overview.” https://github.com/langchain-ai/langgraph (2025)

[2] “Pydantic AI: An Agent Framework for Building GenAI Applications.” https://github.com/jxnl/pydantic-ai (2025)

[3] “Best Practices for Designing Multi-Agent Systems with LangGraph.” https://blog.langchain.dev/multi-agent-systems/ (2025)

[4] “Integrating Langgraph with Pydantic: Validation Patterns for State Transitions.” https://docs.pydantic.ai/latest/integrations/ (2025)

[5] “Error Handling in Langgraph: From Validation to Graceful Recovery.” https://python.langchain.com/docs/langgraph/error-handling/ (2025)

[6] “Performance Considerations for Scaling LangGraph + Pydantic AI Systems.” https://docs.pydantic.ai/latest/usage/pydantic_ai/ (2025)

[7] “Exploring Pydantic AI’s Core Features: Schema Inference, Function Calling, and Structured Output.” https://jxnl.github.io/instructor/ (2024)

[8] “Technical Benefits of Pydantic AI for Implementing AI Agent Patterns.” https://medium.com/@pydantic/pydantic-ai-9c211c5dfa5 (2024)

[9] “Pydantic AI in Production: Agent Patterns, Challenges, and Benefits.” https://www.pydantic.dev/blog/pydantic-ai-in-production/ (2025)