notebookLangGraph SDK Advanced

What You'll Learn

This interactive notebook demonstrates advanced monitoring patterns for production LangGraph applications through a realistic travel planning system with multiple specialized agents.

Key Topics Covered:

  • Multi-agent workflow monitoring and orchestration

  • Custom instrumentation with decorators and span wrappers

  • Combining auto-instrumentation with fine-grained manual spans

  • Conversation tracking across complex interactions

  • Production configuration for high-volume scenarios

  • Advanced error handling and recovery patterns

  • Business intelligence integration and analytics

Interactive Tutorial

The notebook walks through building a comprehensive travel planning application featuring hotel search, weather analysis, itinerary planning, and supervisor agents working together.

Open the Advanced Observability Notebook in Google Colab →arrow-up-right

Or download the notebook directly from GitHub →arrow-up-right

Custom Instrumentation Tutorial

For hands-on examples of decorator-based and manual instrumentation, including @trace(), span wrappers, and async support:

Open the Custom Instrumentation Notebook in Google Colab →arrow-up-right

Or download the notebook directly from GitHub →arrow-up-right

Custom Instrumentation Patterns

The SDK supports three instrumentation approaches. You can use them individually or combine them in the same application. For complete API reference, see the Instrumentation Methodsarrow-up-right section in the integration guide.

Combining Auto-Instrumentation with Decorators

Use LangGraphInstrumentor for automatic LangGraph/LangChain tracing, then add @trace() decorators to capture custom business logic that runs outside the framework:

Multi-Agent Decorator Patterns

When building multi-agent systems, @trace() decorators automatically establish parent-child span relationships through nested function calls:

Using Span Wrappers for Typed Attributes

Span wrapper classes provide typed helper methods for setting semantic attributes on LLM calls, tool invocations, and chain operations. Use them with start_as_current_span() for fine-grained control:

For the complete list of helper methods on each span wrapper class, see the Span Types and Helper Methodsarrow-up-right reference.

Production Configuration Best Practices

Before deploying LangGraph applications to production, configure the SDK for your specific workload characteristics.

High-Volume Applications

Optimize for applications processing thousands of traces per minute:

Low-Latency Requirements

Optimize for applications requiring sub-second trace export:

Memory-Constrained Environments

Configure conservative limits for edge deployments or containerized environments:

Development vs Production Configurations

Development Configuration:

Production Configuration:

Best Practices for Context and Conversation IDs

Structure your identifiers for maximum analytical value:

Prerequisites

  • Fiddler account with API credentials

  • OpenAI API key for example interactions

  • Basic familiarity with LangGraph concepts

Time Required

  • Complete tutorial: 45-60 minutes

  • Quick overview: 15-20 minutes

Telemetry Data Reference

Understanding the data captured by the Fiddler LangGraph SDK.

Span Attributes

The SDK automatically captures these OpenTelemetry attributes:

Attribute
Type
Description

gen_ai.agent.name

str

Name of the AI agent (auto-extracted from LangGraph, configurable for LangChain)

gen_ai.agent.id

str

Unique identifier (format: trace_id:agent_name)

gen_ai.conversation.id

str

Session identifier set via set_conversation_id()

fiddler.span.type

str

Span classification: chain, tool, llm, or other

gen_ai.llm.input.system

str

System prompt content

gen_ai.llm.input.user

str

User input/prompt

gen_ai.llm.output

str

Model response text

gen_ai.llm.context

str

Custom context set via set_llm_context()

gen_ai.request.model

str

Model identifier (e.g., "gpt-4o-mini")

gen_ai.llm.token_count

int

Token usage metrics

gen_ai.tool.name

str

Tool function name

gen_ai.tool.input

str

Tool input parameters (JSON)

gen_ai.tool.output

str

Tool execution results (JSON)

gen_ai.tool.definitions

str

Tool definitions available to the LLM (JSON array of OpenAI-format tool schemas)

gen_ai.input.messages

str

Complete message history provided as input to the LLM (JSON array)

gen_ai.output.messages

str

Output messages generated by the LLM, including tool calls (JSON array)

duration_ms

float

Span duration in milliseconds

fiddler.error.message

str

Error message (if span failed)

fiddler.error.type

str

Error type classification

Setting Attributes with Span Wrappers

When using manual instrumentationarrow-up-right, span wrapper classes provide typed helper methods that set these attributes automatically. For example, FiddlerGeneration.set_model("gpt-4o") sets gen_ai.request.model, and FiddlerTool.set_tool_name("search") sets gen_ai.tool.name.

Span Wrapper
Key Methods
Attributes Set

FiddlerGeneration

set_model(), set_system_prompt(), set_user_prompt(), set_completion(), set_usage(), set_messages(), set_output_messages(), set_tool_definitions()

gen_ai.request.model, gen_ai.llm.input.*, gen_ai.llm.output, gen_ai.usage.*, gen_ai.input.messages, gen_ai.output.messages, gen_ai.tool.definitions

FiddlerTool

set_tool_name(), set_tool_input(), set_tool_output(), set_tool_definitions()

gen_ai.tool.name, gen_ai.tool.input, gen_ai.tool.output, gen_ai.tool.definitions

FiddlerChain

set_input(), set_output()

Input/output data attributes

FiddlerSpan

set_attribute(), set_agent_name(), set_conversation_id()

Any custom or standard attribute

For the complete method reference, see Span Types and Helper Methodsarrow-up-right.

Querying and Filtering in Fiddler

Use these attributes in the Fiddler UI to:

  • Filter by agent: gen_ai.agent.name = "hotel_search_agent"

  • Find conversations: gen_ai.conversation.id = "user-123_support_2026-06-15..."

  • Analyze by model: gen_ai.request.model = "gpt-4o"

  • Track errors: fiddler.error.type EXISTS

Who Should Use This

  • AI engineers building production LangGraph applications

  • DevOps teams monitoring agentic systems

  • Technical leaders evaluating observability strategies

Limitations and Considerations

Current Limitations

  • Framework Support: LangGraph is fully supported with automatic agent name extraction

    • LangChain applications require manual agent name configuration

    • Non-LangGraph Python code can use @trace() decorators or manual context managers for custom instrumentation (see Instrumentation Methodsarrow-up-right)

  • Protocol Support: Currently uses HTTP-based OTLP

    • gRPC support planned for future releases

  • Attribute Limits: Default OpenTelemetry limits apply

    • Configurable via span_limits parameter

    • Very large attribute values may be truncated

Performance Considerations

Overhead: Typical performance impact is < 5% with default settings

  • Use sampling to reduce overhead in high-volume scenarios

  • Adjust batch processing delays based on latency requirements

Memory: Span queue size affects the memory footprint

  • Default queue (100 spans) uses ~1-2MB

  • Increase OTEL_BSP_MAX_QUEUE_SIZE for high throughput

  • Decrease for memory-constrained environments

Network: Compression significantly reduces bandwidth usage

  • Gzip compression: ~70-80% reduction

  • Use Compression.NoCompression only for debugging

Production Deployment Checklist

Before deploying to production:

When to Tune Each Setting

Scenario
Configuration

High-volume production

Increase queue size, batch size, sampling rate

Low-latency requirements

Decrease schedule delay, smaller batches

Memory constraints

Decrease span limits, queue size, batch size

Development/debugging

Disable sampling, enable console tracer

Cost optimization

Increase sampling (lower %), enable compression

Next Steps

After completing the tutorial: