LangGraph SDK Advanced
What You'll Learn
This interactive notebook demonstrates advanced monitoring patterns for production LangGraph applications through a realistic travel planning system with multiple specialized agents.
Key Topics Covered:
Multi-agent workflow monitoring and orchestration
Conversation tracking across complex interactions
Production configuration for high-volume scenarios
Advanced error handling and recovery patterns
Business intelligence integration and analytics
Interactive Tutorial
The notebook walks through building a comprehensive travel planning application featuring hotel search, weather analysis, itinerary planning, and supervisor agents working together.
Open the Advanced Observability Notebook in Google Colab →
Or download the notebook directly from GitHub →
Production Configuration Best Practices
Before deploying LangGraph applications to production, configure the SDK for your specific workload characteristics.
High-Volume Applications
Optimize for applications processing thousands of traces per minute:
import os
from opentelemetry.sdk.trace import SpanLimits, sampling
from opentelemetry.exporter.otlp.proto.http.trace_exporter import Compression
from fiddler_langgraph import FiddlerClient
# Configure batch processing BEFORE initializing FiddlerClient
os.environ['OTEL_BSP_MAX_QUEUE_SIZE'] = '500' # Increased from default 100
os.environ['OTEL_BSP_SCHEDULE_DELAY_MILLIS'] = '500' # Faster export than default 1000ms
os.environ['OTEL_BSP_MAX_EXPORT_BATCH_SIZE'] = '50' # Larger batches than default 10
os.environ['OTEL_BSP_EXPORT_TIMEOUT'] = '10000' # Longer timeout than default 5000ms
# Increase span limits to capture more data
production_limits = SpanLimits(
max_events=128, # Default: 32
max_links=64, # Default: 32
max_span_attributes=128, # Default: 32
max_event_attributes=64, # Default: 32
max_link_attributes=32, # Default: 32
max_span_attribute_length=8192, # Default: 2048
)
# Sample 5-10% of traces to manage data volume
production_sampler = sampling.TraceIdRatioBased(0.05)
client = FiddlerClient(
api_key=os.getenv("FIDDLER_ACCESS_TOKEN"),
application_id=os.getenv("FIDDLER_APPLICATION_ID"),
url=os.getenv("FIDDLER_URL"),
console_tracer=False,
span_limits=production_limits,
sampler=production_sampler,
compression=Compression.Gzip,
)Low-Latency Requirements
Optimize for applications requiring sub-second trace export:
# Reduce batch delay for faster exports
os.environ['OTEL_BSP_SCHEDULE_DELAY_MILLIS'] = '100' # Export every 100ms
os.environ['OTEL_BSP_MAX_EXPORT_BATCH_SIZE'] = '5' # Smaller batches
client = FiddlerClient(
api_key=os.getenv("FIDDLER_ACCESS_TOKEN"),
application_id=os.getenv("FIDDLER_APPLICATION_ID"),
url=os.getenv("FIDDLER_URL"),
compression=Compression.Gzip, # Still use compression
)Memory-Constrained Environments
Configure conservative limits for edge deployments or containerized environments:
memory_constrained_limits = SpanLimits(
max_events=16, # Minimal event capture
max_links=16, # Minimal linking
max_span_attributes=32, # Reduced attributes
max_event_attributes=16, # Reduced event attributes
max_link_attributes=16, # Reduced link attributes
max_span_attribute_length=1024, # Shorter attribute values
)
os.environ['OTEL_BSP_MAX_QUEUE_SIZE'] = '50' # Smaller queue
os.environ['OTEL_BSP_MAX_EXPORT_BATCH_SIZE'] = '5' # Smaller batches
client = FiddlerClient(
api_key=os.getenv("FIDDLER_ACCESS_TOKEN"),
application_id=os.getenv("FIDDLER_APPLICATION_ID"),
url=os.getenv("FIDDLER_URL"),
span_limits=memory_constrained_limits,
sampler=sampling.TraceIdRatioBased(0.1), # Sample 10%
compression=Compression.Gzip,
)Development vs Production Configurations
Development Configuration:
# Capture everything with verbose debugging
dev_client = FiddlerClient(
api_key=os.getenv("FIDDLER_API_KEY"),
application_id=os.getenv("FIDDLER_APPLICATION_ID"),
url=os.getenv("FIDDLER_URL"),
console_tracer=True, # Print spans to console
sampler=None, # Capture 100% of traces
)Production Configuration:
# Optimized for performance and cost
prod_client = FiddlerClient(
api_key=os.getenv("FIDDLER_API_KEY"),
application_id=os.getenv("FIDDLER_APPLICATION_ID"),
url=os.getenv("FIDDLER_URL"),
console_tracer=False, # No console output
sampler=sampling.TraceIdRatioBased(0.05), # Sample 5%
compression=Compression.Gzip, # Reduce bandwidth
span_limits=production_limits, # Controlled limits
)Best Practices for Context and Conversation IDs
Structure your identifiers for maximum analytical value:
from fiddler_langgraph.tracing.instrumentation import set_llm_context, set_conversation_id
import uuid
# Set meaningful, searchable context labels
set_llm_context(model, 'Customer Support - Tier 1 - Billing Inquiries')
set_llm_context(model, 'Content Generation - Marketing Copy - Blog Posts')
set_llm_context(model, 'Data Analysis - Financial Reports - Q4 2025')
# Use structured conversation IDs with metadata
user_id = 'user-12345'
session_type = 'support'
timestamp = '2025-10-17'
conversation_id = f'{user_id}_{session_type}_{timestamp}_{uuid.uuid4()}'
set_conversation_id(conversation_id)
# Example: user-12345_support_2025-10-17_550e8400-e29b-41d4-a716-446655440000Prerequisites
Fiddler account with API credentials
OpenAI API key for example interactions
Basic familiarity with LangGraph concepts
Time Required
Complete tutorial: 45-60 minutes
Quick overview: 15-20 minutes
Telemetry Data Reference
Understanding the data captured by the Fiddler LangGraph SDK.
Span Attributes
The SDK automatically captures these OpenTelemetry attributes:
gen_ai.agent.name
str
Name of the AI agent (auto-extracted from LangGraph, configurable for LangChain)
gen_ai.agent.id
str
Unique identifier (format: trace_id:agent_name)
gen_ai.conversation.id
str
Session identifier set via set_conversation_id()
fiddler.span.type
str
Span classification: chain, tool, llm, or other
gen_ai.llm.input.system
str
System prompt content
gen_ai.llm.input.user
str
User input/prompt
gen_ai.llm.output
str
Model response text
gen_ai.llm.context
str
Custom context set via set_llm_context()
gen_ai.llm.model
str
Model identifier (e.g., "gpt-4o-mini")
gen_ai.llm.token_count
int
Token usage metrics
gen_ai.tool.name
str
Tool function name
gen_ai.tool.input
str
Tool input parameters (JSON)
gen_ai.tool.output
str
Tool execution results (JSON)
duration_ms
float
Span duration in milliseconds
fiddler.error.message
str
Error message (if span failed)
fiddler.error.type
str
Error type classification
Querying and Filtering in Fiddler
Use these attributes in the Fiddler UI to:
Filter by agent:
gen_ai.agent.name = "hotel_search_agent"Find conversations:
gen_ai.conversation.id = "user-123_support_2025-10-17..."Analyze by model:
gen_ai.llm.model = "gpt-4o"Track errors:
fiddler.error.type EXISTS
Who Should Use This
AI engineers building production LangGraph applications
DevOps teams monitoring agentic systems
Technical leaders evaluating observability strategies
Limitations and Considerations
Current Limitations
Framework Support: Only LangGraph is fully supported with automatic agent name extraction
LangChain applications require manual agent name configuration
Other frameworks must use the Client API directly
Protocol Support: Currently uses HTTP-based OTLP
gRPC support planned for future releases
Attribute Limits: Default OpenTelemetry limits apply
Configurable via
span_limitsparameterVery large attribute values may be truncated
Performance Considerations
Overhead: Typical performance impact is < 5% with default settings
Use sampling to reduce overhead in high-volume scenarios
Adjust batch processing delays based on latency requirements
Memory: Span queue size affects the memory footprint
Default queue (100 spans) uses ~1-2MB
Increase
OTEL_BSP_MAX_QUEUE_SIZEfor high throughputDecrease for memory-constrained environments
Network: Compression significantly reduces bandwidth usage
Gzip compression: ~70-80% reduction
Use
Compression.NoCompressiononly for debugging
Production Deployment Checklist
Before deploying to production:
When to Tune Each Setting
High-volume production
Increase queue size, batch size, sampling rate
Low-latency requirements
Decrease schedule delay, smaller batches
Memory constraints
Decrease span limits, queue size, batch size
Development/debugging
Disable sampling, enable console tracer
Cost optimization
Increase sampling (lower %), enable compression
Next Steps
After completing the tutorial:
Technical Reference: Fiddler LangGraph SDK Documentation
Production Deployment: Adapt the demonstrated patterns for your specific use case