Why Alert Integration Matters
AI models fail in unique ways—drift, data quality issues, performance degradation, safety violations. Fiddler’s alert integrations ensure your team responds immediately:- Unified Incident Management - AI alerts flow into the same systems as infrastructure alerts
- Faster Response Times - On-call engineers notified via existing escalation policies
- Context-Rich Alerts - Model context, affected predictions, and root cause analysis included
- Reduced Alert Fatigue - Intelligent grouping and deduplication across tools
- Automated Remediation - Trigger workflows to rollback models or scale resources
Integration Categories
📊 Observability Platforms
Send Fiddler metrics and alerts to enterprise observability platforms for unified monitoring. Supported Platforms:- Datadog - Application performance monitoring and infrastructure observability ✓ GA
- Correlate AI model issues with infrastructure metrics
- Build unified dashboards combining Fiddler + infrastructure data
- Use Datadog’s anomaly detection on Fiddler metrics
- Alert on compound conditions (model drift + high latency)
🚨 Incident Management
Connect alerts to on-call systems for immediate engineer notification. Supported Platforms:- PagerDuty - Incident management and on-call scheduling ✓ GA
- Page on-call ML engineers for critical model failures
- Escalate unresolved AI incidents automatically
- Track MTTR (Mean Time To Resolution) for model issues
- Integrate with incident runbooks and response workflows
💬 Team Collaboration
Send alerts to team communication tools for visibility and collaboration. Supported Platforms:- Slack - Team messaging and collaboration ✓ GA (Coming Soon)
- Microsoft Teams - Enterprise communication platform ✓ GA (Coming Soon)
- Notify ML team channel when drift is detected
- Alert data science team on data quality issues
- Share model performance reports automatically
- Collaborative incident triage in team channels
Observability Platform Integrations
Datadog
Integrate Fiddler with Datadog for unified application and AI monitoring. Why Datadog + Fiddler:- Unified Dashboards - Combine infrastructure, application, and AI model metrics
- Correlated Alerts - Alert on compound conditions (e.g., “high model drift + high API latency”)
- Service Map Integration - See model health in Datadog service dependency graphs
- Anomaly Detection - Leverage Datadog’s ML-based alerting on Fiddler metrics
- Metric Export - Send Fiddler drift, performance, and data quality metrics to Datadog
- Event Streaming - Stream model events (predictions, drift detections) as Datadog events
- Alert Forwarding - Route Fiddler alerts to Datadog for unified incident management
- Tag Propagation - Maintain consistent tagging across platforms (model, environment, team)
Incident Management Integrations
PagerDuty
Route critical AI alerts to on-call engineers via PagerDuty. Why PagerDuty + Fiddler:- On-Call Escalation - Page the right ML engineer based on escalation policies
- Incident Deduplication - Prevent alert storms from related model issues
- Incident Timeline - Track when AI issues were detected, acknowledged, resolved
- Postmortem Integration - Include model context in incident reports
- Severity Mapping - Map Fiddler alert criticality to PagerDuty severity levels
- Service Integration - Associate alerts with PagerDuty services (e.g., “Fraud Detection Service”)
- Custom Payloads - Include model metadata, drift scores, affected predictions
- Bidirectional Updates - Acknowledge/resolve incidents in PagerDuty or Fiddler
Team Collaboration Integrations
Slack (Coming Soon)
Planned Features:- Channel Notifications - Post alerts to team Slack channels
- Interactive Messages - Acknowledge, snooze, or resolve alerts from Slack
- Scheduled Reports - Daily/weekly model performance summaries
- Threaded Discussions - Collaborate on incident resolution in threads
Microsoft Teams (Coming Soon)
Planned Features:- Adaptive Cards - Rich, interactive alert notifications
- Team Channels - Route alerts to relevant team channels
- Bot Commands - Query model status from Teams chat
- Integration with Workflows - Trigger Teams workflows on alerts
Alert Routing Patterns
Pattern 1: Severity-Based Routing
Route alerts to different channels based on severity:Pattern 2: Team-Based Routing
Different teams get different alerts:Pattern 3: Composite Alerting
Alert on compound conditions across multiple platforms:Metric Export Patterns
Export Fiddler Metrics to Datadog
fiddler.model.drift.score- Overall drift score (0-1)fiddler.model.drift.feature.<feature_name>- Per-feature driftfiddler.model.performance.<metric>- Model performance metricsfiddler.model.data_quality.score- Data quality scorefiddler.model.predictions.count- Prediction volumefiddler.model.predictions.latency- Prediction latency percentiles
Query Fiddler Metrics in Datadog
Alert Lifecycle Management
Alert States
Fiddler alerts transition through these states:- PagerDuty: Bidirectional state sync (acknowledge, resolve)
- Datadog: Event-based updates
- Slack: Interactive message updates
Alert Deduplication
Prevent alert storms with intelligent deduplication:Custom Webhook Integrations
For platforms not natively supported, use generic webhooks:Monitoring Integration Health
Track Integration Status
Alerts on Integration Failures
Best Practices
Alert Fatigue Prevention
1. Use Appropriate Severity Levels:Incident Response Runbooks
Include runbook links in alert payloads:Security & Compliance
Secure Credential Management
Never hardcode credentials:Alert Data Privacy
PII Redaction in Alerts:Audit Logging
Track alert delivery:Troubleshooting
Common Issues
Alerts Not Delivered:- Verify integration credentials are valid and not expired
- Check network connectivity from Fiddler to external platform
- Ensure webhook endpoints are reachable (not blocked by firewall)
- Validate alert thresholds are actually being triggered
- Enable alert deduplication with appropriate time windows
- Check if multiple notification channels are configured
- Verify integration isn’t configured twice
- Ensure
include_context=Truein alert configuration - Check payload template includes necessary fields
- Verify external platform supports rich payloads (some SMS gateways don’t)
Integration Selector
Choose the right integration for your use case:| Your Need | Recommended Integration | Why |
|---|---|---|
| On-call engineer paging | PagerDuty | Escalation policies, incident management |
| Infrastructure correlation | Datadog | Unified metrics, correlated dashboards |
| Team notifications | Slack (Coming Soon) | Channel-based, collaborative triage |
| Custom internal tools | Generic Webhooks | Flexible, integrate with any HTTP endpoint |
| Multi-tool strategy | Datadog + PagerDuty | Metrics + incidents in one workflow |
Related Integrations
- Cloud Platforms - Deploy Fiddler on AWS, Azure, GCP
- Data Platforms - Ingest data from Snowflake, Kafka
- ML Platforms - Integrate with Databricks, MLflow
- Agentic AI - Monitor LangGraph and Strands Agents