Integration With S3
This guide explains how to integrate AWS S3 with Fiddler to retrieve baseline or production data for model monitoring. You'll learn how to:
Extract data from S3 buckets using different authentication methods
Load data efficiently based on your needs
Connect the extracted data with Fiddler's monitoring capabilities
How to Integrate Fiddler with AWS S3
Prerequisites
Before getting started, ensure you have:
An AWS account with access to the required S3 bucket
Required Python packages installed: boto3, pandas, and fiddler-client
Appropriate AWS credentials or profile configuration
Basic familiarity with Python and AWS S3 concepts
AWS Authentication Methods
Method 1: Using AWS Access Keys
If you're using AWS access keys for authentication, use this approach:
Method 2: Using AWS Profiles (Recommended)
For enhanced security, we recommend using AWS profiles instead of hardcoding credentials:
Data Loading Options
Option 1: Direct Memory Loading
For smaller datasets that fit in memory, load directly into a pandas DataFrame as shown in the examples above.
Option 2: File System Loading
For larger datasets or when memory constraints exist, save to disk first:
Using AWS S3 Data with Fiddler
For Baseline Datasets
After loading your data, you can use it to create a baseline dataset in Fiddler. See the Creating a Baseline Dataset guide for more details.
For Production Traffic
To publish production data for monitoring. Refer to the batch publishing guide for more details. For more publishing options, see the additional publishing guides located here.
Best Practices
Always use AWS profiles instead of hardcoded credentials in production environments
Implement proper error handling around S3 operations
Consider data size when choosing between memory and file system loading
Use appropriate AWS IAM roles and permissions
Monitor memory usage when working with large datasets
Last updated
Was this helpful?