Sagemaker Integration
Last updated
Was this helpful?
Last updated
Was this helpful?
Integrating AWS SageMaker with Fiddler allows you to monitor your deployed models easily. This guide shows you how to use an AWS Lambda function leveraging the Fiddler Python client to read SageMaker inference logs from an Amazon S3 bucket and send those inferences to your Fiddler instance. This setup provides real-time visibility, simplifies how you monitor your models, and gives essential insights into how your SageMaker models perform and behave.
An actively served SageMaker model with
Data capture enabled
Inference logs persisted to S3 in JSONL format
Access to a Fiddler environment
Your SageMaker model onboarded to Fiddler. Check out our ML Monitoring - Simple Quick Start Guide for onboarding your models.
Configure SageMaker for data capture
Onboard your SageMaker model to Fiddler
Create an AWS Lambda function for data integration between SageMaker and Fiddler
Monitor and analyze your model in Fiddler
Detailed Steps
This guide assumes that your SageMaker model is set up and onboarded to Fiddler, as noted in the prerequisites.
Create a new AWS Lambda function
Begin by creating a new AWS Lambda function.
Set Up Environment Variables
In your Lambda function, create the following environment variables
FIDDLER_URL - The URL of your Fiddler environment (including https:// e.g. 'https://your_company_name.fiddler.ai').
FIDDLER_TOKEN - Your Fiddler authorization token (see [here] for more details on connecting to Fiddler with our Python client).
FIDDLER_MODEL_UUID - The unique identifier of your Fiddler model which can be found in the UI on the model card page or model.id
if you have a reference to this model in your notebook using the Fiddler Python client.
FIDDLER_MODEL_COLUMNS - Your Fiddler model's input columns. These should align with the values to expect from the SageMaker event's JSONL "inputs" data. These need to be in the same order as sent in the event.
FIDDLER_MODEL_OUTPUT_COLUMN - The name of the model output column in Fiddler. The value is from the SageMaker event's JSONL "outputs" data.
FIDDLER_TIMESTAMP_COLUMN - Optionally, the name of the timestamp column in Fiddler. This is optionally pre-defined when you onboard your model to Fiddler and it tells Fiddler to look for this column in your inferences for the datetime the event occurred. The alternative is to not included a timestamp and let Fiddler insert the current datetime as soon as your inference is uploaded: this works well for streaming real-time and near-time inferences.
Set Up Trigger for Lambda Function
Ensure that you configure a trigger for your Lambda function so that it is invoked upon “Object creation” events in the S3 bucket associated with your model.
Add Code to Your Lambda Function
Paste the example script into your new Lambda function function.
Customize
The output of your SageMaker model's endpoint may be different than this example which would at the least require adjusting the dictionary keys used to extract the inference values in the process_jsonl_content
function.
This example is not a complete production solution. You may wish to consider validation, logging, and other requirements expected by your organization's standards.
Python Script for Lambda Function
```py import os import json import uuid import boto3 import logging from typing import Dict, List, Any import fiddler as fdl
logger = logging.getLogger() logger.setLevel(logging.INFO)
url = os.getenv('FIDDLER_URL') token = os.getenv('FIDDLER_TOKEN') model_uuid = os.getenv('FIDDLER_MODEL_UUID') model_columns = os.getenv('MODEL_COLUMNS') model_output_column = os.getenv('MODEL_OUTPUT') timestamp_column = os.getenv('MODEL_TIMESTAMP')
s3_client = boto3.client('s3')
fdl.init(url=url, token=token) fiddler_model = fdl.Model.get(id_=model_uuid)
def get_all_columns():
def process_jsonl_content(event_data: str) -> Dict[str, Any]: input_data = event_data['captureData']['endpointInput']['data'] input_values = input_data.split(',') # Split the CSV string into a list
def parse_sagemaker_log(log_file_path: str) -> List[Dict[str, Any]]: try: # Collect all events in a List, 1 per JSON-line in the file event_rows = [] with open(log_file_path, 'r') as file: for line in file: event = json.loads(line.strip()) row = process_jsonl_content(event) event_rows.append(row)
def publish_to_fiddler(inferences: List[Dict[str, Any]], model: fdl.Model): # There are multiple options for publishing data to Fiddler, check # the online documentation for batch, streaming, and REST API options. # The below publish call will use a streaming approach managed by # the Fiddler Python client internally based on the volume of inferences.
def lambda_handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
If you have provisioned Fiddler via the , you will also need to set these 3 environment variables for the Fiddler Python client to properly authenticate.