Skip to content

Uploading Model Artifacts

To upload your model to Fiddler, you need to create a model package.

A model package is essentially a directory that contains * a set of instructions for how the model will operate in production (called package.py) * some metadata about the model (called model.yaml) * your model * additional assets such as preprocessing pipelines

package.py

At the heart of a model package is the package.py script.

This is a script that tells Fiddler how to feed data into your model. package.py is fully customizable, and allows for custom preprocessing pipelines and predict functions.


Here is what package.py should look like in the most general sense:

%%writefile package.py

import pickle
from pathlib import Path
import pandas as pd

PACKAGE_PATH = Path(__file__).parent

"""
Here, we create a FiddlerModel object that contains all the necessary
    components for the model to run smoothly.
"""

class FiddlerModel:

    def __init__(self):
        """
        Here we can load in the model and any other necessary
            serialized objects from the PACKAGE_PATH.
        """

    def transform_input(self, input_df):
        """
        The transform_input() function let's us apply any necessary
            preprocessing to our input before feeding it into our model.
        It should return the transformed version of input_df.
        """

    def predict(self, input_df):
        """
        The predict() function should return a DataFrame of predictions
            whose columns correspond to the outputs of your model.

        For regression models, this DataFrame typically has a single column
            that stores the continuous output of your model.
        For binary classification models, this DataFrame typically has a
            single column that stores the probability prediction for the
            positive class.
        For multiclass classification models, this DataFrame typically has
            the same number of columns as it does classes (one for each
            class probability prediction).
        """

def get_model():
    return FiddlerModel()

Some notes about package.py:

  • Every package.py should have a predict() function and a transform_input() function. These are the only functions that will be invoked by Fiddler directly.

  • You can write your own functions and add them to package.py. The complexity of package.py will depend on the framework you are using and the task you are performing.

  • You can incorporate other .py scripts into your package.py with relative imports. Just add them to the model directory along with package.py.

A note about transform_input():

  • Generally, you should call transform_input() from within the predict() function before making predictions.

  • If you don't need to transform your input in any way, you can just return the original input from the transform_input() function as shown below and leave it out of the predict() function.

def transform_input(self, input_df):
    return input_df

model.yaml

Fiddler also requires that you save some model metadata in YAML form and include it in the model package.

You can harvest this metadata by creating a Fiddler ModelInfo object. This can be generated from an existing Fiddler dataset.

import fiddler as fdl

model_info = fdl.ModelInfo.from_dataset_info(
    dataset_id=# your Fiddler dataset ID,
    dataset_info=# your Fiddler DatasetInfo object,
    target=# the name of your target column,
    outputs=# the names of your model output columns
)

Once you have your ModelInfo object, you can call its to_dict() function to store it as a YAML file as shown below.

import yaml

with open('model.yaml', 'w') as yaml_file:
    yaml.dump({'model': model_info.to_dict()}, yaml_file)

Model Assets

The final step in creating a model package is to include any assets you need in order to make predictions.

This includes your model artifact and any serialized preprocessing objects that you may need.

These serialized objects should be loaded by the __init__() function you wrote inside package.py.

The Complete Model Package

At this point, your model package (directory) should contain * package.py * model.yaml * Any model assets

package.py examples

simple Sklearn model package.py
from pathlib import Path
# with the `sklearn_wrapper.py` file dropped into the top-level org directory
from ...sklearn_wrapper import SimpleSklearnModel

PACKAGE_PATH = Path(__file__).parent
MODEL_FILE_NAME = 'model.pkl'
PRED_COLUMN_NAMES = ['setosa', 'versicolor', 'virginica']

def get_model():
    return SimpleSklearnModel(PACKAGE_PATH / MODEL_FILE_NAME,
                              PRED_COLUMN_NAMES, is_classifier=True,
                              is_multiclass=True)
custom package.py for a serialized preprocessor pipeline

Once you have your model package completed, you can further customize package.py to meet your needs.

Consider a case where your model assets include your model artifact and a serialized preprocessor pipeline. Below is an example of how you might incorporate these model assets into package.py.

%%writefile package.py

import pickle
from pathlib import Path
import pandas as pd

PACKAGE_PATH = Path(__file__).parent


class FiddlerModel:

    def __init__(self):

        # Load in our serialized model
        with open(PACKAGE_PATH / 'model.pkl', 'rb') as pkl_file:
            self.model = pickle.load(pkl_file)

        # Load in our serialized preprocessor
        with open(PACKAGE_PATH / 'preprocessor.pkl', 'rb') as pkl_file:
            self.preprocessor = pickle.load(pkl_file)

        # Name our output columns to match the outputs specified in ModelInfo
        self.outputs = ['predicted_value']


    def transform_input(self, input_df):

        # Transform the input using the preprocessor we loaded in
        transformed_df = self.preprocessor.transform(input_df)

        return transformed_df


    def predict(self, input_df):

        # Transform model input prior to prediction with transform_input()
        transformed_df = self.transform_input(input_df)

        # Make predictions on the transformed input
        predictions = self.model.predict(transformed_df)

        # Store the predictions in the output column(s)
        prediction_df = pd.DataFrame(predictions, columns=self.outputs)

        return prediction_df

def get_model():
    return FiddlerModel()
custom package.py for Tensorflow text classifer
import pathlib
import pickle as pk
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.sequence import pad_sequences

MODEL_DIR = pathlib.Path(__file__).parent

SAVED_MODEL_PATH = MODEL_DIR / 'spam_keras.h5'
TOKENIZER_PATH = MODEL_DIR / 'tokenizer.pkl'

TEXT_FIELD = 'text'
OUTPUT_COLUMNS = ['spam_probability']

MAX_SEQUENCE_LENGTH = 50

class FiddlerModel:
  def __init__(self):
      # load the tokenizer from a pickle file
      with open(TOKENIZER_PATH, 'rb') as handle:
       self.tokenizer = pk.load(handle)

      # create persistent TensorFlow session
      self.sess = tf.Session()

      # load model into that session
      with self.sess.as_default():
          self.model = load_model(SAVED_MODEL_PATH)

  def transform_input(self, input_df):
      # tokenize the raw string
      input_tokens = [self.tokenizer.texts_to_sequences([x])[0]
        for x in input_df[TEXT_FIELD].values]

      # pad the token list to fixed length
      input_tokens = pad_sequences(input_tokens, MAX_SEQUENCE_LENGTH)

      return pd.DataFrame(input_tokens.tolist())

  def predict(self, input_df):
      # transform the raw input
      transformed_input_df = self.transform_input(input_df)
      # apply model to transformed input
      with self.sess.as_default():
          predictions = self.model.predict(transformed_input_df)

      return pd.DataFrame(data=predictions, columns=OUTPUT_COLUMNS)

def get_model():
  return FiddlerModel()
custom package.py for Pytorch model
import pathlib
import pickle
import fastai.text
import pandas as pd
import torch
import torch.nn.functional as F
PACKAGE_PATH = pathlib.Path(__file__).parent
PREPROCESSOR_PATH = PACKAGE_PATH / 'preprocessor.pkl'
MODEL_WEIGHTS_PATH = PACKAGE_PATH / 'model_weights.pth'
OUTPUT_NAME = 'sentiment'
class FiddlerModel:
    def load_model(self):
        # set up preprocessing functions
        with PREPROCESSOR_PATH.open('rb') as f:
            self.preprocessor = pickle.load(f)

        # load the fast.ai pytorch model and set to eval mode
        model = fastai.text.learner.get_text_classifier(
            arch=fastai.text.models.awd_lstm.AWD_LSTM,
            vocab_sz=,
            n_class=2)
        loaded_state_dict = torch.load(MODEL_WEIGHTS_PATH, map_location='cpu')
        state_dict = dict(zip(model.state_dict().keys(),
                              loaded_state_dict['model'].values()))
        model.load_state_dict(state_dict)
        model.eval()
        self.model = model
    def transform_input(self, input_df):
        return self.preprocessor(input_df)
    def predict(self, input_tensor):
        with torch.no_grad():
            pre_softmax_pred, _, _ = self.model.forward(input_tensor)
            pred = F.softmax(pre_softmax_pred, dim=1)[:, 1].numpy()
        return pd.DataFrame({OUTPUT_NAME: pred})
def get_model():
    return FiddlerModel()
Back to top