Customizing Your Model Schema

It's common to want to modify your fdl.ModelSchema object in the case where something was inferred incorrectly by fdl.Model.from_data.

Let's walk through an example of how to do this.


Suppose you've loaded in a dataset as a pandas DataFrame.

import pandas as pd

df = pd.read_csv('example_dataset.csv')

Below is an example of what is displayed upon inspection.


Suppose you create a fdl.Model object by inferring the details from this DataFrame.

model = fdl.Model.from_data(
  name='my_model',
  project_id=PROJECT_ID,
  source=df
)

Below is an example of what is displayed upon inspection of model.schema.

But upon inspection, you notice a few things are wrong.

  1. The value range of output_column is set to [0.01, 0.99], when it should really be [0.0, 1.0].
  2. There are no possible values set for feature_3.
  3. The data type of feature_3 is set to fdl.DataType.STRING, when it should really be fdl.DataType.CATEGORY.

Let's see how we can address these issues.

Modifying a column’s value range

Let's say we want to modify the range of output_column in the above fdl.Model object to be [0.0, 1.0].

You can do this by setting the min and max of the output_column column.

model.schema['output_column'].min = 0.0
model.schema['output_column'].max = 1.0

Modifying a column’s possible values

Let's say we want to modify the possible values of feature_3 to be ['Yes', 'No'].

You can do this by setting the categories of the feature_3 column.

model.schema['feature_3'].categories = ['Yes', 'No']

Modifying a column’s data type

Let's say we want to modify the data type of feature_3 to be fdl.DataType.CATEGORY.

You can do this by setting the data_type of the feature_3 column.

model.schema['feature_3'].data_type = fdl.DataType.CATEGORY

🚧

Note when modifying a column's data type to Category

Note that it is also required when modifying a column's data type to Category to also set the column's possible_values to the list of unique values for that column.

model.schema['feature_3'].data_type = fdl.DataType.CATEGORY
model.schema['feature_3'].possible_values = ['Yes', 'No']