Calculates the mutual information (MI) between variables over a specified dataset.

Input Parameter

Type

Default

Description

project_id

str

None

The unique identifier for the project.

dataset_id

str

None

The unique identifier for the dataset.

features

list

None

A list of features for which to compute mutual information.

normalized

Optional [bool]

False

If True, will compute normalized mutual information (NMI) instead.

slice_query

Optional [str]

None

A SQL query. If specified, mutual information will only be calculated over the dataset slice specified by the query.

sample_size

Optional [int]

None

If specified, only sample_size samples will be used in the mutual information calculation.

seed

Optional [float]

0.25

The random seed used to sample when sample_size is specified.

PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'

mutual_information_features = [
    'feature_1',
    'feature_2',
    'feature_3'
]

mutual_information = client.get_mutual_information(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    features=mutual_information_features
)
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'

mutual_information_features = [
    'feature_1',
    'feature_2',
    'feature_3'
]

slice_query = f""" SELECT * FROM "{DATASET_ID}.{MODEL_ID}" WHERE feature_1 < 20.0 LIMIT 100 """

mutual_information = client.get_mutual_information(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    features=mutual_information_features,
    slice_query=slice_query
)

Return Type

Description

dict

A dictionary containing mutual information results.