Calculates the mutual information (MI) between variables over a specified dataset.
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
dataset_id | str | None | The unique identifier for the dataset. |
features | list | None | A list of features for which to compute mutual information. |
normalized | Optional [bool] | False | If True, will compute normalized mutual information (NMI) instead. |
slice_query | Optional [str] | None | A SQL query. If specified, mutual information will only be calculated over the dataset slice specified by the query. |
sample_size | Optional [int] | None | If specified, only sample_size samples will be used in the mutual information calculation. |
seed | Optional [float] | 0.25 | The random seed used to sample when sample_size is specified. |
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
mutual_information_features = [
'feature_1',
'feature_2',
'feature_3'
]
mutual_information = client.get_mutual_information(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
features=mutual_information_features
)
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
mutual_information_features = [
'feature_1',
'feature_2',
'feature_3'
]
slice_query = f""" SELECT * FROM "{DATASET_ID}.{MODEL_ID}" WHERE feature_1 < 20.0 LIMIT 100 """
mutual_information = client.get_mutual_information(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
features=mutual_information_features,
slice_query=slice_query
)
Return Type | Description |
---|---|
dict | A dictionary containing mutual information results. |