BigQuery Integration

Integrating Fiddler with BigQuery

Learn how to connect Fiddler's ML monitoring platform with your BigQuery data to enable comprehensive model tracking and analysis. This guide covers:

Using BigQuery data to onboard models in Fiddler
Loading baseline datasets from BigQuery tables
Monitoring production data by connecting BigQuery to Fiddler

Prerequisites:

A Google Cloud account with BigQuery access
Fiddler account credentials
Python client installed

Configure BigQuery Access

Before importing data, you'll need to set up BigQuery API access and authentication:

In the GCP platform, Go to the navigation menu -> click APIs & Services. Once you are there, click + Enable APIs and Services (Highlighted below). In the search bar, enter BigQuery API and click Enable.

In order to make a request to the API enabled in Step #1, you need to create a service account and get an authentication file for your Jupyter Notebook. To do so, navigate to the Credentials tab under APIs and Services console and click Create Credentials tab, and then Service account under dropdown.

Enter the Service account name and description and click Done. You can use the BigQuery Admin role under Grant this service account access to the project. You can now see the new service account under the Credentials screen. Click the pencil icon beside the new service account you have created and click Add Key to add auth key. Please choose JSON and click CREATE. It will download the JSON file with auth key info. (Download path will be used to authenticate)

Connect to BigQuery Data

Set up your Python environment and connect to BigQuery:

Install Required Packages

pip install google-cloud google-cloud-bigquery[pandas] google-cloud-storage

Configure Authentication

# Set environment variables for your notebook
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/your/auth-key.json'

Initialize BigQuery Client

# Imports google cloud client library and initiates BQ service
from google.cloud import bigquery
bigquery_client = bigquery.Client()

Query Your Data

# Example query to fetch baseline data
query = """
SELECT * FROM `fiddler-bq.fiddler_test.churn_prediction_baseline` 
"""

# Execute query and load into pandas DataFrame
baseline_df = bigquery_client.query(query).to_dataframe()

Next Steps

Now that you've connected BigQuery to Fiddler, you can:

Onboard your model using the baseline dataset for the model schema inference sample.
Upload a Baseline dataset, which is optional but recommended for monitoring comparisons.
Publish production events for continuous monitoring.

PreviousAirflow Integration NextIntegration With S3

Last updated 6 months ago

Was this helpful?