Bigquery Integration
Last updated
Last updated
© 2024 Fiddler Labs, Inc.
In this article, we will be looking at loading data from BigQuery tables and using the data for the following tasks-
Uploading baseline data to Fiddler
Onboarding a model to Fiddler and creating a surrogate
Publishing production data to Fiddler
Before looking at how to import data from BigQuery to Fiddler, we will first see how to enable BigQuery API. This can be done as follows -
In the GCP platform, Go to the navigation menu -> click APIs & Services. Once you are there, click + Enable APIs and Services (Highlighted below). In the search bar, enter BigQuery API and click Enable.
In order to make a request to the API enabled in Step#1, you need to create a service account and get an authentication file for your Jupyter Notebook. To do so, navigate to the Credentials tab under APIs and Services console and click Create Credentials tab, and then Service account under dropdown.
Enter the Service account name and description. You can use the BigQuery Admin role under Grant this service account access to the project. Click Done. You can now see the new service account under the Credentials screen. Click the pencil icon beside the new service account you have created and click Add Key to add auth key. Please choose JSON and click CREATE. It will download the JSON file with auth key info. (Download path will be used to authenticate)
We will now use the generated key to connect to BigQuery tables from Jupyter Notebook.
Install the following libraries in the python environment and load them to jupyter-
Google-cloud
Google-cloud-bigquery[pandas]
Google-cloud-storage
Set the environment variable using the key that was generated in Step 1
Import Google cloud client and initiate BigQuery service
Specify the query which will be used to import the data from BigQuery
Read the data using the query and write the data to a pandas dataframe
Now that we have data imported from BigQuery to a dataframe, we can refer to the following pages to