Databricks Integration

Databricks is a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks for data engineering and machine learning.

This guide will walk you through the process of getting data from Databricks tables into a Pandas dataframe. Once you have a dataframe ready you can easily upload that data into Fiddler.

Create the Dataframe in your Notebook

Start by creating a Notebook in your Databricks workspace. Databricks has a lot of pre-installed libraries like Spark and Pandas. Using the Spark library you can interact with all your delta lake assets.

To get your data into a Pandas dataframe use the following code snippet. Just replace table_name with your desired table in your Databricks environment.

spark_dataframe = spark.read.table(table_name)
baseline_df = spark_dataframe.toPandas()

Upload the Dataframe to Fiddler

Now that we have a dataframe, you are ready to upload it to Fiddler. You will need to do the following:

  1. Authorize the Fiddler client
  2. Create a Project
  3. Upload the Baseline Dataset

The following code snippet combines all the steps mentioned above

!pip install -q fiddler-client
import fiddler as fdl
import pandas as pd

#set up fiddler client
URL = '' # Make sure to include the full URL (including https://). For example, https://abc.xyz.ai
ORG_ID = '' # Found in General section under the settings tab 
AUTH_TOKEN ='' # Found in the Credentials section under the settings tab 

# Initiate Fiddler client
client = fdl.FiddlerApi(
    url=URL,
    org_id=ORG_ID,
    auth_token=AUTH_TOKEN
)

#create a project
PROJECT_ID = 'project_name'

client.create_project(PROJECT_ID)

#let Fiddler understand your data
dataset_info = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=100)
dataset_info

#Upload baseline dataset
DATASET_ID = 'dataset_name'

client.upload_dataset(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    dataset={
        'baseline': baseline_df
    },
    info=dataset_info
)

Your dataset should be available in Fiddler UI listed under the project you just created. Now you can onboard a model for this dataset.


What’s Next