Databricks Integration

Databricks is a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks for data engineering and machine learning.

This guide will walk you through the process of getting data from Databricks tables into a Pandas dataframe. Once you have a dataframe ready you can easily upload that data into Fiddler.

Create the Dataframe in your Notebook

Start by creating a Notebook in your Databricks workspace. Databricks has a lot of pre-installed libraries like Spark and Pandas. Using the Spark library you can interact with all your delta lake assets.

To get your data into a Pandas dataframe use the following code snippet. Just replace table_name with your desired table in your Databricks environment.

spark_dataframe =
baseline_df = spark_dataframe.toPandas()

Upload the Dataframe to Fiddler

Now that we have a dataframe, you are ready to upload it to Fiddler. You will need to do the following:

  1. Authorize the Fiddler client
  2. Create a Project
  3. Upload the Baseline Dataset

The following code snippet combines all the steps mentioned above

!pip install -q fiddler-client
import fiddler as fdl
import pandas as pd

#set up fiddler client
URL = '' # Make sure to include the full URL (including https://). For example,
ORG_ID = '' # Found in General section under the settings tab 
AUTH_TOKEN ='' # Found in the Credentials section under the settings tab 

# Initiate Fiddler client
client = fdl.FiddlerApi(

#create a project
PROJECT_ID = 'project_name'


#let Fiddler understand your data
dataset_info = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=100)

#Upload baseline dataset
DATASET_ID = 'dataset_name'

        'baseline': baseline_df

Your dataset should be available in Fiddler UI listed under the project you just created. Now you can onboard a model for this dataset.

What’s Next