Airflow Integration
Apache Airflow is an open source platform ETL platform to manage companyās complex workflows. Companies are increasingly integrating their ML models pipeline into Airflow DAGs to manage and monitor all the components of their ML model system.
By integrating Fiddler into an existing Airflow DAG, you will be able to train, manage, and onboard your models while actively monitoring performance, data quality, and troubleshooting degradations across your models.
Fiddler can be easily integrated into your existing airflow DAG for ML model pipeline. A notebook which is used for publishing events can be orchestrated to run as a part of your airflow DAG using a āPapermill Operatorā.
Steps for the walkthrough
Setup airflow on your local or docker, these steps can be followed. Link
Add your jupyter notebook containing the code for publishing to your airflow home directory. In this example we will use the 2 different notebooks -
Add an orchestration code to your airflow directory, airflow will pick up the orchestration code and construct a DAG as defined. The orchestration code contains the āpapermill operatorā to orchestrate the jupyter notebooks which will be used to onboard models and publish events to Fiddler. Please refer to our orchestration code.
The run interval can be set up in orchestration code as āschedule_intervalā in the DAG class. This interval can be based on the frequency of training and inference of your ML model.
Once the DAGs are set up it can be monitored on the UI. Below we can see dummy DAGs have been set up with placeholder nodes for ādata preparation ETLā and āmodel training/inferenceā. We have two DAGs -
a. To set up Fiddler model registration after preparing baseline data (training pipeline)
b. To publish events to Fiddler after data preparation and ML model inference (inference pipeline)
Label Update
An important business use case is integrating Fiddlerās āLabel Updateā as a part of your ML workflow using Airflow. Label update can be used to update the ground truth feature in your data. This can be done using the āāāpublish_eventā api, passing the event, event_id parameters, and making the update_event parameter as āTrueā. The code to update label can be found in the notebook This notebook can be integrated to run as a part of your airflow DAG using the sample code
Papermill Operator
Airflow DAG
Below is an example of Model Registration Airflow DAG run history
Model Registration Airflow DAG flow
Last updated