Flexible Model Deployment

This is a guide for customizing your model deployments for artifact or surrogate based model explainability.

Fiddler Platform supports explainability for models with varying dependencies. This is achieved by running each model in its own pod to provide the resources and dependencies that are unique to that model. For example, if your team has two models developed with the same libraries but using different versions you can run both those models by specifying the exact version they were built with.


📘

Note

Follow this page if you want to upload a model artifact or a surrogate model. For monitoring only models, without artifact uploaded, this is not required.

When you add a model artifact into Fiddler (see add_artifact), you can specify the deployment needed to run the model.

add_model_artifact now takes a deployment_params argument where you can specify the following information using DeploymentParams:

  • image_uri: This is the docker image used to create a new runtime to serve the model. You can choose a base image from the following list, with the matching requirements for your model:

    Image URIDependencies
    md-base/python/machine-learning:1.4.0catboost==1.2.1
    fiddler-client==2.4.0
    flask==2.2.5
    gevent==23.9.0
    gunicorn==20.1.0
    joblib==1.2.0
    lightgbm==3.3.0
    nltk==3.8.1
    numpy==1.23.4
    pandas==1.5.1
    prometheus-flask-exporter==0.21.0
    pyarrow==14.0.1
    pydantic==1.10.13
    scikit-learn==1.1.1
    shap==0.40.0
    xgboost==1.7.1
    md-base/python/deep-learning:1.6.0fiddler-client==2.4.0
    flask==2.2.5
    gevent==23.9.0
    gunicorn==20.1.0
    joblib==1.2.0
    nltk==3.8.1
    numpy==1.23.4
    pandas==1.5.1
    Pillow==10.3.0
    prometheus-flask-exporter==0.21.0
    pyarrow==14.0.1
    pydantic==1.10.13
    tensorflow==2.11.1
    torch==1.13.1
    torchvision==0.14.1
    transformers==4.36.0
    md-base/python/python-38:1.3.0fiddler-client==2.4.0
    flask==2.2.5
    gevent==23.9.0
    gunicorn==20.1.0
    prometheus-flask-exporter==0.21.0
    pyarrow==14.0.1
    pydantic==1.10.13
    md-base/python/python-39:1.3.0fiddler-client==2.4.0
    flask==2.2.5
    gevent==23.9.0
    gunicorn==20.1.0
    prometheus-flask-exporter==0.21.0
    pyarrow==14.0.1
    pydantic==1.10.13
    md-base/python/java:1.3.0fiddler-client==2.4.0
    flask==2.2.5
    gevent==23.9.0
    gunicorn==20.1.0
    h2o==3.44.0.2
    prometheus-flask-exporter==0.21.0
    pyarrow==14.0.1
    pydantic==1.10.13
    md-base/python/rpy2:1.3.0fiddler-client==2.4.0
    flask==2.2.5
    gevent==23.9.0
    gunicorn==20.1.0
    prometheus-flask-exporter==0.21.0
    pyarrow==14.0.1
    pydantic==1.10.13
    rpy2==3.5.1

📘

Images upgrade

Images are upgraded constantly in order to resolve packages vulnerabilities. The tag is updated accordingly.

Each base image comes with a few pre-installed libraries and these can be overridden by specifying requirements.txt file inside your model artifact directory where package.py is defined.

md-base/python/python-38 and md-base/python/python-39 are images with the least pre-installed dependencies, use this if none of the other images matches your requirement.

🚧

Be aware

Installing new dependencies at runtime will take time and is prone to network errors.

  • replicas: The number of replicas running the model.
  • memory: The amount of memory (mebibytes) reserved per replica. NLP models might need more memory, so ensure to allocate the required amount of resources.

🚧

Be aware

Your model might need more memory than the default setting. Please ensure you set appropriate amount of resources. If you get a ModelServeError error when adding a model, it means you didn't provide enough memory for your model.

  • cpu: The amount of CPU (milli cpus) reserved per replica.

Both add_artifact and update_artifact methods support passing deployment_params. For example:

# Specify deployment parameters
deployment_params = fdl.DeploymentParams(
        image_uri="md-base/python/machine-learning:1.4.0",
        cpu=250,
        memory=512,
  		  replicas=1)

# Add model artifact
job = model.add_artifact(
  model_dir =  str, #path to your model dirctory with model artifacts and package.py 
  deployment_param = DeploymentParams | None,
) -> AsyncJob
job.wait()

Once the model is added in Fiddler, you can fine-tune the model deployment based on the scaling requirements, using update_model_deployment. This function allows you to:

  • Horizontal scaling: horizontal scaling via replicas parameter. This will create multiple Kubernetes pods internally to handle requests.
  • Vertical scaling: Model deployments support vertical scaling via cpu and memory parameters. Some models might need more memory to load the artifacts into memory or process the requests.
  • Scale down: You may want to scale down the model deployments to avoid allocating the resources when the model is not in use. Use active parameters to scale down the deployment.
  • Scale up: This will again create the model deployment Kubernetes pods with the resource values available in the database.