Flexible Model Deployment

Fiddler supports explainability for models with varying dependencies. This is achieved by running each model in its own dedicated container to provide the resources and dependencies that are unique to that model. For example, if your team has two models developed with the same libraries but using different versions you can run both those models by specifying the exact version they were built with.

📘 Note
For models that require monitoring features only, there is no need to upload your model artifact or create a surrogate model as these are only used to support explainability features.

When adding a model artifact to your Fiddler model (see add_artifact), you specify the deployment configuration needed to run it using the DeploymentParams argument. Fiddler has a set of starter images from which to select the configuration most appropriate for running your model. These images vary by included libraries and Python versions. Note you can also customize an image by including your own requirements.txt file along with the model artifact package.

DeploymentParams Arguments

image_uri: This is the Docker image used to create a new runtime to serve the model. You can choose a base image from the following list, with the matching requirements for your model:
Image URI
Dependencies
md-base/python/python-39:2.0.3
fiddler-client==3.0.3 flask==2.2.5 gevent==23.9.1 gunicorn==23.0.0 prometheus-flask-exporter==0.21.0 pyarrow==14.0.1 pydantic==1.10.13
md-base/python/python-310:1.0.1
fiddler-client==3.0.3 flask==2.2.5 gevent==23.9.1 gunicorn==23.0.0 prometheus-flask-exporter==0.21.0 pyarrow==14.0.1 pydantic==1.10.13
md-base/python/python-311:1.0.1
fiddler-client==3.0.3 flask==2.2.5 gevent==23.9.1 gunicorn==23.0.0 prometheus-flask-exporter==0.21.0 pyarrow==14.0.1 pydantic==1.10.13
md-base/python/python-312:1.0.1
fiddler-client==3.0.3 flask==2.2.5 gevent==23.9.1 gunicorn==23.0.0 prometheus-flask-exporter==0.21.0 pyarrow==14.0.1 pydantic==1.10.13

📘 Image upgrades
These Docker images are upgraded routinely to resolve security vulnerabilities and the image tag is updated accordingly. Unsupported Python versions are not provided.

🚧 Be aware
Model version features are supported with the image versions listed above. Images below 2.x for python-39 will continue to work for existing models using a single version. From 24.5 onwards, model version first class support is added and these require the new model deployment base image tag versions.

Each base image comes with a few pre-installed libraries and these can be overridden and added to by specifying a requirements.txt file inside your model artifact directory where package.py is defined.

🚧 Be aware
Installing new dependencies at runtime will take time and is prone to network errors.

* `replicas`: The number of Docker image replicas running the model.
* `memory`: The amount of memory (mebibytes) reserved per replica. NLP models might need more memory, so ensure to allocate the required amount of resources.

🚧 Be aware
Your model might require more memory than the default setting. Please ensure you set a sufficient amount of resources. If you see a ModelServeError error when adding a model, it means the current settings were not enough to run your model.

cpu: The amount of CPU (milli cpus) reserved per replica. Both number of features and model complexity can require more CPU allocation.

Both add_artifact and update_artifact methods support passing deployment_params. For example:

# Specify deployment parameters
deployment_params = fdl.DeploymentParams(
        image_uri="md-base/python/python-312:1.0.1",
        cpu=250,
        memory=512,
          replicas=1)

# Add model artifact
job = model.add_artifact(
  model_dir =  str, #path to your model dirctory with model artifacts and package.py
  deployment_param = DeploymentParams | None,
) -> AsyncJob
job.wait()

Once the model is added in Fiddler, you can fine-tune the model deployment based on the scaling requirements, using update_model_deployment. This function allows you to:

Horizontal scaling: horizontal scaling via replicas parameter. This will create multiple Kubernetes pods internally to handle concurrent requests.
Vertical scaling: Model deployments support vertical scaling via cpu and memory parameters. Some models might need more memory to load the artifacts into memory or process the requests.
Scale down: You may want to scale down the model deployments to avoid allocating the resources when the model is not in use. Use active parameter set to False to scale down the deployment.
Scale up: To scale model deployments back up, set active parameter to True.

❓ Questions? Talk to a product expert or request a demo.

💡 Need help? Contact us at [email protected].

PreviousPoint Explainability NextOn Prem Manual Flexible Model Deployment XAI

Last updated 1 month ago

Was this helpful?