Creating a Baseline Dataset
To monitor drift or data integrity issues in production data, baseline data is needed for comparison. A baseline dataset is a representative sample of the data you expect to see in production. It represents the ideal data that your model works best on. For this reason, a baseline dataset should be sampled from your model’s training set.
A few things to keep in mind when designing a baseline dataset:
It’s important to include enough data to ensure you have a representative sample of the training set.
You may want to consider including extreme values (min/max) of each column in your training set so you can properly monitor range violations in production data. However, if you choose not to, you can manually specify these ranges before uploading, see customizing your dataset schema.
Baseline Type: Static Pre-production
Baseline Type: Static Production
Baseline Type: Rolling Production
List Baselines
Last updated