This tool gives you basic insights into the operational health of your ML service in production.
What is being tracked?¶
- Traffic - the traffic received by the model over time
- Latency - the average latency of the model, i.e. the time it takes to respond to prediction requests (in milliseconds)
- Errors - the number of errors the model has made in its predictions
Why is it being tracked?¶
- These are the basic high-level metrics that inform us of the overall system health.
What steps should I take when I see an outlier?¶
- A dip or spike in traffic needs to be investigated. For example, a dip could be due to a production model server going down; a spike could be an adversarial attack.
- An increase in model latency also needs to be investigated. It could be an indicator of requests building up due to high QPS.
- An increase in error counts could, for example, point to data pipeline issues.
- See our article on The Rise of MLOps Monitoring