This tool gives you basic insights into the operational health of your ML service in production.
What is being tracked?¶
- Traffic — The volume of traffic received by the model over time.
- Latency — The average latency of the model, i.e. the time it takes to respond to prediction requests (in milliseconds).
- Errors — The number of errors the model has made in its predictions.
Why is it being tracked?¶
- These are basic high-level metrics that inform us of the overall system health.
What steps should I take when I see an outlier?¶
- A dip or spike in traffic needs to be investigated. For example, a dip could be due to a production model server going down; a spike could be an adversarial attack.
- An increase in model latency also needs to be investigated. It could be an indicator of requests building up due to high QPS.
- An increase in error counts could, for example, point to data pipeline issues.
- See our article on The Rise of MLOps Monitoring