This tool gives you basic insights into the operational health of your ML service in production.
- Traffic — The volume of traffic received by the model over time.
- Latency — The average latency of the model, i.e. the time it takes to respond to prediction requests (in milliseconds).
- Errors — The number of errors the model has made in its predictions.
- These are basic high-level metrics that inform us of the overall system health.
- A dip or spike in traffic needs to be investigated. For example, a dip could be due to a production model server going down; a spike could be an adversarial attack.
- An increase in model latency also needs to be investigated. It could be an indicator of requests building up due to high QPS.
- An increase in error counts could, for example, point to data pipeline issues.
- See our article on The Rise of MLOps Monitoring
[^1]: Join our community Slack to ask any questions
Updated 5 months ago