Service Metrics

This tool gives you basic insights into the operational health of your ML service in production.

What is being tracked?

  • Traffic — The volume of traffic received by the model over time.
  • Latency — The average latency of the model, i.e. the time it takes to respond to prediction requests (in milliseconds).
  • Errors — The number of errors the model has made in its predictions.

Why is it being tracked?

  • These are basic high-level metrics that inform us of the overall system health.

What steps should I take when I see an outlier?

  • A dip or spike in traffic needs to be investigated. For example, a dip could be due to a production model server going down; a spike could be an adversarial attack.
  • An increase in model latency also needs to be investigated. It could be an indicator of requests building up due to high QPS.
  • An increase in error counts could, for example, point to data pipeline issues.


[^1]: Join our community Slack to ask any questions