Skip to main content
Version: Cloud

Monitor a Deployment

Metrics

Metrics can be monitored on the Monitoring page. You can either choose a custom start and end date and time, or select a predefined period. For some metrics, only certain predefined timeframes can be selected in the metrics' section itself. Within the Monitoring page, there are four sub-tabs.

Traffic

There are three traffic metrics:

  • Activity: number of predictions
  • Errors: failed requests
  • Response time: average response time per request

Performance

Performance metrics are only available if you have supplied a sufficiently large amount of actuals. For classification models, the following metrics are available:

  • Accuracy
  • Precision
  • Recall
  • F1

For regression models, the following metrics are available:

  • Root mean squared error (RMSE)
  • Mean absolute error (MAE)
  • Mean absolute percentage error (MAPE)

For text generation models, the following metrics are available:

  • BLEU Score (computed with equal weightage to 1-,2-,3-,4-grams)
tip

If you want to make sure you can view the correct metrics, define a problem type in your Deployments' metadata.

Following problem types can be specified in metadata.json:

  1. classification
  2. regression
  3. classificationWithProbabilities
  4. textGeneration

Additional information for textGeneration models

info

BLEU Score is available only for a single prediction and can be found in the prediction log of a specific prediction.

In order to compute BLEU score, ensure that metadata is defined with problem type 'textGeneration'.

Additionally the prediction and actuals should follow following specification:

Prediction (can use for v1 format - predictions or for v2 format - outputså)

{ 
"predictions": [
"This is a test sentence for BLEU"
]
}

An actual (used for v1 format or v2 format, consistent with predictions) can have a single reference or array of references.

Single reference

{ 
"predictions": [
"This is a test sentence for BLEU with single reference"
]
}

Multiple references

{ 
"predictions": [
["This is a test sentence for BLEU with multiple reference - ref1"],
["This is a test sentence for BLEU with multiple reference - ref2"]
]
}
danger

The BLEU score is available when both the prediction and at least one reference have 4 or more tokens. Any shorter reference is ignored.

Evaluation

Evaluation metrics are only available if you have supplied a sufficiently large amount of evaluations. Since these metrics are specific for Deeploy, they are explained in detail.

info

A disagreement is a situation in which the outcome of a prediction does not match the desired response indicated by an evaluator.

There are two evaluation metrics:

  • Disagreement ratio: the ratio of 'Disagree' evaluations to the total number of evaluations
  • Disagreement per class: how disagreements are distributed across outcomes
    • For classification models: each of the classes in the graph represents an outcome class of the model, with a limit of 5 classes. If the number of classes is higher, the 4 classes with the most disagreements are shown, with the other outcome classes grouped into one.
    • For regression models: outcomes are grouped into 5 classes, with the number of outcomes split evenly among the classes.
tip

To enhance the disagreement per class graph, define features in your metadata.json

Drift

Drift metrics are calculated on input data and are only available if you have supplied a sufficiently large amount of predictions.

  • Input validation: The range of input data is presented in a boxplot, for a single specified feature.
tip

To enhance the experience of your input validation boxplot, define features in your metadata.json

Drift monitoring:

  • Feature drift: A statistical measure (JSD) of difference of the input data distribution with respect to a baseline distribution, for a single specified feature.

    • One metric for drift monitoring is supported, the Jensen-Shannon divergence (JSD). This is a metric which will be near 0 for a minimal difference between data distributions, and near 1 for maximal difference between the data distributions.
  • Feature distribution: A histogram which shows the input data and/or the baseline distribution for the selected timeframe. If a baseline distribution is available, histogram binning is performed automatically to correspond with the binning in the baseline distribution. If no baseline is supplied, shows a histogram of production data based on automatic binning.

tip

In order to be able to use stastistical drift monitoring, define the data distribution of the baseline in your metadata.json. An example can be found here: example-sklearn-census.

Alerts

Deployment owners can create and manage alerts for metrics on the Alerts page of the Deployment. Alerts are useful to receive notifications when a metric has reached a defined threshold (e.g. when the response time of your model takes longer than 100ms).

As a Deployment owner, you receive emails and notifications for all alerts by default. To change this, update your email and notification preferences on the Account Preferences page.