Skip to main content
Version: 1.37

Advanced deployment options

Advanced configuration options are available in for models, explainers and transformers. The options can be enabled and disabled in the Model, Explainer and Transformer steps when creating a Deployment, or on the Details tab of a Deployment. Depending on the selected framework, not all options might be available.

Serverless

Models, explainers, and transformers are deployed non-serverless by default. Non-serverless deployment is particularly well-suited for real-time applications. Since they are always active and consistently utilize resources, these deployments can readily respond to incoming requests without delay.

Serverless models, explainers, or transformers operate differently by not continuously consuming resources. Instead, they activate when the first request is received and shut down completely if there is no further activity for a period of time. This approach is cost-effective when your deployment is not actively used for fetching predictions. However, it's important to be aware of a potential drawback known as a "cold-start." This refers to the delay in serving the first request as the application needs to boot up.

Serverless vs. non-serverless?

Both non-serverless and serverless Deployment methods offer automatic scaling to handle incoming traffic. This means that as the usage of your model increases, additional instances of your model will be deployed to accommodate the growing traffic. Conversely, if there are more instances than required to serve the incoming traffic, your deployment will scale down accordingly.

If your applications is a real-time application, choose the non-serverless deployment. If your application is not real-time (e.g. runs once a day), go with the serverless option.If, for whatever reason you're not sure, we suggest the non-serverless option.

Customized model resources

Teams on all plans can configure CPU request and limit, and memory request and limit. Teams on Private Cloud plans can also select a node type. Deeploy implements a buffer of 25% for all resources. For example, for a node type that has 4 CPU cores, 1 core will be reserved for the Deeploy stack itself.

Private object storage

Select a Docker or Blob credential so the private object storage can be accessed. Please note that credentials need to be added before creating the Deployment.

Integrated explainer

This option is only available for models. Select this option when using a custom Docker image that contains an explainer (e.g. Captum for Pytorch).

Using the Captum text explainer dashboard

If you want to use the Captum text explainer dashboard, make sure to use the following format in your handler.py file.

{
"explanations": [
{
"raw_input_ids":[[ids]],
"word_attributions":[[word_attributions]],
"pred_class":[class_string],
"attr_score":[score_numeric],
"attr_class":[class_string]
}
]
}