Version: Cloud

Advanced deployment options

Configure advanced options for models, explainers, and transformers. Enable or disable these options in the Model, Explainer, and Transformer steps when you create a Deployment, or on the Details page of an existing Deployment. Depending on your selected framework, not all options are available.

Minimum and maximum replicas

Set minimum and maximum replicas for models, explainers, and transformers to control scaling behavior. If you set minimum replicas to at least 1, your Deployment maintains at least 1 active replica, which works well for real-time applications. These Deployments are always active and consistently use resources, so they can respond to incoming requests without delay.

When you set minimum replicas to 0, your Deployment operates in serverless mode by not continuously consuming resources. It activates when the first request arrives and shuts down completely after a period of inactivity. This approach is cost-effective when your Deployment isn't actively fetching predictions. However, be aware of a potential "cold-start" delay—the first request takes longer to serve because the application needs to boot up.

Maximum replicas enable automatic scaling to handle incoming traffic. As usage of your model increases, additional instances deploy up to the maximum replicas to accommodate growing traffic. When more instances exist than required to serve incoming traffic, your Deployment scales down to the minimum replicas.

Autoscaling metric and target

Info

Configure the autoscaling metric for models, explainers, and transformers in the model step, it also applies to the explainer or transformer if included.

Set the autoscaling target individually for models, explainers, and transformers. Autoscaling is enabled when you set the maximum number of replicas higher than the minimum number of replicas. The autoscaling metric has two options: Concurrency and CPU.

Concurrency ensures that your model, explainer, and transformer automatically scale when the number of concurrent requests exceeds the configured autoscaling target. With CPU, the containers autoscale when the current CPU load exceeds the configured autoscaling target (maximum of 100) of your requested CPU amount.

Customized model resources

CPU request and limit, and memory request and limit are configurable on all plans. On Private Cloud plans, you can also select a node type. Deeploy implements a buffer of 25% for all resources. For example, for a node type that has 4 CPU cores, 1 core is reserved for the Deeploy stack itself. When you don't customize model resources, Deeploy uses a default setting.

Info

On Private Cloud plans, make sure the requested CPU and memory are available in the cluster where Deeploy runs.

Private registry

Select a Docker credential so Deeploy can access the private registry. Note that you need to add docker credentials before you create the Deployment, or use temporary credentials.

Private object storage

Select a blob credential so Deeploy can access the private object storage. Note that you need to add blob credentials before you create the Deployment, or use temporary credentials.

Environment variables

Select the environment variables to insert in your container. Different environment variables can be selected for your model, explainer, or transformer. Note that you need to create an environment variable before you create the Deployment.

Additional config

Private cloud only

This option is only available on Private Cloud installations.

Add additional configuration to have more control over how your model, explainer, and transformer are deployed. Additional configuration accepts a JSON value with annotations and/or affinity keys present. Define both annotations and affinity according to their respective Kubernetes API spec.

The following example shows how to use additional configuration:

{
  "annotations": {
    "my-first-key": "first-value",
    "my-second-key": "second-value"
  },
  "affinity": {
    "nodeAffinity": {
      "requiredDuringSchedulingIgnoredDuringExecution": {
        "nodeSelectorTerms": [
          {
            "matchExpressions": [
              {
                "key": "kubernetes.io/os",
                "operator": "In",
                "values": ["linux"]
              }
            ]
          }
        ]
      }
    }
  }
}

Minimum and maximum replicas​

Autoscaling metric and target​

Customized model resources​

Private registry​

Private object storage​

Environment variables​

Additional config​