Version: 1.50

GPU Support

Enterprise only

Using GPUs for your deployments is currently only available for Enterprise subscriptions.

Deeploy enables using GPUs for your deployments. Make sure to have set up available GPU nodes during installation. At the advanced configuration options select a node that has a GPU available to automatically claim a GPU for that component.

Model, Explainer and transformer frameworks

When you select a GPU node and use one of the frameworks that support GPUs by default for your model, explainer or transformer, it will be served with the GPU version of the serving framework. Note that this does require your tensors to be loaded to the GPU device. See this Pytorch handler.py as example. If you use a custom docker and want to make use of the GPU, you need to install CUDA and cuDNN in your docker image. Make sure they are compatible with each other and the selected GPU. Check out this compatibility matrix for an overview.

Sharing GPUs is possible in situations where your cluster is configured for GPU sharing with for example Multi-Insctance GPU (MIG) as explained in this blog by NVIDIA. In this case deployments in Deeploy can be configured to run on a MIG device. By default MIG mode is not enabled in Deeploy.

Enabling Single MIG mode

Make sure the prerequisites as mentioned here are configured.
When creating or updating a deployment customize the model resources to use a modelInstanceType that matches the nvidia.com/gpu.product. Example you have a NVIDIA A100-SXM4-40GB is configured with 2x 3g.20GB MIG devices (4x5GB memory, 3×14 SMs), a modelInstanceType can be configured to match nvidia.com/gpu.product: A100-SXM4-40GB MIG 3g.20gb. Configuring this option is optional if you only have 1 MIG device available in your cluster.

Enabling Mixed MIG mode

This feature is expected to be available for future Deeploy versions

Make sure the prerequisites as mentioned here are configured.
Selecting Mixed MIG mode requires changing the nvidia.com/gpu: 1 resource limit to a node label prefixed with nvidia.com/mig-<sm_slice_count>g-<memory_size>gb
When creating or updating a deployment customize the model resources to use a modelInstanceType that matches the nvidia.com/gpu.product. Example you have a NVIDIA A100-SXM4-40GB is configured with 2x 3g.20GB MIG devices (4x5GB memory, 3×14 SMs), a modelInstanceType can be configured to match nvidia.com/gpu.product: A100-SXM4-40GB MIG 3g.20gb. Configuring this option is optional if you only have 1 MIG device available in your cluster.

Model, Explainer and transformer frameworks​

Sharing GPUs for your deployments​

Enabling Single MIG mode​

Enabling Mixed MIG mode​

Model, Explainer and transformer frameworks

Sharing GPUs for your deployments

Enabling Single MIG mode

Enabling Mixed MIG mode