GPU Support
Using GPUs for your deployments is currently only available for Enterprise subscriptions.
Deeploy enables using GPUs for your deployments. Make sure to have set up available GPU nodes during installation. At the advanced configuration options select a node that has a GPU available to automatically claim a GPU for that component.
Model, Explainer and transformer frameworks
When you select a GPU node and use one of the frameworks that support GPUs by default for your model, explainer or transformer, it will be served with the GPU version of the serving framework. Note that this does require your tensors to be loaded to the GPU device. See this Pytorch handler.py as example. If you use a custom docker and want to make use of the GPU, you need to install CUDA and cuDNN in your docker image. Make sure they are compatible with each other and the selected GPU. Check out this compatibility matrix for an overview.
Sharing GPUs for your deployments
Sharing GPUs is possible in situations where your cluster is configured for GPU sharing with for example Multi-Insctance GPU (MIG) as explained in this blog by NVIDIA. In this case deployments in Deeploy can be configured to run on a MIG device. By default MIG mode is not enabled in Deeploy.
Enabling Single MIG mode
- Make sure the prerequisites as mentioned here are configured.
- When creating or updating a deployment customize the model resources to use a
modelInstanceType
that matches thenvidia.com/gpu.product
. Example you have a NVIDIA A100-SXM4-40GB is configured with 2x 3g.20GB MIG devices (4x5GB memory, 3×14 SMs), amodelInstanceType
can be configured to matchnvidia.com/gpu.product: A100-SXM4-40GB MIG 3g.20gb
. Configuring this option is optional if you only have 1 MIG device available in your cluster.
Enabling Mixed MIG mode
This feature is expected to be available from Deeploy v1.43.0
- Make sure the prerequisites as mentioned here are configured.
- Selecting Mixed MIG mode requires changing the
nvidia.com/gpu: 1
resource limit to a node label prefixed withnvidia.com/mig-<sm_slice_count>g-<memory_size>gb
- When creating or updating a deployment customize the model resources to use a
modelInstanceType
that matches thenvidia.com/gpu.product
. Example you have a NVIDIA A100-SXM4-40GB is configured with 2x 3g.20GB MIG devices (4x5GB memory, 3×14 SMs), amodelInstanceType
can be configured to matchnvidia.com/gpu.product: A100-SXM4-40GB MIG 3g.20gb
. Configuring this option is optional if you only have 1 MIG device available in your cluster.