Skip to main content
Version: 1.44

Creating an external deployment with IBM Watsonx

An IBM Watsonx deployment can be created using the External Deployment type. In this article, we will highlight how to integrate IBM Watsonx Granite 13 Billion chat V2 in Deeploy.

Prerequisites

Create a deployment using watsonx

We will use the granite-13b-chat-v2 foundation model that is developed by IBM Research and only available on Watsonx. In order to create an external deployment in Deeploy we need the following:

  • The model endpoint
  • An access token to communicate with the model

The foundation model inference endpoint

In IBM watsonx it is possible to use 1 endpoint to inference multiple different foundation models, so the endpoint is not model specific:

  • https://{cloud_url}/ml/v1/text/generation?version=2023-05-29 for example, https://eu-de.ml.cloud.ibm.com/ml/v1/text/generation?version=2023-05-29. For more information see here

Retrieving an access token

IBM watsonx has the following very secure approach to retrieve tokens:

  1. An API Key can be generated in the IBM watsonx web client > Profile and settings > API key
  2. A token to inference the token can be generated as following (example in Python):
token_url = "https://iam.cloud.ibm.com/identity/token"
token_body = f"grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey={apikey}"
token_headers = {
"Content-Type": "application/x-www-form-urlencoded",
}

token_response = requests.post(
token_url,
headers=token_headers,
data=token_body
)

if token_response.status_code != 200:
raise Exception("Non-200 response: " + str(token_response.text))

bearer_token = token_response.json()['access_token']
Note

Be aware that the tokens expire in 1 hour

Connecting watsonx in Deeploy

Connecting watsonx to Deeploy can be done both in the UI as locally with the python client

Complete the following steps in the Deeploy UI

  1. Login to the Deeploy UI
  2. Navigate to the workspace where you want to add watsonx as an external deployment
  3. Click Create > External
  4. Follow the steps, in step 3 you are expected to add:
    • The retrieved inference endpoint (e.g., https://eu-de.ml.cloud.ibm.com/ml/v1/text/generation?version=2023-05-29)
    • The access token as a Bearer token
  5. Once the connection check is successful, the deployment can be created.