Version: Cloud

Inference

Use the Python client to inference your Deployment.

Predict

Make a call to the /predict endpoint of the Deployment.

request_body = {
    "instances": [
        [39, 7, 1, 1, 1, 1, 4, 1, 2174, 0, 40, 9]
    ]
}

prediction = client.predict(workspace_id, deployment_id, request_body)

Explain

Make a call to the /explain endpoint of the Deployment.

request_body = {
    "instances": [
        [39, 7, 1, 1, 1, 1, 4, 1, 2174, 0, 40, 9]
    ]
}

explanation = client.explain(workspace_id, deployment_id, request_body)

Completions

Make a call to the /completions endpoint of the Deployment. Only available for generative Hugging Face models

request_body = {
    "prompt": [
        "Tell me a joke",
        "Give a random fact"
    ],
    "logprobs": true,
    "max_tokens": 40
}

completions = client.completions(workspace_id, deployment_id, request_body) # without explain

# only available for generative Hugging Face Deployments with a standard explainer
completions = client.completions(workspace_id, deployment_id, request_body, true) # with explain

Chat completions

Make a call to the /chat/completions endpoint of the Deployment. Only available for generative Hugging Face models.

request_body = {
    "logprobs": true,
    "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant that gives specific answers."
        },
        {
          "role": "user",
          "content": "what is 1 + 1?"
        }
    ],
    "model": "model", # only when inferencing an external Deployment
    "max_tokens": 50
}

chat_completions = client.chat_completions(workspace_id, deployment_id, request_body)

Embeddings

Make a call to the /embeddings endpoint of the Deployment. Only available for embedded Hugging Face models.

request_body = {
    "input": [
        "Tell me a joke", 
        "Wonderful world"
    ],
    "logprobs": true,
    "model": "model", # only when inferencing an external Deployment
    "max_tokens": 40
}

embeddings = client.embeddings(workspace_id, deployment_id, request_body)

Predict​

Explain​

Completions​

Chat completions​

Embeddings​

Predict

Explain

Completions

Chat completions

Embeddings