Skip to main content
Version: Cloud

Inference

Use the Python client to inference your Deployment.

Predict

Make a call to the /predict endpoint of the Deployment.

request_body = {
"instances": [
[39, 7, 1, 1, 1, 1, 4, 1, 2174, 0, 40, 9]
]
}

prediction = client.predict(workspace_id, deployment_id, request_body)

Explain

Make a call to the /explain endpoint of the Deployment.

request_body = {
"instances": [
[39, 7, 1, 1, 1, 1, 4, 1, 2174, 0, 40, 9]
]
}

explanation = client.explain(workspace_id, deployment_id, request_body)

Completions

Make a call to the /completions endpoint of the Deployment. Only available for generative Hugging Face models

request_body = {
"prompt": [
"Tell me a joke",
"Give a random fact"
],
"logprobs": true,
"max_tokens": 40
}

completions = client.completions(workspace_id, deployment_id, request_body) # without explain

# only available for generative Hugging Face Deployments with a standard explainer
completions = client.completions(workspace_id, deployment_id, request_body, true) # with explain

Chat completions

Make a call to the /chat/completions endpoint of the Deployment. Only available for generative Hugging Face models.

request_body = {
"logprobs": true,
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that gives specific answers."
},
{
"role": "user",
"content": "what is 1 + 1?"
}
],
"model": "model", # only when inferencing an external Deployment
"max_tokens": 50
}

chat_completions = client.chat_completions(workspace_id, deployment_id, request_body)

Embeddings

Make a call to the /embeddings endpoint of the Deployment. Only available for embedded Hugging Face models.

request_body = {
"input": [
"Tell me a joke",
"Wonderful world"
],
"logprobs": true,
"model": "model", # only when inferencing an external Deployment
"max_tokens": 40
}

embeddings = client.embeddings(workspace_id, deployment_id, request_body)