Inference
Use the Python client to inference your Deployment.
Predict
Make a call to the /predict
endpoint of the Deployment.
request_body = {
"instances": [
[39, 7, 1, 1, 1, 1, 4, 1, 2174, 0, 40, 9]
]
}
prediction = client.predict(workspace_id, deployment_id, request_body)
Explain
Make a call to the /explain
endpoint of the Deployment.
request_body = {
"instances": [
[39, 7, 1, 1, 1, 1, 4, 1, 2174, 0, 40, 9]
]
}
explanation = client.explain(workspace_id, deployment_id, request_body)
Completions
Make a call to the /completions
endpoint of the Deployment. Only available for generative Hugging Face models
request_body = {
"prompt": [
"Tell me a joke",
"Give a random fact"
],
"logprobs": true,
"max_tokens": 40
}
completions = client.completions(workspace_id, deployment_id, request_body) # without explain
# only available for generative Hugging Face Deployments with a standard explainer
completions = client.completions(workspace_id, deployment_id, request_body, true) # with explain
Chat completions
Make a call to the /chat/completions
endpoint of the Deployment. Only available for generative Hugging Face models.
request_body = {
"logprobs": true,
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that gives specific answers."
},
{
"role": "user",
"content": "what is 1 + 1?"
}
],
"model": "model", # only when inferencing an external Deployment
"max_tokens": 50
}
chat_completions = client.chat_completions(workspace_id, deployment_id, request_body)
Embeddings
Make a call to the /embeddings
endpoint of the Deployment. Only available for embedded Hugging Face models.
request_body = {
"input": [
"Tell me a joke",
"Wonderful world"
],
"logprobs": true,
"model": "model", # only when inferencing an external Deployment
"max_tokens": 40
}
embeddings = client.embeddings(workspace_id, deployment_id, request_body)