Inference data payload
All managed Deployments must follow the data plane formats. External Deployments don't have to adhere to this protocol. Managed Deployments that have a different data payload format can make use of custom mapping.
V1
Models that use Data Plane v1 follow the TensorFlow V1 HTTP API protocol.
Predict
For the /predict
endpoint, the protocol is structured as follows:
Request
{
"instances": [ <value>|<(nested)list>|<list-of-objects> ]
}
The instances
field contains the content of the input tensor.
Response
{
"predictions": [ <value>|<(nested)list>|<list-of-objects> ]
}
The predictions
field contains the content of the output tensor.
Explain
For the /explain
endpoint, the protocol is structured as follows:
Request
{
"instances": [ <value>|<(nested)list>|<list-of-objects> ]
}
The instances
field contains the content of the input tensor.
Response
{
"explanations": [ <value>|<(nested)list>|<list-of-objects> ]
}
The explanations
field contains the content of the output tensor.
V2
The V2 protocol provides increased utility and portability. Currently, not all Deployment frameworks support the V2 protocol, which is why all supported frameworks only support V1. When creating your own custom Docker image, you can adopt V2.
The protocol for both the /predict
and /explain
endpoints is structured as follows:
Request
{
"name": $string,
"shape": [ $number, ... ],
"datatype": $string,
"parameters": $parameters,
"data": [ <value>|<(nested)list>|<list-of-objects> ]
}
name
: name of the input tensorshape
: shape of the input tensor. Each dimension is an integerdatatype
: datatype of tensor input elements as defined in the tensor data types documentationparameters
(optional): object containing 0 or more parameters as explained in the parameters documentationdata
: content of the input tensor. More information can be found in the tensor data documentation
Response
{
"model_name": $string,
"model_version": $string,
"id": $string,
"parameters": $parameters,
"outputs": [$response_output, ... ]
}
model_name
: name of the modelmodel_version
: version of the modelid
: the identifier given to the requestparameters
(optional): object containing 0 or more parameters as explained in the parameters documentationoutputs
: see response_output below
Response output
{
"name": $string
"shape": [$number, ... ],
"datatype": $string,
"data": [$tensor_data, ... ]
}
name
: name of the output tensorshape
: shape of the output tensor. Each dimension is an integerdatatype
: datatype of tensor output elements as defined in the tensor data types documentationdata
: content of the output tensor. More information can be found in the tensor data documentation
Completions
The completions protocol is available for:
- External Deployments (when the endpoint is a
/completions
endpoint) - Hugging Face generative model Deployments
- Custom Docker Deployments (when the endpoint is a
/completions
endpoint)
You can also pass additional OpenAI format parameters such as temperature
, max_tokens
, and others.
/completions endpoint
Request
{
"prompt": [ < list-of-prompt-strings > ]
}
For an explain request with a standard explainer deployed:
{
"prompt": [ < list-of-prompt-strings > ],
"explain": true
}
Response
{
"id": < id >,
"model": < model name >,
"choices":[
{
"index": 0,
"text" : < response >,
...
}
]
}
Chat Completions
The chat completion protocol is available for:
- External Deployments (when the endpoint is a chat completion endpoint)
- Hugging Face generative model Deployments
- Custom Docker Deployments (when the endpoint is a chat completion endpoint)
You can also pass additional OpenAI format parameters such as temperature
, max_tokens
, and others.
/chat/completions endpoint
Request
{
"messages": [
{
"role": "< role >",
"content": < message >"
}
],
..
}
Response
{
"id": < id >,
"model": < model name >,
"choices":[
{
"index": 0,
"message":{
"role":"assistant","reasoning_content":null,"content":"...."
}
...
}
...
]
}
Embeddings
The embeddings protocol is available for:
- External Deployments (when the endpoint is an embedding endpoint)
- Hugging Face generative model Deployments
- Custom Docker Deployments (when the endpoint is an embedding endpoint)
You can also pass additional OpenAI format parameters such as temperature
, max_tokens
, and others.
/embeddings endpoint
Request
{
"input": [
< array of text to get embedding for >
]
}
Response
{
"embedding": [
array of embeddings
],
..
}