Skip to main content
Version: 1.49

Inference data payload

All managed Deployments must follow the data plane formats. External Deployments don't have to adhere to this protocol. Managed Deployments that have a different data payload format can make use of custom mapping.

V1

Models that use Data Plane v1 follow the TensorFlow V1 HTTP API protocol.

Predict

For the /predict endpoint, the protocol is structured as follows:

Request

{
"instances": [ <value>|<(nested)list>|<list-of-objects> ]
}

The instances field contains the content of the input tensor.

Response

{
"predictions": [ <value>|<(nested)list>|<list-of-objects> ]
}

The predictions field contains the content of the output tensor.

Explain

For the /explain endpoint, the protocol is structured as follows:

Request

{
"instances": [ <value>|<(nested)list>|<list-of-objects> ]
}

The instances field contains the content of the input tensor.

Response

{
"explanations": [ <value>|<(nested)list>|<list-of-objects> ]
}

The explanations field contains the content of the output tensor.

V2

The V2 protocol provides increased utility and portability. Currently, not all Deployment frameworks support the V2 protocol, which is why all supported frameworks only support V1. When creating your own custom Docker image, you can adopt V2.

The protocol for both the /predict and /explain endpoints is structured as follows:

Request

{
"name": $string,
"shape": [ $number, ... ],
"datatype": $string,
"parameters": $parameters,
"data": [ <value>|<(nested)list>|<list-of-objects> ]
}

Response

{
"model_name": $string,
"model_version": $string,
"id": $string,
"parameters": $parameters,
"outputs": [$response_output, ... ]
}
  • model_name: name of the model
  • model_version: version of the model
  • id: the identifier given to the request
  • parameters (optional): object containing 0 or more parameters as explained in the parameters documentation
  • outputs: see response_output below

Response output

{
"name": $string
"shape": [$number, ... ],
"datatype": $string,
"data": [$tensor_data, ... ]
}
  • name: name of the output tensor
  • shape: shape of the output tensor. Each dimension is an integer
  • datatype: datatype of tensor output elements as defined in the tensor data types documentation
  • data: content of the output tensor. More information can be found in the tensor data documentation

Completions

The completions protocol is available for:

  • External Deployments (when the endpoint is a /completions endpoint)
  • Hugging Face generative model Deployments
  • Custom Docker Deployments (when the endpoint is a /completions endpoint)

You can also pass additional OpenAI format parameters such as temperature, max_tokens, and others.

/completions endpoint

Request

{
"prompt": [ < list-of-prompt-strings > ]
}

For an explain request with a standard explainer deployed:

{
"prompt": [ < list-of-prompt-strings > ],
"explain": true
}

Response

{
"id": < id >,
"model": < model name >,
"choices":[
{
"index": 0,
"text" : < response >,
...
}
]
}

Chat Completions

The chat completion protocol is available for:

  • External Deployments (when the endpoint is a chat completion endpoint)
  • Hugging Face generative model Deployments
  • Custom Docker Deployments (when the endpoint is a chat completion endpoint)

You can also pass additional OpenAI format parameters such as temperature, max_tokens, and others.

/chat/completions endpoint

Request

{
"messages": [
{
"role": "< role >",
"content": < message >"
}
],
..
}

Response

{
"id": < id >,
"model": < model name >,
"choices":[
{
"index": 0,
"message":{
"role":"assistant","reasoning_content":null,"content":"...."
}
...
}
...
]
}

Embeddings

The embeddings protocol is available for:

  • External Deployments (when the endpoint is an embedding endpoint)
  • Hugging Face generative model Deployments
  • Custom Docker Deployments (when the endpoint is an embedding endpoint)

You can also pass additional OpenAI format parameters such as temperature, max_tokens, and others.

/embeddings endpoint

Request

{
"input": [
< array of text to get embedding for >
]
}

Response

{
"embedding": [
array of embeddings
],
..
}