Version: 1.49

Inference data payload

All managed Deployments must follow the data plane formats. External Deployments don't have to adhere to this protocol. Managed Deployments that have a different data payload format can make use of custom mapping.

V1

Models that use Data Plane v1 follow the TensorFlow V1 HTTP API protocol.

Predict

For the /predict endpoint, the protocol is structured as follows:

Request

{
  "instances": [ <value>|<(nested)list>|<list-of-objects> ]
}

The instances field contains the content of the input tensor.

Response

{
  "predictions": [ <value>|<(nested)list>|<list-of-objects> ]
}

The predictions field contains the content of the output tensor.

Explain

For the /explain endpoint, the protocol is structured as follows:

Request

{
  "instances": [ <value>|<(nested)list>|<list-of-objects> ]
}

The instances field contains the content of the input tensor.

Response

{
  "explanations": [ <value>|<(nested)list>|<list-of-objects> ]
}

The explanations field contains the content of the output tensor.

V2

The V2 protocol provides increased utility and portability. Currently, not all Deployment frameworks support the V2 protocol, which is why all supported frameworks only support V1. When creating your own custom Docker image, you can adopt V2.

The protocol for both the /predict and /explain endpoints is structured as follows:

Request

{
  "name": $string,
  "shape": [ $number, ... ],
  "datatype": $string,
  "parameters": $parameters,
  "data": [ <value>|<(nested)list>|<list-of-objects> ]
}

name: name of the input tensor
shape: shape of the input tensor. Each dimension is an integer
datatype: datatype of tensor input elements as defined in the tensor data types documentation
parameters (optional): object containing 0 or more parameters as explained in the parameters documentation
data: content of the input tensor. More information can be found in the tensor data documentation

Response

{
  "model_name": $string,
  "model_version": $string,
  "id": $string,
  "parameters": $parameters,
  "outputs": [$response_output, ... ]
}

model_name: name of the model
model_version: version of the model
id: the identifier given to the request
parameters (optional): object containing 0 or more parameters as explained in the parameters documentation
outputs: see response_output below

Response output

{
  "name": $string
  "shape": [$number, ... ],
  "datatype": $string,
  "data": [$tensor_data, ... ]
}

name: name of the output tensor
shape: shape of the output tensor. Each dimension is an integer
datatype: datatype of tensor output elements as defined in the tensor data types documentation
data: content of the output tensor. More information can be found in the tensor data documentation

Completions

The completions protocol is available for:

External Deployments (when the endpoint is a /completions endpoint)
Hugging Face generative model Deployments
Custom Docker Deployments (when the endpoint is a /completions endpoint)

You can also pass additional OpenAI format parameters such as temperature, max_tokens, and others.

/completions endpoint

Request

{
  "prompt": [ < list-of-prompt-strings > ]
}

For an explain request with a standard explainer deployed:

{
  "prompt": [ < list-of-prompt-strings > ],
  "explain": true
}

Response

{
  "id": < id >,
  "model": < model name >,
  "choices":[
    {
      "index": 0,
      "text" : < response >,
      ...
    }
  ]
}

Chat Completions

The chat completion protocol is available for:

External Deployments (when the endpoint is a chat completion endpoint)
Hugging Face generative model Deployments
Custom Docker Deployments (when the endpoint is a chat completion endpoint)

You can also pass additional OpenAI format parameters such as temperature, max_tokens, and others.

/chat/completions endpoint

Request

{
  "messages": [
      {                    
      "role": "< role >",
      "content": < message >"
      }
    ],
    ..
}

Response

{
  "id": < id >,
  "model": < model name >,
  "choices":[
    {
      "index": 0,
      "message":{
        "role":"assistant","reasoning_content":null,"content":"...."
      }
      ...
    }
    ...
  ]
}

Embeddings

The embeddings protocol is available for:

External Deployments (when the endpoint is an embedding endpoint)
Hugging Face generative model Deployments
Custom Docker Deployments (when the endpoint is an embedding endpoint)

You can also pass additional OpenAI format parameters such as temperature, max_tokens, and others.

/embeddings endpoint

Request

{
  "input": [
    < array of text to get embedding for >
  ]
}

Response

{
  "embedding": [
      array of embeddings
    ],
    ..
}

V1​

Predict​

Request​

Response​

Explain​

Request​

Response​

V2​

Request​

Response​

Response output​

Completions​

/completions endpoint​

Request​

Response​

Chat Completions​

/chat/completions endpoint​

Request​

Response​

Embeddings​

/embeddings endpoint​

Request​

Response​

V1

Predict

Request

Response

Explain

Request

Response

V2

Request

Response

Response output

Completions

/completions endpoint

Request

Response

Chat Completions

/chat/completions endpoint

Request

Response

Embeddings

/embeddings endpoint

Request

Response