Model Inference
Inference methods
Inference your mode in multiple ways after creating a Deployment. Test whether everything is working on the test page of your Deployment. Then integrate it in your pipelines or applications using Deeploy's Python Client or REST API.
Authentication
Authentication is required for inferencing your Deployment. Authenticate using the Python Client, or authenticate to the REST API:
- Basic auth: create a personal key pair in your account page. Base64 encode the access key and secret key:
echo "access_key:secret_key" | base64
. Use the Base64 encoded string asAuthorization
header, e.g.Authorization: Basic <base64 encoded key>
- Bearer token: create a Deployment token on your Deployment's Authentication page. Use the token as
Authorization
header, e.g.Authorization: Bearer <token>
. Alternatively you can use a JSON Web Token from your OpenID Connect provider as Bearer token. See configure OpenID Connect for more information.
Accepted content types
Mdels and explainers accept various content types. Most standard model frameworks only accept JSON input (application/json
), however custom Docker artifacts accept more content types, e.g. an image or PDF. Deeploy accepts the following content types: 'application/json', 'application/pdf', 'application/octet-stream', 'image/*', 'text/*', 'binary/octet-stream'
. Make sure to set the correct content type header when doing inference. If no header is set, it will default to application/json
. If your model or explainer returns a different type other than application/json
, your request's response will be in binary format.
Request and Prediction logs
For each inference request, a single request log and one or more prediction logs are created in Deeploy. The request is split up into multiple prediction logs if the request body contains multiple entries in the instances
or inputs
array, providing that the predictions
and explanations
in the response body have the same length.
The prediction logs contain the request and response bodies. These are only stored when the content types can be recognized as application/json
. The request body is stored as a binary file in object storage for other types.