Creating Hugging Face Deployments
Deploying Hugging Face models generally follows the steps outlined in Create a Deployment. However, there are additional configuration steps specific to Hugging Face models. In this article, we will highlight only the parts unique to Hugging Face Deployments.
Hugging Face model deployments are still experimental.
Prerequisites
You added a Repository that adheres to the requirements.
We can create reference.json for model in one of the two formats.
Include a Blob URL reference, as illustrated in this example:
{
"reference": {
"blob": {
"url": "s3://path-to-model"
}
}
}
Include a huggingface model_id for open source non authorization model for deployment.
{
"reference": {
"huggingface": {
"model": "bigscience/bloom-560m"
}
}
}
Currently private models or model which need approval can only be deployed using blob.
The deployment steps follows the usual approach with few distinct options specific to Hugging Face models. These are listed below.
Model
Select Hugging Face in the Model framework dropdown.
Explainer
For text-to-text generation and text generation type models we provide two standard explainers; saliency and attention. See saliency and attention in standard explainers. To use these explainers, select Standard explainer and choose an option in the Explainer framework dropdown.
Inferencing
Based on type of huggingface model that is needed to be deployed, different api endpoints needs to be used to make requests. If your model is a generative text model or an embedding model the inference cannot be made at /predict or /explain endpoints. More info below:
- For generative models such as microsoft/phi2 the inference can only be made on /completions and /chat/completions endpoint.
- For embedding models inference can be made only on /embeddings endpoint.
- For other models the inferencing is done /predict.
Model type is infered from the model itself. In case it fails to do so you can specify model argument to state task
{
"task": "table_question_answering" | "question_answering" | "token_classification" |
"sequence_classification" | "fill_mask" | "text_generation" | "text2text_generation" |
"multiple_choice" | "text_embedding"
}
When standard explainer is deployed with hugging face generative models. Explanations is only possible at /completions and can be obtained passing "explain": true in request body.