XGBoost complains about feature names
From XGBoost 2.0.0+, when training on a pandas DataFrame, feature names are automatically saved in the xgboost model artefact. This can cause problems when using XGBoost in combination with explainers or when the V1 protocol is used to inference the model. This can result in an HTTP 500 status: Internal Server Error like the following:
An error has occurred while calling the model endpoint, server returned HTTP 500: Internal Server Error training data did not have the following fields: ...
To solve this error, there are two options:
Option 1
Make sure to convert your DataFrame into a Numpy Array and train the model using Numpy Array and DMatrix.
# Import libraries
from xgboost import DMatrix, train
import pandas as pd
import os
# Create df and specify X and y
df = pd.DataFrame({
'A': [1, 4, 7],
'B': [2, 5, 8],
'C': [3, 6, 9],
'target': [0, 1, 1]
})
X = df[['A', 'B', 'C']]
y = df['target']
# Convert X and y to Numpy Arrays
X_array = X.to_numpy()
y_array = y.to_numpy()
# Create DMatrix for optimal memory efficiency and training speed
data_dmatrix = DMatrix(data=X_array, label=y_array)
# Set XGBoost parameters
params = {
'objective': 'binary:logistic',
'learning_rate': 0.1,
'random_state': 42
}
# Train the model
model = train(params, data_dmatrix)
# Create 'model' folder and 'model.json' file
model_dir = "./model"
JSON_FILE = "model.json"
# Save model object
model_file = os.path.join((model_dir), JSON_FILE)
model.save_model(model_file)
Option 2
In case you already trained your model on a DataFrame, remove the content of the featureNames key from an existing model.json
file using the following code snippet.
# Open the JSON file, load it, modify it, and save it back
with open('model.json', 'r') as f:
d = json.load(f)
# Modify the 'featureNames' key
d['featureNames'] = []
# Save the updated dictionary back to the JSON file
with open('model_no_feature_names.json', 'w') as f:
json.dump(d, f, indent=4)