I have trained an XGBoost model using caret and now, I am calculating the mean SHAP value of each predictor using the package SHAPforxgboost, using the following code:
library(SHAPforxgboost) to_select <- names(caret.xgb$trainingData)[-1] #variables to select in the training set, #the first one is the outcome, needs to be removed shap_values <- shap.values(xgb_model = caret.xgb$finalModel, X_train =data_train %>% select(all_of(to_select)) %>% as.matrix() ) shap_long <- shap.prep(shap_contrib = shap_values$shap_score, X_train = data_train %>% select(all_of(to_select)) %>% as.matrix() ) However, I get the following error:
Error in predict.xgb.Booster(xgb_model, (X_train), predcontrib = TRUE) : Feature names stored in `object` and `newdata` are different! But I am already selecting the same features as in the training set of the model, and when I use the function identical the output is TRUE.
Thank you!
I tried selecting the same features as in the training set in the model, in case the order of the variables was different, but the error is still the same. I also looked at the intersection of colnames() of each dataset, and it was complete.