Error when calculating SHAP value in xgboost model - feature names are different?

Question

I have trained an XGBoost model using caret and now, I am calculating the mean SHAP value of each predictor using the package SHAPforxgboost, using the following code:

library(SHAPforxgboost) to_select <- names(caret.xgb$trainingData)[-1] #variables to select in the training set, #the first one is the outcome, needs to be removed shap_values <- shap.values(xgb_model = caret.xgb$finalModel, X_train =data_train %>% select(all_of(to_select)) %>% as.matrix() ) shap_long <- shap.prep(shap_contrib = shap_values$shap_score, X_train = data_train %>% select(all_of(to_select)) %>% as.matrix() )

However, I get the following error:

Error in predict.xgb.Booster(xgb_model, (X_train), predcontrib = TRUE) : Feature names stored in `object` and `newdata` are different!

But I am already selecting the same features as in the training set of the model, and when I use the function identical the output is TRUE.

Thank you!

I tried selecting the same features as in the training set in the model, in case the order of the variables was different, but the error is still the same. I also looked at the intersection of colnames() of each dataset, and it was complete.

a12456 · Accepted Answer · 2024-09-25 09:37:36Z

I found the error! The xgboost function had internally changed one column name from my dataset, that's why I got the error. Here is the code I used to look for it:

caret.xgb$coefnames %>% as_tibble() %>% filter(!value %in% to_select)

Collectives™ on Stack Overflow

Error when calculating SHAP value in xgboost model - feature names are different?

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related