0

I have trained an XGBoost model using caret and now, I am calculating the mean SHAP value of each predictor using the package SHAPforxgboost, using the following code:

library(SHAPforxgboost) to_select <- names(caret.xgb$trainingData)[-1] #variables to select in the training set, #the first one is the outcome, needs to be removed shap_values <- shap.values(xgb_model = caret.xgb$finalModel, X_train =data_train %>% select(all_of(to_select)) %>% as.matrix() ) shap_long <- shap.prep(shap_contrib = shap_values$shap_score, X_train = data_train %>% select(all_of(to_select)) %>% as.matrix() ) 

However, I get the following error:

Error in predict.xgb.Booster(xgb_model, (X_train), predcontrib = TRUE) : Feature names stored in `object` and `newdata` are different! 

But I am already selecting the same features as in the training set of the model, and when I use the function identical the output is TRUE.

Thank you!

I tried selecting the same features as in the training set in the model, in case the order of the variables was different, but the error is still the same. I also looked at the intersection of colnames() of each dataset, and it was complete.

0

1 Answer 1

0

I found the error! The xgboost function had internally changed one column name from my dataset, that's why I got the error. Here is the code I used to look for it:

caret.xgb$coefnames %>% as_tibble() %>% filter(!value %in% to_select) 
Sign up to request clarification or add additional context in comments.

1 Comment

I don't think xgboost changes the order. Maybe caret?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.