I have a CatboostRanker (with groups) model trained on a dataset of ~1m rows of ~300 features. Three such features are ~90% invalid (have a special category denoting such). My data is time series based, so the invalid data of these features is all but the most recent data.
For some reason, these three features are amongst the top 5 most important according to their shapley values (the absolute sum of all shap values for a given feature). When looking at the individual shap values for each individual object, ALL of them are positive, meaning they all contribute positively to the target variable of binary [0,1].
I don't understand why the 90% of invalid values for these features all carry with them a positive shapley value, since the invalid category theoretically confers no helpful information. I've read several explanations of shap values, and understand their mathematical basis, but still no closer to an answer. Any suggestions would be much appreciated.
On the other hand, Catboost's own permutation-based get_feature_importance method ranks these variables lowly which seems far more intuitive.
Method: I used shap.TreeExplainer on the trained model, then extract the shap values from the explainer by passing a catboost pool containing my training data.