How to interpret Shapley value plot for a model?

Question

I was trying to use Shapley value approach for understanding the model predictions. I am trying this on a Xgboost model. My plot looks like as below

Can someone help me interpret this? Or confirm my understanding is correct?

My interpretation

1) High values of Feature 5 (indicated by rose/purple combination) - leads to prediction 1

2) Low values of Feature 5 (indicated by blue) - leads to prediction 0

3) Step 1 and 2 applies for Feature 1 as well

4) Low values of Feature 6 leads to prediction 1 and high values of Feature 6 leads to Prediction 0

5) Low values of Feature 8 leads to prediction 1 and high values of Feature 8 leads to prediction 1 as well. If it's too the extreme of x-axis (meaning from x(1,2) or x(2,3) - it means the impact of low values (in this case) of this feature, has a huge impact on the prediction 1. Am I right?

6) Why don't I see all my 45 features in the plot irrespective of the importance/influence. Shouldn't I be seeing no color when they have no importance. Why is that I only see around 12-14 features?

7) What role does Feature 43,Feature 55, Feature 14 play in prediction output?

8) Why is the SHAP value range from -2,2?

Can someone help me with this?

Noah Weber · Accepted Answer · 2019-12-23 09:59:15Z

1. 2. not always there are some blue points also.

3. 4. 5. yes

6. it depends on the shap plot you are using, on some them default is to surpress less important features and not even plot them.

7. They are discriminatory but not as much, you can reconcile them with some other feature selection technique and decide if you want to keep them.

8. Range of the SHAP values are only bounded by the output magnitude range of the model you are explaining. The SHAP values will sum up to the current output, but when there are canceling effects between features some SHAP values may have a larger magnitude than the model output for a specific instance. If you are explaining a model that outputs a probability then the range of the values will be -1 to 1, because the range of the model output is 0 to 1. If you are explaining a model that outputs a real number or log odds the SHAP values could be larger since the model outputs can be larger.

Hi, thanks for the response. Upvoted. One quick question. For point 8, what do you mean by output magnitude range of the model? In my case, the output label is 0 and 1. So the range of output label should be from 0 - 1? But I see -2 to + 2? I know am making a blunder. would be helpful if you could help me wit this? — The Great
– The Great, Commented Dec 23, 2019 at 11:05
@TheGreat I think this video that clarifies the difference between Odds and Log(Odds) can help understand his answer for section 8. — Mario
– Mario, Commented Jan 21, 2021 at 18:07
I am facing an issue related to this post which is stackoverflow.com/questions/71493858/… Can you help me? — The Great
– The Great, Commented Mar 16, 2022 at 8:35

Stack Exchange Network

How to interpret Shapley value plot for a model?

1 Answer 1

Linked

Hot Network Questions

How to interpret Shapley value plot for a model?

1 Answer 1

Linked

Related

Hot Network Questions