2
$\begingroup$

I have used Xgboost fitted a model with AUC around 0.73 and I printed out my last booster:

booster[599]: 0:[userkn_hometypecnt<22] yes=1,no=2,missing=1 1:[userkn_60d_opencardniu_days<40] yes=3,no=4,missing=3 3:[userkn_30d_opencardniu_days<13] yes=7,no=8,missing=7 7:[userkn_60d_opencardniu_days<24] yes=15,no=16,missing=15 15:[userkn_timeminperiod_firstday<1029] yes=29,no=30,missing=29 29:leaf=0.000352735 30:leaf=-0.0100666 16:[userkn_rate_aopencardniusum_actiondaycnt<0.972506] yes=31,no=32,missing=31 31:leaf=0.000398097 32:leaf=-0.0129448 8:[userkn_hometyperate<0.0977183] yes=17,no=18,missing=17 17:leaf=0.0239075 18:[userkn_rate_aopencardniusum_actiondaycnt<0.957994] yes=35,no=36,missing=35 35:leaf=-0.00201536 36:leaf=0.00858442 4:[userkn_newacitoncntactiondayavg<8.82511] yes=9,no=10,missing=9 9:[userkn_mingap_importcard_open<297306] yes=19,no=20,missing=19 19:[userkn_rate_aopencardniusum_actiondaycnt<0.974763] yes=37,no=38,missing=37 37:leaf=-0.0138254 38:leaf=0.00521038 20:[userkn_onlinetime_firstday<1961.5] yes=39,no=40,missing=39 39:leaf=0.0247849 40:leaf=-0.00297016 10:[userkn_60d_opencardniu_days<59] yes=21,no=22,missing=21 21:[userkn_rate_repeatcntmaxactionrepeatcnt_actioncnt<0.124787] yes=41,no=42,missing=41 41:leaf=0.0101992 42:leaf=-0.0222082 22:leaf=0.0145614 2:[userkn_hometyperate_firstday<0.25266] yes=5,no=6,missing=5 5:[userkn_aenterapplyloanpagecntactiondayavg<0.787338] yes=11,no=12,missing=11 11:[userkn_newacitoncntactiondayavg<8.48678] yes=23,no=24,missing=23 23:[userkn_worktimeactionrate<0.36514] yes=43,no=44,missing=43 43:leaf=-0.0178327 44:leaf=0.0168168 24:leaf=0.0254048 12:[userkn_newacitontyperate_firstday<0.794737] yes=25,no=26,missing=25 25:[userkn_newacitoncntactiondayavg<7.14581] yes=47,no=48,missing=47 47:leaf=0.0175715 48:leaf=-0.00748876 26:leaf=0.0174804 6:[userkn_aopencardniurate_firstday<0.0458042] yes=13,no=14,missing=13 13:[userkn_avgperday_opencardniu_cnt<7.44167] yes=27,no=28,missing=27 27:leaf=0.00171541 28:leaf=-0.0229204 14:leaf=0.00968641 

If I am right, the leaf value is the value of logodds and it can be changed into a probability with the sigmoid function. However in the last booster all the leaf values changed to around 0.5 probability.

Which means all the samples will be marked as good/bad cases half and half? So it's no difference with a random guess at a binary classification?

Am I right or any other opinions are quite appreciated!

$\endgroup$

1 Answer 1

1
$\begingroup$

Could you clarify what you mean by "However in the last booster all the leaf values changed to around 0.5 probability"?

My understanding is when computing predicted probabilities, you'd need to add base score (default = 0.5) to estimated weight parameter (leaf score), like so:

$\hat{p} = \frac{\text{exp(0.5 + w)}}{\text{1 + exp(0.5 + w)}}$

where $\text{w}$ is the estimated leaf score.

Below, is the link to the default xgboost parameters in python API: https://xgboost.readthedocs.io/en/latest/python/python_api.html

class xgboost.XGBClassifier(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs) 

base_score: The initial prediction score of all instances, global bias.

Does this answer your question?

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.