Timeline for Adding weights for highly skewed data sets in logistic regression
Current License: CC BY-SA 3.0
6 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Feb 3, 2014 at 13:35 | comment | added | probabilityislogic | An extreme distribution also presents a problem by increasing the chance of quasi-complete separation, especially if you have categorical predictors. Penalisation helps here as well. | |
| Jul 25, 2013 at 16:45 | comment | added | Adam Bailey | Just in case anyone else should misread the above as I did at first. The 20:1 in the question is the ratio of negative to positive observations. The 15:1 in Frank Harrell's answer is something else: the ratio of positive observations to candidate independent variables. | |
| Jul 24, 2013 at 16:03 | comment | added | Gavin Simpson | Thanks Frank - the 15:1 issues was what I was most after. I have some publications on the small-sample bias and Firth's method - but if you did have something to hand eventually I'd be most grateful if you let me know what it was. | |
| Jul 24, 2013 at 15:58 | comment | added | Frank Harrell | There are papers on small-sample bias and the value of the Firth penalization. I don't have those handy. Regarding 15:1 see biostat.mc.vanderbilt.edu/wiki/pub/Main/FrankHarrell/… | |
| Jul 24, 2013 at 15:53 | comment | added | Gavin Simpson | Frank, is there a reference or something to support your "15 times..." detail? I have similar unbalance in some data that I am using logistic regression for in place of a ROC method some other researchers developed. I have recently come across the small-sample bias and added an option for Firth's bias-reduction as a fitting option in my code/package. As I'm writing this up for a journal it would be useful to have something to cite along side rules of thumb like this. Apologies if the reference is your RMS book as that is sat on my shelves, but haven't looked there yet. | |
| Jul 24, 2013 at 11:48 | history | answered | Frank Harrell | CC BY-SA 3.0 |