Timeline for Using limited independent variables in a multivariable regression model
Current License: CC BY-SA 3.0
23 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Apr 13, 2017 at 12:44 | history | edited | CommunityBot | replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/ | |
| Dec 9, 2016 at 15:09 | history | edited | Giuseppe Biondi-Zoccai | CC BY-SA 3.0 | typo correction and integration |
| Apr 17, 2016 at 7:33 | vote | accept | Giuseppe Biondi-Zoccai | ||
| Apr 17, 2016 at 7:33 | vote | accept | Giuseppe Biondi-Zoccai | ||
| Apr 17, 2016 at 7:33 | |||||
| S Apr 17, 2016 at 7:09 | history | bounty ended | CommunityBot | ||
| S Apr 17, 2016 at 7:09 | history | notice removed | CommunityBot | ||
| Apr 9, 2016 at 7:36 | comment | added | Marquis de Carabas | @GiuseppeBiondi-Zoccai I admit my ignorance of limited independent variables...is that just like limited dependent variable but for independent variables? :) That is, the independent variable is categorical, count, etc? Can you explain why the limited independent variable might be a problem? The original Heckman model was actually used for a continuous outcome, but nowadays, there are several flavors, including probit and even Poisson. | |
| Apr 9, 2016 at 7:17 | answer | added | Umberto benedetto | timeline score: 1 | |
| Apr 9, 2016 at 7:05 | comment | added | Giuseppe Biondi-Zoccai | @marquisdecarabas: I tried to look into Heckman type corrections, but I found they focus on limited dependent variables, and not limited independent variables (but possibly I am mistaken...): stats.stackexchange.com/questions/172508/… | |
| Apr 9, 2016 at 6:57 | history | tweeted | twitter.com/StackStats/status/718694216407367680 | ||
| Apr 9, 2016 at 6:45 | comment | added | Marquis de Carabas | @Björn, based on the OP's comment here, the covariates are not missing at random, hence my original suggestion for using some kind of Heckman correction. Obviously, including the people who did not take the exercise stress test is clinically interesting and important here, because they are already different from those who took the exercise test. Of course, OP can drop the 1,000 or so missing exercise test, but results would have limited generalizability. | |
| Apr 9, 2016 at 6:24 | comment | added | Giuseppe Biondi-Zoccai | (+1) Thanks @Björn. My problem is that eventually I might want to generate a clinical risk prediction score for those completing the exercise test (thus including all covariates and using them to predict risk), but also for those not doing the exercise test (so including only some variables). Thus, my problem is two-faceted: using a single model encompassing all patients and all variables (despite several variables being missing in some patients, not at random) to adjust for confounders; then creating separate models for risk prediction in the two main strata. | |
| Apr 9, 2016 at 5:33 | comment | added | Björn | I saw a lot of hits on google.scholar for +"model selection" +"missing covariate", as well as +"model building" +"missing covariate". I suspect that it may be possible - if it is plausible that covariates are simply missing at random - would be to impute them using multiple imputation, do whatever model building you do and combine the results across imputations. I believe there's also models that implicity impute them. However, if covariates will be missing in practice when people are trying to use the prediction score also, that would be an even harder problem. | |
| S Apr 9, 2016 at 5:24 | history | bounty started | Giuseppe Biondi-Zoccai | ||
| S Apr 9, 2016 at 5:24 | history | notice added | Giuseppe Biondi-Zoccai | Draw attention | |
| Apr 1, 2016 at 10:53 | history | edited | Scortchi♦ | CC BY-SA 3.0 | fixed typos |
| Apr 1, 2016 at 9:45 | history | edited | Giuseppe Biondi-Zoccai | CC BY-SA 3.0 | substantial integration as recommended by commentators |
| Apr 1, 2016 at 9:42 | comment | added | Giuseppe Biondi-Zoccai | The question is not peregrine. Basically, I want to create a clinical prediction score for patients undergoing myocardial perfusion imaging. The imaging test follows an exercise stress test in fit patients, and a pharmacologic stress test in those who are not fit. The latter test is worse than the former, and does not provide several important prognostic features (eg maximum heart rate, or workload), so I must include exercise test variables in the multivariable model. But if I do so, I loose more than 1000 patients who only underwent a pharmacologic stress. I added this also in the question. | |
| Apr 1, 2016 at 9:35 | comment | added | Repmat | You can make some arbitrary assumptions, and do data imputation. But for the sample data posted I dont see the need, you do not loose an entire variable. But yeah sure, you will loose data... | |
| Apr 1, 2016 at 9:28 | comment | added | Giuseppe Biondi-Zoccai | I am not sure I follow you. If I use all the covariates in the model I loose several cases (those with NA). If I only use cov1, cov2, and cov3 I don't use the information in cov4 and cov5... | |
| Apr 1, 2016 at 9:19 | comment | added | Repmat | The $x_i$ can have any features, expect they cannot be constant or a linear combination of each other. If there is not much variation in $x_i$ then the standard error will be larger than otherwise. In itself this is not a problem | |
| Apr 1, 2016 at 8:27 | history | edited | Giuseppe Biondi-Zoccai | CC BY-SA 3.0 | minor integration |
| Mar 31, 2016 at 21:33 | history | asked | Giuseppe Biondi-Zoccai | CC BY-SA 3.0 |