Timeline for Using limited independent variables in a multivariable regression model

Current License: CC BY-SA 3.0

23 events

when toggle format	what		by	license	comment
Apr 13, 2017 at 12:44	history	edited	CommunityBot		replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/
Dec 9, 2016 at 15:09	history	edited	Giuseppe Biondi-Zoccai	CC BY-SA 3.0	typo correction and integration
Apr 17, 2016 at 7:33	vote	accept	Giuseppe Biondi-Zoccai
Apr 17, 2016 at 7:33	vote	accept	Giuseppe Biondi-Zoccai
Apr 17, 2016 at 7:33
S Apr 17, 2016 at 7:09	history	bounty ended	CommunityBot
S Apr 17, 2016 at 7:09	history	notice removed	CommunityBot
Apr 9, 2016 at 7:36	comment	added	Marquis de Carabas		@GiuseppeBiondi-Zoccai I admit my ignorance of limited independent variables...is that just like limited dependent variable but for independent variables? :) That is, the independent variable is categorical, count, etc? Can you explain why the limited independent variable might be a problem? The original Heckman model was actually used for a continuous outcome, but nowadays, there are several flavors, including probit and even Poisson.
Apr 9, 2016 at 7:17	answer	added	Umberto benedetto		timeline score: 1
Apr 9, 2016 at 7:05	comment	added	Giuseppe Biondi-Zoccai		@marquisdecarabas: I tried to look into Heckman type corrections, but I found they focus on limited dependent variables, and not limited independent variables (but possibly I am mistaken...): stats.stackexchange.com/questions/172508/…
Apr 9, 2016 at 6:57	history	tweeted			twitter.com/StackStats/status/718694216407367680
Apr 9, 2016 at 6:45	comment	added	Marquis de Carabas		@Björn, based on the OP's comment here, the covariates are not missing at random, hence my original suggestion for using some kind of Heckman correction. Obviously, including the people who did not take the exercise stress test is clinically interesting and important here, because they are already different from those who took the exercise test. Of course, OP can drop the 1,000 or so missing exercise test, but results would have limited generalizability.
Apr 9, 2016 at 6:24	comment	added	Giuseppe Biondi-Zoccai		(+1) Thanks @Björn. My problem is that eventually I might want to generate a clinical risk prediction score for those completing the exercise test (thus including all covariates and using them to predict risk), but also for those not doing the exercise test (so including only some variables). Thus, my problem is two-faceted: using a single model encompassing all patients and all variables (despite several variables being missing in some patients, not at random) to adjust for confounders; then creating separate models for risk prediction in the two main strata.
Apr 9, 2016 at 5:33	comment	added	Björn		I saw a lot of hits on google.scholar for +"model selection" +"missing covariate", as well as +"model building" +"missing covariate". I suspect that it may be possible - if it is plausible that covariates are simply missing at random - would be to impute them using multiple imputation, do whatever model building you do and combine the results across imputations. I believe there's also models that implicity impute them. However, if covariates will be missing in practice when people are trying to use the prediction score also, that would be an even harder problem.
S Apr 9, 2016 at 5:24	history	bounty started	Giuseppe Biondi-Zoccai
S Apr 9, 2016 at 5:24	history	notice added	Giuseppe Biondi-Zoccai		Draw attention
Apr 1, 2016 at 10:53	history	edited	Scortchi♦	CC BY-SA 3.0	fixed typos
Apr 1, 2016 at 9:45	history	edited	Giuseppe Biondi-Zoccai	CC BY-SA 3.0	substantial integration as recommended by commentators
Apr 1, 2016 at 9:42	comment	added	Giuseppe Biondi-Zoccai		The question is not peregrine. Basically, I want to create a clinical prediction score for patients undergoing myocardial perfusion imaging. The imaging test follows an exercise stress test in fit patients, and a pharmacologic stress test in those who are not fit. The latter test is worse than the former, and does not provide several important prognostic features (eg maximum heart rate, or workload), so I must include exercise test variables in the multivariable model. But if I do so, I loose more than 1000 patients who only underwent a pharmacologic stress. I added this also in the question.
Apr 1, 2016 at 9:35	comment	added	Repmat		You can make some arbitrary assumptions, and do data imputation. But for the sample data posted I dont see the need, you do not loose an entire variable. But yeah sure, you will loose data...
Apr 1, 2016 at 9:28	comment	added	Giuseppe Biondi-Zoccai		I am not sure I follow you. If I use all the covariates in the model I loose several cases (those with NA). If I only use cov1, cov2, and cov3 I don't use the information in cov4 and cov5...
Apr 1, 2016 at 9:19	comment	added	Repmat		The $x_i$ can have any features, expect they cannot be constant or a linear combination of each other. If there is not much variation in $x_i$ then the standard error will be larger than otherwise. In itself this is not a problem
Apr 1, 2016 at 8:27	history	edited	Giuseppe Biondi-Zoccai	CC BY-SA 3.0	minor integration
Mar 31, 2016 at 21:33	history	asked	Giuseppe Biondi-Zoccai	CC BY-SA 3.0

toggle format