2
$\begingroup$

I had a model (made with VW, log loss) based on a set of base (p=1000's) predictors. It did not predict well.

I added set A of predictors (p=~5 predictors), and it improved immensely.

I added set B of predictors (p=1000's), without set A, and it was only a little bit better than the base model.

I then tried to predict based on the base predictors + A + B, and it performed terribly. Much worse than even the base model. In most of the models the coefficients range from ~-3.5 to 3.5. In the model with all the predictors, there is only one negative coefficient (-0.93), and the rest are positive, ranging up to 8.0.

I suspect that collinearity is the culprit. How should I test whether groups of predictors are collinear?

$\endgroup$

1 Answer 1

1
$\begingroup$

First, collinearity doesn't affect prediction.

Second, thousands of predictors? If I had to guess, I'd say overfitting is a problem. What's your sample size?

Third, changes in the coefficients could easily be due to collinearity, but, if your goal is prediction, why are you looking here? Unless explanation is also important. But, if explanation is important, then thousands of predictors are going to be a mess. You are then looking at the relationship between one IV and the DV after controlling for all the thousands of other IVs. If you can understand that, then you are much smarter than I am (and probably smarter than the people you will have to talk to).

Finally, to answer your exact question, collinearity diagnostics automatically look at groups of predictors. If you use condition indexes (which, in my opinion are the best; more importantly, that's also David Belsley's opinion and he is the maven of collinearity) then you will get a table with proportion of variances. The only problem will be reading the table with 1000s of columns, but that's not really a stats problem. You could probably sort that table to group certain variables, maybe in the stats program itself or maybe with some spreadsheet program or something else.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.