5
$\begingroup$

I find a lot of resources online which explains the dummy variable trap and that you should remove 1 category of your dummy variable before fitting it into a multilinear model to avoid multicollinearity. While I understand what you should do I don't understand why you should do it in term of mathematical explanation. I mean, let's take a concrete example: I have a variable Gender with values Male or Female. If I take the multilinear model equation I get:

$$y = B_0 + x_1B_1 + x_2B_2$$

with $x_1 = 1$ and $x_2 = 0$. So I get: $y=0+1\times1 + 0\times1$ so how is it different from $y=0+1\times1$ (which the second dummy variable removed ? Could someone give me a concrete mathematical example of how this "trap" works? Thanks

$\endgroup$
2
  • 8
    $\begingroup$ You seem to describe a model for just a single observation! With $n$ observations, $B_0$ is a vector of ones, $B_1$ is a binary vector, $B_2$ is a binary vector, and (by construction) $B_0 = B_1+B_2$: that's collinearity. Regardless, your question itself contains the concrete example you ask for. $\endgroup$ Commented Apr 13, 2018 at 14:39
  • 2
    $\begingroup$ Oh ok I see, that makes sense. Thanks for your answer :) $\endgroup$ Commented Apr 14, 2018 at 11:32

1 Answer 1

1
$\begingroup$

We can see from Wikipedia that:

Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related.

In your case, that means that
$$x_1 = 1-x_2$$ and hence, your equation becomes \begin{align} y &= B_0 + x_1B_1 + x_2B_2 \\ &= B_0 + B_1(1-x_2) + B_2 x_2 \\ &= (B_0+B_1) + (B_2-B_1)x_2 \end{align}

It is obvious that $(B_0+B_1) + (B_2-B_1)x_2$ is equivalently $\alpha + \beta x$ which entails only one variable.

Reference: Dummy Variable Trap

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.