> How the formal definition works... I'm not sure how a linear combination constructed in a certain way can tell you the variables are independent or not.
Perhaps a more intutive definition of linear dependence is as follows:
A set of vectors is linearly dependent if **one of its vectors is a linear combination of the other vectors**.
In other words, one of its vectors "depends linearly" on the other vectors. This definition can be deduced from the formal definition:
If $\{ \mathbf{v}_1, \dots, \mathbf{v}_n \}$ is linearly dependent, then there exists some non-zero $\alpha_1, \dots, \alpha_n$ such that
$$ \alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_2 + \dots + \alpha_n \mathbf{v}_n = \mathbf{0} $$
Without loss of generality, suppose $\alpha_1 \neq 0$ then we have
$$ \mathbf{v}_1 = - \frac{\alpha_2}{\alpha_1} \mathbf{v}_2 - \dots - \frac{\alpha_n}{\alpha_1} \mathbf{v}_n $$
So $\mathbf{v}_1$ is a linear combination of the other vectors.
> Why set the linear combination equations to $\mathbf{0}$. I don't see how setting to zero helps determine independence or not.
In the proof above, we did not set the right-hand side to zero. It is zero only because everything is moved to the left-hand side.
> Why choose $\alpha_k$ to be non-zero in one case. It seems arbitrary.
In the proof above, we need not work with $\alpha_1$. We could have chosen any non-zero $\alpha_k$ to show that $\mathbf{v}_k$ is a linear combination of the other vectors.