The AES MixColumns operator ensures that the 8 bytes (4 in the input column 4 in the output column) form the codewords of an MDS code over $GF(2^8)$, which means the minimum weight of the code, which is 5, equals the number of nonzero bytes.
Any nonzer byte contributes 1 to the minimum weight, by definition of Hamming Weight over $GF(2^8)$. A nonzero symbol has weight 1, regardless of how many bits of the eight is nonzero.
See the answer to this question for more.
Edit: The Singleton bound states that minimum distance is at least $n-k+1.$ Here $n=8, k=4.$ Such a code is MDS and proving MDS depends on the code structure. Look up MDS codes and Reed Solomon codes.
More concretely, a linear code is the nullspace of its parity check matrix. So if that matrix has all its collections of $d-1$ columns linearly independent (over $GF(2^8)$ here) then its minimum weight codeword must be $d$ or more. Moreover a code is MDS if and only if its dual is MDS so we can just consider the generator matrix and observe all collections of 4columns of $[A| I]$ are indeed linearly independent.So, $d\geq 5.$ But by singleton bound $d\leq 5.$ QED.
See the following link for more details: