First, let's consider the case $y_i = 1$. In order to be classified correctly, $w^Tx_i + b > 0$, therefore $0 <y_i(w^Tx_i + b)$. In order for this number to be $\geq 1 - \xi_i$, $0 \leq \xi_i < 1$, with $\xi_i = 0$ when $y_i(w^Tx_i + b) \geq 1$.
Now let's assume the same classification occurred, so that $w^Tx_i + b > 0$, but incorrectly, i.e., $y_i = -1$. As $y_i < 0$, it must be that $y_i(w^Tx_i + b) < 0$. In order for this number to be $\geq 1 - \xi_i$, it must be that $\xi_i > 1$.
A similar logic can be worked through for the case where $w^Tx_i + b < 0$. In both cases, misclassification implies that $\xi > 1$.
Since $\xi_i > 1$ for all misclassified cases, it follows that $\Sigma_{i \in \text{misclassified}}\xi_i > \Sigma_{i \in \text{misclassified}}1$, which latter evidently equals the number of misclassified cases.
Since $\xi_i \geq 0$ for all correctly classified cases, it follows that $\Sigma_{i \in \text{correct}}\xi_i \geq 0$.
Therefore the sum over all cases must be greater than or equal to the number of misclassified cases, which makes it an upper bound on the number of misclassified cases.