How can we Prove that the sum of slacks $\sum \xi_n$ from the objective function of the SVM formulation with soft margin is an upper bound on the number of misclassified training examples?
Some references: https://www.cs.colostate.edu/~asa/pdfs/howto.pdf
My current knowledge: Optimization problem is
minimize w,b
1/2 $||w||^2$ + C $\sum \xi_i$
subject to $y_i (w^T x_i +b) \ge 1 - \xi_i, \xi_i \ge 0$
But it uses the problem taken for granted, I am looking how can we actually solve this.