This answer is basically the same as Paul Garrett's. --- First I'll state the question as follows.
Let $V$ be a finite dimensional vector space over a field $K$, and let $S$ and $T$ be diagonalizable endomorphisms of $V$. We say that $S$ and $T$ are simultaneously diagonalizable if (and only if) there is a basis of $V$ which diagonalizes both. The theorem is
$S$ and $T$ are simultaneously diagonalizable if and only if they commute.
If $S$ and $T$ are simultaneously diagonalizable, they clearly commute. For the converse, I'll just refer to Theorem 5.1 of The minimal polynomial and some applicationsThe minimal polynomial and some applications by Keith Conrad. [Harvey Peng pointed out in a comment that the link to Keith Conrad's text was broken. I hope the link will be restored, but in the meantime here is a link to the Wayback Machine version. Edit: original link just updated.]
EDIT. The key statement to prove the above theorem is Theorem 4.11 of Keith Conrad's text, which says:
Let $A: V \to V$ be a linear operator. Then $A$ is diagonalizable if and only if its minimal polynomial in $F[T]$ splits in $F[T]$ and has distinct roots.
[$F$ is the ground field, $T$ is an indeterminate, and $V$ is finite dimensional.]
The key point to prove Theorem 4.11 is to check the equality $$V=E_{\lambda_1}+···+E_{\lambda_r},$$ where the $\lambda_i$ are the distinct eigenvalues and the $E_{\lambda_i}$ are the corresponding eigenspaces. One can prove this by using Lagrange's interpolation formula: put $$f:=\sum_{i=1}^r\ \prod_{j\not=i}\ \frac{T-\lambda_j}{\lambda_i-\lambda_j}\ \in F[T]$$ and observe that $f(A)$ is the identity of $V$.