It follows by the pigeonhole principle. Here's an excerpt from my Dec 11 2007 sci.math postmath post:
Recall (proof below) $\rm\; AB \:\:=\:\:\: I \:\;\Rightarrow\; BA \:\:=\:\: I\;\;\:$ easily reduces to:
THEOREM $\;$ $\rm\;\;B\;$ injective $\rm\;\Rightarrow\:\: B\;$ surjective, $\:$ for linear $\rm\:B\:$ on a finite dim vector space $\rm\:V$
Proof $\rm\ \ \ B\;$ injective $\rm\;\Rightarrow\ B\;$ preserves injections: $\rm\;R < S \;\Rightarrow\; BR < BS\;$
Hence for $\rm\;\;\; R \;\: < \;\; S < \cdots < \; V\;\;$ a chain of maximum length (= dim $\rm V\:$)
its image $\rm\;BR < BS < \cdots < BV \le V\;\;\:\;$ would have length greater
if $\rm\ BV < V\:,\: $ hence, instead $\rm\:\:\:\ \ BV = V\:,\;\:$ i.e. $\rm\; B \;$ is surjective. $\;$ QED
Notice how this form of proof dramatically emphasizes the essence of the matter, namely that
injective maps cannot decrease heights (here = dimension = length of max subspace chain).
Below is said standard reduction to xxxjective form. See said sci.math post for much more,
including references to folklore generalizations, e.g. work of Armendariz and Vasconcelos in the seventies.
First, notice that $\rm\;\;\ AB = I \;\Rightarrow\: B\:$ injective, since $\rm\;A\;$ times $\rm\;B\:x = B\:y\;$ yields $\rm\;x = y\:,\:$ and
$\rm\ B\ $ surjective $\rm\ \Rightarrow\ BA = I \;\;$ since for all $\rm\;x\;$ exists $\rm\; y : \;\; x = B\:y = B(AB)\:y = BA \: x$
Combining them: $\rm\: AB = I \;\Rightarrow\: B\:$ injective $\rm\;\Rightarrow\; B\:$ surjective $\rm\;\Rightarrow\: BA = I$