It follows simply by the pigeonhole principle. Here is an excerpt from my [old post][1]:
Recall (below) $\rm\; BA \:=\:\: I \:\;\Rightarrow\; AB \:=\: I\;\;$ easily reduces to:
THEOREM $\;$ $\rm\;A\;$ injective $\rm\;\Rightarrow\; A\;$ surjective, $\:$ for linear $\rm\:A\:$ on finite dim vector space $\rm\:V$
Proof: $\rm\;A\;$ injective $\rm\;\Rightarrow A\;$ preserves injections: $\rm\;R < S \Rightarrow AR < AS\;$
Hence for $\rm\;\;\; R \;\; < \;\; S < \cdots < V\;$ a subspace chain of maximum length
its image $\rm\;AR < AS < \cdots < AV \le V\;$ would have greater length
if $\rm\;AV < V,\:$ therefore instead $\rm\;\; AV = V\:,\;$ i.e. $\rm\; A \;$ is surjective. $\;\;$ QED
Notice how this form of proof dramatically emphasizes the essence of the matter, namely that
injective maps cannot decrease heights (here = dimension = length of max subspace chain).
Below is said standard reduction to xxxjective form. See said [sci.math post][1] for much more,
including references to folklore generalizations, e.g. work of Vasconcelos in the seventies.
First, notice that $\rm\;\;\: BA = I \;\Rightarrow\: A\:$ injective, since $\rm\;B\;$ times $\rm\;Ax = Ay\;$ yields $\rm\;x = y\:,\:$ and
$\rm A\:$ surjective $\rm\;\Rightarrow\: AB = I \;\;$ since for all $\rm\;x\;$ exists $\rm\; y : \;\; x = Ay = A(BA)y = AB \: x$
Combining them: $\rm\; BA = I \;\Rightarrow\: A\:$ injective $\rm\;\Rightarrow\; A\:$ surjective $\rm\;\Rightarrow\: AB = I$ [1]:http://google.com/[email protected]