It follows simply by the pigeonhole principle. Here is an excerpt from my old post:
Recall (below) $\rm\; BA \:=\:\: I \:\;\Rightarrow\; AB \:=\: I\;\;$ easily reduces to:
THEOREM $\;$ $\rm\;A\;$ injective $\rm\;\Rightarrow\; A\;$ surjective, $\:$ for linear $\rm\:A\:$ on finite dim vector space $\rm\:V$
Proof: $\rm\;A\;$ injective $\rm\;\Rightarrow A\;$ preserves injections: $\rm\;R < S \Rightarrow AR < AS\;$
Hence for $\rm\;\;\; R \;\; < \;\; S < \cdots < V\;$ a subspace chain of maximum length
its image $\rm\;AR < AS < \cdots < AV \le V\;$ would have greater length
if $\rm\;AV < V\;$, therefore instead $\rm\;\; AV = V\:,\;$ i.e. $\rm\; A \;$ is surjective. $\;\;$ QED
Below is said standard reduction to **jective form. See my said sci.math post for much more,
including references to folklore generalizations, e.g. work of Vasconcelos in the seventies.
First, notice that $\rm\;\;\; BA = I \Rightarrow A\;\:$ injective, since $\rm\;B\;$ times $\rm\;Ax = Ay\;$ yields $\rm\;x = y\;$, and
$\rm A\;$ surjective $\rm\;\Rightarrow AB = I\;$, since for all $\rm\;x\;$ exists $\rm\; y : \;\; x = Ay = A(BA)y = ABx$