It's a simple consequence of the pigeonhole principle. Here's an excerpt from my old post:
Recall (below) $\rm\;\; BA = I \Rightarrow AB = I\;$ reduces to: $\rm\;A\;$ injective $\rm\;\Rightarrow A \;$ surjective. $\;$ But
THEOREM $\;$ $\rm\;A\;$ injective $\rm\;\Rightarrow A\;$ surjective, $\:$ for all linear $\rm\;A\;$ on finite dimensional $\rm\;V\;$
Proof $\rm\;A\;$ injective $\rm\;\Rightarrow A\;$ preserves injections: $\rm\;R < S \Rightarrow AR < AS\;$
Hence for $\rm\;\;\; R \;\; < \;\; S < \cdots < V\;$ a subspace chain of maximum length
its image $\rm\;AR < AS < \cdots < AV \le V\;$ would have greater length
if $\rm\;AV < V\;$, therefore instead $\rm\;\; AV = V\:,\;$ i.e. $\rm\; A \;$ is surjective. $\;\;$ QED
Below is said standard reduction to **jective form. See my said sci.math post for much more.
First, notice that $\rm\;\;\; BA = I \Rightarrow A\;$ injective, since $\rm\;B\;$ times $\rm\;Ax = Ay\;$ yields $\rm\;x = y\;$, and
$\rm A\;$ surjective $\rm\;\Rightarrow AB = I\;$, since for all $\rm\;x\;$ exists $\rm\; y\; : x = Ay = A(BA)y = ABx$