This is treated under Configuration model. Note that, there can be self-edges (i.e. it's not a loop-free graph) and there can also be multiple edges between two nodes. And, the formula above is a result of a series of assumptions and is not exactly correct.
Let's find the expected value of number of edges between vertices $i$ and $j$ and setup the mathematical model. We can think of vertices with half-edges (or stubs as wikipedia article mentions) attached to them. Each vertex, $i$, has $k_i$ half-edges attached. If an half-edge from vertex $i$ is connected another half-edge from vertex $j$, we have an edge between $i$ and $j$. If an half-edge from vertex $i$ is connected to another half-edge from the same vertex, we have a self-loop for vertex $i$.
Let $X_{i}^l$ be the binary random variable that represents the $l$-th half-edge of vertex $i$. It's $1$ if it's connected to vertex $j$ and $0$ otherwise. So, the expected value of $X_{i}^l$ is the probability of the half-edge $l$ of vertex $i$ connected to vertex $j$. There are $2m-1$ half-edges in total except this one and there are $k_j$ that are of interest. So, in a random setup, this probability is simply $$\mathbb E[X_i^l]=\frac{k_j}{2m-1}$$
The number of edges between the vertices $i$ and $j$ is just the sum of all $X_i^l$ for vertex $i$. So, the expected value of this is $$\mathbb E[\text{# edges between i and j}] = \mathbb E\left[\sum_{l=1}^{k_i} X_i^l\right]=\sum_{l=1}^{k_i} E[X_i^l]=\sum_{l=1}^{k_i} \frac{k_j}{2m-1}=\frac{k_ik_j}{2m-1}$$
This is not exactly the probability of vertices $i$ and $j$ being connected. It's the expected number of edges between these two and is expected to be close to the probability we want if the graph (and the number of edges) is large. In this case, it can also be approximated with $2m-1\approx 2m$ in the denominator. Because, if the graph is large, and say the degrees of the vertices is small compared to the overall number of edges, the overcounting by adding the probabilities of independent events (if this expected value is assumed to be the probability as per the original question) is getting smaller because for every half-edge there are so many other possibilities to connect to and the likelihood of selecting the same half-edge from vertex $j$ is small for two given half-edges of $i$.
A Counter Example
Moreover, returning to the original question, one should not always expect to have $k_ik_j\leq 2m$ because the denominator is just a sum of the degrees and the numerator is the multiplication of some of them. Simply, in a graph with only two vertices, we can easily have $k_1=4, k_2=4$ and therefore $2m=8$, but the "probability" greater than $1$.
The actual probability in this case is $$1-\overbrace{3/7}^{\text{prob. of first stub of $i$ connected to $i$}} \times \overbrace{1/5}^{\text{prob. of the remaining stub of $i$ is connected to $i$}} = 32/35$$
But, the expected number of edges between the two vertices is $k_ik_j/(2m-1) = 16 / 7$, which is far greater.
Reference
The treatment of this topic/subject is inside Section 13.2.1 of Newman's book, Networks: An Introduction.