There has been much debate, confusion and contradiction on this topic, both on stats.stackexchange and in scientific literature. 

A useful paper is the [2004 study by Grandvalet][1] which argues that the **variance of the cross validation estimator** is a linear combination of three moments: 

$$ var = \frac{1}{n^2} \sum_{i,j} Cov(e_i,e_j)$$
$$= \frac{1}{n}\sigma^2 + \frac{m-1}{n}\omega + \frac{n-m}{n} \gamma$$

Where each term is a particular component of the $n \times n$ covariance matrix $\Sigma$ of cross validation errors $\mathbf{e} = (e_1,...,e_n)^T$

[![enter image description here][2]][2]

As @Amoebe points out, this variance is not a function of $K$. 

> k -fold CV with any value of k produces an error for each of the n observations. So MSE estimate always has the denominator n. This denominator does not change between LOOCV and e.g. 10-fold CV. This is your main confusion here. 

Now, there is a lot more subtlety in this equation of variance than it seems. In particular the terms $\omega$ and $\gamma$ are influenced by *correlation* between the data sets, training sets, testing sets etc.. and *instability* of the model. You will need to read through the extensive (and technical) literature to really understand deeply what this means. 
 [1]: http://www.jmlr.org/papers/volume5/grandvalet04a/grandvalet04a.pdf
 [2]: https://i.sstatic.net/HRtCF.png
 [3]: https://stats.stackexchange.com/a/358138/192854