In a typical linear regression model, we assume that the errors/residuals w.r.t. predictions are i.i.d. and following a normal distribution with a given variance that is the same for all observations.
I.e. if we have features/covariates $\mathbf{X}_{m \times n}$, target/response variable $\mathbf{y}_{m}$ and fitted coefficients $\mathbf{\beta}_{n}$, then: $$ (\mathbf{X} \mathbf{\beta} - \mathbf{y}) \sim N(0, \sigma^2) $$
If the residuals are correlated and/or each have different variances - i.e.: $$ (\mathbf{X} \mathbf{\beta} - \mathbf{y}) \sim N(\mathbf{0}_{m}, \mathbf{\Sigma}_{m \times m}) $$ Then coefficients that are estimated by simply minimizing squared errors will have certain undesirable properties, and better coefficients may instead be obtained by "generalized least squares" which takes into account those correlations and differing variances. Per Wikipedia GLS (generalized least squares) gets the coefficients $\mathbf{\beta}_{\text{GLS}}$ as: $$ \mathbf{\beta}_{\text{GLS}} = ( \mathbf{X}^T \mathbf{\Sigma}^{-1} \mathbf{X} )^{-1} \mathbf{X}^T \mathbf{\Sigma}^{-1} \mathbf{y} $$
Now my question is: assuming we don't know the explicit relationships between residuals beforehand and that the data will not follow some obvious sparsity pattern like autoregressive data, how can such a correlation or covariance matrix $\mathbf{\Sigma}$ be obtained? From a logical POV, that kind of matrix would need to have more parameters than the data can provide. Is there a formula for them that would impose some restrictions? I guess in theory it could be bootstrapped, but I doubt a BS estimate would do unless the number of resamples is larger than what's reasonable to run in a computer.