Consider the following data simulated in R according to the model for a one-factor ANOVA with three levels of the factor and ten replications at each level. Each level has variance $\sigma^2 = 3^2 = 9.$
set.seed(2020) x1 = rnorm(10, 20, 3) x2 = rnorm(10, 21, 3) x3 = rnorm(10, 22, 4) x = c(x1,x2,x3) gp = as.factor(rep(1:3, each=10))
Here is a stripchart in R showing the ten observations in each group.
stripchart(x ~ gp, pch="|", ylim=c(.5,3.5))

The ANOVA table is given below:
anova(lm(x~gp)) Analysis of Variance Table Response: x Df Sum Sq Mean Sq F value Pr(>F) gp 2 140.48 70.240 4.463 0.02115 * Residuals 27 424.93 15.738 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
MSA = $15.7382$ is the average of the variances within each of the three groups. This is one way to estimate $\sigma^2.$ [Never mind that it is not a very good estimate; with only 30 observations altogether, we can't expect a really close estimate.]
mean(c(var(x1),var(x2),var(x3))) [1] 15.7382
If all three groups had the same mean $\mu$ (the assumption of the null hypothesis), then the three group means $(\bar X_1,\bar X_2, \bar X_3)$ would each would have a normal distribution with mean $\mu$ and variance $\sigma^2/10.$ So, if $H_0$ were true, we could also estimate $\sigma^2$ as the $10$ times the variance of the 'sample' of three $\bar X_i$s:
10*var(c(mean(x1),mean(x2),mean(x3))) [1] 70.23971
Thus MS(Group) = $70.2397.$ [Because $H_0$ is false, this estimate is much too large; the three means also express the differences among groups.]
So the way ANOVA "knows" how to get the two variances is because of the two procedures we have just seen.
If $H_0$ is true the two variance estimates tend to be about the same so that the F-ratio would tend to be about $1.$ The larger the F-ratio is above $1,$ the stronger the evidence against $H_0.$ In our case $F = 4.463 > 1.$ Taking numerator and denominator degrees of freedom into account, $4.463$ is judged to be "significantly" larger than $1.$
The variance estimate in the numerator of $F$ involves both $\sigma^2$ and the difference in group population means $\mu_.$ The variance estimate in the denominator involves only $\sigma^2.$
Here is a plot of the density function of the distribution $\mathsf{F}(2, 27).$ The (tiny) area under the density curve to the right of the vertical dotted line is the P-value $0.02115.$

curve(df(x, 2, 27), 0, 10, lwd=2, ylab="PDF", xlab="F", main="Density of F(2,27)") abline(v = 4.463, col="red", lwd=2, lty="dotted") abline(h=0, col="green2"); abline(v=0, col="green2")