Bravo on having an intuition that, knowing nothing else, predicting the mean of $y$ every time is the best you can do (at least assuming "best" to be measured in terms of squared deviations between observed and predicted values). I believe this to be a critical component of understanding what $R^2$ and its generalizations mean.
There are many equivalent ways of writing $R^2$ in the simple cases, such as in-sample for ordinary least squares linear regression. Using standard notation where $n$ is the sample size, $y_i$ are the observed values, $\hat y_i$ are the predicted values, and $\bar y$ is the usual mean of all $y_i$, the one that makes the most sense to me is the following:
$$ R^2=1-\left(\dfrac{ \overset{n}{\underset{i=1}{\sum}}\left( y_i-\hat y_i \right)^2 }{ \overset{n}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right) $$
(For (in-sample) OLS linear regression, this turns out to be equal to the squared correlation between predicted and observed values, also equal to the squared correlation between the $x$ and $y$ variables in a simple linear regression.)
A slight modification of the notation gives a relationship to variance.
$$ R^2=1-\left(\dfrac{ \dfrac{1}{n}\overset{n}{\underset{i=1}{\sum}}\left( y_i-\hat y_i \right)^2 }{ \dfrac{1}{n}\overset{n}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right) $$
Since the $\dfrac{1}{n}$ terms in the numerator and denominator cancel out, this is equal to the earlier formula. Then the numerator and denominator are equal to the variances of the residuals and of the original data.
$$ R^2=1-\left(\dfrac{ \dfrac{1}{n}\overset{n}{\underset{i=1}{\sum}}\left( y_i-\hat y_i \right)^2 }{ \dfrac{1}{n}\overset{n}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right)\\ = 1 - \left( \dfrac{ \mathbb V\text{ar}\left( Y - \hat Y \right) }{ \mathbb V\text{ar}\left( Y \right) } \right) $$
Next, I will take some of my explanation in another answer of mine.
$$ y_i-\bar{y} = (y_i - \hat{y_i} + \hat{y_i} - \bar{y}) = (y_i - \hat{y_i}) + (\hat{y_i} - \bar{y}) $$
$$( y_i-\bar{y})^2 = \Big[ (y_i - \hat{y_i}) + (\hat{y_i} - \bar{y}) \Big]^2 = (y_i - \hat{y_i})^2 + (\hat{y_i} - \bar{y})^2 + 2(y_i - \hat{y_i})(\hat{y_i} - \bar{y}) $$
$$SSTotal := \sum_i ( y_i-\bar{y})^2 = \sum_i(y_i - \hat{y_i})^2 + \sum_i(\hat{y_i} - \bar{y})^2 + 2\sum_i\Big[ (y_i - \hat{y_i})(\hat{y_i} - \bar{y}) \Big]$$
$$ :=SSRes + SSReg + Other $$
Divide through by the sample size $n$ (or $n-1$) to get variance estimates.
In OLS linear regression, $Other$ drops to zero. Consequently, all of the variance in $Y$ is accounted for by the residual variance (unexplained) and regression variance (explained). We, therefore, can describe the proportion of total variance explained by the regression, which would be the variance explained by the regression model $(SSReg/n)$ divided by the total variance $(SSTotal/n)$.
$$ \dfrac{SSReg/n}{SSTotal/n} $$$$= \dfrac{SSReg}{SSTotal} $$$$= \dfrac{SSTotal -SSRes-Other}{SSTotal} $$$$= 1-\dfrac{SSRes}{SSTotal}$$$$=1-\left(\dfrac{ \overset{n}{\underset{i=1}{\sum}}\left( y_i-\hat y_i \right)^2 }{ \overset{n}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right) $$
For an intuition, you observe some phenomenon and record the values produced. As you notice that they are not all equal, you begin to wonder why. Different starting conditions (values of the features) can account for some of that. As an example, consider why people are not all the same height. One reason for this is that not everyone is the same age, and people tend to get taller as they grow up. If you only consider adults (so age is a feature), you will have a much narrower range of heights than if you consider all people. If you start to consider genetics and lifestyle, you might be able to get a rather tight distribution of plausible heights, thus explaining much of the variation in the combined values of all human heights.