I was reading the book "An Introduction to Statistical Learning with Applications in R". In page 306, when talking about the objective function of tree model, the book says:
"The goal is to find boxes $R_1,...,R_J$" that minimize the RSS, given by" $$\sum_{j=1}^J\sum_{i\in R_j}(y_i-\hat{y}_{R_j})^2,$$ where $\hat{y}_{R_j}$ is the mean response for the training observations within the $j$th box. Unfortunately, it is computationally infeasible to consider every possible partition of the feature space into $J$ boxes."
My question is: isn't the optimal solution to this RSS very obvious? We just partition the whole feature into $N$ rectangles such that each rectangle only contains one data point, then we achieve zero RSS. Let's put the test performance aside. For now, if we just want to find the $J$ and $\{R_j\}_{j=1}^J$ that minimizes the above RSS, then shouldn't we just make partitions of the feature space such that each rectangle only contains one training data point?