I'm attempting to validate a PLSmy Partial Least Squares (PLS) -regression model using sklearn.
From From the documentationdocumentation it statesand other readings regarding PLS regression I've come to understand that the score function "returnsthere are generally two metrics used to evaluate the R^2performance of self.predict(X) wrt.y"the algorithm. R2$R^2$ is calculated as 1 - residual sum of squares(RSS) and the total sum of squares(TSS):
$$ R^2 = 1 - RSS/TSS $$ $$ RSS =\sum(y-\hat{\mathbf{y}})^2 $$ $$ \ TSS = \sum(y - \bar{\mathbf{y}})^2 $$ Additionally from my understanding I need to validate the model using a test set on which I can then calculate the Predictive performanceWhile (Q2) of the model. Q2$Q^2$ is calculated as: 1 - Predictive residual Error sum of squares(PRESS)/Total Sum of Squares.
$$ \ Q2 = 1 - PRESS/TSS $$ TSS: $$ \ Q^2 = 1 - PRESS/TSS $$ $$ \ PRESS = \sum(y-\hat{\mathbf{y}})^2 $$
The Calculation for R2$R^2$ and Q2 seem practically$Q^2$ are almost identical. In this case could I calculate R2 as on, with the training set and Q2only difference being that RSS is calculated from the data on which the test set? In that case could I simply call:algorithm is trained and PRESS is calculated from held out data.
r2 = pls.score(x_train,y_train) q2 = pls.score(x_test,y_test) My question:
In the view of training/test splits of data, is it appropriate to compute r2call $R^2$ a metric of how the algorithm fits the training data and q2$Q^2$ a metric of algorithm performance on test data?
Additionally a sideSide question: Is it good practice to scale Y in the same manner as X in PLS regression?