Return to Question

Tweeted twitter.com/StackStats/status/954254129907384321

occurred Jan 19, 2018 at 7:28

Post Reopened by kjetil b halvorsen♦, whuber♦

occurred Jul 25, 2017 at 13:10

changed the question to fit recommendations from the comments.

Source Link

edited Jul 25, 2017 at 12:40

Tkanno

Calculating R2 Validation metrics (R2 and Q2) for PLSPartial Least Squares (PLS) Regression in sklearn

I'm attempting to validate a PLSmy Partial Least Squares (PLS) -regression model using sklearn.

From From the documentationdocumentation it statesand other readings regarding PLS regression I've come to understand that the score function "returnsthere are generally two metrics used to evaluate the R^2performance of self.predict(X) wrt.y"the algorithm. R2$R^2$ is calculated as 1 - residual sum of squares(RSS) and the total sum of squares(TSS):

$$ R^2 = 1 - RSS/TSS $$ $$ RSS =\sum(y-\hat{\mathbf{y}})^2 $$ $$ \ TSS = \sum(y - \bar{\mathbf{y}})^2 $$ Additionally from my understanding I need to validate the model using a test set on which I can then calculate the Predictive performanceWhile (Q2) of the model. Q2$Q^2$ is calculated as: 1 - Predictive residual Error sum of squares(PRESS)/Total Sum of Squares.

$$ \ Q2 = 1 - PRESS/TSS $$ TSS: $$ \ Q^2 = 1 - PRESS/TSS $$ $$ \ PRESS = \sum(y-\hat{\mathbf{y}})^2 $$

The Calculation for R2$R^2$ and Q2 seem practically$Q^2$ are almost identical. In this case could I calculate R2 as on, with the training set and Q2only difference being that RSS is calculated from the data on which the test set? In that case could I simply call:algorithm is trained and PRESS is calculated from held out data.

r2 = pls.score(x_train,y_train) q2 = pls.score(x_test,y_test)

My question:

In the view of training/test splits of data, is it appropriate to compute r2call $R^2$ a metric of how the algorithm fits the training data and q2$Q^2$ a metric of algorithm performance on test data?

Additionally a sideSide question: Is it good practice to scale Y in the same manner as X in PLS regression?

Calculating R2 and Q2 for PLS Regression in sklearn

I'm attempting to validate a PLS-regression model using sklearn.

From the documentation it states that the score function "returns the R^2 of self.predict(X) wrt.y". R2 is calculated as:

$$ R^2 = 1 - RSS/TSS $$ $$ RSS =\sum(y-\hat{\mathbf{y}})^2 $$ $$ \ TSS = \sum(y - \bar{\mathbf{y}})^2 $$ Additionally from my understanding I need to validate the model using a test set on which I can then calculate the Predictive performance (Q2) of the model. Q2 is calculated as: 1-PRESS/Total Sum of Squares.

$$ \ Q2 = 1 - PRESS/TSS $$ $$ \ PRESS = \sum(y-\hat{\mathbf{y}})^2 $$

The Calculation for R2 and Q2 seem practically identical. In this case could I calculate R2 as on the training set and Q2 on the test set? In that case could I simply call:

r2 = pls.score(x_train,y_train) q2 = pls.score(x_test,y_test)

to compute r2 and q2?

Additionally a side question: Is it good practice to scale Y in the same manner as X in PLS regression?

Validation metrics (R2 and Q2) for Partial Least Squares (PLS) Regression

I'm attempting to validate my Partial Least Squares (PLS) -regression model. From documentation and other readings regarding PLS regression I've come to understand that there are generally two metrics used to evaluate the performance of the algorithm. $R^2$ is calculated as 1 - residual sum of squares(RSS) and the total sum of squares(TSS):

$$ R^2 = 1 - RSS/TSS $$ $$ RSS =\sum(y-\hat{\mathbf{y}})^2 $$ $$ \ TSS = \sum(y - \bar{\mathbf{y}})^2 $$ While $Q^2$ is calculated as 1 - Predictive residual Error sum of squares(PRESS)/ TSS: $$ \ Q^2 = 1 - PRESS/TSS $$ $$ \ PRESS = \sum(y-\hat{\mathbf{y}})^2 $$

The Calculation for $R^2$ and $Q^2$ are almost identical, with the only difference being that RSS is calculated from the data on which the algorithm is trained and PRESS is calculated from held out data.

My question:

In the view of training/test splits of data, is it appropriate to call $R^2$ a metric of how the algorithm fits the training data and $Q^2$ a metric of algorithm performance on test data?

Side question: Is it good practice to scale Y in the same manner as X in PLS regression?

Post Closed as "Not suitable for this site" by Peter Flom

occurred Jul 21, 2017 at 12:14