3
$\begingroup$

I have developed a log-log model which gives me a rmse of 0.1. I want to compare the results with a SVM model. In the SVM i didn't initially use the log transformed variables. RMSE from the non-transformed predictors is 3.9.

If i am to compare the two models, should i use the transformed variables in SVM and then compare that rmse with that of the linear model or is there a way to back-transform the rmse from the linear model to compare it with the SVM model.

Regards

$\endgroup$
2
  • $\begingroup$ Are you saying that you are trying to predict y and you train a linear model with log(y) as the output and an SVM model with y as the output ? $\endgroup$ Commented May 1, 2016 at 23:50
  • $\begingroup$ @Romain: Yes. For the linear model i transformed both the response and predictors due to non-constant variance, but since those assumptions dont hold for SVM i modeled using the original predictors rather than the transformed ones. $\endgroup$ Commented May 2, 2016 at 0:09

2 Answers 2

2
$\begingroup$

Let consider a classic ML problem: $X_{train}$ (the data for training), $y_{train}$ (the response for training), $X_{test}$ (the data for testing), $y_{test}$ (the data for testing).

You are using 2 models: linear regression ($LinReg$) and the $SVM$ and you train them in the following way:

  • Linear Regression:

    transform some variables $X_{train,transform} = f(X_{train})$

    Then train: $log(y_{train}) = LinReg(X_{train,transform})$

  • SVM:

    Train $y_{train} = SVM(X_{train})$

To predict you go through the same steps:

  • Linear Regression:

    transform using previous transformation $f$. $X_{test,transform} = f(X_{test})$

    Then get the $y's$: $\hat{y}_{test} = exp(LinReg(X_{test,transform}))$

  • SVM:

    Train $\hat{y}_{test} = SVM(X_{test})$

If you want to compare the 2 models you can use either log or non log metric. Without the log:

  • $RMSE^{SVM} = \|\hat{y}_{test} - y_{test}\|/\sqrt{n} = \|SVM(X_{test}) - y_{test} \|/\sqrt{n}$

  • $RMSE^{Reg} = \|\hat{y}_{test} - y_{test}\|/\sqrt{n} = \|LinReg(X_{test,transform}) - y_{test} \|/\sqrt{n}$

With the log:

  • $RMSE^{SVM} = \|log(\hat{y}_{test}) - log(y_{test})\|/\sqrt{n} = \|log(SVM(X_{test})) - log(y_{test}) \|/\sqrt{n}$

  • $RMSE^{Reg} = \|log(\hat{y}_{test}) - log(y_{test})\|/\sqrt{n} = \|log(LinReg(X_{test,transform})) - log(y_{test}) \|/\sqrt{n}$

With $n$ the number of points in the testing set and $\|.\|$ the euclidean norm.

Finally if you want you can also re-compute training RMSE (with or without log) by just replacing $test$ with $train$ in above equations. Hope this answer your question.

$\endgroup$
1
  • $\begingroup$ Just wanted to point out that the above is incorrect. Taking the log of a transformed value and an untransformed value does not make the two values comparable. See my answer below $\endgroup$ Commented Jan 4, 2019 at 22:31
1
$\begingroup$

$\hat{y}$ and $y_{test}$ should be back-transformed/"un-logged" in order for the RMSE of the log-log regression model to be comparable to SVM

MSE(log-log model) =$ \frac{(e^\hat{yi} - e^{yi-test})^{2}}{n} $

RMSE (log-log model) = ${\sqrt{ MSE}}$

Compute RMSE for SVM as normal

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.