Regression analysis: Log-transformation to meet assumptions?

Question

For my master's thesis I'm exploring the relationship between attitude towards the advertismenent (Aad), brand types (boutiques and high street) and willingness to recommend (willing or not). Therefore, I need to run two regression analyses:

A bivariate regression (Aad = B0 + B1Brandtype + e)
A multiple regression (Willingness to recommend = B0 + B1Aad + B2BrandType + B3Aad*BrandType + e

Note that the Aad data are sentiment scores derived from Instagram comments (ranging from -0,9 to 0,9, where the negative values indicate a negative attitude towards the ad). The variables willingness and brand type are dummy coded.

However, when I run the two regressions, none of the assumptions are met... (normality, linearity, homoscedasticity). So my question is: would it make sense to log-transform the variable Aad for the assumptions to be met? However, one problem is that I'm dealing with negative values and so I need to add +100 for instance and then log-transform; however, if I transform those I wouldn't be able to recognize the negative attitude anymore? So then I can only say whether it has a relationship or not, but not which brand types significantly receives more positive attitudes? Right?

It's unlikely that log transformation would help here, even with extra constant, if only for the reason you give. Plot log x over the range [99.1, 100.9] to see one reason. I think we need information on the data to give more specific advice. How big is your dataset? Can you show a histogram for aad? — Nick Cox
– Nick Cox, Commented Apr 6, 2020 at 16:38
Hi nick, thank you very much for your response. I've added my two histograms in my question above. I have a sample size of 2631. So following the CLT the assumption of normality would be met. However, then i still have the other assumptions who are not met. — Renée Stalman
– Renée Stalman, Commented Apr 6, 2020 at 17:23
Sorry, but as specified, I wanted to see a histogram for aad, namely the original outcome. The residuals don't give any kind of clear signal about whether the original variable should be transformed. Curious: what is this? SPSS? — Nick Cox
– Nick Cox, Commented Apr 6, 2020 at 17:48
OK. I don't think transforming that is either necessary or likely to be helpful. Your regression for willingness to recommend should perhaps be a logit regression. — Nick Cox
– Nick Cox, Commented Apr 6, 2020 at 18:52

Cole Wagner · Accepted Answer · 2025-05-13 03:23:13Z

I agree with others who have counseled not to arbitrarily transform the Aad variable, and want to add some additional perspective that may be useful:

First, I assume by the equations you've provided that you're performing two linear regressions. For your second model that predicts willingness to recommend, this is not the correct approach. Because your outcome is binary (yes/no) I would recommend using logistic regression instead. This also takes care of your assumptions troubles as logistic regression does not make the same distributional assumptions as linear regression. The assumptions for logistic regression are that the observations are independent and the predictors are linearly related to the log odds of the outcome.
Second, when talking about assumptions for linear regression, keep in mind that the normality and homoscedasticity assumptions apply to the residuals, not to the variables themselves. Therefore, you don't need to worry about the fact that the Aad variable is not normally distributed. It is important, however, that each of your predictors are linearly related to the outcome, so plotting scatterplots of each predictor vs. the outcome would be a good visual test of that assumption. To assess normality and homoscedasticity, you need to create residual plots.

Peter Flom · Accepted Answer · 2020-04-07 14:23:03Z

Rather than transforming the DV just to meet assumptions (and using a somewhat arbitrary transformation of log(DV + 100) ) I suggest using a method that does not rely on the assumptions on the errors. Two possibilities are robust regression and quantile regression.

Stack Exchange Network

Regression analysis: Log-transformation to meet assumptions?

2 Answers 2

Hot Network Questions

Regression analysis: Log-transformation to meet assumptions?

2 Answers 2

Related

Hot Network Questions