Linked Questions
0 votes
2 answers
5k views
Mean centering and setting standard deviation to 1 in data [duplicate]
I was wondering if it is necessary to mean center and set std to 1 to the both my xs and ys in linear regression or doing that to just xs is fine enough. Lets say I use a different model, say neural ...
421 votes
7 answers
433k views
When conducting multiple regression, when should you center your predictor variables & when should you standardize them?
In some literature, I have read that a regression with multiple explanatory variables, if in different units, needed to be standardized. (Standardizing consists in subtracting the mean and dividing ...
69 votes
2 answers
102k views
Are mean normalization and feature scaling needed for k-means clustering?
What are the best (recommended) pre-processing steps before performing k-means?
38 votes
3 answers
61k views
What algorithms need feature scaling, beside from SVM?
I am working with many algorithms: RandomForest, DecisionTrees, NaiveBayes, SVM (kernel=linear and rbf), KNN, LDA and XGBoost. All of them were pretty fast except for SVM. That is when I got to know ...
32 votes
2 answers
99k views
When to normalize data in regression? [duplicate]
Under what circumstances should the data be normalized/standardized when building a regression model. When i asked this question to a stats major, he gave me an ambiguous answer "depends on the data". ...
46 votes
1 answer
49k views
When and how to use standardized explanatory variables in linear regression
I have 2 simple questions about linear regression: When is it advised to standardize the explanatory variables? Once estimation is carried out with standardized values, how can one predict with new ...
24 votes
3 answers
34k views
Why do we divide by the standard deviation and not some other standardizing factor before doing PCA?
I was reading the following justification (from cs229 course notes) on why we divide the raw data by its standard deviate: even though I understand what the explanation is saying, it is not clear to ...
10 votes
3 answers
2k views
Statistics for online dating sites
I'm curious how an online dating systems might use survey data to determine matches. Suppose they have outcome data from past matches (e.g., 1 = happily married, 0 = no 2nd date). Next, let's ...
4 votes
1 answer
9k views
Interpretation of regression coefficients when dependent variable is standardized
When the dependent variable is standardized, how does one interpret the regression coefficients of continuous or categorical independent variables? For instance, if we have $K$ groups in the data and ...
7 votes
1 answer
14k views
If you standardize X, must you always standardize y?
Related reading: When conducting multiple regression, when should you center your predictor variables & when should you standardize them? When and how to use standardized explanatory variables in ...
6 votes
2 answers
3k views
Standardize non-normal predictors before performing binomial GLMM using mean and sd?
I am planning to predict a binomial variable (1/0, a used point by an animal or point available to an animal in its range) using several continuous, distance-based predictor variables (distance to ...
6 votes
2 answers
7k views
How to compare coefficients of a negative binomial regression for determining relative importance?
I'm working in R, using glm.nb (of the MASS package) to model count data with a negative binomial regression model. I'd like to compare the relative importance of each of my predictor variables ...
3 votes
1 answer
2k views
Correct setup for leave-one-subject-out cross-validation
I've got a question concerning leave-one-subject-out cross-validation of a classifier and correct outlier handling in this case. Let's suppose I've got 5 subjects. Within each subject the features ...
2 votes
1 answer
3k views
data normalization after dimension reduction for classification
The classifier is KNN or RBF-SVM. After doing dimension reduction (e.g., PCA, LDA or KPCA, KLDA), does it need to do normalization before classification? In LIBSVM ...
5 votes
0 answers
2k views
Why would SVD be 'unstable' if you don't standardize your data first?
I'm reading an article about Direct Linear Transformation which processes data using SVD, and the data set is standardized so that it has zero mean and unit standard deviation (n.b., some people call ...