Standardizing features by one specific feature

Question

I am working on a project with a dataset that looks something like the following:

 velocity accel_amp f_vert tau_vert f_pitch_filt tau 0 3.778595 -5.796777 2.400000 32.753227 1.600000 27.844535 1 1.970611 -6.087134 2.272727 32.638705 1.704545 30.639998 2 3.581163 -6.241817 2.400000 32.850969 1.600000 30.449256 3 4.735210 -6.109532 1.400000 28.809865 1.000000 127.749313 4 5.340568 -6.614317 1.400000 20.249699 1.000000 124.549628

I was suggested to standardize the last 5 features by velocity in order to improve my PCA. Does this simply mean to take each element in these last 5 columns, subtract the mean of the velocity column, and then divide by the standard deviation of the velocity column?

That is how I interpreted this suggestion. Is there a functionality in Python for doing this? Any suggestions or clarification would be appreciated.

Thanks.

Although the suggested answer does what you described, but I wonder what this kind of standardization mean? Maybe other features are directly correlation to velocity, what about their distribution, normally distributed? I am not sure I understand what goes on here. You may want to ask this question at stats.stackexchange.com — TwinPenguins
– TwinPenguins, Commented Apr 3, 2020 at 7:14
1) it’s not clear why it was suggested to standardize by velocity, I guess it was some misunderstanding. Maybe you could clarify with the person who suggested it? 2) what does it mean „improve PCA“? 3) in general, it is advisable to standardize the data before PCA if the columns have significantly different scale. However, one takes the mean and standard deviation of particular column. See the following question and the questions/ answers linked therein. stats.stackexchange.com/questions/69157/… — aivanov
– aivanov, Commented Jan 10, 2021 at 21:21

Guilherme Marques · Accepted Answer · 2020-04-02 21:29:41Z

If this is a pandas DataFrame:

vel_mean = df.velocity.mean() vel_std = df.velocity.std() df = df.apply(lambda x: (x - vel_mean) / vel_std)

In case it is a numpy array:

data = (data - np.mean(data, axis=0)[0]) / np.std(data, axis=0)[0]

Stack Exchange Network

Standardizing features by one specific feature

1 Answer 1

Hot Network Questions

Standardizing features by one specific feature

1 Answer 1

Related

Hot Network Questions