1

I have a huge dataframe and trying to figure out the most efficient way to normalize each value in a column and in turn go through all the columns using the mean and std.dev.

A sample of the dataframe is as follows:

 TimeStamp 340 341 342 343 0 10:27:30 1.953036 2.110234 1.981548 1.705684 1 10:28:30 1.973408 2.046361 1.806923 1.496244 2 10:29:30 0.000000 0.000000 0.014881 0.198947 3 10:30:30 2.567976 3.169928 3.479591 3.557881 4 10:31:30 4415.498729 5075.996948 5653.925541 6133.202200 5 10:32:30 4473.930295 5146.802497 5736.030854 6224.380260 

I want to: find a mean for col["340"]:

 for column in df.iteritems(): df.mean() df.std() 

...further calculations for normalizing

However, I am extremely new to pandas and it is not working....:( I can find the mean per col but i have 2500 cols

1 Answer 1

1

If your looking to normalize the data, then you can do this

(df.iloc[:,1:] - df.mean().values)/df.std().values 

Assuming you want to do (X-mean)/standard Deviation normalization. Note: df.loc[] used to exempt the first column for TimeStamp...

Sign up to request clarification or add additional context in comments.

1 Comment

:) Thank you!! The nested loops were giving me trouble....that is perfect! Does exactly what I needed it to do!!!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.