Python: normalizing some of the columns of a pandas DataFrame

Question

I have a DataFrame from which I want to normalize some arbitrary columns using another arbitrary column:

import itertools as it import numpy as np import pandas as pd header = tuple(['h_seqNum', 'h_stamp', 'user_id']) joints = tuple(['head', 'neck', 'torso']) attribs = tuple(['pos_x','pos_y','pos_z']) all_columns = it.izip(*it.product(joints, attribs)) multiind_first = list(it.chain(['header']*len(header), all_columns.next(), ['pose',])) multiind_second = list(it.chain(header, all_columns.next(), ['pose',])) df = pd.DataFrame(np.random.rand(65).reshape(5,13), columns = pd.MultiIndex.from_arrays([multiind_first, multiind_second], names=['joint', 'attrib']))

The resulting DataFrame is something like this one:

joint header head neck torso pose attrib h_seqNum h_stamp user_id pos_x pos_y pos_z pos_x pos_y pos_z pos_x pos_y pos_z pose 0 0.681 0.059 0.607 0.093 0.504 0.975 0.317 0.739 0.129 0.759 0.254 0.814 1 1 0.914 0.420 0.305 0.242 0.700 0.180 0.324 0.171 0.477 0.943 0.877 0.069 0 2 0.522 0.395 0.118 0.739 0.653 0.326 0.947 0.517 0.036 0.647 0.079 0.227 0 3 0.475 0.815 0.792 0.208 0.472 0.427 0.213 0.544 0.440 0.033 0.636 0.527 2 4 0.767 0.774 0.983 0.646 0.949 0.947 0.402 0.015 0.913 0.734 0.192 0.032 0

I want to normalize all the columns (attrib) belonging to an arbitrary joint (eg. 'head') using another arbitrary joint (eg. 'torso'). For instance something like.

df['head'] = df['head'] - df['torso'] df['neck'] = df['neck'] - df['torso'] # Note that torso remains "unnormalized"

To do so I wrote a function:

def normalize_joints(df, from_joint): joint_names = set(joints) - set([from_joint,]) for j in list(joint_names): df[j] = df[j] - df[norm_name]

However, when I execute this function I get the following error:

normalize_joints(df, 'torso') --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-414-47f39f04716d> in <module>() ----> 1 normalize_joints(df, 'torso') <ipython-input-407-cf13a67fabd8> in normalize_joints(df, from_joint) 2 joint_names = set(joints) - set([from_joint,]) 3 for j in list(joint_names): ----> 4 df[j] = df[j] - df[from_joint] /Library/Python/2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value) 2117 fill_value, limit, takeable=takeable) 2118 -> 2119 return frame 2120 2121 def _reindex_index(self, new_index, method, copy, level, fill_value=NA, /Library/Python/2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value) 2164 @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs) 2165 def reindex_axis(self, labels, axis=0, method=None, level=None, copy=True, -> 2166 limit=None, fill_value=np.nan): 2167 return super(DataFrame, self).reindex_axis(labels=labels, axis=axis, 2168 method=method, level=level, /Library/Python/2.7/site-packages/pandas/core/generic.pyc in _set_item(self, key, value) 677 678 __bool__ = __nonzero__ --> 679 680 def bool(self): 681 """ Return the bool of a single element PandasObject /Library/Python/2.7/site-packages/pandas/core/internals.pyc in set(self, item, value) 1768 def sp_index(self): 1769 return self.values.sp_index -> 1770 1771 @property 1772 def kind(self): /Library/Python/2.7/site-packages/pandas/core/internals.pyc in _reset_ref_locs(self) 1054 # see if we can align other 1055 if hasattr(other, 'reindex_axis'): -> 1056 if align: 1057 axis = getattr(other, '_info_axis_number', 0) 1058 other = other.reindex_axis(self.items, axis=axis, /Library/Python/2.7/site-packages/pandas/core/internals.pyc in _rebuild_ref_locs(self) 1062 1063 # make sure that we can broadcast -> 1064 is_transposed = False 1065 if hasattr(other, 'ndim') and hasattr(values, 'ndim'): 1066 if values.ndim != other.ndim or values.shape == other.shape[::-1]: AttributeError: _ref_locs

After several tries I have not been able to locate the source of my error. If I perform the operation

df['head'] - df['torso']

it returns me a DataFrame with the correct result. However, when I try to assign this DataFrame to df['head'] I get the error shown before.

Is it any way to perform this assignment?

Moreover, I was wondering if there are any better ways to perform the same normalization than the one I am trying. Perhaps using groupby and then and applying the normalize function to the selected DataFrame?

EDIT:

This error occurred with numpy 1.6 and pandas 0.12

After upgrading to numpy 1.8 and pandas 0.13 the following operation is valid:

df['head'] = df['head'] - df['torso']

In your first code block, you need to replace multiind_first with mi_level_one and multiind_second with mi_level_two. — LondonRob
– LondonRob, Commented Feb 17, 2014 at 15:21

Alvaro Fuentes · Accepted Answer · 2014-02-17 15:23:42Z

The problem is that your columns are instances of MultiIndex try this:

def normalize_joints(df, from_joint): joint_names = set(joints) - set([from_joint,]) for j in list(joint_names): keys = [(j,c) for c in attribs] df[keys] = df[j] - df[from_joint] print df normalize_joints(df, 'torso') print df

Output:

joint header head neck torso pose attrib h_seqNum h_stamp user_id pos_x pos_y pos_z pos_x pos_y pos_z pos_x pos_y pos_z pose 0 0.067366 0.957394 0.983969 0.602662 0.505270 0.990675 0.753841 0.598397 0.846479 0.757155 0.220009 0.328470 0.686525 1 0.806405 0.800388 0.302178 0.935559 0.180360 0.322767 0.230457 0.617555 0.602589 0.109482 0.181803 0.311266 0.929481 2 0.649677 0.237286 0.963088 0.370463 0.471590 0.489256 0.060383 0.070885 0.858312 0.306232 0.511731 0.257015 0.283287 3 0.054800 0.127925 0.099985 0.700160 0.211256 0.026782 0.820380 0.922593 0.600130 0.100745 0.418157 0.869735 0.597275 4 0.678372 0.334520 0.247894 0.616133 0.914610 0.229628 0.317488 0.224910 0.620222 0.952499 0.946568 0.539502 0.838473 joint header head neck torso pose attrib h_seqNum h_stamp user_id pos_x pos_y pos_z pos_x pos_y pos_z pos_x pos_y pos_z pose 0 0.067366 0.957394 0.983969 -0.154493 0.285261 0.662205 -0.003314 0.378387 0.518009 0.757155 0.220009 0.328470 0.686525 1 0.806405 0.800388 0.302178 0.826077 -0.001443 0.011501 0.120975 0.435752 0.291322 0.109482 0.181803 0.311266 0.929481 2 0.649677 0.237286 0.963088 0.064231 -0.040141 0.232241 -0.245850 -0.440846 0.601297 0.306232 0.511731 0.257015 0.283287 3 0.054800 0.127925 0.099985 0.599414 -0.206900 -0.842953 0.719635 0.504436 -0.269605 0.100745 0.418157 0.869735 0.597275 4 0.678372 0.334520 0.247894 -0.336366 -0.031958 -0.309874 -0.635011 -0.721658 0.080719 0.952499 0.946568 0.539502 0.838473

Thanks, @xndrme Your answer has raised me another question. Why if df['head'] - df['torso'] produces a pd.DataFrame with the same results as your answer it is not possible to assign it to df['head']? I understand that it must be something related with the MultiIndex, but I do not see why
The problem is that df['head'] on a multi-index is just partial, it works for getting the data but it seems that for setting you should provide the entire multilevel indexes (I think it has something to do with the implementation of pandas, maybe some of its developer could answer your question better ;)
Somehow it seems that the developers had this issue in mind. Upgrading to numpy 1.8 and pandas 0.13 fixed the problem.

VGonPa · Accepted Answer · 2014-02-17 18:08:33Z

I believe that I have found a rather simple solution:

def normalize(df, from_joint): df.drop(['header', 'pose', from_joint], axis=1, level='joint').sub(df[from_joint], level=1) df.update(normalize(df, 'torso'))

Collectives™ on Stack Overflow

Python: normalizing some of the columns of a pandas DataFrame

2 Answers 2

4 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Related