1

I discovered a strange behavior in Python Pandas and wanted to ask if it is my fault or if it is an actual program bug. Let's take the following DataFrame:

data = DataFrame({'k2':[1, 2, 3, ], 'name':['joe', 'mark', 'carl']}) data.set_index('name', drop=False, inplace=True) 

If I create a function which returns a Series object like that

def my_test(i, x): x['interrel'] = x.apply(lambda row: i['k2'] - row['k2'] if i['name'] != row['name'] else 0, axis=1) print x['interrel'] return x['interrel'] 

and appy that function using apply to the created DataFrame using

 data.apply(lambda row: my_test(row, data), axis=1) 

all I get in the output is the last calculated row times three. However, the print statement in the my_test function shows that the calculations are correct. It seems that only the particular series objects are not appended correctly.

Can you reconstruct this problem? Did I get anything wrong regarding the use of the apply function?

Please consider that this is only an example, I am not asking for another way to do pairwise differences in Pandas

Any help is appreciated

2
  • Well you're passing the df as the second param so the final result will be the last operation Commented May 11, 2015 at 10:06
  • Thanks, you are right if I do data.copy() it works flawlessly. Could you please elaborate on your answer a little bit more as a dedicated answer, because I did not understand why this last row is broadcasted on each row Commented May 11, 2015 at 10:12

1 Answer 1

1

Because you're passing your data df as a reference and assigning directly to it each time by calling apply in your func then it overwrites with the last operation:

In [20]: def my_test(i, x): x['interrel'] = x.apply(lambda row: i['k2'] - row['k2'] if i['name'] != row['name'] else 0, axis=1) #print(x['interrel']) print("x-----",x, "\n-------") return x['interrel'] data.apply(lambda row: my_test(row, data), axis=1) x----- k2 name interrel name joe 1 joe 0 mark 2 mark -1 carl 3 carl -2 ------- x----- k2 name interrel name joe 1 joe 0 mark 2 mark -1 carl 3 carl -2 ------- x----- k2 name interrel name joe 1 joe 1 mark 2 mark 0 carl 3 carl -1 ------- x----- k2 name interrel name joe 1 joe 2 mark 2 mark 1 carl 3 carl 0 ------- Out[20]: name joe mark carl name joe 2 1 0 mark 2 1 0 carl 2 1 0 

As you've found if you pass a copy of your data then the operations are performed on the copy of the original data and the correct result is returned, you can see that the data df is unaffected:

 In [22]: def my_test(i, x): x['interrel'] = x.apply(lambda row: i['k2'] - row['k2'] if i['name'] != row['name'] else 0, axis=1) #print(x['interrel']) print("x-----",x, "\n-------") return x['interrel'] print(data.apply(lambda row: my_test(row, data.copy()), axis=1)) print(data) x----- k2 name interrel name joe 1 joe 0 mark 2 mark -1 carl 3 carl -2 ------- x----- k2 name interrel name joe 1 joe 0 mark 2 mark -1 carl 3 carl -2 ------- x----- k2 name interrel name joe 1 joe 1 mark 2 mark 0 carl 3 carl -1 ------- x----- k2 name interrel name joe 1 joe 2 mark 2 mark 1 carl 3 carl 0 ------- name joe mark carl name joe 0 -1 -2 mark 1 0 -1 carl 2 1 0 k2 name interrel name joe 1 joe 2 mark 2 mark 1 carl 3 carl 0 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.