10

I find myself doing repetitive tasks to various [pandas][1] DataFrames, so I made a function to do the processing. How do I modify df in the function process_df(df) so that the caller sees all changes (without assigning a return value)?

A simplified version of the code:

def process_df(df): df.columns = map(str.lower, df.columns) df = pd.DataFrame({'A': [1], 'B': [2]}) process_df(df) print df 
 A B 0 1 2 

EDIT new code:

def process_df(df): df = df.loc[:, 'A'] df = pd.DataFrame({'A': [1], 'B': [2]}) process_df(df) print df 
 A B 0 1 2 
2
  • IIUC your code is working, after function process_df column names becomes [a, b] Commented Feb 2, 2016 at 5:16
  • Indeed it is. My bad. In the process of simplifying I left out the part that does not work. I will repost as a new question. Commented Feb 2, 2016 at 5:27

1 Answer 1

8

Indexing a DataFrame using ix, loc, iloc, etc. returns a view of the underlying data (it is a read operation). In order to modify the contents of the frame you will need to use in-place transforms. For example,

def process_df(df): # drop all columns except for A df.drop(df.columns[df.columns != 'A'], axis=1, inplace=True) df = DataFrame({'A':[1,2,3], 'B':[1,2,3]}) process_df(df) 

To change the order of columns, you can do something like this:

def process_df(df): # swap A and B df.columns = ['B', 'A'] df[['B', 'A']] = df[['A', 'B']] 
Sign up to request clarification or add additional context in comments.

3 Comments

How would I go about re-arranging the order of multiple columns? All the examples I have seen to do this begin with df = ... which wouldn't be an in-place transform.
Added an example above. It's not pretty. Out of curiosity, why the restriction that you can't use the return value of process_df?
Code (and error) reduction. The function is always used to modify the existing DataFrame.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.