1

I am just starting to use user-defined functions, so this is probably not a very complex question, forgive me.

I have a few dataframes, which all have a column named 'interval_time' (for example) and I would like to rename this column 'Timestamp', and then make this renamed column into the index.

I know that I can do this manually with this;

df = df.rename(index=str, columns={'interval_time': 'Timestamp'}) df = df.set_index('Timestamp') 

but now I would like to define a function called rename that does this for me. I have seen that this works;

def rename_col(data, col_in='tempus_interval_time', col_out='Timestamp'): return data.rename(index=str, columns={col_in: col_out}, inplace=True) 

but when I try to add the second function it does not seem to do anything, but if I define the second part as its own function and run it it does seem to work.

I am trying this

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'): return data.rename(index=str, columns={col_in: col_out}, inplace=True) return data.set_index('Timestamp', inplace=True) 

The dataframes that I am using have the following form;

df_scada interval_time A ... X Y 0 2010-11-01 00:00:00 0.0 ... 396.36710 381.68860 1 2010-11-01 00:05:00 0.0 ... 392.97974 381.40634 2 2010-11-01 00:10:00 0.0 ... 390.15695 379.99493 3 2010-11-01 00:15:00 0.0 ... 389.02786 379.14810 
2
  • Have you tried chaining them together? return df.rename(...).set_index(...) Commented Jul 6, 2018 at 14:44
  • When a return statement gets evaluated in python, it quits out of the function call. Any further return statements are ignored. A common way to return multiple objects at once is to return a tuple containing the objects. However as the answer by Martijn points out, you don't have to return anything if you are modifying objects in place. Commented Jul 6, 2018 at 14:48

2 Answers 2

4

You don't need to return anything, because your operations are done in place. You can do the in-place changes in your function:

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'): data.rename(index=str, columns={col_in: col_out}, inplace=True) data.set_index('Timestamp', inplace=True) 

and any other references to the dataframe you pass into the function will see the changes made:

>>> import pandas as pd >>> df = pd.DataFrame({'interval_time': pd.to_datetime(['2010-11-01 00:00:00', '2010-11-01 00:05:00', '2010-11-01 00:10:00', '2010-11-01 00:15:00']), ... 'A': [0.0] * 4}, index=range(4)) >>> df A interval_time 0 0.0 2010-11-01 00:00:00 1 0.0 2010-11-01 00:05:00 2 0.0 2010-11-01 00:10:00 3 0.0 2010-11-01 00:15:00 >>> def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'): ... data.rename(index=str, columns={col_in: col_out}, inplace=True) ... data.set_index('Timestamp', inplace=True) ... >>> rename_n_index(df, 'interval_time') >>> df A Timestamp 2010-11-01 00:00:00 0.0 2010-11-01 00:05:00 0.0 2010-11-01 00:10:00 0.0 2010-11-01 00:15:00 0.0 

In the above example, the df reference to the dataframe shows the changes made by the function.

If you remove the inplace=True arguments, the method calls return a new dataframe object. You can store an intermediate result as a local variable, then apply the second method to the dataframe referenced in that local variable:

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'): renamed = data.rename(index=str, columns={col_in: col_out}) return renamed.set_index('Timestamp') 

or you can chain the method calls directly to the returned object:

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'): return data.rename(index=str, columns={col_in: col_out})\ .set_index('Timestamp')) 

Because renamed is already a new dataframe, you can apply the set_index() call in-place to that object, then return just renamed, as well:

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'): renamed = data.rename(index=str, columns={col_in: col_out}) renamed.set_index('Timestamp', inplace=True) return renamed 

Either way, this returns a new dataframe object, leaving the original dataframe unchanged:

>>> def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'): ... renamed = data.rename(index=str, columns={col_in: col_out}) ... return renamed.set_index('Timestamp') ... >>> df = pd.DataFrame({'interval_time': pd.to_datetime(['2010-11-01 00:00:00', '2010-11-01 00:05:00', '2010-11-01 00:10:00', '2010-11-01 00:15:00']), ... 'A': [0.0] * 4}, index=range(4)) >>> rename_n_index(df, 'interval_time') A Timestamp 2010-11-01 00:00:00 0.0 2010-11-01 00:05:00 0.0 2010-11-01 00:10:00 0.0 2010-11-01 00:15:00 0.0 >>> df A interval_time 0 0.0 2010-11-01 00:00:00 1 0.0 2010-11-01 00:05:00 2 0.0 2010-11-01 00:10:00 3 0.0 2010-11-01 00:15:00 
Sign up to request clarification or add additional context in comments.

2 Comments

Method chaining is a 4th possibility, i.e. renamed = data.rename(...)\ .set_index(...). Some find it aesthetically pleasing to see method calls visually aligned.
@jpp: yes, but so ugly I don't know if I want to go there ;-)
2

See @MartijnPieters' explanation for resolving the errors in your code.

However, note that the Pandorable method is to use method chaining. Some find it aesthetically pleasing to see method names visually aligned. Here's an example:

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'): renamed = data.rename(index=str, columns={col_in: col_out})\ .set_index('Timestamp') return renamed 

Then to apply these to a sequence of dataframes as in your previous question:

dfs = [df.pipe(rename_n_index) for df in (df1, df2, df3)] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.