3

I am trying to process and update rows in a dataframe through a function, and return the dataframe to finish using it. When I try to return the dataframe to the original function call, it returns a series and not the expected column updates. A simple example is below:

df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index = ['a', 'b', 'c', 'd', 'e', 'f'], columns=['A']) def get_item(data): comb=pd.DataFrame() comb['Newfield'] = data #create new columns comb['AnotherNewfield'] = 'y' return pd.DataFrame(comb) 

Caling a function using apply:

>>> newdf = df['A'].apply(get_item) >>> newdf a A Newfield AnotherNewfield a adam st... b A Newfield AnotherNewfield e sed st... c A Newfield AnotherNewfield d dave st... d A Newfield AnotherNewfield d dave st... e A Newfield AnotherNewfield s NaN st... f A Newfield AnotherNewfield m NaN str(... Name: A, dtype: object >>> type(newdf) <class 'pandas.core.series.Series'> 

I assume that apply() is bad here, but am not quite sure how I 'should' be updating this dataframe via function otherwise.

Edit: I appologize but i seems I accidentally deleted the sample function on an edit. added it back here as I attempt a few other things I found in other posts.

Testing in a slightly different manner with individual variables - and returning multiple series variables -> seems to work so I will see if this is something I can do in my actual case and update.

def get_item(data): value = data #create new columns AnotherNewfield = 'y' return pd.Series(value),pd.Series(AnotherNewfield) df['B'], df['C'] = zip(*df['A'].apply(get_item)) 
5
  • 1
    df['A'] = df['A'].apply(get_item) Commented Sep 1, 2021 at 23:23
  • Thanks for the response- it would seem that this would only return one column A right? Essentially was hoping to return the new columns as well as a dataframe. Are you saying that I may need to do each column individually? Commented Sep 2, 2021 at 1:53
  • 1
    drum's answer doesn't return anything. It modifies column 'A' of the dataframe rather than assigning the modified series to a new varaible. Commented Sep 2, 2021 at 15:09
  • 1
    Your dataframes are stored in column A so you have a series of dataframes. Commented Sep 2, 2021 at 20:44
  • Yea that seems to be the case. I was able to get something close to work out the way i wanted (ie. if i create new columns, return them) using this: def get_item(data): value = data #create new columns AnotherNewfield = 'y' return pd.Series(value),pd.Series(AnotherNewfield) df['B'], df['C'] = zip(*df['A'].apply(get_item)) -> ill see if that is something I can apply to my actual code and reupdate. Commented Sep 2, 2021 at 21:03

3 Answers 3

1

You could use groupby with apply to get dataframe from apply call, like this:

import pandas as pd # add new column B for groupby - we need single group only to do the trick df = pd.DataFrame( {'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]}, index=['a', 'b', 'c', 'd', 'e', 'f']) def get_item(data): # create empty dataframe to be returned comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None) # append series data (or any data) to dataframe's columns comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True) comb['AnotherNewfield'] = 'y' # return complete dataframe return comb # use column B for group to get tuple instead of dataframe newdf = df.groupby('B').apply(get_item) # after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation) newdf.droplevel(0) 

Output:

 Newfield AnotherNewfield 0 adam y 1 ed y 2 dra y 3 dave y 4 sed y 5 mike y 
Sign up to request clarification or add additional context in comments.

Comments

1

This will work:

df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index =['a', 'b', 'c', 'd', 'e', 'f'], columns=['A']) def get_item(data): comb=pd.DataFrame() comb['Newfield'] = data #create new columns comb['AnotherNewfield'] = 'y' return comb new_df = get_item(df) 

Comments

0

For anyone looking for a potential answer to this, I got the desired result when executing this code I found in another post. Will post that guy's name to credit him, but this essentially allowed me to edit the function and get the data that was created in the different columns via the apply function:

def get_item(data): value = data #create new columns using variables AnotherNewfield = 'y' return pd.Series(value),pd.Series(AnotherNewfield) >>> df['B'], df['C'] = zip(*df['A'].apply(get_item)) >>> df A B C a adam (adam,) (y,) b ed (ed,) (y,) c dra (dra,) (y,) d dave (dave,) (y,) e sed (sed,) (y,) f mike (mike,) (y,) >>> 

The only problem it brings is - the parenthesis and comma come with the data. I intend to get rid of that in the code outside of the function. Perhaps this

>>> df['B'] = df['B'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x))) >>> df A B C a adam adam (y,) b ed ed (y,) c dra dra (y,) d dave dave (y,) e sed sed (y,) f mike mike (y,) >>> df['C'] = df['C'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x))) >>> df A B C a adam adam y b ed ed y c dra dra y d dave dave y e sed sed y f mike mike y 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.