1

I have a dataframe with a column contain multiple values. I need to split the values and assigned them to separate but matching columns. Example Fruits column contain different fruits. I need to splits these fruits and assign apple to apple column, banana to banana column ..etc.

 fruits 0 apple/mango 1 grapes 2 orange/apple 3 grapes/orange/apple/banana 4 banana/grapes/apple 

I created data frame but when I split values in a row ,these values are added to multiple rows instead of single row. There should be 5 rows in new data frame but I'm getting 12 rows

columns_name =['apple','banana','grapes','orange','mango'] fruits =df['fruits'].to_list() columnlists_list =[] for column in columns_name: column_list = column column_list =[] columnlists_list.append(column_list) for fruit in fruits: one_fruit = fruit.split('/') for i in range(0,len(one_fruit)): if column == one_fruit[i]: column_list.append('1') else: column_list.append('0') #data frame fruits_df = pd.DataFrame(columnlists_list) fruits_df =fruits_df.transpose() fruits_df.columns =[columns_name] 

This the result I got.

 apple banana grapes orange mango 0 1 0 0 0 0 1 0 0 0 0 1 2 0 0 1 0 0 3 0 0 0 1 0 4 1 0 0 0 0 5 0 0 1 0 0 6 0 0 0 1 0 

Please advise how to correct the error. Thank you

2
  • 1
    What is your expected output? Commented Feb 17, 2022 at 17:48
  • It should be 1st row : Apple and Mango 1 rest 0, 2nd row: only Grapes is one, 3rd row :both Orange and apple 1 rest 0..like that. New dataframe should contain only similar number of rows to original dataframe. that is 5 rows Commented Feb 17, 2022 at 17:50

3 Answers 3

1

Try using the get_dummies method in the str accessor (.str) from pandas working with text data:

df['fruits'].str.get_dummies(sep='/') 

Output:

 apple banana grapes mango orange 0 1 0 0 1 0 1 0 0 1 0 0 2 1 0 0 0 1 3 1 1 1 0 1 4 1 1 1 0 0 
Sign up to request clarification or add additional context in comments.

Comments

0

You can start by unnesting your data using explode().This requires a column of list type values which you can achieve with str.split('/').

Then you can use get_dummies() which will use all the unique values of your 'fruits' column, and chain groupby to get one row per index:

# Unnest your dataframe expl_df = df.fruits.str.split('/').explode().reset_index() >>> print(expl_df) index fruits 0 0 apple 1 0 mango 2 1 grapes 3 2 orange 4 2 apple 5 3 grapes 6 3 orange 7 3 apple 8 3 banana 9 4 banana 10 4 grapes 11 4 apple # Convert your fruits to columns pd.get_dummies(expl_df, columns=['fruits'],prefix='', prefix_sep='').groupby(['index'], as_index=False).sum().drop('index',axis=1) apple banana grapes mango orange 0 1 0 0 1 0 1 0 0 1 0 0 2 1 0 0 0 1 3 1 1 1 0 1 4 1 1 1 0 0 

1 Comment

Yes its worked. Thank oyu
0

Here is my attempt:

df['fruits']=df['fruits'].str.split("/") columns_name =['apple','banana','grapes','orange','mango'] membership_info = [] for elem in df['fruits']: membership = [] for fruit in columns_name: membership.append(1) if fruit in elem else membership.append(0) membership_info.append(membership) result_df = pd.DataFrame(membership_info, columns= columns_name) print(result_df) 

Output:

 apple banana grapes orange mango 0 1 0 0 0 1 1 0 0 1 0 0 2 1 0 0 1 0 3 1 1 1 1 0 4 1 1 1 0 0 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.