1

I am trying to take a artist with matching id that make music across various singular to combinations of genres.

This is what I am trying to do

Artist | Id | Genre | Jazz | Blues | Rock | Trap | Rap | Hip-Hop | Pop | Rb | ---------------------------------------------------------------------------------------------------- Bob | 1 | [Jazz, Blues] | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 ---------------------------------------------------------------------------------------------------- Fred | 2 | [Rock,Jazz] | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 ---------------------------------------------------------------------------------------------------- Jeff | 3 | [Trap, Rap, Hip-Hop] | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 ---------------------------------------------------------------------------------------------------- Amy | 4 | [Pop, Rock, Jazz] | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 ---------------------------------------------------------------------------------------------------- Mary | 5 | [Hip-Hop, Jazz, Rb] | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 ---------------------------------------------------------------------------------------------------- 

this is the error i get

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-50-7a4ed81e14d7> in <module> 11 for index, row in artist_df.iterrows(): 12 x.append(index) ---> 13 for i in row['genre']: 14 artists_with_genres.at[index, genre] = 1 15 TypeError: 'float' object is not iterable 

These (artists) genres are attributes I will use to help to determine a similar artists when combined with other factors like years, songs or demographics.

The new columns I am creating and iterating through will specify whether a artist is in a genre. With 1/0 to simply represent whether the artist is rock/hip-hop/trap etc. or not. using binary representation of attributes.

this is the current dataframe

enter image description here

Took my data frame and split the genres into individual so I can convert to 1/0 binary representation.

Do I need to set genre to the index?

1st took data frame like this

Artist | Id | Genre | -------------------------------------- Bob | 1 | Jazz | Blues -------------------------------------- Fred | 2 | Rock | Jazz -------------------------------------- Jeff | 3 | Trap | Rap | Hip-Hop -------------------------------------- Amy | 4 | Pop | Rock | Jazz -------------------------------------- Mary | 5 | Hip-Hop | Jazz | Rb 

This is what I am trying to do

Artist | Id | Genre | Jazz | Blues | Rock | Trap | Rap | Hip-Hop | Pop | Rb | ---------------------------------------------------------------------------------------------------- Bob | 1 | [Jazz, Blues] | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 ---------------------------------------------------------------------------------------------------- Fred | 2 | [Rock,Jazz] | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 ---------------------------------------------------------------------------------------------------- Jeff | 3 | [Trap, Rap, Hip-Hop] | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 ---------------------------------------------------------------------------------------------------- Amy | 4 | [Pop, Rock, Jazz] | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 ---------------------------------------------------------------------------------------------------- Mary | 5 | [Hip-Hop, Jazz, Rb] | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 ---------------------------------------------------------------------------------------------------- 

Every genre is separated by a | so we simply have to call the split function on |.

[![artist_df\['genres'\] = artist_df.genres.str.split('|') artist_df.head()][1]][1] 

First make a copy of the df into df.

artists_with_genres = df.copy(deep=True) 

Then iterate through df, then append the artists genres as columns of 1s or 0s.

1 if that column contains artists in the genre at the present index and 0 if not.

x = [] for index, row in artist_df.iterrows(): x.append(index) for genre in row['genres']: artists_with_genres.at[index, genre] = 1 **Confirm that every row has been iterated and acted upon.** print(len(x) == len(artist_df)) artists_with_genres.head(30) 

Filling in the NaN values with 0 to show that a artist doesn't have that column's genre.

artists_with_genres = artists_with_genres.fillna(0) artists_with_genres.head(3) 
4
  • 1
    Have you looked at one hot encoding? pd.get_dummies() might just give you what you need. Commented Jun 4, 2020 at 18:24
  • No, I never heard of one hot encoding using pd.get_dummies() . But I will look into it Commented Jun 4, 2020 at 18:30
  • Trying but running into issue with it being a float. Commented Jun 4, 2020 at 21:06
  • 1
    use artist_df['genre'] = artist_df['genre'].astype(str) Commented Jun 4, 2020 at 23:51

1 Answer 1

5

Try this using get_dummies:

df['Genre'] = df['Genre'].str.split('|') dfx = pd.get_dummies(pd.DataFrame(df['Genre'].tolist()).stack()).sum(level=0) df = pd.concat([df, dfx], axis=1).drop(columns=['Genre']) print(df) Artist Id Blues Hip-Hop Jazz Pop Rap Rb Rock Trap 0 Bob 1 1 0 1 0 0 0 0 0 1 Fred 2 0 0 1 0 0 0 1 0 2 Jeff 3 0 1 0 0 1 0 0 1 3 Amy 4 0 0 1 1 0 0 1 0 4 Mary 5 0 1 1 0 0 1 0 0 

For detailed explanation, look here -> Pandas column of lists to separate columns

Sign up to request clarification or add additional context in comments.

10 Comments

geting this error 'AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas'
I think it is becuase is a float64 artist_df['genre'].dtypes dtype('float64')
How is Genre a float?
No clue maybe how i cleaned the data. I was able to convert to a obj
df3['genre'] = df3['genre'].astype(str)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.