Generate columns of top ranked values in Pandas

Question

I have a dataframe topic_data that contains the output of an LDA topic model:

topic_data.head(15) topic word score 0 0 Automobile 0.063986 1 0 Vehicle 0.017457 2 0 Horsepower 0.015675 3 0 Engine 0.014857 4 0 Bicycle 0.013919 5 1 Sport 0.032938 6 1 Association_football 0.025324 7 1 Basketball 0.020949 8 1 Baseball 0.016935 9 1 National_Football_League 0.016597 10 2 Japan 0.051454 11 2 Beer 0.032839 12 2 Alcohol 0.027909 13 2 Drink 0.019494 14 2 Vodka 0.017908

This shows the top 5 terms for each topic, and the score (weight) for each. What I'm trying to do is reformat so that the index is the rank of the term, the columns are the topic IDs, and the values are formatted strings generated from the word and score columns (something along the lines of "%s (%.02f)" % (word,score)). That means the new dataframe should look something like this:

Topic 0 1 ... Rank 0 Automobile (0.06) Sport (0.03) ... 1 Vehicle (0.017) Association_football (0.03) ... ... ... ... ...

What's the right way of going about this? I assume it involves a combination of index-setting, unstacking, and ranking, but I'm not sure of the right approach.

CT Zhu · Accepted Answer · 2015-11-06 22:13:57Z

It would be something like this, note that Rank has to be generated first:

In [140]: df['Rank'] = (-1*df).groupby('topic').score.transform(np.argsort) df['New_str'] = df.word + df.score.apply(' ({0:.2f})'.format) df2 = df.sort(['Rank', 'score'])[['New_str', 'topic','Rank']] print df2.pivot(index='Rank', values='New_str', columns='topic') topic 0 1 2 Rank 0 Automobile (0.06) Sport (0.03) Japan (0.05) 1 Vehicle (0.02) Association_football (0.03) Beer (0.03) 2 Horsepower (0.02) Basketball (0.02) Alcohol (0.03) 3 Engine (0.01) Baseball (0.02) Drink (0.02) 4 Bicycle (0.01) National_Football_League (0.02) Vodka (0.02)

Collectives™ on Stack Overflow

Generate columns of top ranked values in Pandas

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related