I have a dataframe topic_data that contains the output of an LDA topic model:
topic_data.head(15) topic word score 0 0 Automobile 0.063986 1 0 Vehicle 0.017457 2 0 Horsepower 0.015675 3 0 Engine 0.014857 4 0 Bicycle 0.013919 5 1 Sport 0.032938 6 1 Association_football 0.025324 7 1 Basketball 0.020949 8 1 Baseball 0.016935 9 1 National_Football_League 0.016597 10 2 Japan 0.051454 11 2 Beer 0.032839 12 2 Alcohol 0.027909 13 2 Drink 0.019494 14 2 Vodka 0.017908 This shows the top 5 terms for each topic, and the score (weight) for each. What I'm trying to do is reformat so that the index is the rank of the term, the columns are the topic IDs, and the values are formatted strings generated from the word and score columns (something along the lines of "%s (%.02f)" % (word,score)). That means the new dataframe should look something like this:
Topic 0 1 ... Rank 0 Automobile (0.06) Sport (0.03) ... 1 Vehicle (0.017) Association_football (0.03) ... ... ... ... ... What's the right way of going about this? I assume it involves a combination of index-setting, unstacking, and ranking, but I'm not sure of the right approach.