Skip to main content
2 of 2
explicated 8 lakh and corrected some minor typos (first edit missed one 0)

How to give name to topics created using LDA?

I have categorized 800,000 documents into 500 categories using the Mahout topic modelling.

Instead of representing the topic using the top 5/10 words for each topics, I want to infer a generic name for the group using any existing algorithm. For the time being, I have used the following algorithm to arrive at the name for the topic:

For each topic

  • Take all the documents belonging to the topic (using the document-topic distribution output)
  • Run python nltk to get the noun phrases
  • Create the TF file from the output
  • name for the topic is the phrase (limited towards max 5 words)

Please suggest a approach to arrive at more relevant name for the topics.

adihere
  • 81
  • 1
  • 1
  • 2