Skip to main content
explicated 8 lakh and corrected some minor typos (first edit missed one 0)
Source Link

I have categorized 8 lakh800,000 documents into 500 categories using the Mahout topic modelling.

Instead of representing the topic using the top 5/10 words for each topics, I want to infer a generic name for the group using any existing algorithm. For the time being  ,I I have used the following algorithm to arrive at the name for the topic:

For each topic

  • Take all the documents belonging to the topic  (using the document-topic distribution output)
  • Run python nltk to get the noun phrases
  • Create the TF file from the output
  • name for the topic is the phrase  ( limitedlimited towards max 5 words)

Please suggest a approach to arrive at more relevant name for the topics.

I have categorized 8 lakh documents into 500 categories using the Mahout topic modelling.

Instead of representing the topic using the top 5/10 words for each topics, I want to infer a generic name for the group using any existing algorithm. For the time being  ,I have used the following algorithm to arrive at the name for the topic:

For each topic

  • Take all the documents belonging to the topic(using the document-topic distribution output)
  • Run python nltk to get the noun phrases
  • Create the TF file from the output
  • name for the topic is the phrase( limited towards max 5 words)

Please suggest a approach to arrive at more relevant name for the topics.

I have categorized 800,000 documents into 500 categories using the Mahout topic modelling.

Instead of representing the topic using the top 5/10 words for each topics, I want to infer a generic name for the group using any existing algorithm. For the time being, I have used the following algorithm to arrive at the name for the topic:

For each topic

  • Take all the documents belonging to the topic  (using the document-topic distribution output)
  • Run python nltk to get the noun phrases
  • Create the TF file from the output
  • name for the topic is the phrase  (limited towards max 5 words)

Please suggest a approach to arrive at more relevant name for the topics.

Source Link
adihere
  • 81
  • 1
  • 1
  • 2

How to give name to topics created using LDA?

I have categorized 8 lakh documents into 500 categories using the Mahout topic modelling.

Instead of representing the topic using the top 5/10 words for each topics, I want to infer a generic name for the group using any existing algorithm. For the time being ,I have used the following algorithm to arrive at the name for the topic:

For each topic

  • Take all the documents belonging to the topic(using the document-topic distribution output)
  • Run python nltk to get the noun phrases
  • Create the TF file from the output
  • name for the topic is the phrase( limited towards max 5 words)

Please suggest a approach to arrive at more relevant name for the topics.