How to give name to topics created using LDA?

Question

I have categorized 800,000 documents into 500 categories using the Mahout topic modelling.

Instead of representing the topic using the top 5/10 words for each topics, I want to infer a generic name for the group using any existing algorithm. For the time being, I have used the following algorithm to arrive at the name for the topic:

For each topic

Take all the documents belonging to the topic (using the document-topic distribution output)
Run python nltk to get the noun phrases
Create the TF file from the output
name for the topic is the phrase (limited towards max 5 words)

Please suggest a approach to arrive at more relevant name for the topics.

Emre · Accepted Answer · 2016-01-07 17:32:32Z

I can suggest several papers on this topic:

Automatic Labelling of Topic Models
Automatic Labeling Hierarchical Topics
Representing Topics Labels for Exploring Digital Libraries

You can find more by looking at their citations.

thanks... i will check the papers (in particular the first one) — adihere
– adihere, Commented Jan 8, 2016 at 19:14

chewpakabra · Accepted Answer · 2016-01-07 10:03:23Z

If you don't want to dig into much NLP in that task, I suggest you to generate a set of most frequent NGrams (of lengths 2-5) from your documents and find the most distinct ngrams for each category using TF*IDF metric as sense importance of a particular ngram (normalizing measure by word count) and selecting those Ngrams that are used in a particular category and are not (or rarely) used in others.

thanks for the suggestion. But initially i had tried with NGrams(3 words) with tf-idf approach. But the label generated were jot that meaningful. Can you suggest any NLP approach which will be more helpful. — adihere
– adihere, Commented Jan 8, 2016 at 19:13

CpILL · Accepted Answer · 2018-05-16 01:02:33Z

0

You might try using word vectors to average the top N words in a topic and then using the cosine similarity to find the closest word in the corpus?

Just a quick and dirty an idea...

answered May 16, 2018 at 1:02

CpILL

1012 bronze badges

$\begingroup$ i have tried this approach.Also added tf-idf so that the words are unique for topic. But the result is not that encouraging $\endgroup$

adihere
– adihere

2018-05-17 10:26:52 +00:00
Commented May 17, 2018 at 10:26
$\begingroup$ Thanks, I was thinking of trying this myself but won’t bother now. $\endgroup$

CpILL
– CpILL

2018-05-22 18:59:52 +00:00
Commented May 22, 2018 at 18:59

Add a comment |

Learning stats by example · Accepted Answer · 2020-08-06 02:36:24Z

A few ideas you'll often see..

Generate a list from Wikipedia titles, extract keyphrases, predict the related wikipedia pages and use the keyphrases.
Generate a hand-labeled dataset.
Use a graph populated with topics and the relations between words and topics to predict the most likely topics
Abstractive summarization and keyphrase extraction

Stack Exchange Network

How to give name to topics created using LDA?

4 Answers 4

Linked

Hot Network Questions

How to give name to topics created using LDA?

4 Answers 4

Linked

Related

Hot Network Questions