3
$\begingroup$

I have a huge amount of tweets on a particular topic say 'ABC' and the data is not labelled. I want to perform multi-class sentiment analysis of these tweets. I tried many unsupervised clustering techniques like Kmeans, DBScan, Agglomerative clustering from sklearn but the max silhoutte score that I have reached is 0.31 and the kmeans gives large negative score. I have performed cleaning and encoding of tweets using Bert embeddings, Word2Vec but nothing seems to change.

Suppose I used some other labelled multiclass dataset and build a classifier and then use that classifier to identify sentiment in my target data, will it be good enough? Is this approach correct and logical?

I have found these general speech datasets. Will they suffice my purpose of getting correct sentiments for the "ABC" tweets dataset?

I found this another emotion dataset related to tweets.

$\endgroup$

2 Answers 2

0
$\begingroup$

Better approach would definitely be supervised learning model. There are two alternatives for you to go:

(1) What you could try is to use a transformer model that was trained on another sentiment case, like movie or restaurant reviews. First, you could try how this model works for your use-case and then use it to label your unlabeled data.

(2) Or you could label some tweets yourself (like 100-200) and then finetune another sentiment transformer model on this data. Then you need to label a lot less data then if you start from scratch.

$\endgroup$
0
$\begingroup$

The natural approach is to use a labelled dataset and a supervised learning technique. You can start with something simple, like using tf-idf for feature generation and train a simple logistic regression model.

I think this is the first thing you should try, I see it more likely to succeed than the unsupervised techniques, and it is simple enough.

$\endgroup$
1
  • $\begingroup$ I have an unlabelled tweets dataset on topic "ABC" and a labelled dataset which is just normal conversational text. Do you think the trained model on normal conversation text will predict correct labels for the "ABC" tweets dataset @David Masip $\endgroup$ Commented Jun 28, 2020 at 14:07

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.