Sentiment analysis of tweets (Train model on a labelled dataset and use on some other unlabelled data)

Question

I have a huge amount of tweets on a particular topic say 'ABC' and the data is not labelled. I want to perform multi-class sentiment analysis of these tweets. I tried many unsupervised clustering techniques like Kmeans, DBScan, Agglomerative clustering from sklearn but the max silhoutte score that I have reached is 0.31 and the kmeans gives large negative score. I have performed cleaning and encoding of tweets using Bert embeddings, Word2Vec but nothing seems to change.

Suppose I used some other labelled multiclass dataset and build a classifier and then use that classifier to identify sentiment in my target data, will it be good enough? Is this approach correct and logical?

I have found these general speech datasets. Will they suffice my purpose of getting correct sentiments for the "ABC" tweets dataset?

I found this another emotion dataset related to tweets.

technik · Accepted Answer · 2022-01-08 12:29:57Z

Better approach would definitely be supervised learning model. There are two alternatives for you to go:

(1) What you could try is to use a transformer model that was trained on another sentiment case, like movie or restaurant reviews. First, you could try how this model works for your use-case and then use it to label your unlabeled data.

(2) Or you could label some tweets yourself (like 100-200) and then finetune another sentiment transformer model on this data. Then you need to label a lot less data then if you start from scratch.

David Masip · Accepted Answer · 2020-06-28 08:17:36Z

The natural approach is to use a labelled dataset and a supervised learning technique. You can start with something simple, like using tf-idf for feature generation and train a simple logistic regression model.

I think this is the first thing you should try, I see it more likely to succeed than the unsupervised techniques, and it is simple enough.

I have an unlabelled tweets dataset on topic "ABC" and a labelled dataset which is just normal conversational text. Do you think the trained model on normal conversation text will predict correct labels for the "ABC" tweets dataset @David Masip — Doofenshmirtz
– Doofenshmirtz, Commented Jun 28, 2020 at 14:07

Stack Exchange Network

Sentiment analysis of tweets (Train model on a labelled dataset and use on some other unlabelled data)

2 Answers 2

Hot Network Questions

Sentiment analysis of tweets (Train model on a labelled dataset and use on some other unlabelled data)

2 Answers 2

Related

Hot Network Questions