I have a huge amount of tweets on a particular topic say 'ABC' and the data is not labelled. I want to perform multi-class sentiment analysis of these tweets. I tried many unsupervised clustering techniques like Kmeans, DBScan, Agglomerative clustering from sklearn but the max silhoutte score that I have reached is 0.31 and the kmeans gives large negative score. I have performed cleaning and encoding of tweets using Bert embeddings, Word2Vec but nothing seems to change.
Suppose I used some other labelled multiclass dataset and build a classifier and then use that classifier to identify sentiment in my target data, will it be good enough? Is this approach correct and logical?
I have found these general speech datasets. Will they suffice my purpose of getting correct sentiments for the "ABC" tweets dataset?
I found this another emotion dataset related to tweets.