Machine learning algorithm to classify only positive and unlabeled data

Question

I am trying to classify text with only positive features and unlabeled data. I just want the algorithm to identify the positive data and want to mark everything else as negative. What would be a good machine learning algorithm to classify such data? I tried using different algorithms in Weka but almost all classifiers give a lot of false positives.

user2566092 · Accepted Answer · 2014-04-04 21:25:15Z

If you believe that the unlabelled data is mostly negatives, then probably the best thing to do is to label all unlabelled data as "negative" and run your classifier of choice. Note that if you get an unlabelled testing data point predicted to be positive, this does not mean the answer is wrong. Some of your unlabelled data could be positive. So it's hard to judge how well your classifier is doing in your setting. If you believe that your unlabelled data might be biased toward the positives then you're probably better off using so-called "one-class classifiers" on the positive data, there are popular examples including one-class SVM.

Thank you for the response. I have a training set from which positive class has features that could be learnt by classifiers. But the negative class does not have any feature. The positive class is easily identifiable but a negative class is just random. So, I think I should be able to classify the text as positive or not. Would that be one-class classifier?

Collectives™ on Stack Overflow

Machine learning algorithm to classify only positive and unlabeled data

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related