2

I am trying to classify text with only positive features and unlabeled data. I just want the algorithm to identify the positive data and want to mark everything else as negative. What would be a good machine learning algorithm to classify such data? I tried using different algorithms in Weka but almost all classifiers give a lot of false positives.

1 Answer 1

3

If you believe that the unlabelled data is mostly negatives, then probably the best thing to do is to label all unlabelled data as "negative" and run your classifier of choice. Note that if you get an unlabelled testing data point predicted to be positive, this does not mean the answer is wrong. Some of your unlabelled data could be positive. So it's hard to judge how well your classifier is doing in your setting. If you believe that your unlabelled data might be biased toward the positives then you're probably better off using so-called "one-class classifiers" on the positive data, there are popular examples including one-class SVM.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the response. I have a training set from which positive class has features that could be learnt by classifiers. But the negative class does not have any feature. The positive class is easily identifiable but a negative class is just random. So, I think I should be able to classify the text as positive or not. Would that be one-class classifier?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.