1

So far, I have stumbled upon many advices and papers on PU Learning and Unary classification.

TLDR: Does anyone have suggestions for a semi-supervised binary classification method for labeled data of only one class and unlabeled data that can be from either classes? And I'm unsure what is the proportion of Class A to B that exists within the unlabeled data.

The simplest answer has been one-class SVM (Binary semi-supervised classification with positive only and unlabeled data set), but I have so many unlabeled examples compared to how many labeled ones I can find. And I am unsure if either the positive class or negative class are rare enough for anomaly detection.

One of the other suggested methods is the two-step process where I can figure out a set of reliable negative class data, but I cannot really identify a set of data as reliably negative (https://www.cs.uic.edu/~liub/publications/ICDM-03.pdf).

And another method suggests a weighted SVM (http://users.csc.tntech.edu/~weberle/Spring2011/CSC6910/Papers/posonly.pdf), but I am unsure if I can make the same assumption as the authors in that my positive data is a random subset of all the positive data, as I used a criteria to figure out which ones were positive, so I assume there is bias in the labeled data.

Overall, I have a lot of labeled data of positive class, that is to say the data of what I am looking for, but then I have many more unlabeled data. (Though in a way, the labeled data could also be considered data of a negative class.) And I am unsure what proportion of positive data and negative data exists within the unlabeled data, as there could be an equal distribution between the two classes. Or who knows, maybe data of the positive class could be rarer than the data of the negative class.

3
  • 2
    SO is for questions about programming, this is mostly about machine learning/stats, probably belongs on cross-validated, another stack exchange site. Commented Jul 27, 2017 at 19:32
  • 2
    Your question is not a good fit for Stack Overflow, as it appears to be focused on pure machine learning. Please consider visiting Data Science SE or Cross Validated instead. Commented Jul 27, 2017 at 19:33
  • Ah I tried Cross Validated, but only a few people want to look at the question and none have answered. Commented Jul 27, 2017 at 20:06

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.