replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/

edited Apr 13, 2017 at 12:44

1

Suppose we want to build a binary classifier with weighted loss, i.e., it penalize different types of errors (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data?

For example, suppose we are doing some fraud detection problem (let's assume the prior is 50% to 50% fraud vs. normal here, although most fraud detection are extremely imbalanced), where we can afford some false positives (false alerts on normal transactions), but really want to avoid false negatives (missed detection on fraud transactions).

Let's say we want the loss ratio to be 1:5 (false positive : false negative), can we make 5 copies of my fraud transactions?

Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say a transaction is a fraud one. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post this post), then this hack will not work well.

Any formal/mathematical explanations?

Suppose we want to build a binary classifier with weighted loss, i.e., it penalize different types of errors (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data?

For example, suppose we are doing some fraud detection problem (let's assume the prior is 50% to 50% fraud vs. normal here, although most fraud detection are extremely imbalanced), where we can afford some false positives (false alerts on normal transactions), but really want to avoid false negatives (missed detection on fraud transactions).

Let's say we want the loss ratio to be 1:5 (false positive : false negative), can we make 5 copies of my fraud transactions?

Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say a transaction is a fraud one. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.

Any formal/mathematical explanations?

Suppose we want to build a binary classifier with weighted loss, i.e., it penalize different types of errors (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data?

For example, suppose we are doing some fraud detection problem (let's assume the prior is 50% to 50% fraud vs. normal here, although most fraud detection are extremely imbalanced), where we can afford some false positives (false alerts on normal transactions), but really want to avoid false negatives (missed detection on fraud transactions).

Let's say we want the loss ratio to be 1:5 (false positive : false negative), can we make 5 copies of my fraud transactions?

Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say a transaction is a fraud one. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.

Any formal/mathematical explanations?

Notice removed Draw attention by CommunityBot

occurred Jul 29, 2016 at 16:45

Bounty Ended with no winning answer by CommunityBot

occurred Jul 29, 2016 at 16:45

Notice added Draw attention by HXD

occurred Jul 21, 2016 at 15:41

Bounty Started worth 50 reputation by HXD

occurred Jul 21, 2016 at 15:41

added examples to illustrate the quesiton

Source Link

edited Jul 8, 2016 at 14:56

HXD

37.8k
27
152
249

Suppose we want to build a binary classifier with weighted loss, i.e., it treatpenalize different types of errorerrors (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data? Specifically

For example, ifsuppose we wantare doing some fraud detection problem (let's assume the loss forprior is 50% to 50% fraud vs. normal here, although most fraud detection are extremely imbalanced), where we can afford some false positivepositives (false alerts on normal transactions), but really want to avoid false negativenegatives (missed detection on fraud transactions).

Let's say we want the loss ratio to be 1:5 (false positive : false negative), can we make 5 copies of my positive datafraud transactions?

Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say positivea transaction is a fraud one. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.

Any formal/mathematical explanations?

Source Link

asked Jul 8, 2016 at 14:27

HXD

37.8k
27
152
249

Can I hack weighted loss function by creating multiple copies of data

Suppose we want to build a binary classifier with weighted loss, i.e., it treat different types of error (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data? Specifically, if we want the loss for false positive to false negative to be 1:5, can we make 5 copies of my positive data? Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say positive. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.

Any formal/mathematical explanations?

Stack Exchange Network

Return to Question

Can I hack weighted loss function by creating multiple copies of data