Skip to main content
replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/
Source Link

Suppose we want to build a binary classifier with weighted loss, i.e., it penalize different types of errors (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data?

For example, suppose we are doing some fraud detection problem (let's assume the prior is 50% to 50% fraud vs. normal here, although most fraud detection are extremely imbalanced), where we can afford some false positives (false alerts on normal transactions), but really want to avoid false negatives (missed detection on fraud transactions).

Let's say we want the loss ratio to be 1:5 (false positive : false negative), can we make 5 copies of my fraud transactions?

Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say a transaction is a fraud one. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this postthis post), then this hack will not work well.

Any formal/mathematical explanations?

Suppose we want to build a binary classifier with weighted loss, i.e., it penalize different types of errors (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data?

For example, suppose we are doing some fraud detection problem (let's assume the prior is 50% to 50% fraud vs. normal here, although most fraud detection are extremely imbalanced), where we can afford some false positives (false alerts on normal transactions), but really want to avoid false negatives (missed detection on fraud transactions).

Let's say we want the loss ratio to be 1:5 (false positive : false negative), can we make 5 copies of my fraud transactions?

Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say a transaction is a fraud one. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.

Any formal/mathematical explanations?

Suppose we want to build a binary classifier with weighted loss, i.e., it penalize different types of errors (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data?

For example, suppose we are doing some fraud detection problem (let's assume the prior is 50% to 50% fraud vs. normal here, although most fraud detection are extremely imbalanced), where we can afford some false positives (false alerts on normal transactions), but really want to avoid false negatives (missed detection on fraud transactions).

Let's say we want the loss ratio to be 1:5 (false positive : false negative), can we make 5 copies of my fraud transactions?

Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say a transaction is a fraud one. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.

Any formal/mathematical explanations?

Notice removed Draw attention by CommunityBot
Bounty Ended with no winning answer by CommunityBot
Notice added Draw attention by HXD
Bounty Started worth 50 reputation by HXD
added examples to illustrate the quesiton
Source Link
HXD
  • 37.8k
  • 27
  • 152
  • 249

Suppose we want to build a binary classifier with weighted loss, i.e., it treatpenalize different types of errorerrors (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data? Specifically

For example, ifsuppose we wantare doing some fraud detection problem (let's assume the loss forprior is 50% to 50% fraud vs. normal here, although most fraud detection are extremely imbalanced), where we can afford some false positivepositives (false alerts on normal transactions), but really want to avoid false negativenegatives (missed detection on fraud transactions).

Let's say we want the loss ratio to be 1:5 (false positive : false negative), can we make 5 copies of my positive datafraud transactions? 

Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say positivea transaction is a fraud one. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.

Any formal/mathematical explanations?

Suppose we want to build a binary classifier with weighted loss, i.e., it treat different types of error (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data? Specifically, if we want the loss for false positive to false negative to be 1:5, can we make 5 copies of my positive data? Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say positive. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.

Any formal/mathematical explanations?

Suppose we want to build a binary classifier with weighted loss, i.e., it penalize different types of errors (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data?

For example, suppose we are doing some fraud detection problem (let's assume the prior is 50% to 50% fraud vs. normal here, although most fraud detection are extremely imbalanced), where we can afford some false positives (false alerts on normal transactions), but really want to avoid false negatives (missed detection on fraud transactions).

Let's say we want the loss ratio to be 1:5 (false positive : false negative), can we make 5 copies of my fraud transactions? 

Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say a transaction is a fraud one. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.

Any formal/mathematical explanations?

Source Link
HXD
  • 37.8k
  • 27
  • 152
  • 249

Can I hack weighted loss function by creating multiple copies of data

Suppose we want to build a binary classifier with weighted loss, i.e., it treat different types of error (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.

Can I hack it by manipulating my data? Specifically, if we want the loss for false positive to false negative to be 1:5, can we make 5 copies of my positive data? Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say positive. So the false negative will be reduced.

My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.

Any formal/mathematical explanations?