How to generate data if algo itself is involved in the process with a feedback loop?

Question

I have an algorithm which would be a rather easy classification task with a set of features and a class output which I would like to solve with a machine learning algo.

But I am having issues and doubts about the data generation. The features my algo uses as inputs are processed beforehand by other algorithms and more importantly, also have a feedback loop to my the algorithm I want to change.

Basically, the better my algo is getting, the less false positives there should be. But with less false positives, I have more and more imbalanced data to work with, which would mean, it is harder to train the algorithm. I could reduce the performance of my algo on purpose and generate data, but then I am not sure if the data I am getting is any meaningful as there is a feedback loop.

To me this seems like a chicken, egg problem.

atmarges · Accepted Answer · 2018-09-18 23:34:02Z

Are you perhaps doing ensembles?

Usually, for imbalanced dataset, the easiest way is to oversample or undersample the data. You either repeat some data on classes containing small samples or cut-off some sample data on classes with very high frequency to make a balanced dataset.

Other technique is to use weights for classes with respect to the frequency of each class.

Another one is to build a model that generates artificial inputs like that in generative adversarial networks.

user2974951 · Accepted Answer · 2018-09-18 10:06:12Z

This does sounds like a bad idea, since you are selecting your data beforehand and hence likely to cause sample bias. Have you looked at anomaly detection approaches?

Stack Exchange Network

How to generate data if algo itself is involved in the process with a feedback loop?

2 Answers 2

Hot Network Questions

How to generate data if algo itself is involved in the process with a feedback loop?

2 Answers 2

Related

Hot Network Questions