I have an algorithm which would be a rather easy classification task with a set of features and a class output which I would like to solve with a machine learning algo.
But I am having issues and doubts about the data generation. The features my algo uses as inputs are processed beforehand by other algorithms and more importantly, also have a feedback loop to my the algorithm I want to change.
Basically, the better my algo is getting, the less false positives there should be. But with less false positives, I have more and more imbalanced data to work with, which would mean, it is harder to train the algorithm. I could reduce the performance of my algo on purpose and generate data, but then I am not sure if the data I am getting is any meaningful as there is a feedback loop.
To me this seems like a chicken, egg problem.