3

I have created a dataset from some sensor measurements and some labels and did some classification on it with good results. However, since my the amount of data in my dataset is relatively small (1400 examples) I want to generate more data based on this data. Each row from my dataset consists of 32 numeric values and a label.

Which would be the best approach to generate more data based on the existing dataset I have? So far I have looked at Generative Adversarial Networks and Autoencoders, but I don't think this methods are suitable in my case.

Until now I have worked in Scikit-learn but I could use other libraries as well.

2
  • It depends strongly on the data. If you understand the formation of the signal by the sensor, you can simulate it, if not, a GAN is usually not the worst idea, but a quite complex one. If the sensors are real, physical sensors, adding different noise models to the signal will help. Commented Jul 29, 2019 at 9:42
  • @Dschoni I don't know how the sensors build the data, I mean they are just weight and vibration sensors, so I don't think I can simulate them. But if I understood correctly, GANs build data from a random distribution, and I want to generate "whole examples" if it is possible, meaning the 32 numerical columns plus the label. Is there another method of doing that or something close to it? Commented Jul 29, 2019 at 9:46

1 Answer 1

5

The keyword is here Data Augmentation. You use your available data and modify them slightly to generate additional data which are a little bit different from your source data.

Please take a look at this link. The author uses Data Augmentation to rotate and flip the cat image. So he generate 6 additional images with different perspectives from a single source image. If you transfer this idea to your sensor data you can add some kind of random noise to your data to increase the dataset. You can find a simple example for Data Aufmentation for time series data here. enter image description here

Another approach is to window the data and move the window a small step, so the data in the window are a little bit different.

The guys from the statistics stackexchange write something about it. Please check this for additional information.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.