1

I'm using the following generator:

datagen = ImageDataGenerator( fill_mode='nearest', cval=0, rescale=1. / 255, rotation_range=90, width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.5, horizontal_flip=True, vertical_flip=True, validation_split = 0.5, ) train_generator = datagen.flow_from_dataframe( dataframe=traindf, directory=train_path, x_col="id", y_col=classes, subset="training", batch_size=8, seed=123, shuffle=True, class_mode="other", target_size=(64,64)) STEP_SIZE_TRAIN = train_generator.n // train_generator.batch_size valid_generator = datagen.flow_from_dataframe( dataframe=traindf, directory=train_path, x_col="id", y_col=classes, subset="validation", batch_size=8, seed=123, shuffle=True, class_mode="raw", target_size=(64, 64)) STEP_SIZE_VALID = valid_generator.n // valid_generator.batch_size 

Now the problem is that the validation data is also being augmented which I guess is not something you'd want to do while training. How do I avoid this? I don't have two directories for train and validation. I want to use a single dataframe to train the network. Any suggestions?

1

4 Answers 4

3

The solution my friend found was using a different generator but with the same validation split and no shuffle.

datagen = ImageDataGenerator( #featurewise_center=True, #featurewise_std_normalization=True, rescale=1. / 255, rotation_range=90, width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.5, horizontal_flip=True, vertical_flip=True, validation_split = 0.15, ) valid_datagen=ImageDataGenerator(rescale=1./255,validation_split=0.15) 

and then you can define the two generators as

train_generator = datagen.flow_from_dataframe( dataframe=traindf, directory=train_path, x_col="id", y_col=classes, subset="training", batch_size=64, seed=123, shuffle=False, class_mode="raw", target_size=(224,224)) STEP_SIZE_TRAIN = train_generator.n // train_generator.batch_size valid_generator = valid_datagen.flow_from_dataframe( dataframe=traindf, directory=train_path, x_col="id", y_col=classes, subset="validation", batch_size=64, seed=123, shuffle=False, class_mode="raw", target_size=(224, 224)) STEP_SIZE_VALID = valid_generator.n // valid_generator.batch_size 
Sign up to request clarification or add additional context in comments.

Comments

1

You can resolve this issue with a small change in your code. You can add one more ImageDataGenerator object named test_datagen, in which you will only pass the rescale parameter and no augmentation technique. So, the augmenting techniques will be in a different object, for you its datagen.You also have to split you training and testing directory before passing it to train and test data generators. I am giving you a sample code from TensorFLow, you can also refer to this.

#For traning data train_datagen = ImageDataGenerator( rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True) #For testing data test_datagen = ImageDataGenerator(rescale=1./255) train_generator = train_datagen.flow_from_directory( 'data/train', target_size=(150, 150), batch_size=32, class_mode='binary') validation_generator = test_datagen.flow_from_directory( 'data/validation', target_size=(150, 150), batch_size=32, class_mode='binary') model.fit_generator( train_generator, steps_per_epoch=2000, epochs=50, validation_data=validation_generator, validation_steps=800) 

1 Comment

I can't use flow from directory... I have a dataframe :(
1

You should see this related question's answer: When using Data augmentation is it ok to validate only with the original images?

It says to use ImageDataGenerator with empty parameters when loading validation data, such as:

train_gen = ImageDataGenerator(aug_params).flow_from_directory(train_dir) valid_gen = ImageDataGenerator().flow_from_directory(valid_dir) model.fit_generator(train_gen, validation_data=valid_gen) 

Comments

0

Try spitting your dataframe into separate dataframes. Then you can just do a separate generator for each dataframe.

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.