I have a large dataset 5GB that I want to use for training a neural network model designed using Keras. Although I am using Nvidia Tesla P100 GPU, the training is really slow (each epoch takes ~ 60-70s) (I choose the batch size=10000). After reading and searching, I found out that I can improve the training speed by using keras fit_generator instead of the typical fit. To do so, I coded the following:
from __future__ import print_function import numpy as np from keras import Sequential from keras.layers import Dense import keras from sklearn.model_selection import train_test_split def generator(C, r, batch_size): samples_per_epoch = C.shape[0] number_of_batches = samples_per_epoch / batch_size counter = 0 while 1: X_batch = np.array(C[batch_size * counter:batch_size * (counter + 1)]) y_batch = np.array(r[batch_size * counter:batch_size * (counter + 1)]) counter += 1 yield X_batch, y_batch # restart counter to yeild data in the next epoch as well if counter >= number_of_batches: counter = 0 if __name__ == "__main__": X, y = readDatasetFromFile() X_tr, X_ts, y_tr, y_ts = train_test_split(X, y, test_size=.2) model = Sequential() model.add(Dense(16, input_dim=X.shape[1])) model.add(keras.layers.advanced_activations.PReLU()) model.add(Dense(16)) model.add(keras.layers.advanced_activations.PReLU()) model.add(Dense(16)) model.add(keras.layers.advanced_activations.PReLU()) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) batch_size = 1000 model.fit_generator(generator(X_tr, y_tr, batch_size), epochs=200, steps_per_epoch=X.shape[0]/ batch_size, validation_data=generator(X_ts, y_ts, batch_size * 2), validation_steps=X.shape[0] / batch_size * 2, verbose=2, use_multiprocessing=True) loss, accuracy = model.evaluate(X_ts, y_ts, verbose=0) print(loss, accuracy) After running with fit_generator, the training time improved a little bit but it is still slow (each epoch now takes ~ 40-50s). When running nvidia-smi in the terminal, I found out that GPU utilization is ~15% only which makes me wonder if my code is wrong. I am posting my code above to kindly ask you if there is a bug causing to slow the performance of GPU.
Thank you,
CUDA_VISIBLE_DEVICES?