0

I have a large dataset 5GB that I want to use for training a neural network model designed using Keras. Although I am using Nvidia Tesla P100 GPU, the training is really slow (each epoch takes ~ 60-70s) (I choose the batch size=10000). After reading and searching, I found out that I can improve the training speed by using keras fit_generator instead of the typical fit. To do so, I coded the following:

from __future__ import print_function import numpy as np from keras import Sequential from keras.layers import Dense import keras from sklearn.model_selection import train_test_split def generator(C, r, batch_size): samples_per_epoch = C.shape[0] number_of_batches = samples_per_epoch / batch_size counter = 0 while 1: X_batch = np.array(C[batch_size * counter:batch_size * (counter + 1)]) y_batch = np.array(r[batch_size * counter:batch_size * (counter + 1)]) counter += 1 yield X_batch, y_batch # restart counter to yeild data in the next epoch as well if counter >= number_of_batches: counter = 0 if __name__ == "__main__": X, y = readDatasetFromFile() X_tr, X_ts, y_tr, y_ts = train_test_split(X, y, test_size=.2) model = Sequential() model.add(Dense(16, input_dim=X.shape[1])) model.add(keras.layers.advanced_activations.PReLU()) model.add(Dense(16)) model.add(keras.layers.advanced_activations.PReLU()) model.add(Dense(16)) model.add(keras.layers.advanced_activations.PReLU()) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) batch_size = 1000 model.fit_generator(generator(X_tr, y_tr, batch_size), epochs=200, steps_per_epoch=X.shape[0]/ batch_size, validation_data=generator(X_ts, y_ts, batch_size * 2), validation_steps=X.shape[0] / batch_size * 2, verbose=2, use_multiprocessing=True) loss, accuracy = model.evaluate(X_ts, y_ts, verbose=0) print(loss, accuracy) 

After running with fit_generator, the training time improved a little bit but it is still slow (each epoch now takes ~ 40-50s). When running nvidia-smi in the terminal, I found out that GPU utilization is ~15% only which makes me wonder if my code is wrong. I am posting my code above to kindly ask you if there is a bug causing to slow the performance of GPU.

Thank you,

2
  • Did you try forcefully assigning a GPU to it by using CUDA_VISIBLE_DEVICES? Commented Jun 27, 2019 at 17:00
  • @ParthasarathySubburaj Thank you for your quick repoonse! How do I do that? Commented Jun 27, 2019 at 17:01

1 Answer 1

1

Just try assigning GPUs forcefully so:

import os os.environ["CUDA_VISIBLE_DEVICES"]="0" # or if you want more than 1 GPU set it as "0", "1" 
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much. Do I have to use this assignment before importing tensorflow?
Its always better to import os first and set all your environment variables before we import other packages

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.