The performance of GPU still slow even by keras fit_generator method

Question

I have a large dataset 5GB that I want to use for training a neural network model designed using Keras. Although I am using Nvidia Tesla P100 GPU, the training is really slow (each epoch takes ~ 60-70s) (I choose the batch size=10000). After reading and searching, I found out that I can improve the training speed by using keras fit_generator instead of the typical fit. To do so, I coded the following:

from __future__ import print_function import numpy as np from keras import Sequential from keras.layers import Dense import keras from sklearn.model_selection import train_test_split def generator(C, r, batch_size): samples_per_epoch = C.shape[0] number_of_batches = samples_per_epoch / batch_size counter = 0 while 1: X_batch = np.array(C[batch_size * counter:batch_size * (counter + 1)]) y_batch = np.array(r[batch_size * counter:batch_size * (counter + 1)]) counter += 1 yield X_batch, y_batch # restart counter to yeild data in the next epoch as well if counter >= number_of_batches: counter = 0 if __name__ == "__main__": X, y = readDatasetFromFile() X_tr, X_ts, y_tr, y_ts = train_test_split(X, y, test_size=.2) model = Sequential() model.add(Dense(16, input_dim=X.shape[1])) model.add(keras.layers.advanced_activations.PReLU()) model.add(Dense(16)) model.add(keras.layers.advanced_activations.PReLU()) model.add(Dense(16)) model.add(keras.layers.advanced_activations.PReLU()) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) batch_size = 1000 model.fit_generator(generator(X_tr, y_tr, batch_size), epochs=200, steps_per_epoch=X.shape[0]/ batch_size, validation_data=generator(X_ts, y_ts, batch_size * 2), validation_steps=X.shape[0] / batch_size * 2, verbose=2, use_multiprocessing=True) loss, accuracy = model.evaluate(X_ts, y_ts, verbose=0) print(loss, accuracy)

After running with fit_generator, the training time improved a little bit but it is still slow (each epoch now takes ~ 40-50s). When running nvidia-smi in the terminal, I found out that GPU utilization is ~15% only which makes me wonder if my code is wrong. I am posting my code above to kindly ask you if there is a bug causing to slow the performance of GPU.

Thank you,

Did you try forcefully assigning a GPU to it by using CUDA_VISIBLE_DEVICES? — Parthasarathy Subburaj
– Parthasarathy Subburaj, Commented Jun 27, 2019 at 17:00
@ParthasarathySubburaj Thank you for your quick repoonse! How do I do that? — Ethan C.
– Ethan C., Commented Jun 27, 2019 at 17:01

Sunderam Dubey · Accepted Answer · 2022-11-28 13:10:07Z

1

Just try assigning GPUs forcefully so:

import os os.environ["CUDA_VISIBLE_DEVICES"]="0" # or if you want more than 1 GPU set it as "0", "1"

edited Nov 28, 2022 at 13:10

Sunderam Dubey

8,83512 gold badges25 silver badges43 bronze badges

answered Jun 27, 2019 at 17:04

Parthasarathy Subburaj

4,2942 gold badges12 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ethan C. Over a year ago

Thank you very much. Do I have to use this assignment before importing tensorflow?

Parthasarathy Subburaj Over a year ago

Its always better to import os first and set all your environment variables before we import other packages

Collectives™ on Stack Overflow

The performance of GPU still slow even by keras fit_generator method

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related