4

I have a new PC (on Ubuntu 18.04) which has a 2080Ti GPU. I'm trying to get it all up and running in regards to training neural networks in Python using Keras (in an Anaconda environment) but am getting a "Segmentation fault (core dumped)" error when trying to fit the model.

The code I'm using works completely fine at work on my Windows PC (has a 1080Ti GPU). The error seems to be related to GPU memory, and I can see something odd is happening when I run 'nvidia-smi' prior to fitting the model I see around 800mb of the available 11gb GPU memory is being used up, but once I compile the model this available memory is all taken up. In the processes section I can see this is something to do with the anaconda environment (i.e. ...ics-link/anaconda3/envs/py35/bin/python = 9677MiB)

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 415.25 Driver Version: 415.25 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... On | 00000000:04:00.0 On | N/A | | 28% 44C P2 51W / 250W | 10491MiB / 10986MiB | 7% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1507 G /usr/lib/xorg/Xorg 30MiB | | 0 1538 G /usr/bin/gnome-shell 57MiB | | 0 1844 G /usr/lib/xorg/Xorg 309MiB | | 0 1979 G /usr/bin/gnome-shell 177MiB | | 0 3816 G /usr/lib/firefox/firefox 6MiB | | 0 5451 G ...-token=169F1B80118E535BC5002C22A81DD0FA 90MiB | | 0 5896 G ...-token=631C5DCD90ADCF80959770937CE797E7 128MiB | | 0 6485 C ...ics-link/anaconda3/envs/py35/bin/python 9677MiB | +-----------------------------------------------------------------------------+ 

Here is the code, just for reference:

from __future__ import print_function import keras from keras.datasets import cifar10 from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D, Activation, BatchNormalization from keras.callbacks import ModelCheckpoint, CSVLogger from keras import backend as K import numpy as np batch_size = 64 num_classes = 10 epochs = 10 # input image dimensions img_rows, img_cols = 32, 32 # the data, shuffled and split between train and test sets (x_train, y_train), (x_test, y_test) = cifar10.load_data() if K.image_data_format() == 'channels_first': x_train = x_train.reshape(x_train.shape[0], 3, img_rows, img_cols) x_test = x_test.reshape(x_test.shape[0], 3, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 3) x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 3) input_shape = (img_rows, img_cols, 3) x_train = x_train.astype('float32') x_test = x_test.astype('float32') # normalise pixel values mean = np.mean(x_train,axis=(0,1,2,3)) std = np.std(x_train,axis=(0,1,2,3)) x_train = (x_train-mean)/(std+1e-7) x_test = (x_test-mean)/(std+1e-7) print('x_train shape:', x_train.shape) print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) model.add(Conv2D(64, (3, 3))) #model.add(BatchNormalization()) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(128, (3, 3))) #model.add(BatchNormalization()) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(256, (3, 3))) #model.add(BatchNormalization()) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(1024)) model.add(Activation("relu")) model.add(Dropout(0.25)) model.add(Dense(1024)) model.add(Activation("relu")) model.add(Dropout(0.25)) model.add(Dense(1024)) model.add(Activation("relu")) model.add(Dropout(0.25)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy']) #load weights from previous run #model.load_weights('model07_weights_best.hdf5') from keras.preprocessing.image import ImageDataGenerator datagen = ImageDataGenerator( featurewise_center=False, # set input mean to 0 over the dataset samplewise_center=False, # set each sample mean to 0 featurewise_std_normalization=False, # divide inputs by std of the dataset samplewise_std_normalization=False, # divide each input by its std zca_whitening=False, # apply ZCA whitening rotation_range=0.1, # randomly rotate images in the range (degrees, 0 to 180) width_shift_range=0.1, # randomly shift images horizontally (fraction of total width) height_shift_range=0.1, # randomly shift images vertically (fraction of total height) horizontal_flip=True, # randomly flip images vertical_flip=False) # randomly flip images # Compute quantities required for feature-wise normalization # (std, mean, and principal components if ZCA whitening is applied). datagen.fit(x_train) #save weights and log checkpoint = ModelCheckpoint("model14_weights_best.hdf5", monitor='val_acc', verbose=1, save_best_only=True, mode='max') csv_logger = CSVLogger('model14_loss_log.csv', append=True, separator=';') callbacks_list = [checkpoint,csv_logger] # Fit the model on the batches generated by datagen.flow(). model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size), epochs=epochs, validation_data=(x_test, y_test), callbacks = callbacks_list ) 

I'm not expecting anything much to take up a great deal of space on the GPU, but it seems to being saturated. As I mention it works on my Windows PC.

Any ideas as to what might cause this?

1
  • I'll just add that I'm installing tensorflow-gpu and keras via Anaconda, and because of this I'm installing cuda and cudnn automatically Commented Jan 24, 2019 at 13:20

2 Answers 2

2

I don’t believe this has something to do with the memory size. I have been dealing with this recently. Segmentation fault error stands for a failure of the parallelization of your training process on the GPU. You wouldn’t have this error if the process was running sequentially no matter how big is your dataset. Also, no need to worry about your deep learning settings either.

Since you are just about to set up a new machine, I believe there must be two reasons for the segmentation fault in your context.

First, I would go and check if my GPU is installed correctly but based on the details you provided, I believe the issue is more about the module (Keras in your case) as a second reason:

  • In this case, you may have soemthing weird in your installation of the module or one of its dependencies. I would recommend to remove it and clean up everything and reinstall it again.

  • Are you sure your tensorflow-gpu is installed (properly) ? what about cuda and cudnn?

If you believe keras is correctly installed, try this test code :

from tensorflow.python.client import device_lib print(device_lib.list_local_devices()) 

This will print whether your tensorflow is using a CPU or a GPU backend.

I doubt you will have the segmentation error again if you all above steps went well.

check this reference for tensorflow testing on GPU.

Sign up to request clarification or add additional context in comments.

1 Comment

I agree with your statement that this error is not caused by out-of-memory, but disagree with two of your potential reasons. I ran the same project on the same machine successfully but just failed when I changed the model to perform a different task.
1

If it's a memory issue then you would be able to train it with lower batch size. Try reducing batch size to 32 and if it doesn't works keep reducing till batch size 1 and observe the GPU usage.

Also add following code at the top of your code, it would dynamically allocate the GPU memory. So you would be able to see how much GPU memory is used/required with smaller batch sizes.

import tensorflow as tf from keras.backend.tensorflow_backend import set_session config = tf.ConfigProto() config.gpu_options.allow_growth = True # dynamically grow the memory used on the GPU config.log_device_placement = True # to log device placement (on which device the operation ran) # (nothing gets printed in Jupyter, only if you run it standalone) sess = tf.Session(config=config) set_session(sess) # set this TensorFlow session as the default session for Keras 

Source: https://github.com/keras-team/keras/issues/4161

I hope it will help.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.