I have a new PC (on Ubuntu 18.04) which has a 2080Ti GPU. I'm trying to get it all up and running in regards to training neural networks in Python using Keras (in an Anaconda environment) but am getting a "Segmentation fault (core dumped)" error when trying to fit the model.
The code I'm using works completely fine at work on my Windows PC (has a 1080Ti GPU). The error seems to be related to GPU memory, and I can see something odd is happening when I run 'nvidia-smi' prior to fitting the model I see around 800mb of the available 11gb GPU memory is being used up, but once I compile the model this available memory is all taken up. In the processes section I can see this is something to do with the anaconda environment (i.e. ...ics-link/anaconda3/envs/py35/bin/python = 9677MiB)
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 415.25 Driver Version: 415.25 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... On | 00000000:04:00.0 On | N/A | | 28% 44C P2 51W / 250W | 10491MiB / 10986MiB | 7% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1507 G /usr/lib/xorg/Xorg 30MiB | | 0 1538 G /usr/bin/gnome-shell 57MiB | | 0 1844 G /usr/lib/xorg/Xorg 309MiB | | 0 1979 G /usr/bin/gnome-shell 177MiB | | 0 3816 G /usr/lib/firefox/firefox 6MiB | | 0 5451 G ...-token=169F1B80118E535BC5002C22A81DD0FA 90MiB | | 0 5896 G ...-token=631C5DCD90ADCF80959770937CE797E7 128MiB | | 0 6485 C ...ics-link/anaconda3/envs/py35/bin/python 9677MiB | +-----------------------------------------------------------------------------+ Here is the code, just for reference:
from __future__ import print_function import keras from keras.datasets import cifar10 from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D, Activation, BatchNormalization from keras.callbacks import ModelCheckpoint, CSVLogger from keras import backend as K import numpy as np batch_size = 64 num_classes = 10 epochs = 10 # input image dimensions img_rows, img_cols = 32, 32 # the data, shuffled and split between train and test sets (x_train, y_train), (x_test, y_test) = cifar10.load_data() if K.image_data_format() == 'channels_first': x_train = x_train.reshape(x_train.shape[0], 3, img_rows, img_cols) x_test = x_test.reshape(x_test.shape[0], 3, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 3) x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 3) input_shape = (img_rows, img_cols, 3) x_train = x_train.astype('float32') x_test = x_test.astype('float32') # normalise pixel values mean = np.mean(x_train,axis=(0,1,2,3)) std = np.std(x_train,axis=(0,1,2,3)) x_train = (x_train-mean)/(std+1e-7) x_test = (x_test-mean)/(std+1e-7) print('x_train shape:', x_train.shape) print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) model.add(Conv2D(64, (3, 3))) #model.add(BatchNormalization()) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(128, (3, 3))) #model.add(BatchNormalization()) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(256, (3, 3))) #model.add(BatchNormalization()) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(1024)) model.add(Activation("relu")) model.add(Dropout(0.25)) model.add(Dense(1024)) model.add(Activation("relu")) model.add(Dropout(0.25)) model.add(Dense(1024)) model.add(Activation("relu")) model.add(Dropout(0.25)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy']) #load weights from previous run #model.load_weights('model07_weights_best.hdf5') from keras.preprocessing.image import ImageDataGenerator datagen = ImageDataGenerator( featurewise_center=False, # set input mean to 0 over the dataset samplewise_center=False, # set each sample mean to 0 featurewise_std_normalization=False, # divide inputs by std of the dataset samplewise_std_normalization=False, # divide each input by its std zca_whitening=False, # apply ZCA whitening rotation_range=0.1, # randomly rotate images in the range (degrees, 0 to 180) width_shift_range=0.1, # randomly shift images horizontally (fraction of total width) height_shift_range=0.1, # randomly shift images vertically (fraction of total height) horizontal_flip=True, # randomly flip images vertical_flip=False) # randomly flip images # Compute quantities required for feature-wise normalization # (std, mean, and principal components if ZCA whitening is applied). datagen.fit(x_train) #save weights and log checkpoint = ModelCheckpoint("model14_weights_best.hdf5", monitor='val_acc', verbose=1, save_best_only=True, mode='max') csv_logger = CSVLogger('model14_loss_log.csv', append=True, separator=';') callbacks_list = [checkpoint,csv_logger] # Fit the model on the batches generated by datagen.flow(). model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size), epochs=epochs, validation_data=(x_test, y_test), callbacks = callbacks_list ) I'm not expecting anything much to take up a great deal of space on the GPU, but it seems to being saturated. As I mention it works on my Windows PC.
Any ideas as to what might cause this?