I am having some difficulty understanding exactly why the GPU and CPU speeds are similar with networks of small size (CPU is sometimes faster), and GPU is faster with networks of larger size. The code at the bottom of the question runs in 103.7s on an i7-6700k, but when using tensorflow-gpu, the code runs in 29.5 seconds.
However, when I train a network that has 100 hidden neurons, instead of 1000 like in the example below, I get ~20 seconds when using the GPU, and ~15 seconds when using the CPU.
I read on another stack overflow answer that CPU->GPU transfers take long, I'm assuming this is in reference to loading the data examples on the GPU.
Can someone explain why this occurs, and possibly reference some change in the code that I can make to maximize speed?
import numpy as np import tensorflow as tf import keras from keras.models import Sequential from keras.utils import np_utils from keras.layers.core import Dense, Activation, Flatten, Dropout from sklearn.preprocessing import normalize ## Importing the MNIST dataset using Keras from keras.datasets import mnist (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape for vector input N, x, y = X_train.shape X_train = normalize(np.reshape(X_train, (N, x * y))) N, x, y = X_test.shape X_test = normalize(np.reshape(X_test, (N, x * y))) # one-hot encoding y_train = np_utils.to_categorical(y_train) y_test = np_utils.to_categorical(y_test) model = Sequential() model.add(Dense(output_dim=750, input_dim=784)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(150)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(50)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(50)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(10)) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy']) fit = model.fit(X_train, y_train, batch_size=128, nb_epoch=10, verbose=0) ## Printing the accuracy of our model, according to the loss function specified in model.compile above score = model.evaluate(X_test, y_test, verbose=0) print('Test score:', score[0]) print('Test accuracy:', score[1])