2
$\begingroup$

I have collected the TFLearn DCGAN example code and put it into my local Jupyter environment. Furthermore, I have changed some comments and added with tf.device('/gpu:0'): right before calling gan.fit(...), resulting in the following code:

# coding: utf-8 # In[1]: get_ipython().magic('matplotlib inline') from __future__ import division, print_function, absolute_import import matplotlib.pyplot as plt import numpy as np import tensorflow as tf import tflearn # In[2]: # Data loading and preprocessing import tflearn.datasets.mnist as mnist X, Y, testX, testY = mnist.load_data() X = np.reshape(X, newshape=[-1, 28, 28, 1]) # In[3]: # Noise data input z_dim = 200 total_samples = len(X) # In[4]: # Generator def generator(x, reuse=False): with tf.variable_scope('Generator', reuse=reuse): x = tflearn.fully_connected(x, n_units=7 * 7 * 128) x = tflearn.batch_normalization(x) x = tf.nn.tanh(x) x = tf.reshape(x, shape=[-1, 7, 7, 128]) x = tflearn.upsample_2d(x, 2) x = tflearn.conv_2d(x, 64, 5, activation='tanh') x = tflearn.upsample_2d(x, 2) x = tflearn.conv_2d(x, 1, 5, activation='sigmoid') return x # In[5]: # Discriminator def discriminator(x, reuse=False): with tf.variable_scope('Discriminator', reuse=reuse): x = tflearn.conv_2d(x, 64, 5, activation='tanh') x = tflearn.avg_pool_2d(x, 2) x = tflearn.conv_2d(x, 128, 5, activation='tanh') x = tflearn.avg_pool_2d(x, 2) x = tflearn.fully_connected(x, 1028, activation='tanh') x = tflearn.fully_connected(x, 2) x = tf.nn.softmax(x) return x # In[6]: # Input data gen_input = tflearn.input_data(shape=[None, z_dim], name='input_gen_noise') input_disc_noise = tflearn.input_data(shape=[None, z_dim], name='input_disc_noise') input_disc_real = tflearn.input_data(shape=[None, 28, 28, 1], name='input_disc_real') # In[7]: # Build discriminator disc_fake = discriminator(generator(input_disc_noise)) disc_real = discriminator(input_disc_real, reuse=True) disc_net = tf.concat([disc_fake, disc_real], axis=0) # In[8]: # Build stacked Generator/Discriminator gen_net = generator(gen_input, reuse=True) stacked_gan_net = discriminator(gen_net, reuse=True) # In[9]: # Build training ops for Discriminator # Each network optimization should only update its own variable, thus we need # to retrieve each network variable (with get_layer_variables_by_name) disc_vars = tflearn.get_layer_variables_by_name('Discriminator') # We need 2 target placeholders, for both the real and fake image target disc_target = tflearn.multi_target_data(['target_disc_fake', 'target_disc_real'], shape=[None, 2]) disc_model = tflearn.regression(disc_net, optimizer='adam', placeholder=disc_target, loss='categorical_crossentropy', trainable_vars=disc_vars, batch_size=64, name='target_disc', op_name='DISC') # In[10]: # Build training ops for Generator gen_vars = tflearn.get_layer_variables_by_name('Generator') gan_model = tflearn.regression(stacked_gan_net, optimizer='adam', loss='categorical_crossentropy', trainable_vars=gen_vars, batch_size=64, name='target_gen', op_name='GEN') # In[11]: # Define GAN model, that outputs the generated images gan = tflearn.DNN(gan_model, tensorboard_verbose=3) # In[12]: # Training # Prepare input data to feed to the discriminator disc_noise = np.random.uniform(-1., 1., size=[total_samples, z_dim]) # Prepare target data to feed to the discriminator (0: fake image, 1: real image) y_disc_fake = np.zeros(shape=[total_samples]) y_disc_real = np.ones(shape=[total_samples]) y_disc_fake = tflearn.data_utils.to_categorical(y_disc_fake, 2) y_disc_real = tflearn.data_utils.to_categorical(y_disc_real, 2) # In[13]: # Prepare input data to feed to the stacked Generator/Discriminator gen_noise = np.random.uniform(-1., 1., size=[total_samples, z_dim]) # Prepare target data to feed to the Discriminator # The Generator tries to fool the Discriminator, thus target is 1 (real images) y_gen = np.ones(shape=[total_samples]) y_gen = tflearn.data_utils.to_categorical(y_gen, 2) # In[14]: # Start training, feed both noise and real images with tf.device('/gpu:0'): gan.fit(X_inputs={'input_gen_noise': gen_noise, 'input_disc_noise': disc_noise, 'input_disc_real': X}, Y_targets={'target_gen': y_gen, 'target_disc_fake': y_disc_fake, 'target_disc_real': y_disc_real}, n_epoch=10) # In[15]: # Create another model from the Generator graph to generate some samples # for testing (re-using the same session to re-use the weights learnt) gen = tflearn.DNN(gen_net, session=gan.session) # In[16]: f, a = plt.subplots(4, 10, figsize=(10, 4)) for i in range(10): # Noise input z = np.random.uniform(-1., 1., size=[4, z_dim]) g = np.array(gen.predict({'input_gen_noise': z})) for j in range(4): # Generate image from noise. Extend to 3 channels for matplot figure. img = np.reshape(np.repeat(g[j][:, :, np.newaxis], 3, axis=2), newshape=(28, 28, 3)) a[j][i].imshow(img) f.show() plt.draw() 

I want to run this code on my NVIDIA GPU. I already have CUDA and cuDNN installed on my machine. Upon examining Windows Task Manager during training, I see that my CPU is stressed and my GPU lies dorment.

Windows Task Manager during training

Can anyone give advice on how to properly implement with tf.device('/gpu:0'): as it's clear that the above code does not run on my NVIDIA GPU?

$\endgroup$

1 Answer 1

3
+50
$\begingroup$

The parts of your code that need to be inside the with tf.device('/gpu:0'): context is the actual computation graph, where the neural network lives. All the parameters that should be updated on the GPU should be defined within there. I don't know tflearn very well, but I assume if you wrap the code from IN 6 to IN 12 inside the context manager, it should work.

What happens is that when you define the variables (weights) and the operations in your graph is that you create them but also have to place them and save the reference. Normally this would happen inside your RAM and the variable itself is an adress to the RAM. In the case of TensorFlow, two things are different. First of all, sometimes you want to place some things or all the things on your GPU, due to some inherent advantages that it has for the types of computations that happen in TensorFlow. By using these context managers you show where you want to place them.

The fit that you had inside your context manager will only call the training operations that are already defined on your CPU.

It seems slightly verbose to do it this way, but you get a lot of flexibility with this. You can spread your graph over multiple GPUs, or do part of the compute on the CPU because it's faster there, or you don't have enough GPU RAM for the full graph.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.