64

Note : this question was initially asked on github, but it was asked to be here instead

I'm having trouble running tensorflow on gpu, and it does not seems to be the usual cuda's configuration problem, because everything seems to indicate cuda is properly setup.

The main symptom: when running tensorflow, my gpu is not detected (the code being run, and its output).

What differs from usual issues is that cuda seems properly installed and running ./deviceQuery from cuda samples is successful (output).

I have two graphical cards:

  • an old GTX 650 used for my monitors (I don't want to use that one with tensorflow)
  • a GTX 1060 that I want to dedicate to tensorflow

I use:

I've tried:

  • adding /usr/local/cuda/bin/ to $PATH
  • forcing gpu placement in tensorflow script using with tf.device('/gpu:1'): (and with tf.device('/gpu:0'): when it failed, for good measure)
  • whitelisting the gpu I wanted to use with CUDA_VISIBLE_DEVICES, in case the presence of my old unsupported card did cause problems
  • running the script with sudo (because why not)

Here are the outputs of nvidia-smi and nvidia-debugdump -l, in case it's useful.

At this point, I feel like I have followed all the breadcrumbs and have no idea what I could try else. I'm not even sure if I'm contemplating a bug or a configuration problem. Any advice about how to debug this would be greatly appreciated. Thanks!

Update: with the help of Yaroslav on github, I gathered more debugging info by raising log level, but it doesn't seem to say much about the device selection : https://gist.github.com/oelmekki/760a37ca50bf58d4f03f46d104b798bb

Update 2: Using theano detects gpu correctly, but interestingly it complains about cuDNN being too recent, then fallback to cpu (code ran, output). Maybe that could be the problem with tensorflow as well?

4
  • 1
    as another sanity check you could try another framework (like Theano) with GPU to see if it works, perhaps your GPU setup is somehow broken that's not detected by deviceQuery Commented Feb 19, 2017 at 15:16
  • Good idea, thanks. I'll try that and report it in the question body. Commented Feb 19, 2017 at 15:20
  • that output is suspiciously small, here's what I see when I run with VLOG=1 -- pastebin.com/LQF0j3Ri Commented Feb 19, 2017 at 15:25
  • Yep, I've truncated it past the device selection, as the rest is probably irrelevant. Here is the full log : gist.github.com/oelmekki/25ea3b1186c2ee7aaa23448547bc23b2 Commented Feb 19, 2017 at 15:26

8 Answers 8

79

From the log output, it looks like you are running the CPU version of TensorFlow (PyPI: tensorflow), and not the GPU version (PyPI: tensorflow-gpu). Running the GPU version would either log information about the CUDA libraries, or an error if it failed to load them or open the driver.

If you run the following commands, you should be able to use the GPU in subsequent runs:

$ pip uninstall tensorflow $ pip install tensorflow-gpu 
Sign up to request clarification or add additional context in comments.

8 Comments

Oh, indeed. I followed this doc [1] while tensorflow was already installed, I wasn't aware it needed an other package. Thanks! -- [1] tensorflow.org/tutorials/using_gpu
Whenever I install tensorflow-gpu, it will reinstall tensorflow. Is this supposed to happen? I can't get it to detect my devices..
I have the same issue, and I follow your steps, "uninstall tensorlow" and "install tensorflow-gpu" I got this error : AttributeError: 'module' object has no attribute 'Session'
This answer is most likely deprecated in 2021.. any other solutions?
This is depricated!
|
28

None of the other answers here worked for me. After a bit of tinkering I found that this fixed my issues when dealing with Tensorflow built from binary:


Step 0: Uninstall protobuf

pip uninstall protobuf 

Step 1: Uninstall tensorflow

pip uninstall tensorflow pip uninstall tensorflow-gpu 

Step 2: Force reinstall Tensorflow with GPU support

pip install --upgrade --force-reinstall tensorflow-gpu 

Step 3: If you haven't already, set CUDA_VISIBLE_DEVICES

So for me with 2 GPUs it would be

export CUDA_VISIBLE_DEVICES=0,1 

2 Comments

you save my day !
Glad to hear it :)
18

In my case:

pip3 uninstall tensorflow 

is not enough. Because when reinstall with:

pip3 install tensorflow-gpu 

It is still reinstall tensorflow with cpu not gpu. So, before install tensorflow-gpu, I tried to remove all related tensor folders in site-packages uninstall protobuf, and it works!

For conclusion:

pip3 uninstall tensorflow 

Remove all tensor folders in ~\Python35\Lib\site-packages

pip3 uninstall protobuf pip3 install tensorflow-gpu 

1 Comment

Typo suggestion : I suggest you use pip3 everywhere, in your post. In my case, i removed tensorboard, and tensorflow-1.3.0.dist-info from dist-packages and it unblocked this issue.
9

Might seem dumb but a sudo reboot has fixed the exact same problem for me and a couple others.

2 Comments

Rebooting was all I needed too. ^^
saved my day ^_^
2

The answer that saved my day came from Mark Sonn. Simply add this to .bashrc and source ~/.bashrc if you are on Linux:

export CUDA_VISIBLE_DEVICES=0,1 

Previously I had to use this workaround to get tensorflow recognize my GPU:

import tensorflow as tf gpus = tf.config.experimental.list_physical_devices(device_type="GPU") tf.config.experimental.set_visible_devices(devices=gpus[0], device_type="GPU") tf.config.experimental.set_memory_growth(device=gpus[0], enable=True) 

Even though the code still worked, adding these lines every time is clearly not something I would want. My version of tensorflow was built from source according to the documentation to get v2.3 support CUDA 10.2 and cudnn 7.6.5.

If anyone having trouble with that, I suggest doing a quick skim over the docs. Took 1.5 hours to build with bazel. Make sure you have gcc7 and bazel installed.

Comments

0

This error may be caused by your GPU's compute capability, CUDA officially supports GPU's compute capability within 3.5 ~ 5.0, you can check here: https://en.wikipedia.org/wiki/CUDA

In my case, the error was like this:

Ignoring visible gpu device (device: 0, name: GeForce GT 640M, pci bus id: 0000:01:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.

For now we can only compile from source code on Linux (or mac OS) to break the '3.5~5.0' limit.

Comments

0

There are various system incompatible problems.

The requirement for libraries can vary from the version of TensorFlow.

During using python in interactive mode a lot of useful information is printing into stderr. What I suggest for TensorFlow with version 2.0 or more to call:

python3.8 -c "import tensorflow as tf; print('tf version:', tf.version); tf.config.list_physical_devices()"

After this command, you will observe missing libraries (or a version of it) for work with GPU in addition to requirements:

p.s. CUDA_VISIBLE_DEVICES should not have a real connection with TensorFlow, or it's more general - it's a way to customize available GPUs for all launched processes.

Comments

0

For anaconda users. I installed tensorflow-gpu via GUI using Anaconda Navigator and configured NVIDIA GPU as in tensorflow guide but tensorflow couldn't find the GPU anyway. Then I uninstalled tensorflow, always via GUI (see here) and reinstalled it via command line in an anaconda prompt issuing:

conda install -c anaconda tensorflow-gpu 

and then tensorflow could find the GPU correctly.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.