My script doesnt seem to be executed on GPU, although Tensorflow-gpu is installed

Question

I have a machine with cuda 10.1 and tensorflow and tensorflow gpu 1.14.0 installed. I am running a python script that trains a CNN in a virtualenv. I am indicating in the source code that I want to use the GPU, as follows:

import os os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"; os.environ["CUDA_VISIBLE_DEVICES"]="0";

However, when I run the script, the training epochs are taking a lot to finish. Here is the output of my nvidia-smi:

What I think is strange is why the GPU utilization is that low and why my python script is not appearing in the processes list. Here are the outputs of some commands I have tried:

>>> import tensorflow as tf >>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

the output is

2019-10-14 09:53:12.674719: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-14 09:53:12.679047: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-10-14 09:53:12.784993: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-14 09:53:12.785744: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f155c59650 executing computations on platform CUDA. Devices: 2019-10-14 09:53:12.785771: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5 2019-10-14 09:53:12.806453: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz 2019-10-14 09:53:12.807345: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f15605dfc0 executing computations on platform Host. Devices: 2019-10-14 09:53:12.807408: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2019-10-14 09:53:12.807829: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-14 09:53:12.808859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:01:00.0 2019-10-14 09:53:12.809148: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:53:12.809313: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:53:12.809481: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:53:12.809531: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:53:12.809572: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:53:12.809611: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:53:12.811997: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-10-14 09:53:12.812038: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2019-10-14 09:53:12.812059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-14 09:53:12.812067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-10-14 09:53:12.812072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Device mapping: /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device 2019-10-14 09:53:12.812372: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping: /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

Other command I tried is

>>> with tf.Session() as sess: devices = sess.list_devices()

The output is

2019-10-14 09:55:52.398317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-14 09:55:52.399249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:01:00.0 2019-10-14 09:55:52.399355: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:55:52.399399: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:55:52.399437: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:55:52.399475: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:55:52.399509: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:55:52.399544: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 09:55:52.399552: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-10-14 09:55:52.399557: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2019-10-14 09:55:52.402143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-14 09:55:52.402162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]

Finally, I also tried this

>>> from tensorflow.python.client import device_lib >>> print(device_lib.list_local_devices())

With the following output

2019-10-14 10:00:52.389511: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-14 10:00:52.390582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:01:00.0 2019-10-14 10:00:52.390741: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 10:00:52.390811: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 10:00:52.390854: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 10:00:52.390897: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 10:00:52.390934: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 10:00:52.390968: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2019-10-14 10:00:52.390975: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-10-14 10:00:52.390980: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2019-10-14 10:00:52.390990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-14 10:00:52.390994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-10-14 10:00:52.390998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 17281747132467712783 , name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 3885020928213180904 physical_device_desc: "device: XLA_GPU device" , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 15667518323180153095 physical_device_desc: "device: XLA_CPU device" ]

Interestingly, when I run these commands, the python process appears in the NVIDIA-SMI monitor.

What am I missing here?

GPhilo · Accepted Answer · 2019-10-14 08:24:03Z

From your log:

Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory;

You installed CUDA 10.1 but TF-GPU requires CUDA 10.0, so you need to install it (no need to uninstall the 10.1 one, they can coexist)

Gwang-Jin Kim · Accepted Answer · 2019-10-14 08:33:12Z

Recently I sent to friends instructions to install cuda and tf-gpu using conda (because this is the fast) - after some while of searching in the internet, my protocol is this:

########################## # Install Miniconda ########################## mkdir -p ~/install cd ~/install wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh # I guess on a mac you should do # wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh ######################### # install nvidia driver # so these are the linux (ubuntu) commands # for mac, maybe one should follow the scheme # removing nvidia drivers first # and then download newest nvidia driver # and install it # and reboot # # If you are using a laptop without gpu, just skip this block ######################### sudo apt purge nvidia-* # remove all nvidia driver first sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt install nvidia-driver-418 sudo apt install nvidia-cuda-toolkit # reboot sudo reboot ######################### # install machine learning stuff keras tensorflow-gpu # # if you are installing in a laptop without gpu, # replace 'tensorflow-gpu' by 'tensorflow'! ######################### conda create --name keras conda activate keras conda install python ipython jupyter pandas scipy seaborn scikit-learn tensorflow-gpu keras pytest openpyxl graphviz ######################### # finally, test a successful installation by: # entering: ipython # and there trying: from tensorflow.python.client import device_lib print(device_lib.list_local_devices()) # should list gpu # sth like: physical_device_desc: "device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1" , name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 14085000268159177816 physical_device_desc: "device: XLA_GPU device" ]

Collectives™ on Stack Overflow

My script doesnt seem to be executed on GPU, although Tensorflow-gpu is installed

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related