5

My current:

nvidia-smi Wed Aug 4 01:40:39 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:00:0C.0 Off | 0 | | N/A 34C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:00:0D.0 Off | 0 | | N/A 34C P0 36W / 300W | 0MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... On | 00000000:00:0E.0 Off | 0 | | N/A 33C P0 39W / 300W | 0MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... On | 00000000:00:0F.0 Off | 0 | | N/A 37C P0 41W / 300W | 0MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ 

I want to install Tensorflow 2.3/2.4, so I need to upgrade cuda to 10.1 at least in Conda. I know how to install cudakit in conda:

conda install cudatoolkit=10.1 

But this seems not enough:

Status: CUDA driver version is insufficient for CUDA runtime version 

If I want to keep the old version cuda 10.0, can I update cuda to 10.1 through Conda? This won't work:

conda install cuda=10.1 

I am using Python 3.8. If I can't keep cuda 10.0, how to directly upgrade cuda to 10.1 with or without conda? It's best if I can upgrade in Conda.

ADDITION:

I installed cudatoolkit=10.1, but the cuda driver still not good. My conda env list shows:

cudatoolkit 10.1.243 h6bb024c_0 tensorflow-gpu 2.3.0 pypi_0 pypi 

The following test is good:

import tensorflow as tf 2021-08-04 04:21:31.110443: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 In [3]: print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU'))) 2021-08-04 04:21:34.499432: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1 2021-08-04 04:21:34.665738: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.666369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:00:0c.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0 coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s 2021-08-04 04:21:34.666459: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.667017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties: pciBusID: 0000:00:0d.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0 coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s 2021-08-04 04:21:34.667064: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.667613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties: pciBusID: 0000:00:0e.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0 coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s 2021-08-04 04:21:34.667644: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2021-08-04 04:21:34.670275: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2021-08-04 04:21:34.672971: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2021-08-04 04:21:34.673378: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2021-08-04 04:21:34.676043: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2021-08-04 04:21:34.677370: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2021-08-04 04:21:34.681850: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2021-08-04 04:21:34.681989: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.682604: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.683196: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.683782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.684353: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.684961: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.685513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2 Num GPUs Available: 3 

But the following test failed:

import tensorflow as tf with tf.device('/gpu:0'): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) with tf.Session() as sess: print (sess.run(c)) 

The error message:

2021-08-04 04:27:30.934969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2 2021-08-04 04:27:30.935028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 --------------------------------------------------------------------------- InternalError Traceback (most recent call last) ...... InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version 

If this statement is true, why my installation is still bad, because I already installed cudatoolkit=10.1 in Conda:

If you want to install a GPU driver, you could install a newer CUDA toolkit, which will have a newer GPU driver (installer) bundled with it. 

cudatoolkit and cuda driver still not match?

0

1 Answer 1

7

No, you can't update the GPU driver via conda, and that is what is needed in your case to support CUDA 10.1 or something newer. See here:

Anaconda requires that the user has installed a recent NVIDIA driver that meets the version requirements in the table below.

(the up-to-date table is here)

If you want to install a GPU driver, you could install a newer CUDA toolkit, which will have a newer GPU driver (installer) bundled with it. Or you can retrieve a driver here and install it. By newer CUDA toolkit, I mean the CUDA toolkit installers provided by NVIDIA, which are available here, not via conda. You cannot do the driver update via conda.

I suggest you study the CUDA linux install guide, because the methodology used to install the previous driver (runfile or package manager) is probably the one you want to use for your next driver.

As an alternative (for example if you don't have or can't get admin access to the system), you can investigate CUDA forward compatibility. (This may also be of interest regarding compatibility.)

Sign up to request clarification or add additional context in comments.

2 Comments

Hi, Robert, please see my addition. I installed cudatoolkit=10.1, but the cuda driver version still not good.
I've edited my answer. it was unclear when I said newer CUDA toolkit. You have to install CUDA via a NVIDIA CUDA installer, not via conda. You cannot do this via conda.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.