My current:
nvidia-smi Wed Aug 4 01:40:39 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:00:0C.0 Off | 0 | | N/A 34C P0 37W / 300W | 0MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:00:0D.0 Off | 0 | | N/A 34C P0 36W / 300W | 0MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... On | 00000000:00:0E.0 Off | 0 | | N/A 33C P0 39W / 300W | 0MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... On | 00000000:00:0F.0 Off | 0 | | N/A 37C P0 41W / 300W | 0MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ I want to install Tensorflow 2.3/2.4, so I need to upgrade cuda to 10.1 at least in Conda. I know how to install cudakit in conda:
conda install cudatoolkit=10.1 But this seems not enough:
Status: CUDA driver version is insufficient for CUDA runtime version If I want to keep the old version cuda 10.0, can I update cuda to 10.1 through Conda? This won't work:
conda install cuda=10.1 I am using Python 3.8. If I can't keep cuda 10.0, how to directly upgrade cuda to 10.1 with or without conda? It's best if I can upgrade in Conda.
ADDITION:
I installed cudatoolkit=10.1, but the cuda driver still not good. My conda env list shows:
cudatoolkit 10.1.243 h6bb024c_0 tensorflow-gpu 2.3.0 pypi_0 pypi The following test is good:
import tensorflow as tf 2021-08-04 04:21:31.110443: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 In [3]: print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU'))) 2021-08-04 04:21:34.499432: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1 2021-08-04 04:21:34.665738: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.666369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:00:0c.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0 coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s 2021-08-04 04:21:34.666459: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.667017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties: pciBusID: 0000:00:0d.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0 coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s 2021-08-04 04:21:34.667064: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.667613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties: pciBusID: 0000:00:0e.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0 coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s 2021-08-04 04:21:34.667644: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2021-08-04 04:21:34.670275: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2021-08-04 04:21:34.672971: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2021-08-04 04:21:34.673378: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2021-08-04 04:21:34.676043: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2021-08-04 04:21:34.677370: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2021-08-04 04:21:34.681850: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2021-08-04 04:21:34.681989: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.682604: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.683196: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.683782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.684353: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.684961: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-04 04:21:34.685513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2 Num GPUs Available: 3 But the following test failed:
import tensorflow as tf with tf.device('/gpu:0'): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) with tf.Session() as sess: print (sess.run(c)) The error message:
2021-08-04 04:27:30.934969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2 2021-08-04 04:27:30.935028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 --------------------------------------------------------------------------- InternalError Traceback (most recent call last) ...... InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version If this statement is true, why my installation is still bad, because I already installed cudatoolkit=10.1 in Conda:
If you want to install a GPU driver, you could install a newer CUDA toolkit, which will have a newer GPU driver (installer) bundled with it. cudatoolkit and cuda driver still not match?