I have a new windows 10 laptop with a discrete nvidia graphics card (all drivers up to date). The first example from the documention of CUDAFunctionLoad[] in v11.2 fails as shown below, even though CUDAQ[] evaluates to True and OpenCLLink appears to be working fine....
Needs["CUDALink`"] code = " __global__ void addTwo(mint * in, mint * out, mint length) { int index = threadIdx.x + blockIdx.x*blockDim.x; if (index < length) out[index] = in[index] + 2; }"; cudaFun = CUDAFunctionLoad[code, "addTwo", {{_Integer, _, "Input"}, {_Integer, _, "Output"}, _Integer}, 256, "ShellOutputFunction" -> Print] Here's the output: (note the message nvcc fatal: Host compiler targets unsupported OS.)
In[3]:= cudaFun = CUDAFunctionLoad[code, "addTwo", {{_Integer, _, "Input"}, {_Integer, _, "Output"}, _Integer}, 256, "ShellOutputFunction" -> Print, "ShellCommandFunction" :> Print] During evaluation of In[3]:= CUDAFunctionLoad::cmpf: The kernel compilation failed. Consider setting the option "ShellOutputFunction"->Print to display the compiler error message. During evaluation of In[3]:= call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64 "C:\Users\micha\AppData\Roaming\Mathematica\Paclets\Repository\CUDAResources-Win64-11.2.22\CUDAToolkit\bin\nvcc.exe" -cubin -m64 -arch=sm_61 -DUSING_CUDA_FUNCTION=1 -Dmint="long long" -DReal_t=double -DUSING_DOUBLE_PRECISIONQ=1 -o "C:\Users\micha\AppData\Roaming\Mathematica\ApplicationData\CUDALink\BuildFolder\sbp-14988\Working-sbp-14988-15020-1\CUDAFunction-1595.cubin" "C:\Users\micha\AppData\Roaming\Mathematica\ApplicationData\CUDALink\BuildFolder\sbp-14988\Working-sbp-14988-15020-1\CUDAFunction-1595.cu" During evaluation of In[3]:= C:\Users\micha\AppData\Roaming\Mathematica\ApplicationData\CUDALink\BuildFolder\sbp-14988\Working-sbp-14988-15020-1>call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64 ********************************************************************** ** Visual Studio 2017 Developer Command Prompt v15.5.2 ** Copyright (c) 2017 Microsoft Corporation ********************************************************************** [vcvarsall.bat] Environment initialized for: 'x64' Microsoft (R) C/C++ Optimizing Compiler Version 19.12.25831 for x64 Copyright (C) Microsoft Corporation. All rights reserved. tmpxft_00000f74_00000000-1.cpp **nvcc fatal : Host compiler targets unsupported OS.** Out[3]= CUDAFunctionLoad[" __global__ void addTwo(mint * in, mint * out, mint length) { int index = threadIdx.x + blockIdx.x*blockDim.x; if (index < length) out[index] = in[index] + 2; }", "addTwo", {{_Integer, _, "Input"}, {_Integer, _, "Output"}, _Integer}, 256, "ShellOutputFunction" -> Print, "ShellCommandFunction" :> Print] Here's the result of my CUDAResourcesInformation[]:
{{"Name" -> "CUDAResources", "Version" -> "11.2.22", "BuildNumber" -> "", "Qualifier" -> "Win64", "WolframVersion" -> "11.2", "SystemID" -> {"Windows-x86-64"}, "Description" -> "{ToolkitVersion -> v8.0, MinimumDriver -> 290}", "Category" -> "", "Creator" -> "", "Publisher" -> "", "Support" -> "", "Internal" -> False, "Location" -> "C:\\Users\\micha\\AppData\\Roaming\\Mathematica\\Paclets\\\ Repository\\CUDAResources-Win64-11.2.22", "Context" -> {}, "Enabled" -> True, "Loading" -> Manual, "Hash" -> "9c9b3bf9dfc07e0cc2376f0ef13e01d5"}} Machine Details: I'm on Windows 10 Pro v1703 with an Nvidia GeForce 2016 card, Cuda 9.1 Toolkit, and Visual Studio installed.
References: