I have a problem when kernel launches. I launch a kernel with a grid size of (3000000, 16), and CUDA reports an "invalid argument" runtime error here. I tried different maxPixelCount value and found: when maxPixelCount is 200000, the error is reported, while when it's 50000, it continues without error.
dim3 dimGrid(maxPixelCount, imageCount); printf("grid: %d * %d * %d", dimGrid.x, dimGrid.y, dimGrid.z); mcudaGetGrayDataKernel <<< dimGrid, 1 >>> (deviceDestDataPtrs, deviceImageDataPtrs, deviceSizes); cudaStatus = cudaGetLastError(); if (cudaStatus != cudaSuccess) { printf("cuda start kernel error\n%s", cudaGetErrorString(cudaStatus); goto Error; } I checked the max grid size to ensure my card's ability, using the following sentence:
printf(" - max grid size: %d * %d * %d\n", prop.maxGridSize[0], prop.maxGridSize[1], prop.maxGridSize[2]); I got the following message:
- max grid size: 2147483647 * 65535 * 65535 I think this means my dim is in the proper range. But why does the error appears?
My IDE is Visual Studio 2013
This problem has been solved. To reach the max limit of grid size, the Device->Code Generation option has to be set to the proper version. For my GPU I modified it to compute_30,sm_30.
-arch=sm_30on the compile command line might be all you need.Code Generationoption tocompute_20,sm_30. In Host, I modified theAdditional Compiler Optionsoption to-arch=sm_30. But the problem still remains. And a compiling warning was reported:1>cl : command line warning D9002: ignored unknown option "-arch=sm_30"compute_20,sm_30won't work. You should choosecompute_30,sm_30And you seem to have changed more than just the code generation option (wherever you added-arch=sm_30, remove that). Since you're struggling with this, you could also just take your code and drop it into thevectorAddcuda sample project, and compile it there. And of course you will need a cc3.0 or higher GPU to run it on.computeandsmmean respectly?