Skip to content

Feature request: let clang interpret openmp target simd as equivalent to acc vector for nvptx and amdgpu targets like it is done in current gcc-15.2 #163335

@bschulz81

Description

@bschulz81

Hi there, gpu API's for parallel programming often support three nested levels of parallelism.

In OpenAcc, this is reflected by the preprocessor constructs gang, parallel and vector.
OpenMP has the three parallelization levels: teams distribute, parallel for and simd.

In gcc-15.2, the target simd construct now corresponds to acc vector.

I.e. the construct

#pragma acc parallel loop gang vector

is equivalent to

#pragma omp target teams distribute parallel for simd

for gcc-15.2

The code (note the element access over a strides array, which is common for blas routines)

#include <omp.h> #include <stdio.h> int main(int argc, char** argv) { int x[600]; const int stride[2]={1,2}; #pragma omp target data map (tofrom: x[0:600]) map( to:stride[0:2])device(omp_get_default_device()) #pragma omp target simd device(omp_get_default_device()) for(size_t i=0;i<200;i++) x[i*stride[0]+i*stride[1]]=-1; printf("%d",x[597]); return 0; } 

when compiled with gcc-15.2 and the following options
-fopenmp -foffload=nvptx-none -fno-stack-protector -O3 -Wall

yields no warning and compiles fine and runs on device.

If compiled with clang 21.1.3 and the following options
-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -O3 -Wall

one gets the following warning from clang

warning: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]

The cuda code generated by clang is very efficient when compared to gcc. However, when you already implement openacc with its vector construct, and if openacc is implemented by the openmp runtime, then it would make sense to turn omp target simd into an acc vector equivalent, and allow the constructs omp target parallel for simd, and omp target teams distribute parallel for simd as well as collapse, reduction and tile constructs for this

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions