Feature request: let clang interpret openmp target simd as equivalent to acc vector for nvptx and amdgpu targets like it is done in current gcc-15.2

Hi there, gpu API's for parallel programming often support three nested levels of parallelism.

In OpenAcc, this is reflected by the preprocessor constructs gang, parallel and vector.
OpenMP has the three parallelization levels: teams distribute, parallel for and simd.

In gcc-15.2, the target simd construct now corresponds to acc vector.

I.e. the construct

#pragma acc parallel loop gang vector

is equivalent to

#pragma omp target teams distribute parallel for simd

for gcc-15.2

The code (note the element access over a strides array, which is common for blas routines)

#include <omp.h> #include <stdio.h> int main(int argc, char** argv) { int x[600]; const int stride[2]={1,2}; #pragma omp target data map (tofrom: x[0:600]) map( to:stride[0:2])device(omp_get_default_device()) #pragma omp target simd device(omp_get_default_device()) for(size_t i=0;i<200;i++) x[i*stride[0]+i*stride[1]]=-1; printf("%d",x[597]); return 0; }

when compiled with gcc-15.2 and the following options
-fopenmp -foffload=nvptx-none -fno-stack-protector -O3 -Wall

yields no warning and compiles fine and runs on device.

If compiled with clang 21.1.3 and the following options
-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -O3 -Wall

one gets the following warning from clang

warning: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]

The cuda code generated by clang is very efficient when compared to gcc. However, when you already implement openacc with its vector construct, and if openacc is implemented by the openmp runtime, then it would make sense to turn omp target simd into an acc vector equivalent, and allow the constructs omp target parallel for simd, and omp target teams distribute parallel for simd as well as collapse, reduction and tile constructs for this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: let clang interpret openmp target simd as equivalent to acc vector for nvptx and amdgpu targets like it is done in current gcc-15.2 #163335

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: let clang interpret openmp target simd as equivalent to acc vector for nvptx and amdgpu targets like it is done in current gcc-15.2 #163335

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions