Skip to content

Commit 2b97bc4

Browse files
committed
[NFC] Add documentation for declare target as requested
1 parent 9fd5ec3 commit 2b97bc4

File tree

1 file changed

+124
-0
lines changed

1 file changed

+124
-0
lines changed
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
<!--===- docs/OpenMP-declare-target.md
2+
3+
Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
See https://llvm.org/LICENSE.txt for license information.
5+
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
7+
-->
8+
9+
# Introduction to Declare Target
10+
11+
In OpenMP `declare target` is a directive that can be applied to a function or variable (primarily global) to notate to the compiler that it should be generated in a particular devices environment. In essence whether something should be emitted for host or device, or both. An example of its usage for both data and functions can be seen below.
12+
13+
```Fortran
14+
module test_0
15+
integer :: sp = 0
16+
!$omp declare target link(sp)
17+
end module test_0
18+
19+
program main
20+
use test_0
21+
!$omp target map(tofrom:sp)
22+
sp = 1
23+
!$omp end target
24+
end program
25+
```
26+
27+
In the above example, we created a variable in a seperate module, mark it as `declare target` and then map it, embedding it into the device IR and assigning to it.
28+
29+
30+
```Fortran
31+
function func_t_device() result(i)
32+
!$omp declare target to(func_t_device) device_type(nohost)
33+
INTEGER :: I
34+
I = 1
35+
end function func_t_device
36+
37+
program main
38+
!$omp target
39+
call func_t_device()
40+
!$omp end target
41+
end program
42+
```
43+
44+
In the above example, we are stating that a function is required on device utilising `declare target`, and that we will not be utilising it on host, so we are in theory free to remove or ignore it. A user could also in this case, leave off the `declare target` from the function and it would be implicitly marked `declare target any` (for both host and device), as it's been utilised within a target region.
45+
46+
# Declare Target as represented in the OpenMP Dialect
47+
48+
In the OpenMP Dialect `declare target` is not represented by a specific `operation` instead it's a OpenMP dialect specific `attribute` that can be applied to any operation in any dialect. This helps to simplify the utilisation of it, instead of replacing or modifying existing global or function `operations` in a dialect it applies to it as extra metadata that the lowering can use in different ways as is neccesary.
49+
50+
The `attribute` is composed of multiple fields representing the clauses you would find on the `declare target` directive i.e. device type (`nohost`, `any`, `host`) or the capture clause (`link` or `to`). A small example of `declare target` applied to an Fortran `real` can be found below:
51+
52+
```MLIR
53+
fir.global internal @_QFEi {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} : f32 {
54+
%0 = fir.undefined f32
55+
fir.has_value %0 : f32
56+
}
57+
```
58+
59+
This would look similar for function style `operations`.
60+
61+
The application and access of this attribute is aided by an OpenMP Dialect MLIR Interface named `DeclareTargetInterface`, which can be utilised on operations to access the appropriate interface functions, e.g.:
62+
63+
```C++
64+
auto declareTargetGlobal = llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(Op.getOperation());
65+
declareTargetGlobal.isDeclareTarget();
66+
```
67+
68+
# Declare Target Fortran OpenMP Lowering
69+
70+
The initial lowering of `declare target` to MLIR for both use-cases is done inside of the usual OpenMP lowering in flang/lib/Lower/OpenMP.cpp, however some direct calls to `declare target` related functions from Flang's Bridge in flang/lib/Lower/Bridge.cpp are made.
71+
72+
The marking of operations with the declare target attribute happens in two phases, the second one optional contingent on the first failing to apply the attribute due to the operation not being generated yet, the main notable case this occurs currently is when a Fortran function interface has been marked.
73+
74+
The initial phase happens when the declare target directive and it's clauses are initially processed, with the primary data gathering for the directive and clause happening in a function called `getDeclareTargetInfo` which is then used to feed the `markDeclareTarget` function which does the actual marking utilising the `DeclareTargetInterface`, if it encounters something that has been marked twice over multiple directives with two differing device types (e.g. `host`, `nohost`), then it will swap the device type to `any`.
75+
76+
Whenever we invoke `genFIR` on an `OpenMPDeclarativeConstruct` from Bridge, we are also invoking another function
77+
called `gatherOpenMPDeferredDeclareTargets` which gathers information relevant to the application of the `declare target` attribute (symbol that it should be applied to, device type clause, and capture clause) when processing `declare target` declarations, storing the data in a vector that is part of Bridge's instantiation of the `AbstractConverter`. This data is only stored if we encounter a function or variable symbol that does not have an operation instantiated for it yet, unfortunately this cannot happen as part of the initial marking as we must store this data in Bridge and only have access to the abstract version of the converter via the OpenMP lowering.
78+
79+
This information is used in the second phase, which is a form of deferred processing of the `declare target` marked operations that have delayed generation, this is done via the function `markOpenMPDeferredDeclareTargetFunctions` which is called from Bridge at the end of the lowering process allowing us to mark those where possible. It iterates over the data gathered by `gatherOpenMPDeferredDeclareTargets` checking if any of the recorded symbols have now had their corresponding operations instantiated and applying where possible utilising `markDeclareTarget`.
80+
However, it must be noted that it is still possible for operations not to be generated for certain symbols, in particular the case of function interfaces that are not directly used or defined within the current module, this means we cannot emit errors in the case of left-over unmarked symbols, these must (and should) be caught by the initial semantic analysis.
81+
82+
NOTE: `declare target` can be applied to implicit `SAVE` attributed variables, however, by default Flang does not represent these as `GlobalOp`'s which means we cannot tag and lower them as `declare target` normally, instead similarly to the way `threadprivate` handles these cases, we raise and initialize the variable as an internal `GlobalOp` and apply the attribute. This occurs in the flang/lib/Lower/OpenMP.cpp function `genDeclareTargetIntGlobal`.
83+
84+
# Declare Target Transformation Passes for Flang
85+
86+
There are currently two passes within Flang that are related to the processing of `declare target`:
87+
* `OMPMarkDeclareTarget` - This pass is in charge of marking functions captured (called from) in `target` regions or other `declare target` marked functions as `declare target`, it does so recursively, e.g. nested calls will also be implicitly marked. It currently will try to mark things as conservatively as possible, i.e. if captured in a `target` region it will apply `nohost`, unless it encounters something with `host` in which case it will apply the any device type (if it's already `any`, then it's left untouched). Functions are handled similarly, except we utilise the parents device type where possible.
88+
* `OMPFunctionFiltering` - This is executed after `OMPMarkDeclareTarget`, and currently only for device, its job is to conservatively remove functions from the module where possible. This helps make sure incompatible code from the host is not lowered for device (although, a user can still self inject incompatible code, but this mechanism allows them to avoid that). Functions with `target` regions in them are preserved as they may be neccesary to maintain (e.g. reverse offloading in the future), otherwise, we will remove any function marked as a `declare target host` function and any uses will be replaced with `undef`'s so that other passes can appropriately clean them up and in the meantime we don't break verification.
89+
90+
While this infrastructure is generally applicable to more than just Flang, we currently only utilise them in the Flang frontend and they are part of the Flang codebase, rather than the OpenMP dialect codebase.
91+
92+
# Declare Target OpenMP Dialect To LLVM-IR Lowering
93+
94+
The OpenMP dialect lowering of `declare target` is a little unique currently, as it's not an `operation` and is an `attribute` we process it utilising the LLVM Target lowerings `amendOperation`, which occurs immediately after an operation has been lowered to LLVM-IR. As it can be applicable to multiple different operations, we must
95+
specialise this function for each operation type that we may encounter. Currently this is `GlobalOp`'s and
96+
`FuncOp`'s.
97+
98+
In the case where we encounter a `FuncOp` our processing is fairly simple, if we're processing the device code, we will finish up our removal of `host` marked functions, anything we could not remove previously we now remove, e.g. if it had a `target` directive in it (which we need to keep a hold of to this point, to actually outline the `target` kernel for device). This hopefully leaves us with only `any`, `device` or undeterminable functions left in the module to lower further, reducing the possibiltiy of device incompatible code being in the module.
99+
100+
For `GlobalOp`'s, the processing is a little more complex, we currently leverage two OMPIRBuilder functions which we have inherited from Clang and moved to the `OMPIRBuilder` to share across the two compiler frontends `registerTargetGlobalVariable` and `getAddrOfDeclareTargetVar`. These two functions are actually recursive and invoke each other depending on the clauses and options provided to the `OMPIRBuilder` (in particular unified shared memory), but the main functionality they provide is the generation of a new global pointer for device with a "ref_" prefix, and enqueuing metadata generation by the `OMPIRBuilder` at the end of the module, for both host and device that links the newly generated device global pointer and the host pointer together across the two modules (and resulting binaries).
101+
102+
Two things of note about the `GlobalOp` processing, the first is that similarly to other metadata (e.g. for `TargetOp`) that is shared across both host and device modules, the device needs access to the previously generated host IR file, which is done through another `attribute` applied to the `ModuleOp` by the compiler frontend. The file is loaded in and consumed by the `OMPIRBuilder` to populate it's `OffloadInfoManager` data structures, keeping host and device appropriately synchronised.
103+
104+
The second (and more important to remember) is that as we effectively replace the original LLVM-IR generated for the `declare target` marked `GlobalOp` we have some corrections we need to do for `TargetOp`'s (or other region operations that use them directly) which still refer to the original lowered global operation. This is done via `handleDeclareTargetMapVar` which is invoked as the final function and alteration to the lowered `target` region, it's only invoked for device as it's only required in the case where we have emitted the "ref" pointer , and it effectively replaces all uses of the originally lowered global symbol, with our new global ref pointer's symbol. Currently we do not remove or delete the old symbol, this is due to the fact that the same symbol can be utilised across multiple target regions, if we remove it, we risk breaking lowerings of target regions that will be processed at a later time. To appropriately delete these no longer neccesary symbols we would need a deferred removal process at the end of the module, which is currently not in place. It may be possible to store this information in the OMPIRBuilder and then perform this cleanup process on finalization, but this is open for discussion and implementation still.
105+
106+
# Current Support
107+
108+
For the moment, `declare target` should work for:
109+
* Marking functions/subroutines and function/subroutine interfaces for generation on host, device or both.
110+
* Implicit function/subroutine capture for calls emitted in a `target` region or explicitly marked `declare
111+
target` function/subroutine. Note: Calls made via arguments passed to other functions must still be
112+
themselves marked `declare target`, e.g. passing a `C` function pointer and invoking it, then the interface
113+
and the `C` function in the other module must be marked `declare target`, with the same type of
114+
marking as indicated by the specification.
115+
* Marking global variables with `declare target`'s `link` clause and mapping the data to the device data
116+
environment utilising `declare target` (may not work for all types yet, but for scalars and arrays
117+
of scalars, it should).
118+
119+
Doesn't work for, or needs further testing for:
120+
* Marking the following types with `declare target link` (needs further testing):
121+
* Descriptor based types, e.g. pointers/allocatables.
122+
* Derived types.
123+
* Members of derived types (use-case needs legality checking with OpenMP specification).
124+
* Marking global variables with `declare target`'s `to` clause, a lot of the lowering should exist, but it needs further testing and likely some further changes to fully function.

0 commit comments

Comments
 (0)