[OpenMP][OMPT] Use global thread id for codeptr_ra in end_critical #169826
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
When a critical construct has finished, it will trigger a critical-released event. If a tool is attached, and the
mutex_releasedcallback was registered, the tool with receive an event containing thecodeptr_ra, the return address of the callback invocation.All the way back in 82e94a5, this
codeptr_rawas implemented by calling__ompt_load_return_addresswith a fixed global thread id of0. However, this approach results in a race-condition, and can yield incorrect results to the tool.__ompt_load_return_address(0)points to the current return address of the thread 0 in__kmp_threads. This thread may already execute some other construct. A tool might therefore receive the return address of e.g. somelibompinternals, or other parts of the user code. Additionally, a call to__ompt_load_return_addressresets theth.ompt_thread_info.return_addresstoNULL, therefore also affecting the return address of thread 0. Another dispatched event, e.g. parallel-begin might therefore not transfer anycodeptr_ra.To fix this, replace the fixed thread id by the
global_tid, which is stored just before dispatching themutex_releasedcallback.Context beyond the commit message:
Click to open
We ran into this issue while investigating a race-condition on our own in Score-P. After fixing our issue with nested parallelism and critical constructs, we've noticed that the resulting call trees were off.
A code like this:
was causing a broken call tree, e.g.:
Where critical regions were not resolved by
addr2line(because they pointed e.g. tolibompinternals), orfor/parallelconstructs were missing the code position entirely. Looking at individual memory addresses closely in a debug build, I noticed that multiple threads were writing toth.ompt_thread_info.return_address, which looked incorrect to me.It would be great to have a test for this, but I haven't been able to trigger this consistently, especially not with a light-weight tool. If anyone has an idea for this, please give suggestions.