Skip to content

Conversation

@eyalkoren
Copy link
Contributor

@eyalkoren eyalkoren commented Aug 18, 2021

What does this PR do?

This PR started as a fix to an issue reported in the forum.

The issue is related to a change we did when tests of gRPC version 1.35 started failing due to duplicated client spans. In order to get rid of the extra span, we excluded io.grpc.internal.ManagedChannelImpl$RealChannel from our ChannelInstrumentation. However, this broke other scenarios, so this PR restores instrumentation of said channel and adds special handling for the io.grpc.internal.DelayedClientCall (added in 1.32) instead. This new client call type is created as a placeholder and starts going through the client call lifecycle, but at some point (different between 1.32 and 1.35) it starts referring to the real client call and delegate invocations to it. The new lifecycle management allows creation of duplicated spans, but when io.grpc.internal.DelayedClientCalls are used, their corresponding spans replace the ones created for the "real" calls once they are set as the delegate of the placeholder call, and the spans created for the real calls are discarded.

In addition, we now use ensureInstrumented to instrument client calls, server calls, client listeners and server listeners. By doing that, we can both be very efficient in type matching and still support custom types. While customer client/server calls are sort of edge case, custom listeners are probably more expected. Our tests now include custom client call and client/server listeners.

Lastly, I refactored all instrumentations to extract advice methods into separate (inner) classes, so to ensure proper dependencies visibility.

In a separate PR I intend to apply some enhancements to SpanConcurrentHashMap and then replace all maps within GrpcHelper with such.

Checklist

  • This is an enhancement of existing features, or a new feature in existing plugins
    • I have updated CHANGELOG.asciidoc
    • I have added tests that prove my fix is effective or that my feature works
    • The reporting user confirmed general functionality and resolution of the reported issue
@ghost
Copy link

ghost commented Aug 18, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-08-22T08:38:06.566+0000

  • Duration: 60 min 49 sec

  • Commit: ce7d2ed

Test stats 🧪

Test Results
Failed 0
Passed 2381
Skipped 17
Total 2398

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 2381
Skipped 17
Total 2398

@eyalkoren eyalkoren marked this pull request as ready for review August 18, 2021 12:36
@eyalkoren eyalkoren requested a review from SylvainJuge August 18, 2021 12:36
Copy link
Member

@SylvainJuge SylvainJuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only a few questions/comments.

}
}

if (spanToMap != null && !spanToMap.isDiscarded()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] in which case does the span would have been discarded ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's a real possibility, but since we are discarding spans I think it worth the check - we don't need to continue with mappings for discarded spans.
Theoretical option: two internal gRPC channels delegate newCall() from one to the other, where the nested call already sets the created call as the real call in the corresponding DelayedClientCall, causing the active span to be discarded already before the outer call exits.

}

@Test
@Disabled // disabled for now as it's flaky on CI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤞

Comment on lines +40 to +45
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty-shaded</artifactId>
<version>${grpc.version}</version>
<scope>test</scope>
</dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] why do we need this extra dependency for testing ? is it required only for newer versions ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this for getting sources so I can debug an entire version. Since it's in test scope, I think worth keeping it for the same purpose.

@eyalkoren eyalkoren merged commit 72895b0 into elastic:master Aug 22, 2021
@eyalkoren eyalkoren deleted the grpc-fix branch August 22, 2021 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants