Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

2
  • 1
    try: cuobjdump -all -ptx ... Commented Nov 26 at 17:58
  • Thank you very much @RobertCrovella . Using that command, I can see the PTXs now, though I still don't understand why --dump-ptx didn't show anything about ncclDevKernel. Unfortunately, trying to run a simple all_gather with PyTorch raises this error: Error in rank 0: CUDA error: no kernel image is available for execution on the device Commented Nov 27 at 12:17