Skip to content

Add version details to NVSHMEM version mismatch errors#32

Open
AkiRusProd wants to merge 1 commit intoNVIDIA:develfrom
AkiRusProd:feature/detail-versions
Open

Add version details to NVSHMEM version mismatch errors#32
AkiRusProd wants to merge 1 commit intoNVIDIA:develfrom
AkiRusProd:feature/detail-versions

Conversation

@AkiRusProd
Copy link
Contributor

Add Version Details to NVSHMEM Version Mismatch Errors

Problem: When NVSHMEM device and host library versions mismatch, the error message only provided a generic warning without specific version details, making it difficult to diagnose compatibility issues.

Solution: Added detailed version number output in major.minor.patch format when version mismatches are detected:

  • In nvshmemid_hostlib_init_attr function during library initialization
  • In nvshmemi_cuobject_init_common function during CUmodule/CUlibrary initialization

Error output example now shows:

NVSHMEM device library version (3.3.24) does not match with NVSHMEM host library version (3.2.5) 

Benefits:

  • Accelerates compatibility issue diagnosis
  • Allows precise identification of which version needs updating
  • Simplifies debugging in heterogeneous environments with different library versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant