Skip to content

[Bugfix][feat]: add support for fp16 and bf16 dtype aliases for collective operations.#46

Open
foraxe wants to merge 1 commit intoNVIDIA:develfrom
foraxe:fix_perf
Open

[Bugfix][feat]: add support for fp16 and bf16 dtype aliases for collective operations.#46
foraxe wants to merge 1 commit intoNVIDIA:develfrom
foraxe:fix_perf

Conversation

@foraxe
Copy link

@foraxe foraxe commented Dec 12, 2025

Problem

Running perftest with -d fp16 or -d bf16 fails with:

AttributeError: module 'nvshmem.bindings' has no attribute 'fp16_sum_reduce_on_stream' 
AttributeError: module 'nvshmem.bindings' has no attribute 'bf16_sum_reduce_on_stream' 

Root Cause

collective_on_buffer constructs binding function names using the user-provided dtype directly (e.g., fp16_sum_reduce_on_stream), but the actual bindings use:

  • half for fp16

  • bfloat16 for bf16

Solution

Added a dtype alias mapping in nvshmem4py/nvshmem/core/collective.py to normalize user-friendly shorthand names to their binding-compatible equivalents:

  • fp16half

  • bf16bfloat16

Changes

  • nvshmem4py/nvshmem/core/collective.py: Added a dtype_aliases dict and applied normalization before constructing binding function names.

Testing

fp16

Command:

OMPI_ALLOW_RUN_AS_ROOT=1 \ OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 \ mpirun -np 4 -N 4 --bind-to none \ python nvshmem4py/perftest/reduction_on_stream.py \ -b 2M -e 2M -f 2 \ -d fp16 -o sum \ -w 5 -n 20 

Former error:

AttributeError: module 'nvshmem.bindings' has no attribute 'fp16_sum_reduce_on_stream'. Did you mean: 'int16_sum_reduce_on_stream'? 

Now:

size(B) count type latency(us) min_lat(us) max_lat(us) algbw(GB/s) busbw(GB/s) 2097152 1048576 half-sum 27.4416002 25.088 33.824 76.422 114.634 

bf16

Command:

OMPI_ALLOW_RUN_AS_ROOT=1 \ OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 \ mpirun -np 4 -N 4 --bind-to none \ python nvshmem4py/perftest/reduction_on_stream.py \ -b 2M -e 2M -f 2 \ -d bf16 -o sum \ -w 5 -n 20 

Former error:

AttributeError: module 'nvshmem.bindings' has no attribute 'bf16_sum_reduce_on_stream'. Did you mean: 'int16_sum_reduce_on_stream'? 

Now:

size(B) count type latency(us) min_lat(us) max_lat(us) algbw(GB/s) busbw(GB/s) 2097152 1048576 bfloat16-sum 34.4000001 25.280 147.360 60.964 91.446 

Known Issue (Out of Scope)

The following dtypes are listed in the perftest argument parser choices but have no corresponding bindings:

  • ulonglong — missing ulonglong_*_reduce_on_stream bindings

  • ptrdiff — missing ptrdiff_*_reduce_on_stream bindings

These will fail at runtime if used. Consider adding bindings or removing these from the supported choices in a future PR.

Dtype Compatibility Analysis

User Input Binding Function Exists? Notes
int int_sum_reduce_on_stream
int32 int32_sum_reduce_on_stream
uint32 uint32_sum_reduce_on_stream
int64 int64_sum_reduce_on_stream
uint64 uint64_sum_reduce_on_stream
long long_sum_reduce_on_stream
longlong longlong_sum_reduce_on_stream
ulonglong Missing: no ulonglong_sum_reduce_on_stream
size size_sum_reduce_on_stream
ptrdiff Missing: no ptrdiff_sum_reduce_on_stream
float float_sum_reduce_on_stream
double double_sum_reduce_on_stream
fp16 Maps to half_sum_reduce_on_stream (with this fix)
bf16 Maps to bfloat16_sum_reduce_on_stream (with this fix)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant