[Bugfix][feat]: add support for `fp16` and `bf16` dtype aliases for collective operations. by foraxe · Pull Request #46 · NVIDIA/nvshmem

foraxe · 2025-12-12T06:25:20Z

Problem

Running perftest with -d fp16 or -d bf16 fails with:

AttributeError: module 'nvshmem.bindings' has no attribute 'fp16_sum_reduce_on_stream'

AttributeError: module 'nvshmem.bindings' has no attribute 'bf16_sum_reduce_on_stream'

Root Cause

collective_on_buffer constructs binding function names using the user-provided dtype directly (e.g., fp16_sum_reduce_on_stream), but the actual bindings use:

half for fp16
bfloat16 for bf16

Solution

Added a dtype alias mapping in nvshmem4py/nvshmem/core/collective.py to normalize user-friendly shorthand names to their binding-compatible equivalents:

fp16 → half
bf16 → bfloat16

Changes

nvshmem4py/nvshmem/core/collective.py: Added a dtype_aliases dict and applied normalization before constructing binding function names.

Testing

fp16

Command:

OMPI_ALLOW_RUN_AS_ROOT=1 \ OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 \ mpirun -np 4 -N 4 --bind-to none \ python nvshmem4py/perftest/reduction_on_stream.py \ -b 2M -e 2M -f 2 \ -d fp16 -o sum \ -w 5 -n 20

Former error:

AttributeError: module 'nvshmem.bindings' has no attribute 'fp16_sum_reduce_on_stream'. Did you mean: 'int16_sum_reduce_on_stream'?

Now:

size(B) count type latency(us) min_lat(us) max_lat(us) algbw(GB/s) busbw(GB/s) 2097152 1048576 half-sum 27.4416002 25.088 33.824 76.422 114.634

bf16

Command:

OMPI_ALLOW_RUN_AS_ROOT=1 \ OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 \ mpirun -np 4 -N 4 --bind-to none \ python nvshmem4py/perftest/reduction_on_stream.py \ -b 2M -e 2M -f 2 \ -d bf16 -o sum \ -w 5 -n 20

Former error:

AttributeError: module 'nvshmem.bindings' has no attribute 'bf16_sum_reduce_on_stream'. Did you mean: 'int16_sum_reduce_on_stream'?

Now:

size(B) count type latency(us) min_lat(us) max_lat(us) algbw(GB/s) busbw(GB/s) 2097152 1048576 bfloat16-sum 34.4000001 25.280 147.360 60.964 91.446

Known Issue (Out of Scope)

The following dtypes are listed in the perftest argument parser choices but have no corresponding bindings:

ulonglong — missing ulonglong_*_reduce_on_stream bindings
ptrdiff — missing ptrdiff_*_reduce_on_stream bindings

These will fail at runtime if used. Consider adding bindings or removing these from the supported choices in a future PR.

Dtype Compatibility Analysis

User Input	Binding Function Exists?	Notes
int	✅	int_sum_reduce_on_stream
int32	✅	int32_sum_reduce_on_stream
uint32	✅	uint32_sum_reduce_on_stream
int64	✅	int64_sum_reduce_on_stream
uint64	✅	uint64_sum_reduce_on_stream
long	✅	long_sum_reduce_on_stream
longlong	✅	longlong_sum_reduce_on_stream
ulonglong	❌	Missing: no ulonglong_sum_reduce_on_stream
size	✅	size_sum_reduce_on_stream
ptrdiff	❌	Missing: no ptrdiff_sum_reduce_on_stream
float	✅	float_sum_reduce_on_stream
double	✅	double_sum_reduce_on_stream
fp16	✅	Maps to half_sum_reduce_on_stream (with this fix)
bf16	✅	Maps to bfloat16_sum_reduce_on_stream (with this fix)

…operations.

feat: add support for fp16 and bf16 dtype aliases for collective …

5063bea

…operations.

foraxe mentioned this pull request Dec 15, 2025

[Issue]: AttributeError: module 'nvshmem.bindings' has no attribute 'fp16_sum_reduce_on_stream' #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix][feat]: add support for `fp16` and `bf16` dtype aliases for collective operations.#46

[Bugfix][feat]: add support for `fp16` and `bf16` dtype aliases for collective operations.#46
foraxe wants to merge 1 commit intoNVIDIA:develfrom
foraxe:fix_perf

foraxe commented Dec 12, 2025

Labels

1 participant

Conversation