5

NumPY has complex64 corresponding to two float32's.

But it also has float16's but no complex32.

How come? I have signal processing calculation involving FFT's where I think I'd be fine with complex32, but I don't see how to get there. In particular I was hoping for speedup on NVidia GPU with cupy.

However it seems that float16 is slower on GPU rather than faster.

Why is half-precision unsupported and/or overlooked?

Also related is why we don't have complex integers, as this may also present an opportunity for speedup.

8
  • Why were you expecting a speedup? Commented Jun 26, 2019 at 19:43
  • 2
    Because half the bits to push around. Commented Jun 27, 2019 at 0:28
  • but what if the processor (and c code) is optimized for 32 and 64 bit processing? Most of us aren't using 8 bit processors any more! Commented Jun 27, 2019 at 0:42
  • 1
    With respect to what cupy has or has not implemented, that's probably just a matter of development priority. cupy is still pretty new (e.g. at least compared to CUDA, or numpy, for example). You might express your desire to the cupy developers, in the form of an issue or pull request. I doubt asking a random question on SO is a good way to indicate to the cupy development team your interest. A better way would be to contact them directly (github, for example) and provide a specific example, and maybe even a specific genre, for motivation. Commented Jun 29, 2019 at 15:04
  • 1
    However it seems that float16 is slower on GPU rather than faster. Its certainly possible for a FP16 FFT on a GPU to be faster than a corrsponding F32 (or FP64) FFT. GPU type matters, of course. It also seems like you may have pointed this out in an oblique fashion in your comments, so I'm not sure why you would leave your statement like that in your question unedited. So I'll just leave this here for future readers. Commented Jun 29, 2019 at 15:22

1 Answer 1

4

This issue has been raised in the CuPy repo for some time:

https://github.com/cupy/cupy/issues/3370

But there's no concrete work plan yet; most of the things are still of explorative nature.

One of the reasons that it's not trivial to work out is that there's no numpy.complex32 dtype that we can directly import (note that all CuPy's dtypes are just alias of NumPy's), so there'd be problems when a device-host transfer is asked. The other thing is there's no native mathematical functions written either on CPU or GPU for complex32, so we will need to write them all ourselves to do casting, ufunc, and what not. In the linked issue there is a link to a NumPy discussion, and my impression is it's currently not being considered...

Sign up to request clarification or add additional context in comments.

1 Comment

I would like to add, though, during the preliminary testing to support half-precision FFT in CuPy (github.com/cupy/cupy/pull/4407), we do see that an expected 2x speedup can be obtained on certain architectures. @RobertCrovella It would be great if you could help us understand better why Pascal is not performant there 🙂

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.