5

I'm trying to send float16 data to an Nvidia P100 card from some Cython code. When I was using float32, I could define my types in Cython like so:

DTYPE = np.float32 ctypedef np.float32_t DTYPE_t cdef np.ndarray[DTYPE_t, ndim=2] mat = np.empty((100, 100), dtype=DTYPE) 

But Cython can't find a defined type for np.float16_t, so I can't just replace the 32 with 16. If I try to provide another type that takes the same amount of space, like np.uint16_t, I get errors like:

Does not understand character buffer dtype format string ('e') 

When I google, all I can find is a thread from 2011 about people trying to figure out how to support it...surely there must be a solution by now?

2
  • 1
    It isn't completely obvious to me how numpy supports it given that GCC doesn't seem to on x86 Commented Nov 21, 2017 at 20:16
  • 3
    (Possibly useful... github.com/numpy/numpy/blob/…) Commented Nov 21, 2017 at 20:20

1 Answer 1

8

I think the answer is "sort of, but it's a reasonable amount of work if you want to do any real calculations".

The basic problem is that C doesn't support a 16-bit float type on most PCs (because the processor instructions don't exist). Therefore, what numpy has done is typedefed a 16-bit unsigned int to store the 16-bit float, and then written a set of functions to convert that to/from the supported float types. Any calculations using np.float16 are actually done on 32-bit or 64-bit floats but the data is stored in the 16-bit format between calculations.

The consequence of this is that Cython doesn't have an easy way of generating valid C code for any calculations it needs to do. The consequence is that you may need to write out this C code yourself.

There are a number of levels of complexity, depending on what you actually want to do:

1) Don't type anything

Cython doesn't actually need you to specify any types - it compiles Python code happy. Therefore, don't assign types to the half-float arrays and just let them by Python objects. This may not be terribly fast but it's worth remembering that it will work.

2) To move data you can view it as uint16

If you're just shuffling data around then can define uint16 arrays and use those to copy it from one place to another. Use the numpy view function to get the data in a format that Cython recognises and to get it back. However, you can't do maths in this mode (the answers will be meaningless).

from libc.stdint cimport uint16_t import numpy as np def just_move_stuff(x): assert x.dtype == np.float16 # I've used memoryviews by cdef np.ndarray should be fine too cdef uint16_t[:] x_as_uint = x.view(np.uint16) cdef uint16_t[:] y_as_uint = np.empty(x.shape,dtype=np.uint16) for n in range(x_as_uint.shape[0]): y_as_uint[n] = x_as_uint[n] return np.asarray(y_as_uint).view(dtype=np.float16) 

The view function doesn't make a copy so is pretty cheap to use.

3) Do maths with manual conversions

If you want to do any calculations you'll need to use numpy's conversion functions to change your "half-float" data to full floats and back. If you forget to do this the answers you get will be meaningless. Start by including them from numpy/halffloat.h:

cdef extern from "numpy/halffloat.h": ctypedef uint16_t npy_half # conversion functions float npy_half_to_float(npy_half h); npy_half npy_float_to_half(float f); def do_some_maths(x): assert x.dtype == np.float16 cdef uint16_t[:] x_as_uint = x.view(np.uint16) cdef uint16_t[:] y_as_uint = np.empty(x.shape,dtype=np.uint16) for n in range(x_as_uint.shape[0]): y_as_uint[n] = npy_float_to_half(2*npy_half_to_float(x_as_uint[n])) return np.asarray(y_as_uint).view(dtype=np.float16) 

This code requires you to link against the numpy core math library:

from distutils.core import setup from distutils.extension import Extension from Cython.Build import cythonize from numpy.distutils.misc_util import get_info info = get_info('npymath') ext_modules = [Extension("module_name", ["module_name.pyx"],**info)] setup( ext_modules = cythonize(ext_modules) ) 
Sign up to request clarification or add additional context in comments.

2 Comments

wouldn't this be at the <= speed compared to 32bit floats? It would only help with memory constraints ?
@DanErez Yes - I'd think there would be a speed penalty and that the main value would be less memory.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.