-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: allow get_dummies to accept dtype argument #18330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@ ## master #18330 +/- ## ========================================== - Coverage 91.35% 91.33% -0.02% ========================================== Files 163 163 Lines 49714 49719 +5 ========================================== - Hits 45415 45410 -5 - Misses 4299 4309 +10
Continue to review full report at Codecov.
|
pandas/core/reshape/reshape.py Outdated
| drop_first : bool, default False | ||
| Whether to get k-1 dummies out of k categorical levels by removing the | ||
| first level. | ||
| dtype : dtype, default np.uint8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versionadded tag
| assert res.columns.tolist() == ['CAP', 'low', 'value'] | ||
| | ||
| | ||
| class TestGetDummies(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this define a fixture that returns the various dtypes that you are testing
doc/source/whatsnew/v0.22.0.txt Outdated
| ^^^^^^^^^ | ||
| | ||
| - | ||
| - :func:`get_dummies` now supports ``dtype`` argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a little more expl, add the PR number as the issue number. Move to Other Enhancements section.
| All done. Also updated sparse tests to use fixtures as well. And added one test to verify effective dtype is uint8 when dtype argument is None. |
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good generally. thanks for parametrizing the tests!
doc/source/whatsnew/v0.22.0.txt Outdated
| | ||
| - Better support for :func:`Dataframe.style.to_excel` output with the ``xlsxwriter`` engine. (:issue:`16149`) | ||
| - :func:`pandas.tseries.frequencies.to_offset` now accepts leading '+' signs e.g. '+1h'. (:issue:`18171`) | ||
| - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a separate sub-section for this
doc/source/whatsnew/v0.22.0.txt Outdated
| - Better support for :func:`Dataframe.style.to_excel` output with the ``xlsxwriter`` engine. (:issue:`16149`) | ||
| - :func:`pandas.tseries.frequencies.to_offset` now accepts leading '+' signs e.g. '+1h'. (:issue:`18171`) | ||
| - | ||
| - :func:`pandas.get_dummies` now supports ``dtype`` argument, which forces specific dtype for new columns. (:issue:`18330`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say default is the same (uint8)
doc/source/whatsnew/v0.22.0.txt Outdated
| - | ||
| - :func:`pandas.get_dummies` now supports ``dtype`` argument, which forces specific dtype for new columns. (:issue:`18330`) | ||
| | ||
| .. code-block:: ipython |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use an ipython block, show the original as well (first)
pandas/tests/reshape/test_reshape.py Outdated
| self.df = DataFrame({'A': ['a', 'b', 'a'], | ||
| 'B': ['b', 'b', 'c'], | ||
| 'C': [1, 2, 3]}) | ||
| @pytest.fixture(params=['uint8', 'float64']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cycle thru more dtypes here that are valid (doesn't have to be all, but include int64, bool, IOW valid for both sparse/dense)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add None as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should prob raise on object dtype I think.
pandas/tests/reshape/test_reshape.py Outdated
| expected = DataFrame({'a': [1, 0, 0], | ||
| 'b': [0, 1, 0], | ||
| 'c': [0, 0, 1]}, dtype=dtype) | ||
| result = get_dummies(s_list, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't construct the kwargs, actually just pass directly
pandas/tests/reshape/test_reshape.py Outdated
| | ||
| # Sparse dataframes do not allow nan labelled columns, see #GH8822 | ||
| res_na = get_dummies(s, dummy_na=True, sparse=self.sparse) | ||
| res_na = get_dummies(s, dummy_na=True, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above, don't use kwargs generally for passing args to test functions, rather pass directly
3d504fb to d93ee28 Compare | Hello @Scorpil! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on November 22, 2017 at 13:15 Hours UTC |
0850ac7 to cb3156a Compare | - :func:`pandas.tseries.frequencies.to_offset` now accepts leading '+' signs e.g. '+1h'. (:issue:`18171`) | ||
| - | ||
| | ||
| ``get_dummies`` now supports ``dtype`` argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a ref here as well.
doc/source/whatsnew/v0.22.0.txt Outdated
| ``get_dummies`` now supports ``dtype`` argument | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
| | ||
| :func:`get_dummies` function now accepts ``dtype`` argument, which forces specific dtype for new columns. When ``dtype`` is not specified or equals to ``None``, new columns will have dtype ``uint8`` (as before), so this change is backwards compatible. (:issue:`18330`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The :func:`get_dummies` function now accepts adtypeargument, which forces a specific dtype for the new columns. The default isuint8ifdtypeis not specified orNone``.
pandas/core/reshape/reshape.py Outdated
| if dtype is None: | ||
| dtype = np.uint8 | ||
| | ||
| if np.dtype(dtype) is np.dtype('O'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_object_dtype
pandas/core/reshape/reshape.py Outdated
| | ||
| if np.dtype(dtype) is np.dtype('O'): | ||
| raise TypeError("'object' is not a valid type for get_dummies") | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be a ValueError; also so dtype=object is not a valid dtype for get_dummies
pandas/core/reshape/reshape.py Outdated
| | ||
| def _get_dummies_1d(data, prefix, prefix_sep='_', dummy_na=False, | ||
| sparse=False, drop_first=False): | ||
| sparse=False, drop_first=False, dtype=np.uint8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be passthru?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, missed this one. Idea was to treat this one as internal and allow "wrapper" to set dtype, but passthru has it's advantages and I don't mind ether way, so I'll move dtype-related conversions to this method.
TomAugspurger left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I'll finish taking a look later but my only real concern is how many times you parametrize by the dtype. It's great to do that for some tests like test_basic_dtype and a few others, but I'm not sure about all the prefix / sep tests. Do you have specific concerns that you're trying to test there?
doc/source/whatsnew/v0.22.0.txt Outdated
| ``get_dummies`` now supports ``dtype`` argument | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
| | ||
| :func:`get_dummies` function now accepts ``dtype`` argument, which forces specific dtype for new columns. When ``dtype`` is not specified or equals to ``None``, new columns will have dtype ``uint8`` (as before), so this change is backwards compatible. (:issue:`18330`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can remove
functionafterget_dummies. - now accepts a dtype argument
- replace forces with specifies a
Replace the second sentence with
When ``dtype`` is not specified, the dtype will be ``uint8`` as before. doc/source/whatsnew/v0.22.0.txt Outdated
| .. ipython:: python | ||
| | ||
| df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]}) | ||
| pd.get_dummies(df, columns=['c']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the "Previous behavior" section since this is backwards compatible.
I'd just do
pd.get_dummies(df, columns=['c']).dtypes pd.get_dummies(df, columns=['c'], dtype=bool).dtypes | | ||
| def get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, | ||
| columns=None, sparse=False, drop_first=False): | ||
| columns=None, sparse=False, drop_first=False, dtype=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use 'uint8' or np.uint8 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to mirror API of DataFrame, Series, Panel etc. where passing None explicitly is allowed and means "dtype will be inferred".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback @TomAugspurger So this is the last question to answer. Do you accept my argument about None or should I change it to np.uint8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok here, it follows a similar style elsewhere
| .. versionadded:: 0.18.0 | ||
| dtype : dtype, default np.uint8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also accept arguments to np.dtype like the string 'i8', and handle those appropriately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
he is using np.dtype()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should work already, I'll add it to the tests.
| # e.g. TestGetDummies::test_basic[uint8-sparse] instead of [uint8-True] | ||
| return request.param == 'sparse' | ||
| | ||
| def effective_dtype(self, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make the default np.uint8 you can remove this.
| 'C': [1, 2, 3]}) | ||
| | ||
| @pytest.fixture(params=['uint8', 'int64', np.float64, bool, None]) | ||
| def dtype(self, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use this fixture in many places it's going to add a ton of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are all quick, so this is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I only had 'uint8' and 'float64', but @jreback reasonably suggested to add some more. What would be a good balance here? If I'll remove usage of this fixture from all the unrelated tests like prefix / separator tests, and move None to separate stand-alone test, would ['uint8', 'i8', np.float64, bool] be OK? Still x4 number of tests, but each item uses a different way to specify dtype, so i think it's meaningful set of fixtures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think toms point is to not apply this fixture to every test (just relevant ones)
pandas/tests/reshape/test_reshape.py Outdated
| def test_dataframe_dummies_all_obj(self): | ||
| df = self.df[['A', 'B']] | ||
| result = get_dummies(df, sparse=self.sparse) | ||
| def test_dataframe_dummies_all_obj(self, df, sparse, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of parametrizing by dtype here?
pandas/tests/reshape/test_reshape.py Outdated
| assert_frame_equal(result, expected) | ||
| | ||
| def test_dataframe_dummies_prefix_str(self): | ||
| def test_dataframe_dummies_prefix_str(self, df, sparse, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question. Is there any relationship between the prefix and dtype? The seem orthogonal.
pandas/tests/reshape/test_reshape.py Outdated
| | ||
| def test_dataframe_dummies_subset(self): | ||
| df = self.df | ||
| def test_dataframe_dummies_subset(self, df, sparse, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question: Any interaction between subset and dtype?
pandas/tests/reshape/test_reshape.py Outdated
| def test_dataframe_dummies_prefix_sep(self): | ||
| df = self.df | ||
| result = get_dummies(df, prefix_sep='..', sparse=self.sparse) | ||
| def test_dataframe_dummies_prefix_sep(self, df, sparse, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question :)
| - :func:`pandas.tseries.frequencies.to_offset` now accepts leading '+' signs e.g. '+1h'. (:issue:`18171`) | ||
| - :class:`pandas.io.formats.style.Styler` now has method ``hide_index()`` to determine whether the index will be rendered in ouptut (:issue:`14194`) | ||
| - :class:`pandas.io.formats.style.Styler` now has method ``hide_columns()`` to determine whether columns will be hidden in output (:issue:`14194`) | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a ref here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, not used to work with sphinx. Do you mean something like this here:
- :func:`get_dummies` now supports ``dtype`` argument, see :ref:`here <whatsnew_0220.enhancements.get_dummies_dtype>` for more (:issue: `18330`) and then this before the actual description block:
.. _whatsnew_0220.enhancements.get_dummies_dtype ?
| | ||
| def get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, | ||
| columns=None, sparse=False, drop_first=False): | ||
| columns=None, sparse=False, drop_first=False, dtype=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok here, it follows a similar style elsewhere
pandas/core/reshape/reshape.py Outdated
| | ||
| if dtype is None: | ||
| dtype = np.uint8 | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a dtype = np.dtype(dtype)
pandas/core/reshape/reshape.py Outdated
| | ||
| def _get_dummies_1d(data, prefix, prefix_sep='_', dummy_na=False, | ||
| sparse=False, drop_first=False): | ||
| sparse=False, drop_first=False, dtype=np.uint8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
afbf368 to 6d447c3 Compare
TomAugspurger left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave another quick glance, and things look good here.
pandas/tests/reshape/test_reshape.py Outdated
| # not that you should do this... | ||
| df = self.df | ||
| result = get_dummies(df, prefix='bad', sparse=self.sparse) | ||
| df[['C']] = df[['C']].astype(np.uint8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a bit weird... Having 2 columns with identical names caused ValueError when I tried expected['C'] = expected['C']. Now, when you mentioned it, I see in diff that expected = expected.astype({"C": np.int64}) should work, I'll put it back.
| def test_dataframe_dummies_with_na(self, df, sparse, dtype): | ||
| df.loc[3, :] = [np.nan, np.nan, np.nan] | ||
| result = get_dummies(df, dummy_na=True, sparse=self.sparse) | ||
| result = get_dummies(df, dummy_na=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you add the sorting because the output changed, or to make the test easier to write?
I slightly prefer the explicit ordering rather than sorting, though that'll be covered elsewhere so changing it isn't a huge deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a bit easier to write.
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. small doc changes. have a look in reshaping.rst if any doc updates are needed.
doc/source/whatsnew/v0.22.0.txt Outdated
| - :class:`pandas.io.formats.style.Styler` now has method ``hide_index()`` to determine whether the index will be rendered in ouptut (:issue:`14194`) | ||
| - :class:`pandas.io.formats.style.Styler` now has method ``hide_columns()`` to determine whether columns will be hidden in output (:issue:`14194`) | ||
| - Improved wording of ``ValueError`` raised in :func:`to_datetime` when ``unit=`` is passed with a non-convertible value (:issue:`14350`) | ||
| - :func:`get_dummies` now supports ``dtype`` argument, see :ref:`here <whatsnew_0220.enhancements.get_dummies_dtype>` for more (:issue:`18330`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this line, its already covered in the sub-section. move the sub-section before other enhancements
pandas/tests/reshape/test_reshape.py Outdated
| return np.uint8 | ||
| return dtype | ||
| | ||
| def test_throws_on_dtype_object(self, df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throws -> raises
8ab9859 to a1de373 Compare | pd.get_dummies(df, dtype=bool).dtypes | ||
| .. versionadded:: 0.22.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ca you move to before the example
doc/source/whatsnew/v0.22.0.txt Outdated
| ``get_dummies`` now supports ``dtype`` argument | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
| | ||
| The :func:`get_dummies` now accepts a ``dtype`` argument, which specifies a specific dtype for the new columns. When ``dtype`` is not specified or ``None``, the dtype will be ``uint8`` as before. (:issue:`18330`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just say the default remains uint8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also removed useless 'specific' in "specifies a specific dtype".
| df3.unstack() | ||
| .. versionadded: 0.18.0 | ||
| .. versionadded:: 0.18.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a typo, right? There are couple more places where second double column is missing:
pandas/core/frame.py 4516: .. versionadded: 0.18.0 4679: .. versionadded: 0.16.1 pandas/core/generic.py 968: .. versionadded: 0.21.0 pandas/core/series.py 1629: .. versionadded: 0.19.0 2216: .. versionadded: 0.18.0 pandas/core/tools/datetimes.py 117: .. versionadded: 0.18.1 143: .. versionadded: 0.16.1 181: .. versionadded: 0.20.0 187: .. versionadded: 0.22.0 pandas/tseries/offsets.py 778: .. versionadded: 0.16.1 882: .. versionadded: 0.18.1 There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yes looks that way. would be great if you can update those! (if you really want to could also add a lint rule to search for these and fail the build if they are found) (also in doc dir too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate PR or this will do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate PR prob better. (the one you changed already is fine). I think we DO want to add some more generic checks for these formatting tags, I guess sphinx doesn't complain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's just comments for sphinx. I'll create an issue then, and see what I can do when I have time to look into it. Or somebody will pick it up before that, which is also fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/core/reshape/reshape.py Outdated
| | ||
| if dtype is None: | ||
| dtype = np.uint8 | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if dtype is None dtype = np.uint8 dtype = np.dtype(dtype) a bit more idiomatic
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments. ping on green.
Use pytest fixtures. Add test for dtype=None.
fecb047 to d19d81f Compare | @jreback it's green. |
| thanks @Scorpil nice patch! keem em coming! |
git diff upstream/master -u -- "*.py" | flake8 --diffUpdate in version 0.19.0 made
get_dummiesreturn uint8 values instead of floats (#8725). While I agree with the argument thatget_dummiesshould output integers by default (to save some memory), in many cases it would be beneficial for user to choose other dtype.In my case there was serious performance degradation between versions 0.18 and 0.19. After investigation, reason behind it turned out to be the change to
get_dummiesoutput type. DataFrame with dummy values was used as an argument to np.dot in an optimization function (second argument was matrix of floats). Since there were lots of iterations involved, and on each iteration np.dot was converting all uint8 values to float64, conversion overhead took unreasonably long time. It is possible to work around this issue by converting dummy columns "manually" afterwards, but it adds unnecessary complexity to the code and is clearly less convenient than callingget_dummieswithdtype=float.Apart from performance considerations, I can imagine
dtype=boolto be a common use case.get_dummies(data, dtype=None)is allowed and will return uint8 values to match the DataFrame interface (where None allows inferring datatype, which is default behavior).I've extended the test suite to run all the
get_dummiestests (except for those that don't deal with internal dtypes, liketest_just_na) twice, once withuint8and once withfloat64.