Fix nonzero of a SparseArray #21175

babky · 2018-05-22T20:27:37Z

The nonzero operation returned the nonzero locations of the underlying index. However we need to get the nonzero locations in the real array. For this operation to be faster an inverse index structure would be beneficial or it could be implemented using binary search.

sa = pd.SparseArray([float('nan'), float('nan'), 1, 0, 0, 2, 0, 0, 0, 3, 0, 0])

returned 0, 3, 7. The index is shifted by two because of the two first NaNs and that's why the 0, 3, 7 are returned. The correct result would be 2, 5, 9 and is found in the method.

For the above sample the code works. However for other implementations of SparseIndex it could be broken.

closes Method dropna does not work on SparseDataFrames #21172
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

The nonzero operation returned the nonzero locations of the underlying index. However we need to get the nonzero locations in the real array. For this operation to be faster an inverse index structure would be beneficial or it could be implemented using binary search. ```python sa = pd.SparseArray([float('nan'), float('nan'), 1, 0, 0, 2, 0, 0, 0, 3, 0, 0]) ``` returned `0, 3, 7`. The index is shifted by two because of the two first `NaN`s and that's why the `0, 3, 7` are returned. The correct result would be `2, 5, 9` and is found in the method.

pep8speaks · 2018-05-22T20:27:40Z

Hello @babky! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/tests/sparse/frame/test_frame.py !
There are no PEP8 issues in the file pandas/tests/sparse/test_indexing.py !

Comment last updated on November 08, 2018 at 09:20 Hours UTC

jbrockmendel · 2018-07-03T18:29:27Z

@babky int32 isn’t imported, needs to be changed to np.int32.

jbrockmendel · 2018-08-17T19:40:20Z

@babky small fixup needed to get this working

babky · 2018-08-20T08:05:13Z

hi, i will fix them in near future, was on a vacation...

jreback · 2018-08-20T10:28:37Z

needs tests, whatsnew entry, and should be a vectorized soln.

jorisvandenbossche · 2018-08-20T11:38:29Z

cc @TomAugspurger pinging you here since you are working on sparse, to make sure this PR would not conflict with your work

TomAugspurger · 2018-08-20T11:40:36Z

At a glance, this approach won’t work with my implementation since calls super to get ndarray.nonzero

babky · 2018-11-08T07:48:56Z

The issue looks gone in the current master. I've added some unit tests to cover the issue. From my really fast overlook of the tests I think that the tests could be added.

codecov · 2018-11-08T09:59:57Z

Codecov Report

Merging #21175 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@ Coverage Diff @@ ## master #21175 +/- ## ========================================== + Coverage 92.25% 92.25% +<.01%  ========================================== Files 161 161 Lines 51390 51390 ========================================== + Hits 47411 47412 +1  + Misses 3979 3978 -1

Flag	Coverage Δ
#multiple	`90.65% <ø> (ø)`	⬆️
#single	`42.31% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/sparse.py	`91.94% <0%> (+0.12%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db2066b...2c912f1. Read the comment docs.

babky · 2018-11-09T04:57:45Z

@jbrockmendel @TomAugspurger: I believe the issue is gone in master branch and I just added the tests which should prevent a regression.

TomAugspurger · 2018-11-09T22:43:52Z

pandas/tests/sparse/test_indexing.py

 s.iloc[indexer]


+class TestSparseArray(object):


Could you move this to pandas/tests/arrays/sparse/test_array.py?

Both things completed 👍, sorry for it.

TomAugspurger

Can you also add a whatsnew in 0.24.0.txt? I don't see one saying that we fixed nonzero.

As required in PR

TomAugspurger · 2018-11-11T12:52:17Z

Moved it one more time. We reorganized things.

jreback · 2018-11-11T16:46:33Z

doc/source/whatsnew/v0.24.0.txt

 - Bug in ``DataFrame.groupby`` not including ``fill_value`` in the groups for non-NA ``fill_value`` when grouping by a sparse column (:issue:`5078`)
 - Bug in unary inversion operator (``~``) on a ``SparseSeries`` with boolean values. The performance of this has also been improved (:issue:`22835`)
 - Bug in :meth:`SparseArary.unique` not returning the unique values (:issue:`19595`)
+- Bug in ``SparseArray.nonzero`` and `SparseDataFrame.dropna` returning shifted/invalid results (:issue:`21172`)


can you add :func: to reference the api of these

jreback · 2018-11-11T16:46:42Z

pandas/tests/arrays/sparse/test_array.py

 tm.assert_sp_array_equal(res, exp)

+ def test_nonzero(self):
+ sa = pd.SparseArray([


can you add the issue number

jreback · 2018-11-11T16:47:14Z

pandas/tests/arrays/sparse/test_array.py

+ 2, 0, 0, 0,
+ 3, 0, 0
+ ])
+ tm.assert_numpy_array_equal(np.array([2, 5, 9], dtype=np.int32),


can you use

result = expected = tm.assert_numpy_array_equal(result, expected)

jreback · 2018-11-11T16:47:31Z

pandas/tests/sparse/frame/test_frame.py

 assert type(res[column]) is SparseSeries
+
+ def test_dropna(self):
+ tm.assert_sp_frame_equal(


use the result= and expected= format

add the issue number as a comment

jreback · 2018-11-11T16:47:54Z

pandas/tests/sparse/frame/test_frame.py

+ pd.SparseDataFrame({"F2": [0, 1]}),
+ pd.SparseDataFrame(
+ {"F1": [float('nan'), float('nan')], "F2": [0, 1]}
+ ).dropna(axis=1, inplace=False, how='all')


can you test for inplace=True/False, how='all', 'any'

via parametrization

babky · 2018-11-16T08:08:00Z

@jreback the comments should be resolved now

TomAugspurger · 2018-11-16T12:24:38Z

df_info.txt

@@ -0,0 +1,8 @@
+<class 'pandas.core.frame.DataFrame'>


Committed by accident?

Yes :-( 👍

TomAugspurger · 2018-11-16T12:25:11Z

doc/source/whatsnew/v0.24.0.rst

 - Bug in ``DataFrame.groupby`` not including ``fill_value`` in the groups for non-NA ``fill_value`` when grouping by a sparse column (:issue:`5078`)
 - Bug in unary inversion operator (``~``) on a ``SparseSeries`` with boolean values. The performance of this has also been improved (:issue:`22835`)
 - Bug in :meth:`SparseArary.unique` not returning the unique values (:issue:`19595`)
+- Bug in :funct:``SparseArray.nonzero`` and :func:``SparseDataFrame.dropna`` returning shifted/invalid results (:issue:`21172`)


function -> func.

Just single backticks around SparseArray.nonzero and SparseDataFrame.dropna

TomAugspurger · 2018-11-16T12:27:38Z

pandas/tests/sparse/frame/test_frame.py

+ def test_dropna(self):
+ # Tests regression #21172.
+ expected = pd.SparseDataFrame({"F2": [0, 1]})
+ for inplace, how in product((True, False), ('all', 'any')):


This could be parametrized over two hings: inplace and how: https://docs.pytest.org/en/latest/parametrize.html#pytest-mark-parametrize-parametrizing-test-functions

@pytest.mark.parametrize("inplcae", [True, False]) @pytest.mark.parametrize("how", ["all", "any"]) def test_dropna(self, how, inplace): ....

ouch, did not know that, thank you 🌮

jreback

if you remove the df_info file and ping on green.

jreback · 2018-11-17T22:16:02Z

thanks @babky

mroeschke added the Sparse Sparse Data Type label May 23, 2018

babky closed this Aug 20, 2018

babky reopened this Aug 20, 2018

babky added 7 commits November 7, 2018 21:42

Fix PEP8 line length

349b847

Fix np.int32 imports

eb1c706

Add nonzero tests

41f07f5

Merge pandas-dev/master

02dd63d

Merge remote-tracking branch 'pandas/master'

475eacf

Add tests for the bug which has already been resolved

9425a24

Remove unnecessary diff

80fe5f3

babky added 2 commits November 8, 2018 09:32

Fix linter issues

29bba39

Fix linter issues

e53e3fd

TomAugspurger reviewed Nov 9, 2018

View reviewed changes

babky added 2 commits November 11, 2018 09:27

Split tests

ccb0cfb

As required in PR

Mention the bugfix in changelog

e25366b

TomAugspurger added this to the 0.24.0 milestone Nov 11, 2018

TomAugspurger approved these changes Nov 11, 2018

View reviewed changes

Move to arrays

539e91f

Fix lingint issues

1fa410c

jreback requested changes Nov 11, 2018

View reviewed changes

babky added 4 commits November 11, 2018 18:16

Added :func: in changelog

95432d9

Try improving the tests

1961e04

Merge remote-tracking branch 'remotes/pandas/master'

165feae

Fix tests

e70206e

TomAugspurger reviewed Nov 16, 2018

View reviewed changes

jreback requested changes Nov 16, 2018

View reviewed changes

babky added 4 commits November 16, 2018 19:43

Resolve comments from PR

400d8a0

Remove unused import

19a3a42

Merge remote-tracking branch 'pandas/master'

ab63531

Trigger Travis

2c912f1

jreback approved these changes Nov 17, 2018

View reviewed changes

jreback merged commit 3abff0d into pandas-dev:master Nov 17, 2018

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

Fix nonzero of a SparseArray (pandas-dev#21175)

7a3ebce

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Fix nonzero of a SparseArray (pandas-dev#21175)

b631fa0

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Fix nonzero of a SparseArray (pandas-dev#21175)

38c908d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix nonzero of a SparseArray #21175

Fix nonzero of a SparseArray #21175

Uh oh!

babky commented May 22, 2018 •

edited

Loading

pep8speaks commented May 22, 2018 •

edited

Loading

jbrockmendel commented Jul 3, 2018

jbrockmendel commented Aug 17, 2018

babky commented Aug 20, 2018

jreback commented Aug 20, 2018

jorisvandenbossche commented Aug 20, 2018

TomAugspurger commented Aug 20, 2018

babky commented Nov 8, 2018

codecov bot commented Nov 8, 2018 •

edited

Loading

babky commented Nov 9, 2018

TomAugspurger Nov 9, 2018

babky Nov 11, 2018

TomAugspurger left a comment

TomAugspurger commented Nov 11, 2018

jreback Nov 11, 2018

babky Nov 11, 2018

jreback Nov 11, 2018

jreback Nov 11, 2018

jreback Nov 11, 2018

jreback Nov 11, 2018

babky commented Nov 16, 2018

TomAugspurger Nov 16, 2018

babky Nov 16, 2018

TomAugspurger Nov 16, 2018

TomAugspurger Nov 16, 2018

babky Nov 16, 2018

jreback left a comment

jreback commented Nov 17, 2018

Labels

7 participants

Uh oh!

Fix nonzero of a SparseArray #21175

Fix nonzero of a SparseArray #21175

Uh oh!

Conversation

babky commented May 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

pep8speaks commented May 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on November 08, 2018 at 09:20 Hours UTC

jbrockmendel commented Jul 3, 2018

jbrockmendel commented Aug 17, 2018

babky commented Aug 20, 2018

jreback commented Aug 20, 2018

jorisvandenbossche commented Aug 20, 2018

TomAugspurger commented Aug 20, 2018

babky commented Nov 8, 2018

codecov bot commented Nov 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

babky commented Nov 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger commented Nov 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

babky commented Nov 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback commented Nov 17, 2018

Labels

7 participants

babky commented May 22, 2018 •

edited

Loading

pep8speaks commented May 22, 2018 •

edited

Loading

codecov bot commented Nov 8, 2018 •

edited

Loading