ENH: Add is_sparse method to check for sparse columns in a DataFrame (GH26706) #37279

avinashpancham · 2020-10-20T16:44:53Z

closes Recommended way to check for sparse data (DataFrame or Series) #26706
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…(GH26706)

avinashpancham · 2020-10-20T16:46:13Z

Implemented the method df.is_sparse and not the method df.sparse.is_sparse, since using the sparse accessor requires that all columns are sparse.

topper-123 · 2020-10-20T18:09:40Z

Thanks, @avinashpancham.

Is this not too specific? I´d like better an is_dtype method, e.g. doing df.is_dtype(“sparse”), which would be more generally useable.

@jorisvandenbossche

topper-123 · 2020-10-20T18:20:50Z

Considering it again, I think:

>>> df = DataFrame({"A": pd.arrays.SparseArray([1, np.nan, 1]), "B": [1, 2, 3]}) >>> df.dtypes == "sparse" Series([True, False], index=["A", "B"])

would achieve the same and be more general. Does the added method add anything over the above?

avinashpancham · 2020-10-20T18:24:55Z

The current imp does only that. That's also (part of) the reason why not everyone agreed on adding such a method

topper-123 · 2020-10-20T21:35:16Z

Ok. IMO this method is too narrow to add to the dataframe namespace. So I`m -1 on this.

avinashpancham · 2020-10-22T23:27:21Z

Np, I will wait for others to share their opinion and then we can decide whether to continue with or close this PR

jorisvandenbossche · 2020-10-23T06:19:37Z

The actual proposal in the issue was to add this not in the "top-level" DataFrame namespace, but in the sparse accessor, so you would do like df.sparse.is_sparse. Given that we already have the .sparse accessor for sparse-related things, I personally don't think that it is "too specific".

Now, there is already a lot of discussion about it on the issue (#26706), so let's have the discussion whether we want this or not there.

Implemented the method df.is_sparse and not the method df.sparse.is_sparse, since using the sparse accessor requires that all columns are sparse.

Yes, to do that, we need to change the sparse accessor to also work when not all columns are sparse.

Considering it again, I think: ... df.dtypes == sparse ... would achieve the same and be more general.

@topper-123 that's potentially a nice idea, but to be clear, that doesn't work currently, as far as I can see:

In [2]: df = DataFrame({"A": pd.arrays.SparseArray([1, np.nan, 1]), "B": [1, 2, 3]}) In [3]: df.dtypes == "sparse" Out[3]: A False B False dtype: bool

(but the proposal can then be to make this work?)

github-actions · 2020-11-23T00:12:02Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

mroeschke · 2021-07-10T02:28:14Z

Thanks for the PR but it appears we need further discussion on #26706 on the API before moving forward with a PR. Closing

ENH: Add is_sparse method to check for sparse columns in a DataFrame …

701bc6f

…(GH26706)

topper-123 added Sparse Sparse Data Type DataFrame DataFrame data structure Enhancement labels Oct 20, 2020

github-actions bot added the Stale label Nov 23, 2020

arw2019 added the Needs Discussion Requires discussion from core team before further action label Dec 11, 2020

github-actions bot removed the Stale label Jan 23, 2021

mroeschke closed this Jul 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add is_sparse method to check for sparse columns in a DataFrame (GH26706) #37279

ENH: Add is_sparse method to check for sparse columns in a DataFrame (GH26706) #37279

Uh oh!

avinashpancham commented Oct 20, 2020

avinashpancham commented Oct 20, 2020

topper-123 commented Oct 20, 2020 •

edited

Loading

topper-123 commented Oct 20, 2020 •

edited by jorisvandenbossche

Loading

avinashpancham commented Oct 20, 2020

topper-123 commented Oct 20, 2020

avinashpancham commented Oct 22, 2020

jorisvandenbossche commented Oct 23, 2020

github-actions bot commented Nov 23, 2020

mroeschke commented Jul 10, 2021

Labels

5 participants

Uh oh!

ENH: Add is_sparse method to check for sparse columns in a DataFrame (GH26706) #37279

ENH: Add is_sparse method to check for sparse columns in a DataFrame (GH26706) #37279

Uh oh!

Conversation

avinashpancham commented Oct 20, 2020

avinashpancham commented Oct 20, 2020

topper-123 commented Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

topper-123 commented Oct 20, 2020 • edited by jorisvandenbossche Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

avinashpancham commented Oct 20, 2020

topper-123 commented Oct 20, 2020

avinashpancham commented Oct 22, 2020

jorisvandenbossche commented Oct 23, 2020

github-actions bot commented Nov 23, 2020

mroeschke commented Jul 10, 2021

Labels

5 participants

topper-123 commented Oct 20, 2020 •

edited

Loading

topper-123 commented Oct 20, 2020 •

edited by jorisvandenbossche

Loading