Skip to content

Conversation

@avinashpancham
Copy link
Contributor

@avinashpancham
Copy link
Contributor Author

Implemented the method df.is_sparse and not the method df.sparse.is_sparse, since using the sparse accessor requires that all columns are sparse.

@topper-123
Copy link
Contributor

topper-123 commented Oct 20, 2020

Thanks, @avinashpancham.

Is this not too specific? I´d like better an is_dtype method, e.g. doing df.is_dtype(“sparse”), which would be more generally useable.

@jorisvandenbossche

@topper-123
Copy link
Contributor

topper-123 commented Oct 20, 2020

Considering it again, I think:

>>> df = DataFrame({"A": pd.arrays.SparseArray([1, np.nan, 1]), "B": [1, 2, 3]}) >>> df.dtypes == "sparse" Series([True, False], index=["A", "B"]) 

would achieve the same and be more general. Does the added method add anything over the above?

@topper-123 topper-123 added Sparse Sparse Data Type DataFrame DataFrame data structure Enhancement labels Oct 20, 2020
@avinashpancham
Copy link
Contributor Author

The current imp does only that. That's also (part of) the reason why not everyone agreed on adding such a method

@topper-123
Copy link
Contributor

Ok. IMO this method is too narrow to add to the dataframe namespace. So I`m -1 on this.

@avinashpancham
Copy link
Contributor Author

Np, I will wait for others to share their opinion and then we can decide whether to continue with or close this PR

@jorisvandenbossche
Copy link
Member

The actual proposal in the issue was to add this not in the "top-level" DataFrame namespace, but in the sparse accessor, so you would do like df.sparse.is_sparse. Given that we already have the .sparse accessor for sparse-related things, I personally don't think that it is "too specific".

Now, there is already a lot of discussion about it on the issue (#26706), so let's have the discussion whether we want this or not there.

Implemented the method df.is_sparse and not the method df.sparse.is_sparse, since using the sparse accessor requires that all columns are sparse.

Yes, to do that, we need to change the sparse accessor to also work when not all columns are sparse.

Considering it again, I think: ... df.dtypes == sparse ... would achieve the same and be more general.

@topper-123 that's potentially a nice idea, but to be clear, that doesn't work currently, as far as I can see:

In [2]: df = DataFrame({"A": pd.arrays.SparseArray([1, np.nan, 1]), "B": [1, 2, 3]}) In [3]: df.dtypes == "sparse" Out[3]: A False B False dtype: bool 

(but the proposal can then be to make this work?)

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Nov 23, 2020
@arw2019 arw2019 added the Needs Discussion Requires discussion from core team before further action label Dec 11, 2020
@github-actions github-actions bot removed the Stale label Jan 23, 2021
@mroeschke
Copy link
Member

Thanks for the PR but it appears we need further discussion on #26706 on the API before moving forward with a PR. Closing

@mroeschke mroeschke closed this Jul 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DataFrame DataFrame data structure Enhancement Needs Discussion Requires discussion from core team before further action Sparse Sparse Data Type

5 participants