Skip to content

API: SparseArray.astype behaviour to always preserve sparseness #34457

@jorisvandenbossche

Description

@jorisvandenbossche

Currently, the SparseArray.astype function will always convert the specified target dtype to a sparse dtype, if it is not one. For example, this gives:

In [64]: arr = pd.arrays.SparseArray([1, 0, 0, 2]) In [65]: arr Out[65]: [1, 0, 0, 2] Fill: 0 IntIndex Indices: array([0, 3], dtype=int32) In [66]: arr.astype(float) Out[66]: [1.0, 0.0, 0.0, 2.0] Fill: 0.0 IntIndex Indices: array([0, 3], dtype=int32) 

This ensures that a simple astype doesn't densify the sparse array (and you don't need to do astype(pd.SparseDtype(float, fill_value))).
And note this also gives this behaviour to Series.astype(..)

But, this also gives the inconsistency that arr.astype(target_dtype).dtype != target_dtype, so you can rely on the fact that you get back an array of the actual dtype that you specified.
See eg the workaround I need to add for this in #34338

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions