-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
I open this suggestion as per @jorisvandenbossche's recommendation.
This issue follows in the steps of #18213 and #19850.
As it is commented in #18213, _from_array has a single difference with the Series constructor, how it handles SparseArrays:
# return a sparse series here if isinstance(arr, ABCSparseArray): from pandas.core.sparse.series import SparseSeries cls = SparseSeriesThis process could be achieved in a similar way in Series.__new__; something on the lines of:
def __new__( cls, *args, **kwargs ): # arr is mandatory, first argument or key `arr`. if isinstance(kwargs.get('arr', args[0]), ABCSparseArray): from pandas.core.sparse.series import SparseSeries cls = SparseSeries obj = object.__new__(cls) obj.__init__(*args, **kwargs) return objWhat's the issue?
As @jorisvandenbossche pointed out, a change like this will result in a change of the API, as this:
>>> s = pd.Series(pd.SparseArray([1, 0, 0, 2, 0])) >>> type(s) <class 'pandas.core.series.Series'>will become this:
>>> s = pd.Series.from_array(pd.SparseArray([1, 0, 0, 2, 0])) >>> type(s) <class 'pandas.core.sparse.series.SparseSeries'>I'm not familiar with sparse data structures, but according to the docs all functionality is kept between Series and SparseSeries. Furthermore, a simple
>>> s = s.to_dense() >>> type(s) <class 'pandas.core.series.Series'>should do it to go back to Series.
Why change it, then?
Currently, Series._from_array is called only inside two functions: DataFrame._idxand DataFrame. _box_col_values. With the proposed change, those calls could be substituted by the default constructor.
Being that the case, when working with panda's subclassing, one would be able to declare complex _constructor_slice such as this:
@property def _constructor_sliced(self): def f(*args, **kwargs): # adapted from https://github.com/pandas-dev/pandas/issues/13208#issuecomment-326556232 return DerivedSeries(*args, **kwargs).__finalize__(self, method='inherit') return f, which would allow for a more complex relationship between the subclassed DataFrame and its sliced version, including the transfer of metadata according to the user's specification in __finalize__.