-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
API - ConsistencyInternal Consistency of API/BehaviorInternal Consistency of API/BehaviorCategoricalCategorical Data TypeCategorical Data TypeNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode
Description
Index._concat (used by Index.append) is thin wrapper around concat_compat. It is overriden by CategoricalIndex so that CategoricalDtype is retained more often than it is in concat_compat. We should make these match.
If we just rip CategoricalIndex._concat, we break 6 tests, all of which boil down to:
def test_append_category_objects(self, ci): # with objects result = ci.append(Index(["c", "a"])) expected = CategoricalIndex(list("aabbcaca"), categories=ci.categories) > tm.assert_index_equal(result, expected, exact=True) If we go the other way and change concat_compat, we break 6 different tests, all of which involve all-empty arrays or arrays that can be losslessly cast to the Categorical's dtype, e.g (edited for legibility)
def test_concat_empty_series_dtype_category_with_array(self): # GH#18515 left = Series(np.array([]), dtype="category") right = Series(dtype="float64") result = concat([left, right]) > assert result.dtype == "float64" def test_concat_categorical_coercion(self): # GH 13524 # category + not-category => not-category s1 = Series([1, 2, np.nan], dtype="category") s2 = Series([2, 1, 2]) exp = Series([1, 2, np.nan, 2, 1, 2], dtype="object") > tm.assert_series_equal(pd.concat([s1, s2], ignore_index=True), exp) E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: CategoricalDtype(categories=[1, 2], ordered=False) E [right]: object Changing concat_compat results in much more convenient behavior, but it is textbook "values-dependent behavior" that in general we want to avoid (cc @jorisvandenbossche)
Metadata
Metadata
Assignees
Labels
API - ConsistencyInternal Consistency of API/BehaviorInternal Consistency of API/BehaviorCategoricalCategorical Data TypeCategorical Data TypeNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode