Skip to content

API/BUG: freq retention in value_counts #33830

@jbrockmendel

Description

@jbrockmendel
dti = pd.date_range('2016-01-01', periods=5) dti.value_counts().index.freq # <-- None dti.factorize()[1].freq # <-- None mi = pd.MultiIndex.from_arrays([dti, dti]) mi.levels[0].freq # <-- None 

There is a comment in tests.indexes.datetimes.test_datetime test_factorize suggesting that freq should be preserved by factorize, but that is not checked and would fail if it were

 # freq must be preserved idx3 = date_range("2000-01", periods=4, freq="M", tz="Asia/Tokyo") exp_arr = np.array([0, 1, 2, 3], dtype=np.intp) arr, idx = idx3.factorize() tm.assert_numpy_array_equal(arr, exp_arr) tm.assert_index_equal(idx, idx3) 

So the question: do we want to try to preserve freq in factorize?

xref #33677 for the MultiIndex case

Update One more: Categorical:

dti = pd.date_range('2016-01-01', periods=5) cat = pd.Categorical(dti) cat.categories.freq # <-- None 

Metadata

Metadata

Assignees

Labels

AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugFrequencyDateOffsetsfreq retentionUser expects "freq" attribute to be preserved

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions