Index automatically replaced when creating a new column out of it

Question

I am currently doing some exercises on a Pandas DataFrame indexed by date (DD/MM/YY). The current exercise requires me to groupby on Year to obtain average yearly values. So what I tried to do was to create a new column containing only the years extracted from the DataFrame's index. The code I wrote is:

data["year"] = [t.year for t in data.index] data.groupby("year").mean()

but for some reason, the new column "year" ends up replacing the previous full-date indexing (which does not even become a "standard" column, it plain disappears), which came a bit by surprise. How can this be?

Thanks in advance!

Can you include a sample of your dataframe in your question? — rahlf23
– rahlf23, Commented Nov 2, 2018 at 23:31

rahlf23 · Accepted Answer · 2018-11-02 23:39:26Z

For a sample dataframe:

 value 2016-01-22 1 2014-02-02 2 2014-08-27 3 2016-01-23 4 2014-03-18 5

If you would like to keep your logic, you just need to call the column you want to take the mean() of and use transform() and then assign it back to the value column:

data['year'] = [t.year for t in data.index] data['value'] = data.groupby('year')['value'].transform('mean')

Yields:

 value year 2016-01-22 2.500000 2016 2014-02-02 3.333333 2014 2014-08-27 3.333333 2014 2016-01-23 2.500000 2016 2014-03-18 3.333333 2014

Collectives™ on Stack Overflow

Index automatically replaced when creating a new column out of it

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related