1

I am trying to use the .groupby() function with panda dataframes, but I keep loosing the column that I am trying to group. I tried to group by the year and it succeeds in doing this but

the column name gets removes so I am unable to call the column. An extra row is added that has the column name, but I am unable to access it. Am I doing something wrong?

for example I ran the code below

stats2 = stats.groupby('yearID').mean() 

and I get this as the result

 2B 3B HR BB 1B yearID 1956 0.035939 0.007809 0.024694 0.096666 0.164637 1957 0.036462 0.007220 0.023651 0.087744 0.167484 1958 0.036856 0.007120 0.024353 0.088281 0.166760 

any ideas on what I am doing wrong and how I can fix this?

thanks

2 Answers 2

2

use the as_index = False, option when grouping

stats2 = stats.groupby('yearID', as_index = False).mean() 

As the other user has made clear, the default behaviour is that the group key becomes the index. This behaviour is prevented by using the option just described.

Sign up to request clarification or add additional context in comments.

Comments

1

The column you group by becomes an index in the result. That's what you call the "extra column".

If you want to recover that as a column, you should stats2.reset_index().

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.