11

I am new to pandas. I have a dataframe that looks like this

sitename name date count 0 chess.com Autobiographer 2012-05-01 2 1 chess.com Autobiographer 2012-05-05 1 2 chess.com Autobiographer 2012-05-15 1 3 chess.com Autobiographer 2012-05-01 1 4 chess.com Autobiographer 2012-05-15 1 5 chess.com Autobiographer 2012-05-01 1 

How to merge the rows based on date and sum up the count for the same date. Like in sql

select sitename, name, date count(*) from table group by date 
5
  • 1
    pandas.pydata.org/pandas-docs/stable/groupby.html Commented May 28, 2014 at 18:52
  • I used df = dataframe.groupby('date') . I got this error pandas.core.groupby.DataFrameGroupBy object at 0x7f0d2de6f9d0> <pandas.core.groupby.DataFrameGroupBy object at 0x32bdb90> Commented May 28, 2014 at 18:58
  • 1
    @user3527975: that's not an error. That's simply what a groupby object looks like when you print it: you want to perform some operation on it (like selecting a column, or performing a sum, etc.) Commented May 28, 2014 at 19:14
  • @DSM : Thanks. Yes I want to perform a sum based on the date. But I want all the columns to be preserved in the updated dataframe. Commented May 28, 2014 at 19:45
  • @DSM : I have posted one more question on this site - stackoverflow.com/questions/23901459/…. Haven't got anything on this one. Do you have any idea for it? Commented May 28, 2014 at 20:55

2 Answers 2

17

If you want to keep your sitename and name in your dataframe, you can do :

df = dataframe.groupby(['date', 'sitename', 'name']).sum() 

EDIT : See @DSM's comment to reset the indexes and have a non indexed dataframe.

Sign up to request clarification or add additional context in comments.

4 Comments

This works perfectly. Thanks for saving time. I was writing functions and doing an dataframe.apply
This combines the sitename date and name as one single column. But these have to be separate coulmns. Any suggestions on this?
@user3527975: no, it doesn't. You're probably confusing the index (which in this case is a multiindex) with a column. You can add .reset_index(), or add as_index=False to groupby, e.g. groupby(["date", "sitename", "name"], as_index=False).sum().
@DSM : I should have said it combines sitename, date and name and uses it as index. as_index = False does the job. Thank you.
0
df = dataframe.groupby('date').sum() 

2 Comments

This one just keeps the date and sum in the dataframe. Whereas the structure is desired like in the post
You can add the columns and then if you want to print the result and keep the headers/columns add this , header=True, index=True)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.