144

I am using the mtcars dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*) group by clause in SQL. ddply() from plyr is working for me

library(plyr) ddply(mtcars, .(cyl,gear),nrow) 

has output

 cyl gear V1 1 4 3 1 2 4 4 8 3 4 5 2 4 6 3 2 5 6 4 4 6 6 5 1 7 8 3 12 8 8 5 2 

Using this code

library(dplyr) g <- group_by(mtcars, cyl, gear) summarise(g, length(gear)) 

has output

 length(cyl) 1 32 

I found various functions to pass in to summarise() but none seem to work for me. One function I found is sum(G), which returned

Error in eval(expr, envir, enclos) : object 'G' not found 

Tried using n(), which returned

Error in n() : This function should not be called directly 

What am I doing wrong? How can I get group_by() / summarise() to work for me?

13
  • I can't reproduce this. I get the same output as from ddply. What version of dplyr are you on? Try updating? Commented Mar 31, 2014 at 17:15
  • I have the latest version 0.1.3. Do you have 0.1.2? Commented Mar 31, 2014 at 17:54
  • Nope. Your example works just fine for me with 0.1.3. Commented Mar 31, 2014 at 17:56
  • 1
    What version of R do you have? Could that be causing the difference in behavior? I also tried this on a computer at home which is using Ubuntu, same thing.. Commented Mar 31, 2014 at 18:03
  • 2
    Thanks!!! That solved it. If I have both plyr and dplyr packages attached, summarise does not work as expected. As soon as I restarted the session (and did not attach all normal packages by default) I was able to make it work. Phew. Commented Mar 31, 2014 at 18:23

6 Answers 6

198

There's a special function n() in dplyr to count rows (potentially within groups):

library(dplyr) mtcars %>% group_by(cyl, gear) %>% summarise(n = n()) #Source: local data frame [8 x 3] #Groups: cyl [?] # # cyl gear n # (dbl) (dbl) (int) #1 4 3 1 #2 4 4 8 #3 4 5 2 #4 6 3 2 #5 6 4 4 #6 6 5 1 #7 8 3 12 #8 8 5 2 

But dplyr also offers a handy count function which does exactly the same with less typing:

count(mtcars, cyl, gear) # or mtcars %>% count(cyl, gear) #Source: local data frame [8 x 3] #Groups: cyl [?] # # cyl gear n # (dbl) (dbl) (int) #1 4 3 1 #2 4 4 8 #3 4 5 2 #4 6 3 2 #5 6 4 4 #6 6 5 1 #7 8 3 12 #8 8 5 2 
Sign up to request clarification or add additional context in comments.

Comments

25

I think what you are looking for is as follows.

cars_by_cylinders_gears <- mtcars %>% group_by(cyl, gear) %>% summarise(count = n()) 

This is using the dplyr package. This is essentially the longhand version of the count () solution provided by docendo discimus.

1 Comment

Works for me. The crucial part is NOT specifying the quotes around column names to group by.
23

Another approach is to use the double colons as this will help avoid potential conflicts with functions with similar names from other packages.

mtcars %>% dplyr::group_by(cyl, gear) %>% dplyr::summarise(length(gear)) 

1 Comment

Edited to add why using colons would be useful although I think the main point of this answer is the use of length
3

Another option, not necesarily more elegant, but does not require to refer to a specific column:

mtcars %>% group_by(cyl, gear) %>% do(data.frame(nrow=nrow(.))) 

This is equivalent to using count():

library(dplyr, warn.conflicts = FALSE) all.equal(mtcars %>% group_by(cyl, gear) %>% do(data.frame(n=nrow(.))) %>% ungroup(), count(mtcars, cyl, gear), check.attributes=FALSE) #> [1] TRUE 

2 Comments

not sure what you mean, it gives nrow of each group, as in count?
I'm not sure as this was 6 years ago. Likely some of the functions have changed. I've removed my old comment.
3

One could simply do this using dplyr:

library(dplyr) # Use the function add_count() and name the new variable as "count" mtcars %>% add_count(cyl, gear, name = "count") 

This way you do not need to group the columns as the add_count() function does that for you when you mention variables. Also, this allows you to retain other variables in the data frame (if any).

Comments

1

Another option is using the function tally from dplyr. Here is a reproducible example:

library(dplyr) mtcars %>% group_by(cyl, gear) %>% tally() #> # A tibble: 8 × 3 #> # Groups: cyl [3] #> cyl gear n #> <dbl> <dbl> <int> #> 1 4 3 1 #> 2 4 4 8 #> 3 4 5 2 #> 4 6 3 2 #> 5 6 4 4 #> 6 6 5 1 #> 7 8 3 12 #> 8 8 5 2 

Created on 2022-09-11 with reprex v2.0.2

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.