group_by does not summarize

Question

I am having a hard time working with the dplyr library. I have been trying to implement a relatively easy piece of code but for some reason when I group by one variable and try to sum to get the total for that variable I get only NA values. Here are my files:

https://www.dropbox.com/sh/zhxfj6cm6gru0t1/AAA-DgeTrngJ0md12W2bEzi0a

And this is code:

library (dplyr) #we set the working directory setwd("~/asado/R/emp") ##we list the files list.files() ##we load the csv files emp1 <- read.csv("AI_EMP_CT_A.csv", sep=',') ##emp1 contains employment information for US counties with naics classification ##empva is another part of the same dataset empva <- read.csv("AI_EMP_CT_VA_A.csv", sep=',') ##we merge our files, they have the same dimentions so rbind works emp <- data.frame(rbind(emp1, empva)) ##we create a variable to summarize our data ##and make sure is stored as character emp$naics <- as.character(substring(emp$Mnemonic,3,6)) ##we try to summarize by the variable naics, summing for Dec.2013 useemp<- emp%.% group_by(naics) %.% summarize(total=sum(Dec.2013, na.rm=T)) ##the resulting dataframe shows NA head(useemp)

Any idea what's going on?

I didn't test your data, but try two things: update dplyr to the latest version (where %>% replaced %.% although I can still be used) and use dplyr::summarize(total=sum(Dec.2013, na.rm=T)) to make sure you're not in conflict with plyr. Does that change anything? — talat
– talat, Commented Aug 8, 2014 at 16:29

BBrill · Accepted Answer · 2014-08-08 17:11:02Z

This works for me, but it was complicated to read your empva file because the last column, the Dec.2013 was filled of ; and not separated from it. Are you sure it is read as numeric?

useemp <- emp %>% group_by(naics) %>% summarize(total=sum(Dec.2013, na.rm=T)) head(useemp) Source: local data frame [6 x 2] naics total 1 2111 132.04674 2 2121 24.84666 3 2122 23.90470 4 2123 17.57697 5 2131 77.20557 6 2211 119.30697

Joran, the reason why I am using dplyr is that plyr will take forever. The file I uploaded is just a sample of the original file with a million observations and variables.
Yes the Dec.2013 works as numeric, I think using %.% rather than %>% was the problem. Thanks.

Collectives™ on Stack Overflow

group_by does not summarize

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related