1

I am having a hard time working with the dplyr library. I have been trying to implement a relatively easy piece of code but for some reason when I group by one variable and try to sum to get the total for that variable I get only NA values. Here are my files:

https://www.dropbox.com/sh/zhxfj6cm6gru0t1/AAA-DgeTrngJ0md12W2bEzi0a

And this is code:

library (dplyr) #we set the working directory setwd("~/asado/R/emp") ##we list the files list.files() ##we load the csv files emp1 <- read.csv("AI_EMP_CT_A.csv", sep=',') ##emp1 contains employment information for US counties with naics classification ##empva is another part of the same dataset empva <- read.csv("AI_EMP_CT_VA_A.csv", sep=',') ##we merge our files, they have the same dimentions so rbind works emp <- data.frame(rbind(emp1, empva)) ##we create a variable to summarize our data ##and make sure is stored as character emp$naics <- as.character(substring(emp$Mnemonic,3,6)) ##we try to summarize by the variable naics, summing for Dec.2013 useemp<- emp%.% group_by(naics) %.% summarize(total=sum(Dec.2013, na.rm=T)) ##the resulting dataframe shows NA head(useemp) 

Any idea what's going on?

2
  • It's na.rm not rm.na. Commented Aug 8, 2014 at 16:23
  • I didn't test your data, but try two things: update dplyr to the latest version (where %>% replaced %.% although I can still be used) and use dplyr::summarize(total=sum(Dec.2013, na.rm=T)) to make sure you're not in conflict with plyr. Does that change anything? Commented Aug 8, 2014 at 16:29

1 Answer 1

2

This works for me, but it was complicated to read your empva file because the last column, the Dec.2013 was filled of ; and not separated from it. Are you sure it is read as numeric?

useemp <- emp %>% group_by(naics) %>% summarize(total=sum(Dec.2013, na.rm=T)) head(useemp) Source: local data frame [6 x 2] naics total 1 2111 132.04674 2 2121 24.84666 3 2122 23.90470 4 2123 17.57697 5 2131 77.20557 6 2211 119.30697 
Sign up to request clarification or add additional context in comments.

3 Comments

They are asking about dplyr, not plyr.
Joran, the reason why I am using dplyr is that plyr will take forever. The file I uploaded is just a sample of the original file with a million observations and variables.
Yes the Dec.2013 works as numeric, I think using %.% rather than %>% was the problem. Thanks.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.