Split a column by group [duplicate]

Question

I have some data that looks a little bit like this:

test.frame <- read.table(text = "name amounts JEAN 318.5,45 GREGORY 1518.5,67,8 WALTER 518.5 LARRY 518.5,55,1 HARRY 318.5,32 ",header = TRUE,sep = "")

I'd like it to look more like this ...

name amount JEAN 318.5 JEAN 45 GREGORY 1518.5 GREGORY 67 GREGORY 8 WALTER 518.5 LARRY 518.5 LARRY 55 LARRY 1 HARRY 318.5 HARRY 32

It seems like there should be a straightforward way to break out the "amounts" column, but I'm not coming up with it. Happy to take a "RTFM page for this particular command" answer. What's the command I'm looking for?

rawr · Accepted Answer · 2014-06-16 19:12:11Z

(test.frame <- read.table(text = "name amounts JEAN 318.5,45 GREGORY 1518.5,67,8 WALTER 518.5 LARRY 518.5,55,1 HARRY 318.5,32 ",header = TRUE,sep = "")) # name amounts # 1 JEAN 318.5,45 # 2 GREGORY 1518.5,67,8 # 3 WALTER 518.5 # 4 LARRY 518.5,55,1 # 5 HARRY 318.5,32 tmp <- setNames(strsplit(as.character(test.frame$amounts), split = ','), test.frame$name) data.frame(name = rep(names(tmp), sapply(tmp, length)), amounts = unlist(tmp), row.names = NULL) # name amounts # 1 JEAN 318.5 # 2 JEAN 45 # 3 GREGORY 1518.5 # 4 GREGORY 67 # 5 GREGORY 8 # 6 WALTER 518.5 # 7 LARRY 518.5 # 8 LARRY 55 # 9 LARRY 1 # 10 HARRY 318.5 # 11 HARRY 32

+1! I had a very similar solution but you were faster :) mine was x <- strsplit(as.character(test.frame$amounts), ","); data.frame(name = rep(test.frame$name, sapply(x, length)), amount = unlist(x))
yeah the setNames isnt necessary, just for clarity. also, your way is better for factors which is what comes about by default in the example given. It'd be nice if stringsAsFactors = FALSE was the default, but setting that option in the rprofile isn't too much effort. Just a habit by now

David Arenburg · Accepted Answer · 2014-06-16 18:16:12Z

The fastest way (probably) will be data.table

library(data.table) setDT(test.frame)[, lapply(.SD, function(x) unlist(strsplit(as.character(x), ','))), .SDcols = "amounts", by = name] ## name amounts ## 1: JEAN 318.5 ## 2: JEAN 45 ## 3: GREGORY 1518.5 ## 4: GREGORY 67 ## 5: GREGORY 8 ## 6: WALTER 518.5 ## 7: LARRY 518.5 ## 8: LARRY 55 ## 9: LARRY 1 ## 10: HARRY 318.5 ## 11: HARRY 32

+1 My cSplit function is a generalization of this approach.

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2014-06-17 04:00:05Z

A generalization of David Arenburg's solution would be to use my cSplit function. Get it from the Git Hub Gist (https://gist.github.com/mrdwab/11380733) or load it with "devtools":

# library(devtools) # source_gist(11380733)

The "long" format would be what you are looking for...

cSplit(test.frame, "amounts", ",", "long") # name amounts # 1: JEAN 318.5 # 2: JEAN 45 # 3: GREGORY 1518.5 # 4: GREGORY 67 # 5: GREGORY 8 # 6: WALTER 518.5 # 7: LARRY 518.5 # 8: LARRY 55 # 9: LARRY 1 # 10: HARRY 318.5 # 11: HARRY 32

But the function can create wide output formats too:

cSplit(test.frame, "amounts", ",", "wide") # name amounts_1 amounts_2 amounts_3 # 1: JEAN 318.5 45 NA # 2: GREGORY 1518.5 67 8 # 3: WALTER 518.5 NA NA # 4: LARRY 518.5 55 1 # 5: HARRY 318.5 32 NA

One advantage with this function is being able to split multiple columns at once.

@Amanda, glad you liked it. Check out my "splitstackshape" package, where this function will eventually be housed.

MrFlick · Accepted Answer · 2014-06-16 18:02:06Z

This isn't a super standard format, but here is one way you can transform your data. First, I would use stringsAsFactors=F with your read.table to make sure everything is a character variable rather than a factor. Alternatively you can do as.character() on those columns.

First I split the values in the amounts using the comma then I combine values with the names column

md <- do.call(rbind, Map(cbind, test.frame$name, strsplit(test.frame$amounts, ",")))

Then I paste everything back together and send it to read.table to do the variable conversion

read.table(text=apply(md,1,paste, collapse="\t"), sep="\t", col.names=names(test.frame))

Alternatively you could just make a data.frame from the md matrix and do the class conversions yourself

data.frame(names=md[,1], amount=as.numeric(md[,2]))

rrs · Accepted Answer · 2014-06-16 18:20:35Z

Here is a plyr solution:

Split.Amounts <- function(x) { amounts <- unlist(strsplit(as.character(x$amounts), ",")) return(data.frame(name = x$name, amounts = amounts, stringsAsFactors=FALSE)) } library(plyr) ddply(test.frame, .(name), Split.Amounts)

Using dplyr:

library(dplyr) test.frame %>% group_by(name) %>% do(Split.Amounts(.))

Collectives™ on Stack Overflow

Split a column by group [duplicate]

5 Answers 5

2 Comments

1 Comment

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

1 Comment

1 Comment

Comments

Comments

Linked

Related