5

I have some data that looks a little bit like this:

test.frame <- read.table(text = "name amounts JEAN 318.5,45 GREGORY 1518.5,67,8 WALTER 518.5 LARRY 518.5,55,1 HARRY 318.5,32 ",header = TRUE,sep = "") 

I'd like it to look more like this ...

name amount JEAN 318.5 JEAN 45 GREGORY 1518.5 GREGORY 67 GREGORY 8 WALTER 518.5 LARRY 518.5 LARRY 55 LARRY 1 HARRY 318.5 HARRY 32 

It seems like there should be a straightforward way to break out the "amounts" column, but I'm not coming up with it. Happy to take a "RTFM page for this particular command" answer. What's the command I'm looking for?

0

5 Answers 5

6
(test.frame <- read.table(text = "name amounts JEAN 318.5,45 GREGORY 1518.5,67,8 WALTER 518.5 LARRY 518.5,55,1 HARRY 318.5,32 ",header = TRUE,sep = "")) # name amounts # 1 JEAN 318.5,45 # 2 GREGORY 1518.5,67,8 # 3 WALTER 518.5 # 4 LARRY 518.5,55,1 # 5 HARRY 318.5,32 tmp <- setNames(strsplit(as.character(test.frame$amounts), split = ','), test.frame$name) data.frame(name = rep(names(tmp), sapply(tmp, length)), amounts = unlist(tmp), row.names = NULL) # name amounts # 1 JEAN 318.5 # 2 JEAN 45 # 3 GREGORY 1518.5 # 4 GREGORY 67 # 5 GREGORY 8 # 6 WALTER 518.5 # 7 LARRY 518.5 # 8 LARRY 55 # 9 LARRY 1 # 10 HARRY 318.5 # 11 HARRY 32 
Sign up to request clarification or add additional context in comments.

2 Comments

+1! I had a very similar solution but you were faster :) mine was x <- strsplit(as.character(test.frame$amounts), ","); data.frame(name = rep(test.frame$name, sapply(x, length)), amount = unlist(x))
yeah the setNames isnt necessary, just for clarity. also, your way is better for factors which is what comes about by default in the example given. It'd be nice if stringsAsFactors = FALSE was the default, but setting that option in the rprofile isn't too much effort. Just a habit by now
6

The fastest way (probably) will be data.table

library(data.table) setDT(test.frame)[, lapply(.SD, function(x) unlist(strsplit(as.character(x), ','))), .SDcols = "amounts", by = name] ## name amounts ## 1: JEAN 318.5 ## 2: JEAN 45 ## 3: GREGORY 1518.5 ## 4: GREGORY 67 ## 5: GREGORY 8 ## 6: WALTER 518.5 ## 7: LARRY 518.5 ## 8: LARRY 55 ## 9: LARRY 1 ## 10: HARRY 318.5 ## 11: HARRY 32 

1 Comment

+1 My cSplit function is a generalization of this approach.
4

A generalization of David Arenburg's solution would be to use my cSplit function. Get it from the Git Hub Gist (https://gist.github.com/mrdwab/11380733) or load it with "devtools":

# library(devtools) # source_gist(11380733) 

The "long" format would be what you are looking for...

cSplit(test.frame, "amounts", ",", "long") # name amounts # 1: JEAN 318.5 # 2: JEAN 45 # 3: GREGORY 1518.5 # 4: GREGORY 67 # 5: GREGORY 8 # 6: WALTER 518.5 # 7: LARRY 518.5 # 8: LARRY 55 # 9: LARRY 1 # 10: HARRY 318.5 # 11: HARRY 32 

But the function can create wide output formats too:

cSplit(test.frame, "amounts", ",", "wide") # name amounts_1 amounts_2 amounts_3 # 1: JEAN 318.5 45 NA # 2: GREGORY 1518.5 67 8 # 3: WALTER 518.5 NA NA # 4: LARRY 518.5 55 1 # 5: HARRY 318.5 32 NA 

One advantage with this function is being able to split multiple columns at once.

1 Comment

@Amanda, glad you liked it. Check out my "splitstackshape" package, where this function will eventually be housed.
1

This isn't a super standard format, but here is one way you can transform your data. First, I would use stringsAsFactors=F with your read.table to make sure everything is a character variable rather than a factor. Alternatively you can do as.character() on those columns.

First I split the values in the amounts using the comma then I combine values with the names column

md <- do.call(rbind, Map(cbind, test.frame$name, strsplit(test.frame$amounts, ","))) 

Then I paste everything back together and send it to read.table to do the variable conversion

read.table(text=apply(md,1,paste, collapse="\t"), sep="\t", col.names=names(test.frame)) 

Alternatively you could just make a data.frame from the md matrix and do the class conversions yourself

data.frame(names=md[,1], amount=as.numeric(md[,2])) 

Comments

1

Here is a plyr solution:

Split.Amounts <- function(x) { amounts <- unlist(strsplit(as.character(x$amounts), ",")) return(data.frame(name = x$name, amounts = amounts, stringsAsFactors=FALSE)) } library(plyr) ddply(test.frame, .(name), Split.Amounts) 

Using dplyr:

library(dplyr) test.frame %>% group_by(name) %>% do(Split.Amounts(.)) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.