R Include lists of Strings in Dataframe

Question

I am trying to create an artificial dataframe of words contributed and deleted by users of Wikipedia for each edit that they make, the end result should look like this:

I created some artifical data to build such a frame but I'm having problems with the variables "Tokens Added" and "Tokens deleted".

I thought creating them as lists of lists would allow me to include them in dataframes even if the elements do not always have equal length. But apparently thats not the case. Instead, R creates a variable for each individual token. thats not feasible because it would create millions of variables. Here is some code to exemplify:

a <- c(1,2,3) e <- list(b = as.list(c("a","b")),c = as.list(c(1L,3L,5L,4L)),d = as.list(c(TRUE,FALSE,TRUE))) DF <- cbind(a,e) U <- data.frame(a,e)

I would like to have it like this:

Is this possible at all in R with dataframes (I tried dearching for answers already but they were either for different questions or too technical for me)? Any help is much appreciated!

I don't think this is possible in the sense that you want. Data.frames are lists of equal length vectors. What you need to do/want to do is create a vector of lists. As far as I know this is not possible (see stackoverflow.com/questions/2624791/…) — Mike H.
– Mike H., Commented May 11, 2017 at 15:19
A different option would be to have each element be a character but just a pasted together version of what the list would be — Mike H.
– Mike H., Commented May 11, 2017 at 15:21

Nate · Accepted Answer · 2017-05-20 12:58:38Z

You can do exactly what you want if you are willing to use library(tibble):

library(tibble) a <- c(1,2,3) e <- list(b = as.list(c("a","b")),c = as.list(c(1L,3L,5L,4L)),d = as.list(c(TRUE,FALSE,TRUE))) tibble(a,e) # A tibble: 3 × 2 a e <dbl> <list> 1 1 <list [2]> 2 2 <list [4]> 3 3 <list [3]>

A tibble or tbl_df will behave just like you are used to with a traditional data.frame but allow you some nice extra functionality like storing lists of various lengths in a column.

Thanks for the suggestion! If I try it on my example though, it just produces this error: Error: Variables must be length 1 or 9. Problem variables: 'a'

Community · Accepted Answer · 2017-05-23 12:18:21Z

I don't think what you want is possible using a vector of lists (as you suggest in your question). This is mainly because you can't create a vector of lists in R (see: How to create a vector of lists in R?)

However, one option (if you really want a data.frame) would be to coerce everything to a character (the most flexible type in R). Something like this might work for you:

e <- c(paste0(c("a","b"),collapse=","), paste0(c(1L,3L,5L,4L), collapse = ","), paste0(c(TRUE,FALSE,TRUE), collapse = ",")) U <- data.frame(a,e, stringAsFactors = F) U # a e #1 1 a,b #2 2 1,3,5,4 #3 3 TRUE,FALSE,TRUE

Then you can back out the value of each cell with a split. Something like:

strsplit(U$e, ",")

Ju Ko · Accepted Answer · 2017-05-19 12:23:40Z

Thanks for all the suggestions everyone! I think I found a simpler solution though. Just in case anyone else has a similar problem in the future, this is what I did:

a <- c(1,2,3) b <- c("a","b") c <- c(1L,3L,5L,4L) d <- c(TRUE,FALSE,TRUE) e <- list(b,c,d);e DF <- data.frame(a,I(e));DF

The I() inhibit function apparently prevents the lists from being converted and the column behaves just like a list of lists as far as I can tell so far. The class of the e column is however not "list" but "AsIs". I don't know whether this might cause problems further down the line, if so, I will update this answer!

EDIT

So it turns out that some functions do not take the AsIs class as input. To convert it back to a usefull character string, you can simply use unlist() on every row.

timfaber · Accepted Answer · 2017-05-11 15:17:51Z

0

Try this:

cbind(a,lapply(e,function(x) paste(unlist(x),collapse=",")))

answered May 11, 2017 at 15:17

timfaber

2,0801 gold badge15 silver badges17 bronze badges

Collectives™ on Stack Overflow

R Include lists of Strings in Dataframe

4 Answers 4

1 Comment

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Linked

Related