3

I am trying to create an artificial dataframe of words contributed and deleted by users of Wikipedia for each edit that they make, the end result should look like this:

Example of Dataframe

I created some artifical data to build such a frame but I'm having problems with the variables "Tokens Added" and "Tokens deleted".

I thought creating them as lists of lists would allow me to include them in dataframes even if the elements do not always have equal length. But apparently thats not the case. Instead, R creates a variable for each individual token. thats not feasible because it would create millions of variables. Here is some code to exemplify:

a <- c(1,2,3) e <- list(b = as.list(c("a","b")),c = as.list(c(1L,3L,5L,4L)),d = as.list(c(TRUE,FALSE,TRUE))) DF <- cbind(a,e) U <- data.frame(a,e) 

I would like to have it like this:

Example of desired Frame

Is this possible at all in R with dataframes (I tried dearching for answers already but they were either for different questions or too technical for me)? Any help is much appreciated!

2
  • I don't think this is possible in the sense that you want. Data.frames are lists of equal length vectors. What you need to do/want to do is create a vector of lists. As far as I know this is not possible (see stackoverflow.com/questions/2624791/…) Commented May 11, 2017 at 15:19
  • A different option would be to have each element be a character but just a pasted together version of what the list would be Commented May 11, 2017 at 15:21

4 Answers 4

2

You can do exactly what you want if you are willing to use library(tibble):

library(tibble) a <- c(1,2,3) e <- list(b = as.list(c("a","b")),c = as.list(c(1L,3L,5L,4L)),d = as.list(c(TRUE,FALSE,TRUE))) tibble(a,e) # A tibble: 3 × 2 a e <dbl> <list> 1 1 <list [2]> 2 2 <list [4]> 3 3 <list [3]> 

A tibble or tbl_df will behave just like you are used to with a traditional data.frame but allow you some nice extra functionality like storing lists of various lengths in a column.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the suggestion! If I try it on my example though, it just produces this error: Error: Variables must be length 1 or 9. Problem variables: 'a'
1

I don't think what you want is possible using a vector of lists (as you suggest in your question). This is mainly because you can't create a vector of lists in R (see: How to create a vector of lists in R?)

However, one option (if you really want a data.frame) would be to coerce everything to a character (the most flexible type in R). Something like this might work for you:

e <- c(paste0(c("a","b"),collapse=","), paste0(c(1L,3L,5L,4L), collapse = ","), paste0(c(TRUE,FALSE,TRUE), collapse = ",")) U <- data.frame(a,e, stringAsFactors = F) U # a e #1 1 a,b #2 2 1,3,5,4 #3 3 TRUE,FALSE,TRUE 

Then you can back out the value of each cell with a split. Something like:

strsplit(U$e, ",") 

Comments

1

Thanks for all the suggestions everyone! I think I found a simpler solution though. Just in case anyone else has a similar problem in the future, this is what I did:

a <- c(1,2,3) b <- c("a","b") c <- c(1L,3L,5L,4L) d <- c(TRUE,FALSE,TRUE) e <- list(b,c,d);e DF <- data.frame(a,I(e));DF 

The I() inhibit function apparently prevents the lists from being converted and the column behaves just like a list of lists as far as I can tell so far. The class of the e column is however not "list" but "AsIs". I don't know whether this might cause problems further down the line, if so, I will update this answer!

EDIT

So it turns out that some functions do not take the AsIs class as input. To convert it back to a usefull character string, you can simply use unlist() on every row.

Comments

0

Try this:

cbind(a,lapply(e,function(x) paste(unlist(x),collapse=","))) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.