I am trying to create an artificial dataframe of words contributed and deleted by users of Wikipedia for each edit that they make, the end result should look like this:
I created some artifical data to build such a frame but I'm having problems with the variables "Tokens Added" and "Tokens deleted".
I thought creating them as lists of lists would allow me to include them in dataframes even if the elements do not always have equal length. But apparently thats not the case. Instead, R creates a variable for each individual token. thats not feasible because it would create millions of variables. Here is some code to exemplify:
a <- c(1,2,3) e <- list(b = as.list(c("a","b")),c = as.list(c(1L,3L,5L,4L)),d = as.list(c(TRUE,FALSE,TRUE))) DF <- cbind(a,e) U <- data.frame(a,e) I would like to have it like this:
Is this possible at all in R with dataframes (I tried dearching for answers already but they were either for different questions or too technical for me)? Any help is much appreciated!


Data.frames are lists of equal length vectors. What you need to do/want to do is create a vector of lists. As far as I know this is not possible (see stackoverflow.com/questions/2624791/…)