50

I'm trying to convert a list of vectors (a multidimensional array essentially) into a data frame, but every time I try I'm getting unexpected results.

My aim is to instantiate a blank list, populate it in a for loop with vectors containing information about that iteration of the loop, then convert it into a data frame after it's finished.

> vectorList <- list() > for(i in 1:5){ + vectorList[[i]] <- c("number" = i, "square root" = sqrt(i)) + } > vectorList 

Outputs:

> [[1]] > number square root > 1 1 > > [[2]] > number square root > 2.000000 1.414214 > > [[3]] > number square root > 3.000000 1.732051 > > [[4]] > number square root > 4 2 > > [[5]] > number square root > 5.000000 2.236068 

Now I want this to become a data frame with 5 observations of 2 variables, but trying to create a data frame from 'vectorList'

numbers <- data.frame(vectorList) 

results in 2 observations of 5 variables.

Weirdly it won't even be coerced with reshape2 (which I know would be an awful work around, but I tried).

Anyone got any insight?

2
  • Just a general note about your approach: you should not grow lists like this inside a for loop, if you can avoid it. When you add something to the end of a list, R has to copy the whole list. This is fine for small cases, but if your list is big (and it's getting bigger and bigger, in your case) this can be quite inefficient. Commented Apr 27, 2017 at 15:55
  • 2
    For your data construction, you could have used lapply like this: vectorList <- lapply(1:5, function(x) c(x, sqrt(x))). Commented Jul 6, 2017 at 16:55

4 Answers 4

69

You can use:

as.data.frame(do.call(rbind, vectorList)) 

Or:

library(data.table) rbindlist(lapply(vectorList, as.data.frame.list)) 

Or:

library(dplyr) bind_rows(lapply(vectorList, as.data.frame.list)) 
Sign up to request clarification or add additional context in comments.

2 Comments

The first one returns a warning: Warning message: In (function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 3) The second one and the third return the error: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0
@PM0087 It works perfectly fine for me. Did you use the data as in the question?
18

The fastest and most efficient way that I know is using the data.table::transpose function (if the length of your vector is low-dimensional):

as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]]))

However, you will need to set the column names manually as data.table::transpose removes them. There is also a purrr::transpose function that does not remove the column names but it seems to be slower. Below a small benchmark including the suggestions of the other users:

vectorList = lapply(1:1000, function(i) (c("number" = i, "square root" = sqrt(i)))) bench = microbenchmark::microbenchmark( dplyr = dplyr::bind_rows(lapply(vectorList, as.data.frame.list)), rbindlist = data.table::rbindlist(lapply(vectorList, as.data.frame.list)), Reduce = Reduce(rbind, vectorList), transpose_datatable = as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]])), transpose_purrr = data.table::as.data.table(purrr::transpose(vectorList)), do.call = as.data.frame(do.call(rbind, vectorList)), times = 10) bench # Unit: microseconds # expr min lq mean median uq max neval cld # dplyr 286963.036 292850.136 320345.1137 310159.7380 341654.619 385399.851 10 b # rbindlist 285830.750 289935.336 306120.7257 309581.1895 318131.031 324217.413 10 b # Reduce 8573.474 9073.649 12114.5559 9632.1120 11153.511 33446.353 10 a # transpose_datatable 372.572 424.165 500.8845 479.4990 532.076 701.822 10 a # transpose_purrr 539.953 590.365 672.9531 671.1025 718.757 911.343 10 a # do.call 452.915 537.591 562.9144 570.0825 592.334 641.958 10 a # now use bigger list and disregard the slowest vectorList = lapply(1:100000, function(i) (c("number" = i, "square root" = sqrt(i)))) bench.big = microbenchmark::microbenchmark( transpose_datatable = as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]])), transpose_purrr = data.table::as.data.table(purrr::transpose(vectorList)), do.call = as.data.frame(do.call(rbind, vectorList)), times = 10) bench.big # Unit: milliseconds # expr min lq mean median uq max neval cld # transpose_datatable 3.470901 4.59531 4.551515 4.708932 4.873755 4.91235 10 a # transpose_purrr 61.007574 62.06936 68.634732 65.949067 67.477948 97.39748 10 b # do.call 97.680252 102.04674 115.669540 104.983596 138.193644 151.30886 10 c 

Comments

14

Also Reduce:

Reduce(rbind, vectorList) # number square root # init 1 1.000000 # 2 1.414214 # 3 1.732051 # 4 2.000000 # 5 2.236068 

1 Comment

Note that Reduce(rbind, vectorList) returns a matrix, so you'd want to wrap it in data.frame to return a data.frame object.
8

An alternative solution using purrr:

purrr::map_dfr( vectorList, as.list ) # # A tibble: 5 x 2 # number `square root` # <dbl> <dbl> # 1 1 1 # 2 2 1.41 # 3 3 1.73 # 4 4 2 # 5 5 2.24 

The code effectively converts each vector to a list and concatenates the results row-wise into a common data frame.

1 Comment

the great thing about the tidyverse methods (both dplyr::bind_rows() and purrr::map_dfr()) is that they can deal with list elements that have different lengths, and named vectors that vary in their order from element to element. Very useful for example when converting the output of xml2::xml_attrs() into rectangular data.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.