36

I have a data frame where the last column is a column of lists. Below is how it looks:

Col1 | Col2 | ListCol -------------------------- na | na | [obj1, obj2] na | na | [obj1, obj2] na | na | [obj1, obj2] 

What I want is

Col1 | Col2 | Col3 | Col4 -------------------------- na | na | obj1 | obj2 na | na | obj1 | obj2 na | na | obj1 | obj2 

I know that all the lists have the same amount of elements.

Edit:

Every element in ListCol is a list with two elements.

4
  • 1
    It depends a lot on how ListCol is structured. If it contains a data frame or named list for each row, just tidyr::unnest will work. If it's some other structure, you may need to rearrange first. To get a better answer, edit with the result of calling dput on your sample data so we can reproduce the exact structure. Commented Jun 15, 2018 at 19:15
  • 2
    Hello. I've tried unnest but what it's been doing is getting the objects to separate, but in different rows rather than columns. Every row of ListCol is a list Commented Jun 15, 2018 at 19:23
  • The simplest way to get it to expand sideways instead of down is to make each list element a 1-row data frame, e.g. with df$ListCol <- lapply(df$ListCol, function(x) as.data.frame(t(x))) (with dplyr and purrr, if you prefer) and then calling unnest. Commented Jun 15, 2018 at 21:43
  • here's alisatire's solution for a similar problem: stackoverflow.com/questions/49889246/how-to-unnest-column-list using invoke_map and tibble. and here are several other solutions: stackoverflow.com/questions/49689927/… Commented Jul 7, 2019 at 2:00

5 Answers 5

31

Currently, the tidyverse answer would be:

library(dplyr) library(tidyr) data %>% unnest_wider(ListCol, names_sep = "_") 
Sign up to request clarification or add additional context in comments.

3 Comments

You can also add the names_sep parameter if you need to keep the name of the nested column (e.g. data %>% unnest_wider(ListCol, names_sep="_") would lead to ListCol_Col3, which is handy when unnesting several columns at once).
Not only can you add names_sep, it's also required for unnamed list-cols.
@MyNameHere indeed, since tidy version 1.3.0. Thanks!
11

Here is one approach, using unnest and tidyr::spread...

library(dplyr) library(tidyr) #example df df <- tibble(a=c(1, 2, 3), b=list(c(2, 3), c(4, 5), c(6, 7))) df %>% unnest(b) %>% group_by(a) %>% mutate(col=seq_along(a)) %>% #add a column indicator spread(key=col, value=b) a `1` `2` <dbl> <dbl> <dbl> 1 1. 2. 3. 2 2. 4. 5. 3 3. 6. 7. 

3 Comments

in your example you are just doing cbind(df[1],do.call(rbind,df$b)) or even cbind(df[1],t(data.frame(df$b)))
@Onyambu, don't you want to write a complete answer to this post? It helped me and could help others but I missed it the first time.
Help doc for Spread says it's superseded by pivot_wider, which works nicely as well.
7

Comparison of two great answers

There are two great one liner suggestions in this thread:

(1) cbind(df[1], t(data.frame(df$b)))

This is from @Onyambu using base R. To get to this answer one needs to know that a dataframe is a list and needs a bit of creativity.

(2) df %>% unnest_wider(b)

This is from @iago using tidyverse. You need extra packages and to know all the nest verbs, but one can think that it is more readable.

Now let's compare performance

library(dplyr) library(tidyr) library(purrr) library(microbenchmark) N <- 100 df <- tibble(a = 1:N, b = map2(1:N, 1:N, c)) tidy_foo <- function() suppressMessages(df %>% unnest_wider(b, names_sep = "-")) base_foo <- function() cbind(df[1],t(data.frame(df$b))) %>% as_tibble # To be fair microbenchmark(tidy_foo(), base_foo(), times = 1000) 

Unit: milliseconds expr min lq mean median uq max neval tidy_foo() 6.538002 7.142651 7.935855 7.434001 7.945101 70.0057 1000 base_foo() 6.000001 6.423951 7.110651 6.636401 6.991952 13.8205 1000 

Conclusion

tidyr solution is 1,1 times slower if you consider the mean but can generate worst case 5x times slower.

2 Comments

Using my real-world data & problem, I see less of a difference. My data.frame has 100k rows, 65 cols, and I'm unnesting a single pair of variables. The tidyr solution takes 12.5 seconds, the base R solution takes 11 seconds, so the base R solution is 1.14x faster. Users might want to test on their own data.
@SamFirke I have relaunched microbenchmark with times = 1000 and found results similar to yours.
2

Here's an option with data.table and base::unlist.

library(data.table) DT <- data.table(a = list(1, 2, 3), b = list(list(1, 2), list(2, 1), list(1, 1))) for (i in 1:nrow(DT)) { set( DT, i = i, j = c('b1', 'b2'), value = unlist(DT[i][['b']], recursive = FALSE) ) } DT 

This requires a for loop on every row... Not ideal and very anti-data.table. I wonder if there's some way to avoid creating the list column in the first place...

Comments

1

@Alec data.table offers tstrsplit function to split a column into multiple columns.

DT = data.table(x=c("A/B", "A", "B"), y=1:3) DT[] # x y #1: A/B 1 #2: A 2 #3: B 3 
DT[, c("c1") := tstrsplit(x, "/", fixed=TRUE, keep=1L)][] # keep only first # x y c1 #1: A/B 1 A #2: A 2 A #3: B 3 B DT[, c("c1", "c2") := tstrsplit(x, "/", fixed=TRUE)][] # x y c1 c2 #1: A/B 1 A B #2: A 2 A <NA> #3: B 3 B <NA> 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.