Split a list column into multiple columns

Question

I have a data frame where the last column is a column of lists. Below is how it looks:

Col1 | Col2 | ListCol -------------------------- na | na | [obj1, obj2] na | na | [obj1, obj2] na | na | [obj1, obj2]

What I want is

Col1 | Col2 | Col3 | Col4 -------------------------- na | na | obj1 | obj2 na | na | obj1 | obj2 na | na | obj1 | obj2

I know that all the lists have the same amount of elements.

Edit:

Every element in ListCol is a list with two elements.

It depends a lot on how ListCol is structured. If it contains a data frame or named list for each row, just tidyr::unnest will work. If it's some other structure, you may need to rearrange first. To get a better answer, edit with the result of calling dput on your sample data so we can reproduce the exact structure. — alistaire
– alistaire, Commented Jun 15, 2018 at 19:15
Hello. I've tried unnest but what it's been doing is getting the objects to separate, but in different rows rather than columns. Every row of ListCol is a list — Santi
– Santi, Commented Jun 15, 2018 at 19:23
The simplest way to get it to expand sideways instead of down is to make each list element a 1-row data frame, e.g. with df$ListCol <- lapply(df$ListCol, function(x) as.data.frame(t(x))) (with dplyr and purrr, if you prefer) and then calling unnest. — alistaire
– alistaire, Commented Jun 15, 2018 at 21:43
here's alisatire's solution for a similar problem: stackoverflow.com/questions/49889246/how-to-unnest-column-list using invoke_map and tibble. and here are several other solutions: stackoverflow.com/questions/49689927/… — Arthur Yip
– Arthur Yip, Commented Jul 7, 2019 at 2:00

MyNameHere · Accepted Answer · 2024-11-27 06:48:34Z

31

Currently, the tidyverse answer would be:

library(dplyr) library(tidyr) data %>% unnest_wider(ListCol, names_sep = "_")

edited Nov 27, 2024 at 6:48

MyNameHere

3371 gold badge3 silver badges17 bronze badges

answered Nov 6, 2020 at 9:15

iago

3,2964 gold badges25 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mario Reutter Over a year ago

You can also add the names_sep parameter if you need to keep the name of the nested column (e.g. data %>% unnest_wider(ListCol, names_sep="_") would lead to ListCol_Col3, which is handy when unnesting several columns at once).

MyNameHere Nov 26, 2024 at 4:40

Not only can you add names_sep, it's also required for unnamed list-cols.

iago Nov 27, 2024 at 6:49

@MyNameHere indeed, since tidy version 1.3.0. Thanks!

Andrew Gustar · Accepted Answer · 2018-06-15 19:36:24Z

11

Here is one approach, using unnest and tidyr::spread...

library(dplyr) library(tidyr) #example df df <- tibble(a=c(1, 2, 3), b=list(c(2, 3), c(4, 5), c(6, 7))) df %>% unnest(b) %>% group_by(a) %>% mutate(col=seq_along(a)) %>% #add a column indicator spread(key=col, value=b) a `1` `2` <dbl> <dbl> <dbl> 1 1. 2. 3. 2 2. 4. 5. 3 3. 6. 7.

answered Jun 15, 2018 at 19:36

Andrew Gustar

18.6k1 gold badge26 silver badges36 bronze badges

3 Comments

Onyambu Over a year ago

in your example you are just doing cbind(df[1],do.call(rbind,df$b)) or even cbind(df[1],t(data.frame(df$b)))

JMarcelino Over a year ago

@Onyambu, don't you want to write a complete answer to this post? It helped me and could help others but I missed it the first time.

ARobertson Over a year ago

Help doc for Spread says it's superseded by pivot_wider, which works nicely as well.

pietrodito · Accepted Answer · 2024-04-11 09:51:09Z

Comparison of two great answers

There are two great one liner suggestions in this thread:

(1) `cbind(df[1], t(data.frame(df$b)))`

This is from @Onyambu using base R. To get to this answer one needs to know that a dataframe is a list and needs a bit of creativity.

(2) `df %>% unnest_wider(b)`

This is from @iago using tidyverse. You need extra packages and to know all the nest verbs, but one can think that it is more readable.

Now let's compare performance

library(dplyr) library(tidyr) library(purrr) library(microbenchmark) N <- 100 df <- tibble(a = 1:N, b = map2(1:N, 1:N, c)) tidy_foo <- function() suppressMessages(df %>% unnest_wider(b, names_sep = "-")) base_foo <- function() cbind(df[1],t(data.frame(df$b))) %>% as_tibble # To be fair microbenchmark(tidy_foo(), base_foo(), times = 1000)

Unit: milliseconds expr min lq mean median uq max neval tidy_foo() 6.538002 7.142651 7.935855 7.434001 7.945101 70.0057 1000 base_foo() 6.000001 6.423951 7.110651 6.636401 6.991952 13.8205 1000

Conclusion

tidyr solution is 1,1 times slower if you consider the mean but can generate worst case 5x times slower.

Using my real-world data & problem, I see less of a difference. My data.frame has 100k rows, 65 cols, and I'm unnesting a single pair of variables. The tidyr solution takes 12.5 seconds, the base R solution takes 11 seconds, so the base R solution is 1.14x faster. Users might want to test on their own data.
@SamFirke I have relaunched microbenchmark with times = 1000 and found results similar to yours.

Alec · Accepted Answer · 2018-06-15 20:36:18Z

Here's an option with data.table and base::unlist.

library(data.table) DT <- data.table(a = list(1, 2, 3), b = list(list(1, 2), list(2, 1), list(1, 1))) for (i in 1:nrow(DT)) { set( DT, i = i, j = c('b1', 'b2'), value = unlist(DT[i][['b']], recursive = FALSE) ) } DT

This requires a for loop on every row... Not ideal and very anti-data.table. I wonder if there's some way to avoid creating the list column in the first place...

Matthew Son · Accepted Answer · 2021-03-18 16:22:57Z

@Alec data.table offers tstrsplit function to split a column into multiple columns.

DT = data.table(x=c("A/B", "A", "B"), y=1:3) DT[] # x y #1: A/B 1 #2: A 2 #3: B 3

DT[, c("c1") := tstrsplit(x, "/", fixed=TRUE, keep=1L)][] # keep only first # x y c1 #1: A/B 1 A #2: A 2 A #3: B 3 B DT[, c("c1", "c2") := tstrsplit(x, "/", fixed=TRUE)][] # x y c1 c2 #1: A/B 1 A B #2: A 2 A <NA> #3: B 3 B <NA>

Collectives™ on Stack Overflow

Split a list column into multiple columns

5 Answers 5

3 Comments

3 Comments

Comparison of two great answers

(1) `cbind(df[1], t(data.frame(df$b)))`

(2) `df %>% unnest_wider(b)`

Now let's compare performance

Conclusion

2 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

3 Comments

Comparison of two great answers

(1) cbind(df[1], t(data.frame(df$b)))

(2) df %>% unnest_wider(b)

Now let's compare performance

Conclusion

2 Comments

Comments

Comments

Linked

Related

(1) `cbind(df[1], t(data.frame(df$b)))`

(2) `df %>% unnest_wider(b)`