R Concatenate column names into new column while sorting by their value

Question

I'm trying to concatenate a string that identifies the order of the columns by their value.

set.seed(100) df <- tibble(id = 1:5, col1 = sample(1:50, 5), col2 = sample(1:50, 5), col3 = sample(1:50, 5)) %>% mutate_at(vars(-id), ~if_else(. <= 20, NA_integer_, .)) # A tibble: 5 x 4 id col1 col2 col3 <int> <int> <int> <int> 1 1 NA 44 NA 2 2 38 23 34 3 3 48 22 NA 4 4 25 NA 48 5 5 NA NA 43 res <- df %>% add_column(order = c('col2', 'col2_col3_co1', 'col2_col1', 'col1_col3', 'col3')) # A tibble: 5 x 5 id col1 col2 col3 order <int> <int> <int> <int> <chr> 1 1 NA 44 NA col2 2 2 38 23 34 col2_col3_co1 3 3 48 22 NA col2_col1 4 4 25 NA 48 col1_col3 5 5 NA NA 43 col3

My current data is in the form of df while the column I'm trying to add is the order column in res. The ordering of the elements in the string is determined by the value of each column, and it also needs to skip over NAs. I'm trying to identify the sequence that each ID takes to populate a value in each column as the values are time in days. However, not all IDs will have a value in all columns, so there's missing values throughout. I usually work within tidyverse, but any solution or thoughts would be much appreciated.

akrun · Accepted Answer · 2020-03-26 21:40:24Z

An easier option is apply, loop over the rows (MARGIN = 1), remove the NA elements, order the rest of the non-NA, use the index to get the column names and paste them together

df$order <- apply(df[-1], 1, function(x) {x1 <- x[!is.na(x)] paste(names(x1)[order(x1)], collapse="_")}) df$order #[1] "col2" "col2_col3_col1" "col2_col1" "col1_col3" "col3"

Or using tidyverse

library(dplyr) library(tidyr) library(stringr) df %>% pivot_longer(cols = -id, values_drop_na = TRUE) %>% arrange(id, value) %>% group_by(id) %>% summarise(order = str_c(name, collapse="_")) %>% right_join(df) %>% select(names(df), order) # A tibble: 5 x 5 # id col1 col2 col3 order # <int> <int> <int> <int> <chr> #1 1 NA 44 NA col2 #2 2 38 23 34 col2_col3_col1 #3 3 48 22 NA col2_col1 #4 4 25 NA 48 col1_col3 #5 5 NA NA 43 col3

Or using pmap from purrr

library(purrr) df %>% mutate(order = pmap_chr(select(., starts_with('col')), ~ {x <- c(...) x1 <- x[!is.na(x)] str_c(names(x1)[order(x1)], collapse="_")}))

This doesn't handle NA correctly, it returns 1 for the first row. try names(x)[order(x, na.last = NA)]
Thank you for the multiple solutions. The apply approach is certainly more straightforward which isn't surprising as rowwise operations tend to seem more complicated than they should be with the tidyverse. I figured that purrr::pmap would be able to accomplish this well, so thank you for the example.

Collectives™ on Stack Overflow

R Concatenate column names into new column while sorting by their value

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related