Create new column that is a random subset of other columns

Question

I'd like to create a new column where each value is a random subset of other values from that row in my data.

# Example data: df <- data.frame(matrix(nrow = 57, ncol = 6)) %>% mutate( X1 = round(rnorm(n = 57, mean = 0, sd = 1), 1), X2 = round(rnorm(n = 57, mean = 0, sd = 1), 1), X3 = round(rnorm(n = 57, mean = 0, sd = 1), 1), X4 = round(rnorm(n = 57, mean = 0, sd = 1), 1), X5 = round(rnorm(n = 57, mean = 0, sd = 1), 1), X6 = round(rnorm(n = 57, mean = 0, sd = 1), 1) ) # my failed attempt at a new column df %>% rowwise() %>% mutate(X7 = str_c(df[, sample(1:6, 3, replace = F)]), sep = ", ")

Forget the rowwise and use sample(1:6, 1, replace = F). Sample just one column not 3. BTW, why str_c? Don't you want to fill X7 with numbers? Like this you will have characters. — Rui Barradas
– Rui Barradas, Commented Sep 1, 2017 at 5:10
@RuiBarradas I want each value of X7 to be a vector of 3 random values from its own row. — Joe
– Joe, Commented Sep 1, 2017 at 5:14

www · Accepted Answer · 2017-09-01 05:42:41Z

A solution uses tidyverse. The key is to split the data frame by row and apply a function to sample the values for each row subset. map_df can achieve the above-mentioned task and combine all the output to a data frame. df2 is the final output.

# Load package library(tidyverse) # Set seed set.seed(123) # Create example data frame df <- data.frame(matrix(nrow = 57, ncol = 6)) %>% mutate( X1 = round(rnorm(n = 57, mean = 0, sd = 1), 1), X2 = round(rnorm(n = 57, mean = 0, sd = 1), 1), X3 = round(rnorm(n = 57, mean = 0, sd = 1), 1), X4 = round(rnorm(n = 57, mean = 0, sd = 1), 1), X5 = round(rnorm(n = 57, mean = 0, sd = 1), 1), X6 = round(rnorm(n = 57, mean = 0, sd = 1), 1) ) # Process the data df2 <- df %>% rowid_to_column() %>% split(f = .$rowid) %>% map_df(function(dt){ dt_sub <- dt %>% select(-rowid) %>% select(sample(1:6, 3, replace = FALSE)) %>% unite(X7, everything(), sep = ", ") return(dt_sub) }) %>% bind_cols(df) %>% select(paste0("X", 1:7)) df2 X1 X2 X3 X4 X5 X6 X7 1 -0.6 0.6 0.5 0.1 0.9 0.1 0.1, 0.5, 0.9 2 -0.2 0.1 0.3 0.0 -1.0 0.2 0.1, 0.3, 0.2 3 1.6 0.2 0.1 2.1 2.0 1.6 1.6, 2.1, 0.1 4 0.1 0.4 -0.6 -0.7 -0.1 -0.2 0.1, 0.4, -0.6 5 0.1 -0.5 -0.8 -1.1 0.2 0.2 0.1, 0.2, -0.5 6 1.7 -0.3 -1.0 0.0 -0.7 1.2 -1, -0.7, -0.3 7 0.5 -1.0 0.1 0.3 -0.6 1.1 0.5, -0.6, -1 ...

Rui Barradas · Accepted Answer · 2017-09-01 06:04:08Z

I believe that the best way is to use base R functions replicate, sample and sapply.

inx <- t(replicate(nrow(df), sample(1:6, 3, replace = F))) df$X7 <- sapply(seq_len(nrow(df)), function(i) paste(df[i, inx[i, ]], collapse = ", "))

Sven Hohenstein · Accepted Answer · 2017-09-01 06:36:44Z

This is a solution in dplyr:

library(dplyr) df %>% group_by(idx = seq(n())) %>% do({ res <- select(., -idx) bind_cols(res, X7 = toString(sample(unlist(res), 3, replace = FALSE))) }) %>% ungroup() %>% select(-idx)

The result:

# A tibble: 57 x 7 X1 X2 X3 X4 X5 X6 X7 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> 1 0.4 0.4 -0.1 3.4 0.9 -0.4 0.4, 0.9, 0.4 2 1.5 0.9 -0.7 1.5 -1.1 -0.3 -0.7, 1.5, -1.1 3 -0.1 -0.5 -0.6 -0.8 -0.3 2.3 -0.3, 2.3, -0.8 4 0.7 -1.0 0.3 0.2 -0.5 -0.3 -1, 0.3, -0.3 5 0.6 0.9 0.4 1.9 -0.7 -2.0 0.4, -2, 0.9 6 0.3 0.7 1.3 0.6 1.3 -0.2 0.7, -0.2, 1.3 7 0.5 0.3 1.1 -0.2 -0.4 -0.8 0.5, 1.1, 0.3 8 0.4 -1.9 0.8 -0.6 -1.1 0.4 0.4, -1.9, -0.6 9 0.2 -1.5 -1.9 1.0 0.0 0.6 0, 1, 0.6 10 -0.2 0.7 -0.5 1.4 0.3 -0.1 -0.2, 0.3, -0.5

@ycw Good idea, thanks for pointing out! I modified my answer accordingly.

Collectives™ on Stack Overflow

Create new column that is a random subset of other columns

3 Answers 3

Comments

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Related