0

I have two columns (V1 and V2) with character information. I want to create a third column with the "sum" of this characters. Like unique values between comma's "," inside the character vectors on both columns (V1 and V2).

I want to go from this:

Example data:

data.frame(V1 = c('A','A','A','A','B','B','','C'), V2 = c('A, B','A','B','','A, C','A, B','A','')) V1 V2 1 A A, B 2 A A 3 A B 4 A 5 B A, C 6 B A, B 7 A 8 C 

To this:

 V3 1 AB 2 A 3 AB 4 A 5 ABC 6 AB 7 A 8 C 
2
  • You want to sort them as well? Row five should result in ABC and not BAC? Commented Jun 3, 2022 at 15:54
  • This doesn't really matter, thanks Commented Jun 3, 2022 at 16:02

4 Answers 4

2

Here is a tidyverse way using purrr and dplyr. You can probably condense this into fewer lines, but this is readable enough.

  1. Split the text on the comma.
  2. Sort and combine the two columns.
  3. Paste them back together.
library(dplyr) library(purrr) library(stringr) df %>% modify(str_split, ",\\s") %>% mutate(V3 = map2(V1, V2, compose(sort, unique, c))) %>% mutate(V3 = map_chr(V3, paste, collapse = "")) 
Sign up to request clarification or add additional context in comments.

Comments

2

We can split the column 'V2', get the union of both columns and paste

data.frame(V3 = mapply(\(x, y) paste(sort(union(x, y)), collapse = ""), strsplit(df1$V2, ",\\s*"), df1$V1)) 

-output

 V3 1 AB 2 A 3 AB 4 A 5 ABC 6 AB 7 A 8 C 

Comments

2

This approach first paste V1 and V2 together, then use strsplit to split the string, then only keep the unique characters and collapse them together.

df$V3 <- sapply(strsplit(gsub(",\\s", "", paste0(df$V1, df$V2)), ""), function(x) paste0(sort(unique(x)), collapse = "")) V3 1 AB 2 A 3 AB 4 A 5 ABC 6 AB 7 A 8 C 

Comments

2

With regex:

gsub("(.)(?=.*\\1)|,| ", "", paste(df$V1, df$V2), perl = TRUE) # [1] "AB" "A" "AB" "A" "BAC" "AB" "A" "C" 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.