1

I have the following data frame:

library(tidyverse) dat <- structure(list(seq_name = c("Peptide_set1.r1", "Peptide_set2.r1" ), peptide = c("KSKLRHGC", "AAYVYVNQF" )), .Names = c("seq_name", "peptide"), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame")) dat #> # A tibble: 2 x 2 #> seq_name peptide #> <chr> <chr> #> 1 Peptide_set1.r1 KSKLRHGC #> 2 Peptide_set2.r1 AAYVYVNQF 

What I want to do is to convert them into this list of vector:

$Peptide_set1.r1 [1] "K" "S" "K" "L" "R" "H" "G" "C" $Peptide_set2.r1 [[1] "A" "A" "Y" "V" "Y" "V" "N" "Q" "F" 

How can I do that?

2
  • @RonakShah Thanks. But not quite. I need the named list. Commented Apr 12, 2018 at 3:31
  • @RonakShah No. The name should be taken from `seq- Commented Apr 12, 2018 at 3:35

1 Answer 1

3

We can split the strings at each character using strsplit and assign names using setnames

setNames(strsplit(dat$peptide, ""), dat$seq_name) #$Peptide_set1.r1 #[1] "K" "S" "K" "L" "R" "H" "G" "C" #$Peptide_set2.r1 #[1] "A" "A" "Y" "V" "Y" "V" "N" "Q" "F" 

To use column index instead of names we can use pull to convert column values to vector as this is a tibble

library(dplyr) setNames(strsplit(pull(dat[2]), ""), pull(dat[1])) #$Peptide_set1.r1 #[1] "K" "S" "K" "L" "R" "H" "G" "C" #$Peptide_set2.r1 #[1] "A" "A" "Y" "V" "Y" "V" "N" "Q" "F" 

We can add them completely into dplyr chain operation as well

library(tidyverse) dat1 <- dat %>% mutate(new = setNames(strsplit(pull(dat[2]), ""), pull(dat[1]))) dat1$new #$Peptide_set1.r1 #[1] "K" "S" "K" "L" "R" "H" "G" "C" #$Peptide_set2.r1 #[1] "A" "A" "Y" "V" "Y" "V" "N" "Q" "F" 

And as @thelatemail commented, we can extract the columns using [[ instead of pull

setNames(strsplit(dat[[2]], ""), dat[[1]]) 
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. How can I modify your code so that it takes the column index instead of name e.g. instead of dat$peptide or dat$seq_name. I tried this but failed setNames(strsplit(as.list(dat[1]), ""), dat[2,])
setNames(strsplit(dat[[2]], ""), dat[[1]]) works too given that [[ is the R primitive operation for extracting from lists, which a tbl still is under all these layers. pull is supposed to replace [[ so the syntax should actually be pull(dat, 2) otherwise you're invoking [ then [[.
@thelatemail I forgot about [[, thanks for explanation on pull as well. :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.