I have a dataframe which contains the names of supervisors and advisors of students' dissertations in a faculty as follows for example:
DF<-data.frame(Names=c("Name : Ali , Family : Ahmadi , Type : First supervisor Name : Aram , Family : Rezaeei , Type : Advisor Name : Omid , Family : Saeedi , Type : Advisor 1 Name : Nima , Family : Shaki , Type : Advisor 2 Name : Sohrab , Family : Karimi , Type : Advisor 3", "Name : Ali , Family : Ahmadi , Type : First supervisor Name : Aram , Family : Rezaeei , Type : Advisor Name : Omid , Family : Saeedi , Type : Advisor 1 Name : Nima , Family : Shaki , Type : Advisor 2 Name : Sohrab , Family : Karimi , Type : Advisor 3", "Name : Ali , Family : Ahmadi , Type : First supervisor Name : Aram , Family : Rezaeei , Type : Advisor Name : Omid , Family : Saeedi , Type : Advisor 1 Name : Nima , Family : Shaki , Type : Advisor 2 Name : Sohrab , Family : Karimi , Type : Advisor 3")) I gonna separate supervisors and advisors as two distinct columns (as my expectation) like this:
DF1<-data.frame(Supervisor=c("Ali Ahmadi","Ali Ahmadi","Ali Ahmadi"),Advisors=c("Aram Rezaeei, Omid Saeedi, Nima Shaki, Sohrab Karimi","Aram Rezaeei, Omid Saeedi, Nima Shaki, Sohrab Karimi","Aram Rezaeei, Omid Saeedi, Nima Shaki, Sohrab Karimi")) DF1 Supervisor Advisors 1 Ali Ahmadi Aram Rezaeei, Omid Saeedi, Nima Shaki, Sohrab Karimi 2 Ali Ahmadi Aram Rezaeei, Omid Saeedi, Nima Shaki, Sohrab Karimi 3 Ali Ahmadi Aram Rezaeei, Omid Saeedi, Nima Shaki, Sohrab Karimi I tried following codes:
DF1<-strsplit(DF$Names, "Name :") stopwords = c(":","Type","Family","Name","1","2", "3", "Advisor", "Family") DF2 <- lapply(DF1,function(x) unlist(strsplit(x," ")) ) DF3 <- lapply(DF2,function(x) x[!x %in% stopwords] ) DF4<-lapply(DF3,function(x) paste(x, collapse = " ")) But the final results as follows is not what was my expectation and apparently need further work to be converted to a datataframe!:
DF4 [[1]] [1] " Ali , Ahmadi , First supervisor Aram , Rezaeei , Omid , Saeedi , Nima , Shaki , Sohrab , Karimi ," [[2]] [1] " Ali , Ahmadi , First supervisor Aram , Rezaeei , Omid , Saeedi , Nima , Shaki , Sohrab , Karimi ," [[3]] [1] " Ali , Ahmadi , First supervisor Aram , Rezaeei , Omid , Saeedi , Nima , Shaki , Sohrab , Karimi ," Is there any simplified method to solve the problem? I found regexp can be helpful but I don't know how to use it atleast in the case of my example. Thanks in advance for any answer...