0

I have one table(1) that looks like this (it is an all by all distance matrix transformed into a tab separated list):

sample1 sample2 405 sample3 sample4 400 sample5 sample6 1 sample7 sample8 20 sample1 sample3 40 

I have another table(2) which contains those samples which meet a certain criteria:

sample1 sample2 sample8 

How can I parse the first table(1) to extract only those rows in which both the variables in columns 1 and 2 can be found in table(2)?

ie desired comparisons are only:

sample1 sample2 405 sample2 sample8 40 sample8 sample1 100 
4
  • The desired comparisions does not make any sense to me. Is that what you want as output? Or those values are not accurate? Commented Dec 28, 2017 at 22:37
  • sorry the values are made up - I just want to filter table(1) for all vs all pairwaise comparisons only for values found in table(2) Commented Dec 28, 2017 at 22:52
  • Understood. dplyr can be used to join data.frame Commented Dec 28, 2017 at 23:04
  • Please use dput() to show your data! Commented Feb 16, 2018 at 14:09

3 Answers 3

2

Here is a base R solution:

rawData1 <- "first second distance sample1 sample2 405 sample3 sample4 400 sample5 sample6 1 sample7 sample8 20 sample1 sample3 40" rawData2 <- "sample sample1 sample2 sample8" a <- read.table(textConnection(rawData1),stringsAsFactors=FALSE,header=TRUE) b <- read.table(textConnection(rawData2),stringsAsFactors=FALSE,header=TRUE) a[a$first %in% b$sample & a$second %in% b$sample, ] 

...and the output:

> a[a$first %in% b$sample & a$second %in% b$sample, ] first second distance 1 sample1 sample2 405 
Sign up to request clarification or add additional context in comments.

Comments

2

I tried a similar set-up using a dataframe for your table(1) and a vector for your table(2).

table_one <- data.frame(col_1 = c("a", "b", "c", "d"), col_2 = c("b", "d", "f", "g"), col_3 = c(1, 2, 3, 4)) table_two <- c("b", "d") 

When you set it up that way, something like this should work:

library(tidyverse) table_one %>% filter(col_1 %in% table_two, col_2 %in% table_two) 

Comments

1

The best option could be inner_join twice, both with 1st column and 2nd column and then perform intersect of two result set.

library(dplyr) df1 <- read.table(text = "Samp1 Samp2 Val sample1 sample2 405 sample3 sample4 400 sample5 sample6 1 sample7 sample8 20 sample1 sample3 40", header = TRUE, stringsAsFactors = FALSE) > df1 Samp1 Samp2 Val 1 sample1 sample2 405 2 sample3 sample4 400 3 sample5 sample6 1 4 sample7 sample8 20 5 sample1 sample3 40 df2 <- data.frame(Samp = c("sample1", "sample2", "sample8"), stringsAsFactors = FALSE) > df2 Samp 1 sample1 2 sample2 3 sample8 #use inner_join between Samp1 with Samp and then again Samp2 with Samp intersect(inner_join(df1,df2, by = c("Samp1" = "Samp")), inner_join(df1,df2, by = c("Samp2" = "Samp"))) The result will be: Samp1 Samp2 Val 1 sample1 sample2 405 

2 Comments

I think that the result set should only contain distances where both Samp1 AND Samp2 are in df2$Samp, which is different than what is produced by your code. If you use intersect() instead of union() your code produces the correct output.
@LenGreski You have pointed out correctly. If Samp1 and Samp2 both should be in df$Samp then intersect() will be needed.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.