I have 2 datasets, i want for each row in datset1 to calculate the difference between all rows in another dataset2. I also replace any negative difference by 0. Here is a simple example of my 2 datasets (because i have datasets around 1000*1000).
df1 <- data.frame(ID = c(1, 2), Obs = c(1.0, 2.0), var=c(2.0,5.0)) df2 <- data.frame(ID = c(2, 1), Obs = c(3.0, 2.0),var=c(7.0,3.0)) df1 ID Obs var 1 1 1 2 2 2 2 5 df2 ID Obs var 1 2 3 7 2 1 2 3 for(i in 1:nrow(df1)){ b1=as.matrix(df1) b2=as.matrix(df2) diff= b1-b2 diff[which(diff < 0 )] <- 0 diff.data= data.frame(cbind(diff, total = rowSums(diff))) } diff.data ID Obs var total 1 0 0 0 0 2 1 0 2 3 This is what i have been able to do, i did the difference between the 2 datasets, replace the negative values by 0 and also was interested to sum the values of the columns after. For the first row in df1 i would like to calculate the difference between all the rows in df2, and for the second row in df1 calculate the difference between all the rows in df2 (and so on). Note that i should not calculate the difference between the IDs (i don't know how to do it, maybe changing diff= b1-b2 by diff= b1[,-1]-b2[,-1]? ). I want to keep the ID from df1 to keep track of my patients (the case of my dataset). I would like to have something like that
diff.data ID Obs var total 1 0 0 0 1 0 0 0 2 0 0 0 2 0 2 2 I thank you in advance for your help.
Here is what i have using your answer, i wanted to create a simple function. But i would like to have the option that my datasets could be either matrices or dataframes, i was only able to generate an error if the datasets are not dataframes:
difference=function(df1,df2){ if(class(df1) != "data.frame" || class(df2) != "data.frame") stop(" df1 or df2 is not a dataframe!") df1=data.frame(df1) df2=data.frame(df2) ID1=seq(nrow(df1)) ID2=seq(nrow(df2)) new_df1 = df1[rep(ID1, each = nrow(df2)), ] new_df1[-1] = new_df1[-1] - df2[rep(seq(nrow(df2)), nrow(df1)), -1] new_df1[new_df1 < 0] = 0 new_df1$total = rowSums(new_df1[-1]) rownames(new_df1) = NULL output=new_df1 return(output) } I know the fact that i specified df1=data.frame(df1) must be a dataframe its just i don't know how to also include that it could be a matrix.
Thank you again in advance for your help.