2

I have 2 datasets, i want for each row in datset1 to calculate the difference between all rows in another dataset2. I also replace any negative difference by 0. Here is a simple example of my 2 datasets (because i have datasets around 1000*1000).

df1 <- data.frame(ID = c(1, 2), Obs = c(1.0, 2.0), var=c(2.0,5.0)) df2 <- data.frame(ID = c(2, 1), Obs = c(3.0, 2.0),var=c(7.0,3.0)) df1 ID Obs var 1 1 1 2 2 2 2 5 df2 ID Obs var 1 2 3 7 2 1 2 3 for(i in 1:nrow(df1)){ b1=as.matrix(df1) b2=as.matrix(df2) diff= b1-b2 diff[which(diff < 0 )] <- 0 diff.data= data.frame(cbind(diff, total = rowSums(diff))) } diff.data ID Obs var total 1 0 0 0 0 2 1 0 2 3 

This is what i have been able to do, i did the difference between the 2 datasets, replace the negative values by 0 and also was interested to sum the values of the columns after. For the first row in df1 i would like to calculate the difference between all the rows in df2, and for the second row in df1 calculate the difference between all the rows in df2 (and so on). Note that i should not calculate the difference between the IDs (i don't know how to do it, maybe changing diff= b1-b2 by diff= b1[,-1]-b2[,-1]? ). I want to keep the ID from df1 to keep track of my patients (the case of my dataset). I would like to have something like that

diff.data ID Obs var total 1 0 0 0 1 0 0 0 2 0 0 0 2 0 2 2 

I thank you in advance for your help.

Here is what i have using your answer, i wanted to create a simple function. But i would like to have the option that my datasets could be either matrices or dataframes, i was only able to generate an error if the datasets are not dataframes:

difference=function(df1,df2){ if(class(df1) != "data.frame" || class(df2) != "data.frame") stop(" df1 or df2 is not a dataframe!") df1=data.frame(df1) df2=data.frame(df2) ID1=seq(nrow(df1)) ID2=seq(nrow(df2)) new_df1 = df1[rep(ID1, each = nrow(df2)), ] new_df1[-1] = new_df1[-1] - df2[rep(seq(nrow(df2)), nrow(df1)), -1] new_df1[new_df1 < 0] = 0 new_df1$total = rowSums(new_df1[-1]) rownames(new_df1) = NULL output=new_df1 return(output) } 

I know the fact that i specified df1=data.frame(df1) must be a dataframe its just i don't know how to also include that it could be a matrix.

Thank you again in advance for your help.

1 Answer 1

3

You can repeat each row in df1 with for nrow(df2) times and each row in df2 for nrow(df1) times so that the size of dataframes is equal and we can directly subtract the values.

#Repeat each row of df1 nrow(df2) times new_df1 <- df1[rep(df1$ID, each = nrow(df2)), ] #Repeat rows of df2 and subtract new_df1[-1] <- new_df1[-1] - df2[rep(seq(nrow(df2)), nrow(df1)), -1] #Replace negative values with 0 new_df1[new_df1 < 0] <- 0 #Add row-wise sum new_df1$total <- rowSums(new_df1[-1]) #Remove rownames rownames(new_df1) <- NULL new_df1 # ID Obs var total #1 1 0 0 0 #2 1 0 0 0 #3 2 0 0 0 #4 2 0 2 2 
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you @Ronak Shah for you help! I really appeciate it! Could you please take a look at my code and tell me what should i do if i want my datasets to be either matrix or dataframes? Actually i dont know if i have to answer my question to post my code modified or just edit my question? still new to Stack. Thank you !
You can edit your question to include the code that you have.
Thank you @Ronak! I edited my post, i would appreciate your help.
Why not use is.data.frame and is.matrix to know if the data passed is a dataframe or matrix. Something like this if(!(is.data.frame(df1) && is.data.frame(df2) || is.matrix(df1) && is.matrix(df2))) stop('')
Thank you @Ronak! It is working. I was first thinking why not just use if(class(df1) = "character" || class(df2) = "character" ) then stop, but i was afraid the user would enter anything else and go on with the calculation. So this way i am sure. Thank you again for your help and Patience !

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.