I have a dataframe with following structure:
df <- data.frame( Replicate = c(rep("N1", 50), rep("N2", 50)), feature1 = rnorm(100, 0, 1), feature2 = rnorm(100, 0, 3), feature3 = rnorm(100, 0.1, 1) ) I am calculating the correlation between my (biological) replicates for each of my data columns (here "feature 1-3") with following code:
results_table <- data.frame(feature = NA, correlation = NA) for(i in colnames(df)[2:4]){ cor_i <- cor(df %>% filter(Replicate == "N1") %>% pull(i), df %>% filter(Replicate == "N2") %>% pull(i), use = "pairwise.complete") results_table_temp <- data.frame(feature = i, correlation = cor_i) results_table <- rbind(results_table, results_table_temp) } results_table <- results_table[2:nrow(results_table),] results_table I basically filter my initial dataframe for the respective replicate and calculate correlation between these replicate for each column (using a a for loop with cor() and store the output in dataframe).
For my dataset (240 rows with >7000 colums), the computing time is quite long! Is there a more efficient way to calculate this? Maybe a specific function or preprocessing of data to make the computation more efficient?