Is there a way to vectorize a nested loop that calculates the spearman correlation and it's p-values?

Question

I have a matrix m with 8300 columns and 18 rows. Each column represents a gene; and each row, a sample. I want to calculate the adjacency matrix (using spearman correlation) and the corresponding p-value matrix.

The code I've got so far is:

W = np.zeros((n_genes, n_genes)) P = np.zeros((n_genes, n_genes)) for i in range(0, n_genes): for j in range(0, n_genes): W[i,j], P[i,j] = st.spearmanr(m[:,i], m[:,j])

Which is amazingly inefficient (It takes around 11 hours to run in colab-google using GPU). Is there a way to vectorize this?

Thank you a lot!

Zach Favakeh · Accepted Answer · 2019-06-26 19:25:28Z

https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.stats.spearmanr.html

It looks like with this function you can pass in your entire m matrix for both arguments and it will do correlations and p-values between all of the columns, which it interprets as the variables (rows being samples of the variables). It then outputs the p-values and correlations in matrix forms. Therefore you can get rid of the for loops and produce the correlation and p-value matrices in one go. Even without doing this in one pass, it looks like you are going through all the data twice to form a symmetric matrix; I would have done the second loop as "for j in range(i, n_genes):" then filled out two entries [i,j] and [j,i] in the body of the loop.

You are totally right. I don't know what I was thinking. In any case, now the process takes 15 seconds.
Hahaha it's often the smallest corrections that save the most time. 11 hours!

Collectives™ on Stack Overflow

Is there a way to vectorize a nested loop that calculates the spearman correlation and it's p-values?

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related