2

I have a matrix m with 8300 columns and 18 rows. Each column represents a gene; and each row, a sample. I want to calculate the adjacency matrix (using spearman correlation) and the corresponding p-value matrix.

The code I've got so far is:

W = np.zeros((n_genes, n_genes)) P = np.zeros((n_genes, n_genes)) for i in range(0, n_genes): for j in range(0, n_genes): W[i,j], P[i,j] = st.spearmanr(m[:,i], m[:,j]) 

Which is amazingly inefficient (It takes around 11 hours to run in colab-google using GPU). Is there a way to vectorize this?

Thank you a lot!

1 Answer 1

2

https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.stats.spearmanr.html

It looks like with this function you can pass in your entire m matrix for both arguments and it will do correlations and p-values between all of the columns, which it interprets as the variables (rows being samples of the variables). It then outputs the p-values and correlations in matrix forms. Therefore you can get rid of the for loops and produce the correlation and p-value matrices in one go. Even without doing this in one pass, it looks like you are going through all the data twice to form a symmetric matrix; I would have done the second loop as "for j in range(i, n_genes):" then filled out two entries [i,j] and [j,i] in the body of the loop.

Sign up to request clarification or add additional context in comments.

2 Comments

You are totally right. I don't know what I was thinking. In any case, now the process takes 15 seconds.
Hahaha it's often the smallest corrections that save the most time. 11 hours!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.