This is an optimization problem that I'm hoping you creative SO users may have an answer to.
I have a large matrix (5 million x 2) with two values: time and type. In essence, each "type" is its own time series -- the below data represents three different time series (one for A, one for B, and one for C). There are 2000 different "types".
mat time type [1,] 50 A [2,] 50 A [3,] 12 B [4,] 24 B [5,] 80 B [6,] 92 B [7,] 43 C [8,] 69 C What is the most efficient way for me to find the correlation between these 2000 time series? I am currently producing a matrix where there are different bins for each time where an event could have occurred, and I populate that matrix with how many events of each "type" occurred in that time slot. After populating that matrix, I loop over each pair of "type"s and find the correlations. This is extremely inefficient (~5 hours).
My whole problem could be solved if there exists a way to implement a by='type' feature in the cor function of R?
Thanks for any insight.