When is it advisable to use sparse matrices in R?

Question

I have been tinkering with power simulations recently and I have the following code:

library(MASS) library(Matrix) simdat <- data.frame(mmm = rep(rep(factor(1:2, labels=c("m1", "m2")), each = 2), times = 2800), ttt = rep(factor(1:2, labels = c("t1", "t2")), times = 5600), sss = rep(factor(1:70), each = 160), iii = rep(rep(factor(1:40), each = 4), times = 70)) beta <- c(1, 2) X1 <- model.matrix(~ mmm, data = simdat) Z1 <- model.matrix(~ ttt, data = simdat)

X1 and Z1 are 11200x2 matrices. With the help of Stackoverflow I managed to make my calculations a lot more efficient than they were before:

funab <- function(){ ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) Mb <- as.vector(X1 %*% beta) M1 <- rowSums(Z1 * ran_sub[rep(1:70, each = 160),]) M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4), times = 70),]) Mout <- Mb + M1 + M2 Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27) }

Y will then be a vector of length 11200. I then replicate this function a lot (say 1000 times):

sim <- replicate(n = 1000, expr = funab()}, simplify = FALSE)

sim will be a 11200x1000 list. Given that I want to do this a lot more and possibly include more code into funab() I wonder if it is advisable to use sparse matrices for X1 and Z1 in the calculations in funab() as it is now?

Are you familiar with microbenchmark? You can use it to compare performance across functions, aka. benchmarking. Simply, install.packages(c("microbenchmark"), dependencies = TRUE), require(microbenchmark) and example(microbenchmark), you know the drill. I've used microbenchmark in this SO answer. — Eric Fail
– Eric Fail, Commented Aug 3, 2014 at 20:26
If you do run a test it would be interesting to add that to your question. — Eric Fail
– Eric Fail, Commented Aug 5, 2014 at 13:53

lord.garbage · Accepted Answer · 2014-08-06 15:52:23Z

Ok, I've tried to follow an advice given in the comments to my question and ran a test with the microbenchmark package. To make copy and pasting easier I will repeat the code from above:

library(MASS) library(Matrix) simdat <- data.frame(mmm = rep(rep(factor(1:2, labels=c("m1", "m2")), each = 2), times = 2800), ttt = rep(factor(1:2, labels = c("t1", "t2")), times = 5600), sss = rep(factor(1:70), each = 160), iii = rep(rep(factor(1:40), each = 4), times = 70)) beta <- c(1, 2) X1 <- model.matrix(~ mmm, data = simdat) Z1 <- model.matrix(~ ttt, data = simdat)

I now create the same matrices as sparse matrices:

sparseX1 <- sparse.model.matrix(~ mmm, data = simdat) sparseZ1 <- sparse.model.matrix(~ ttt, data = simdat)

I then set up the two functions:

funab_sparse <- function(){ ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) Mb <- as.vector(sparseX1 %*% beta) M1 <- Matrix::rowSums(sparseZ1 * ran_sub[rep(1:70, each = 160),]) M2 <- Matrix::rowSums(sparseZ1 * ran_ite[rep(rep(1:40, each = 4), times = 70),]) Mout <- Mb + M1 + M2 Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27) } funab <- function(){ ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) Mb <- as.vector(X1 %*% beta) M1 <- rowSums(Z1 * ran_sub[rep(1:70, each = 160),]) M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4), times = 70),]) Mout <- Mb + M1 + M2 Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27) } library(microbenchmark) res <- microbenchmark(funab(), funab_sparse(), times = 1000)

and get the results:

> res <- microbenchmark(funab(), funab_sparse(), times = 1000) > res Unit: milliseconds expr min lq median uq max neval funab() 2.200342 2.277006 2.309587 2.481627 69.99895 1000 funab_sparse() 8.419564 8.568157 9.666248 9.874024 75.88907 1000

Assuming that I did not make any substantial mistakes I can conclude that with this particular way of doing the calculations using sparse matrices will not speed up my code.

Collectives™ on Stack Overflow

When is it advisable to use sparse matrices in R?

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related