4

I have been tinkering with power simulations recently and I have the following code:

library(MASS) library(Matrix) simdat <- data.frame(mmm = rep(rep(factor(1:2, labels=c("m1", "m2")), each = 2), times = 2800), ttt = rep(factor(1:2, labels = c("t1", "t2")), times = 5600), sss = rep(factor(1:70), each = 160), iii = rep(rep(factor(1:40), each = 4), times = 70)) beta <- c(1, 2) X1 <- model.matrix(~ mmm, data = simdat) Z1 <- model.matrix(~ ttt, data = simdat) 

X1 and Z1 are 11200x2 matrices. With the help of Stackoverflow I managed to make my calculations a lot more efficient than they were before:

funab <- function(){ ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) Mb <- as.vector(X1 %*% beta) M1 <- rowSums(Z1 * ran_sub[rep(1:70, each = 160),]) M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4), times = 70),]) Mout <- Mb + M1 + M2 Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27) } 

Y will then be a vector of length 11200. I then replicate this function a lot (say 1000 times):

sim <- replicate(n = 1000, expr = funab()}, simplify = FALSE) 

sim will be a 11200x1000 list. Given that I want to do this a lot more and possibly include more code into funab() I wonder if it is advisable to use sparse matrices for X1 and Z1 in the calculations in funab() as it is now?

4
  • Are you familiar with microbenchmark? You can use it to compare performance across functions, aka. benchmarking. Simply, install.packages(c("microbenchmark"), dependencies = TRUE), require(microbenchmark) and example(microbenchmark), you know the drill. I've used microbenchmark in this SO answer. Commented Aug 3, 2014 at 20:26
  • I wasn't until now. :) I will check that out today! Commented Aug 4, 2014 at 7:44
  • If you do run a test it would be interesting to add that to your question. Commented Aug 5, 2014 at 13:53
  • I will do that. I hope I get round to it this weekend! Commented Aug 5, 2014 at 14:24

1 Answer 1

1

Ok, I've tried to follow an advice given in the comments to my question and ran a test with the microbenchmark package. To make copy and pasting easier I will repeat the code from above:

library(MASS) library(Matrix) simdat <- data.frame(mmm = rep(rep(factor(1:2, labels=c("m1", "m2")), each = 2), times = 2800), ttt = rep(factor(1:2, labels = c("t1", "t2")), times = 5600), sss = rep(factor(1:70), each = 160), iii = rep(rep(factor(1:40), each = 4), times = 70)) beta <- c(1, 2) X1 <- model.matrix(~ mmm, data = simdat) Z1 <- model.matrix(~ ttt, data = simdat) 

I now create the same matrices as sparse matrices:

sparseX1 <- sparse.model.matrix(~ mmm, data = simdat) sparseZ1 <- sparse.model.matrix(~ ttt, data = simdat) 

I then set up the two functions:

funab_sparse <- function(){ ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) Mb <- as.vector(sparseX1 %*% beta) M1 <- Matrix::rowSums(sparseZ1 * ran_sub[rep(1:70, each = 160),]) M2 <- Matrix::rowSums(sparseZ1 * ran_ite[rep(rep(1:40, each = 4), times = 70),]) Mout <- Mb + M1 + M2 Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27) } funab <- function(){ ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2)) Mb <- as.vector(X1 %*% beta) M1 <- rowSums(Z1 * ran_sub[rep(1:70, each = 160),]) M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4), times = 70),]) Mout <- Mb + M1 + M2 Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27) } library(microbenchmark) res <- microbenchmark(funab(), funab_sparse(), times = 1000) 

and get the results:

> res <- microbenchmark(funab(), funab_sparse(), times = 1000) > res Unit: milliseconds expr min lq median uq max neval funab() 2.200342 2.277006 2.309587 2.481627 69.99895 1000 funab_sparse() 8.419564 8.568157 9.666248 9.874024 75.88907 1000 

Assuming that I did not make any substantial mistakes I can conclude that with this particular way of doing the calculations using sparse matrices will not speed up my code.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.