5

I have a square matrix with dimension ranging from 100x100 to 10,000x10,000. The matrix represents parameter values for a function. I go through a loop where I try various combinations of parameters. Every iteration W has a different value so I have to update the locations in the matrix which have the value W. This happens to be the even entries of the diagonal, so [2,2], [4,4], etc. I am curious if there is a more efficient way to do this than my current approach:

W<-treeDepth*newVar for (iW in evenDiag) { matrixSource[iW,iW]<-W } 

So far I've only tested with 134x134 size matrices, but looping appears to be the fastest approach according to analysis with profvis.

When I try

diag(matrixSource)[evenDiag]<-W 

It appears to take similar amounts of time, but it starts calling < GC> after an average of every 5 calls or so, but the actual times < GC> is called appear to be random.

I think < GC> is garbage collection, but whatever it is, it takes forever, and it's rarely if ever called when I have the loop version above.

Am I wrong to think there's a better approach than looping one by one. Would writing the loop in Rcpp make it faster? Hadley in "Advanced R" doesn't say that writing values into a matrix would be faster with Rcpp so it probably doesn't. If it did, how would I change that one little line(s) to Rcpp without making a C++ function or anything complicated (I don't know any C++).

According to my research it isn't possible to just write something like

matrixSource[evenDiag,evenDiag]<-W 

but that if it were, R would excel at vectorized processing.

What is the best approach for this?

If it's helpful, the context is that the matrix needs to be fed into

negLogLik<- -mvtnorm::dmvnorm(flattenedData,sigma=matrixSource,log=T,checkSymmetry = FALSE) 

and internally to that function it's fed into chol() (which sometimes calls < GC> in parallel)

So if there's a way I can modify that function to work with only partial matrices that aren't fully created so that maybe I only have to assign the value of W once, that'd also be good.

I also need a way to assign the diagonal directly above and below the main diagonal all the same value, Z. Is there a good way to do that?

Thank you <3

2
  • 1
    matrixSource[cbind(iW,iW)]<-W should do what you want, can't speak to efficiency offhand. Commented Jan 14, 2024 at 8:04
  • you could also try regular vector indexing, e.g. matrixSource[(iW - 1)*nrow(matrixSource) + iW] <- W (might be off-by-one), or changing to sparse matrix representation of appropriate. Commented Jan 14, 2024 at 8:15

1 Answer 1

3

I have not found memory problems with square matrices up to dimension 1000.
@MichaelChirico's suggestion seems to be the most efficient. It runs in time constant with the matrix dimension, unlike diag<-.
The two methods used below are

# create an index i <- rep(c(FALSE, TRUE), ncol(A) %/% 2) j <- which(i) # can also be diag(A)[i] diag(A)[j] <- W # cannot be cbind(i, i) A[cbind(j, j)] <- 0 

cbind is clearly more efficient.

library(microbenchmark) library(ggplot2) testFun <- function(N) { out <- lapply(seq.int(N)[-1L], \(n) { A <- matrix(1:n^2, n, n) W <- rep(0L, ncol(A) %/% 2L) i <- rep(c(FALSE, TRUE), ncol(A) %/% 2) j <- which(i) mb <- microbenchmark( diag = {diag(A)[i] <- W}, cbind = {A[cbind(j, j)] <- W} ) mb$dim <- n mb }) out <- do.call(rbind, out) aggregate(time ~ ., out, median) } testData <- testFun(1000) ggplot(testData, aes(dim, time, colour = expr)) + geom_line() + geom_point() + theme_bw() 

Created on 2024-01-14 with reprex v2.0.2


Edit

Another way of assigning the diagonal elements of a square matrix is proposed by user20650 in comment. Change the timimg instruction call above to

mb <- microbenchmark( diag = {diag(A)[i] <- W}, cbind = {A[cbind(j, j)] <- W}, sq = {A[j*j] <- W} ) 

and the plot (without diag) becomes

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

wow. this is amazing. thanks! I only run cbind once since the entries that take the value W are always the same. But it does appear that running cbind once and then executing matrixSource[evenDiagBinded]<-W each iteration is a little faster than running for (iW in evenDiag) { matrixSource[iW,iW]<-W } each iteration after compilation (not sure about uncompiled). Probably the matrixSource[evenDiagBinded]<-W syntax gives the most efficient code for changing the matrix values?
@AFriendlyFish If you run cbind to see its code you will see .Internal(cbind(deparse.level, ...)). cbind is written in optimized C and therefore I was expecting it to be efficient. Even difficult to improve upon. But the indexing has code to handle non-square matrices so it might be further optimized, I guess.
as updating the diagonal of a square matrix could use sq = {A[j*j] <- W} which I'd think would be faster (only tested on small matrices)
@user20650 You are right, I have tested it on matrices with dim = 2:1000 and it's faster.
Thanks @RuiBarradas. Interesting that something seems to happen around 600 dimension.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.