2

I have a large matrix (as class Matrix) in R. It is sparse (only containing 01).

What I do is (if M is the Matrix)

j<-list() for(i in 1:dim(M)[1]){ which(M[i,]==1)->j[[i]] } 

this is normally fast but on such a large matrix(dim 1.7 Mil to 5000) it is very slow. I just cant believe that there is no faster way to obtain the indices of those cols that are 1 in every row....

1
  • Having a "Matrix" you could use summary(M) and, then, split($j, $i) or, probably more efficient, split(rep(seq_len(ncol(M)), diff(M@p)), M@i + 1L) Commented Apr 26, 2016 at 12:29

3 Answers 3

2

I would rather opt for a vectorized approach,and use split instead of these apply/lapply family functions:

M = matrix(c(1,1,0,0,1,0,1,1,1,1,1,1), 4) with(data.frame(which(!!M, arr.ind=T)), split(col, row)) #$`1` #[1] 1 2 3 #$`2` #[1] 1 3 #$`3` #[1] 2 3 #$`4` #[1] 2 3 
Sign up to request clarification or add additional context in comments.

4 Comments

any chance to circumvent the Error in which(!(!M), arr.ind = T) : error in evaluating the argument 'x' in selecting a method for function 'which': Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105 error here? and still make it fast? It is not a normal matrix but a Matrix
you posted which(!(!M), arr.ind = T) whereas it is simply which(!!M, arr.ind = T)
so I get with(data.frame(which(!!M, arr.ind=T)), split(col, row))->j Error in which(!(!M), arr.ind = T) : error in evaluating the argument 'x' in selecting a method for function 'which': Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
if you change which(!!M, arr.ind = T) to which(M==1, arr.ind = T) ? it is certainly due to the sparse matrix format, you can try converting it of a matrix if it does not take too much time ...otherwise I will look at it later but there is certainly a way to handle which with sparse matrix.
2

Using the example by @zx8754

M <- matrix(c(1,1,0,0,1,0,1,1,1,1,1,1), 4) 

we can define an auxiliary matrix that contains the row and column indices of the entries equal to 1:

oneMat <- which(M==1, arr.ind=TRUE) 

From this auxiliary matrix we can create a list that contains the column numbers that are equal to one in each row with

oneList <- lapply(1:nrow(M), function(x) oneMat[oneMat[,1] == x, 2]) #[[1]] #[1] 1 2 3 # #[[2]] #[1] 1 3 # #[[3]] #[1] 2 3 # #[[4]] #[1] 2 3 

If the matrix M is large and sparse, the matrix oneMat should be much smaller than M. In that case I think that the lapply() loop used in the second step could lead to a speedup with respect to the for loop described in the OP.


After some tests, I regretfully have to admit that this answer is particularly slow. The solution by @ColonelBeauvel is the winner:

j <- list() set.seed(123) M <- matrix(rbinom(1e5,1,0.01),ncol=100) library(microbenchmark) f_which_and_lappy <- function(x) {oneMat <- which(x==1, arr.ind=TRUE); lapply(1:nrow(x), function(i) oneMat[oneMat[,1] == i, 2])} f_only_apply <- function(x) {apply(x, 1, function(i) which(i == 1))} f_with_data.frame <- function(x) {with(data.frame(which(!!x, arr.ind=T)), split(col, row))} f_OP <- function(x) {for(i in 1:dim(x)[1]){which(x[i,]==1)->j[[i]]}} res <- microbenchmark( f_which_and_lappy(M), f_only_apply(M), f_with_data.frame(M), f_OP(M),times=1000L) #> res #Unit: microseconds # expr min lq mean median uq max neval cld # f_which_and_lappy(M) 11063.170 11254.032 12090.9506 11351.1830 11570.662 31313.48 1000 d # f_only_apply(M) 3204.572 3359.410 4117.4971 3456.3960 3610.945 25352.35 1000 b # f_with_data.frame(M) 739.556 811.906 912.4726 918.0315 946.700 18623.77 1000 a # f_OP(M) 5642.639 5854.751 6955.9980 5969.3685 6151.209 148847.22 1000 c 

7 Comments

is it possible to benchamrk - it really still takes quite a bit of time - is there soem way to print the index at whcih it currently is?
To display the current row, you might try oneList <- lapply(1:nrow(M), function(x){cat("x=",x,"\r"); oneMat[oneMat[,1] == x, 2]})
hi that is even slower than the solution that I proposed above. I really dont know why but it does 1 row per 20 sec or so ... is there no chance to get it a bit faster?
The output with cat slows it down. However, I'm not sure that the solution is inherently faster. I'll try to make benchmark tests.
thanks a lot...but it is just really slow - it is definitely because it is not a normal matrix but a Matrix - as in a sparse matrix
|
1

Edit after comments:

apply(M, 1, function(i) which(i == 1)) # [[1]] # [1] 1 2 3 # # [[2]] # [1] 1 3 # # [[3]] # [1] 2 3 # # [[4]] # [1] 2 3 

Try this example:

#data M <- matrix(c(1,1,0,0,1,0,1,1,1,1,1,1), 4) # [,1] [,2] [,3] # [1,] 1 1 1 # [2,] 1 0 1 # [3,] 0 1 1 # [4,] 0 1 1 # index of rows with all ones which(rowSums(M == 1) == ncol(M)) # [1] 1 # index of cols with all ones which(colSums(M == 1) == nrow(M)) # [1] 3 

8 Comments

hi - sorry it is not about finding the row will all 1- it is about finding the 1 in a specific row - there are no rows in which all values are 1 - i just need the index where they are 1
Then just try which(M == 1).
in your example , I need for row 2 the results 1 and 3, for row 3, 2 and 3 for row 4 2 and 3
but that provides a Matrix back - which I then need to parse again - right?
@kutyw Maybe: apply(M, 1, function(i) which(i == 1))?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.