6

In R, I'm trying to work with a large matrix (39,146,166 rows by 127 columns) and I'm having memory issues with a number of operations on it. I've determined that about 35% of the entries in the matrix are non-zero, and the remainder are all zeros. Is this sparse enough that I would save some memory representing this matrix using one of R's sparse matrix classes? What is a good rule of thumb for determining when a matrix is worth representing sparsely?

1
  • 2
    You may find this article helpful. Commented Apr 4, 2016 at 23:56

1 Answer 1

3

I don't think the sparse representation will be that much more compact. You need three numbers for each numeric item other than an implicit zero. So even if two of those are 4 byte integers the space in memory will still be larger than a "serial" storage strategy.

By this reasoning anything above 50% will take more storage space, but I'm posting from an iPhone under SF Bay so cannot test with 'object.size'.

Sign up to request clarification or add additional context in comments.

4 Comments

There are a number of sparse matrix formats, and not all of them require 3 numbers per nonzero entry. For example, this format requires about 2 for my case: netlib.org/linalg/html_templates/node92.html
@RyanThompson: That format requires 3 vectors, not two
Only two of those vectors have an entry for each data point. The last vector only has one element for each column, which is negligible in my case.
And if that were any of the representations in the R Matrix package then you could use it as a basis for estimation. But as far as I can tell neither the T-matrix nor the C-matrix versions use such a method.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.