6
$\begingroup$

I have a sparse matrix, and I am interested in find k largest elements and their positions in the matrix. I do it in the following manner:

matrix=SparseArray[Automatic, {4, 4}, 0, { 1, {{0, 2, 5, 8, 10}, {{2}, {3}, {1}, {3}, {4}, {1}, {2}, {4}, {2}, { 3}}}, {1.6608834828359216`, 1.3176250784021715`, 1.6608834828359216`, 3.8979590937707167`, 2.058499550409593, 1.3176250784021715`, 3.8979590937707167`, 1.0569052092863416`, 2.058499550409593, 1.0569052092863416`}}]; k = 2; TakeLargest[Association@Most[ArrayRules[matrix]], k] 

I look for a more efficient way to do it, any suggestion on how to speed up this calculation in huge matrices are welcome.

$\endgroup$
2
  • $\begingroup$ @kglr I did not mean to diss you away. Your methods are still viable for dense matrices. $\endgroup$ Commented Jul 15, 2018 at 14:19
  • $\begingroup$ @HenrikSchumacher, not to worry:) I deleted it because the question is explicitly about sparse arrays. $\endgroup$ Commented Jul 15, 2018 at 14:24

1 Answer 1

10
$\begingroup$

For sparse matrices, the key is to perform the search only on the list of nonzero values.

With[{idx = Ordering[matrix["NonzeroValues"], -k]}, AssociationThread[ matrix["NonzeroPositions"][[idx]], matrix["NonzeroValues"][[idx]] ] ] 

Beware that this completely neglects the zero valued entries.

Remark

This will take postive effect only for really large and sparse matrices. For example, the following matrix would need 745 GB(!) of memory if it was stored (and sorted) as a dense array.

n = 100000; m = 600000; A = SparseArray[RandomInteger[{1, n}, {m, 2}] -> RandomReal[{0, 1}, m], {n, n}]; k = 100; 

But still, finding the 100 largest nonzero elements needs about an 8th part of a second.

a = With[{idx = Ordering[A["NonzeroValues"], -k]}, AssociationThread[A["NonzeroPositions"][[idx]], A["NonzeroValues"][[idx]]] ]; // RepeatedTiming 

0.124

Edit

Incorporating Carl's and kglr's suggestion to use Nearest like in this post, we can get even faster:

a = With[{idx = Nearest[A["NonzeroValues"] -> Automatic, Max[A], k]}, AssociationThread[ A["NonzeroPositions"][[idx]], A["NonzeroValues"][[idx]] ] ]; // RepeatedTiming // First 

0.0051

Note that this returns the largest value first. Of course, we can Reverse the output of Nearest, if needed.

$\endgroup$
4
  • 2
    $\begingroup$ Where are these properties documented as I don't see them in the SparseArray documentation. $\endgroup$ Commented Jul 15, 2018 at 11:53
  • 1
    $\begingroup$ @Edmund Sadly, they are documented nowhere. The more important are SE posts like this and this. I learnt these things solely from this site. What a shame! $\endgroup$ Commented Jul 15, 2018 at 12:46
  • 2
    $\begingroup$ Ordering[matrix["NonzeroValues"], -k] with k greater than 1 will sort the data, and so it is quite slow. You could instead use Nearest[matrix["NonzeroValues"]->"Index", Max[data], k] as in my answer to (154944), which will be much faster when the nonzero length is large. $\endgroup$ Commented Jul 15, 2018 at 12:53
  • 1
    $\begingroup$ Carl, very good point! Wouldn't that suggest the reimplementation of the two-argument version of Ordering, at least for vectors? $\endgroup$ Commented Jul 15, 2018 at 13:06

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.