10

I have a sparse matrix. I need to sort this matrix row-by-row and create another [sparse] matrix. Code may explain it better:

# for `rand` function, you need newer version of scipy. from scipy.sparse import * m = rand(6,6, density=0.6) d = m.getrow(0) print d 

Output1

(0, 5) 0.874881629788 (0, 4) 0.352559852239 (0, 2) 0.504791645463 (0, 1) 0.885898140175 

I have this m matrix. I want to create a new matrix with sorted version of m. The new matrix contains 0'th row like this.

new_d = new_m.getrow(0) print new_d 

Output2

(0, 1) 0.885898140175 (0, 5) 0.874881629788 (0, 2) 0.504791645463 (0, 4) 0.352559852239 

So I can obtain which column is bigger etc:

print new_d.indices 

Output3

array([1, 5, 2, 4]) 

Of course every row should be sorted like above independently.

I have one solution for this problem but it is not elegant.

6
  • So is the case that you are wanting to sort a sparse matrix row-by-row? Commented Apr 4, 2012 at 9:17
  • Yes. Sorry I explain this simple thing like an atomic bomb. Commented Apr 4, 2012 at 9:20
  • How is output2 the sorted version of output1? It seems you're forgetting the element at index 0... Commented Apr 4, 2012 at 12:11
  • @larsmans I didn't understand you. Are you asking (0, 0)? If yes, this cell is 0 and sparse matrix doesn't bother to save it. Commented Apr 4, 2012 at 13:42
  • @Thorn: yes, that's what I meant. So when you sort the entries, you're not interested in the zeros? Commented Apr 4, 2012 at 13:49

2 Answers 2

8

If you're willing to ignore the zero-value elements of the matrix, the code below should work. It is also much faster than implementations that use the getrow method, which is rather slow.

def sort_coo(m): tuples = zip(m.row, m.col, m.data) return sorted(tuples, key=lambda x: (x[0], x[2])) 

For example:

 >>> from numpy.random import rand >>> from scipy.sparse import coo_matrix >>> >>> d = rand(10, 20) >>> d[d > .05] = 0 >>> s = coo_matrix(d) >>> sort_coo(s) [(0, 2, 0.004775589084940246), (3, 12, 0.029941507166614145), (5, 19, 0.015030386789436245), (7, 0, 0.0075044957259399192), (8, 3, 0.047994403933129481), (8, 5, 0.049401058471327031), (9, 15, 0.040011608000125043), (9, 8, 0.048541825332137023)] 

Depending on your needs you may want to tweak the sort keys in the lambda or further process the output. If you want everything in a row indexed dictionary you could do:

from collections import defaultdict sorted_rows = defaultdict(list) for i in sort_coo(m): sorted_rows[i[0]].append((i[1], i[2])) 
Sign up to request clarification or add additional context in comments.

Comments

2

My bad solution is like this:

from scipy.sparse import coo_matrix import numpy as np a = [] for i in xrange(m.shape[0]): # assume m is square matrix. d = m.getrow(i) n = len(d.indices) s = zip([i]*n, d.indices, d.data) sorted_s = sorted(s, key=lambda v: v[2], reverse=True) a.extend(sorted_s) a = np.array(a) new_m = coo_matrix((a[:,2], (a[:,0], a[:,1])), m.shape) 

There can be some simple mistakes above because I have not checked it yet. But the idea is intuitive, I guess. Is there any good solution?

Edit

This new matrix creation may be useless because if you call getrow method then the order is broken again. Only coo_matrix.col keeps the order.

Another Solution

This one is not exact solution but it may be helpful:

def sortSparseMatrix(m, rev=True, only_indices=True): """ Sort a sparse matrix and return column index dictionary """ col_dict = dict() for i in xrange(m.shape[0]): # assume m is square matrix. d = m.getrow(i) s = zip(d.indices, d.data) sorted_s = sorted(s, key=lambda v: v[1], reverse=True) if only_indices: col_dict[i] = [element[0] for element in sorted_s] else: col_dict[i] = sorted_s return col_dict 

>>> print sortSparseMatrix(m) {0: [5, 1, 0], 1: [1, 3, 5], 2: [1, 2, 3, 4], 3: [1, 5, 2, 4], 4: [0, 3, 5, 1], 5: [3, 4, 2]} 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.