4

I spotted an unexpected behavior in scipy.sparse.csr_matrix, which seems a bug to me. Can anyone confirm that this is not normal? I am not an expert in sparse structures so I may be misunderstanding proper usage.

>>> import scipy.sparse >>> a=scipy.sparse.csr_matrix((1,1)) >>> b=scipy.sparse.csr_matrix((1,1)) >>> b[0,0]=1 /home/marco/anaconda3/envs/py35/lib/python3.5/site-packages/scipy/sparse/compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. SparseEfficiencyWarning) >>> a/b matrix([[ nan]]) 

On the other hand, numpy properly handles this:

>>> import numpy as np >>> a=np.zeros((1,1)) >>> b=np.ones((1,1)) >>> a/b array([[ 0.]]) 

Thanks

4
  • 1
    Did you try with (a/b).toarray() ? Commented Jul 22, 2016 at 1:13
  • Looks like a bug to me. Commented Jul 22, 2016 at 1:20
  • (a/b).tolist() returns [[nan]]. a/b is of type matrix, so there is no toarray or todense. Commented Jul 22, 2016 at 1:24
  • 3
    Submitted a bug report: github.com/scipy/scipy/issues/6401 Commented Jul 22, 2016 at 1:30

1 Answer 1

2

For sparse matrix/sparse matrix, the

scipy/sparse/compressed.py

 if np.issubdtype(r.dtype, np.inexact): # Eldiv leaves entries outside the combined sparsity # pattern empty, so they must be filled manually. They are # always nan, so that the matrix is completely full. out = np.empty(self.shape, dtype=self.dtype) out.fill(np.nan) r = r.tocoo() out[r.row, r.col] = r.data out = np.matrix(out) 

the action is explained in this section.

Try this with slightly larger matrices

In [69]: a=sparse.csr_matrix([[1.,0],[0,1]]) In [70]: b=sparse.csr_matrix([[1.,1],[0,1]]) In [72]: (a/b) Out[72]: matrix([[ 1., nan], [ nan, 1.]]) 

So where ever a has 0s (no sparse values), the division is nan. It's returning a dense matrix, and filling in nan.

Without this code, the sparse element by element division produces a sparse matrix with those 'empty' off diagonal slots.

In [73]: a._binopt(b,'_eldiv_') Out[73]: <2x2 sparse matrix of type '<class 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format> In [74]: a._binopt(b,'_eldiv_').A Out[74]: array([[ 1., 0.], [ 0., 1.]]) 

The inverse might be instructive

In [76]: b/a Out[76]: matrix([[ 1., inf], [ nan, 1.]]) In [77]: b._binopt(a,'_eldiv_').A Out[77]: array([[ 1., inf], [ 0., 1.]]) 

It looks like the combined sparsity pattern is determined by the numerator. In further test is looks like this after eliminate_zeros.

In [138]: a1=sparse.csr_matrix(np.ones((2,2))) In [139]: a1 Out[139]: <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format> In [140]: a1[0,1]=0 In [141]: a1 Out[141]: <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format> In [142]: a1/b Out[142]: matrix([[ 1., nan], [ inf, 1.]]) 
Sign up to request clarification or add additional context in comments.

1 Comment

Yep, that's the cause of the bug. I've submitted a fix here: github.com/scipy/scipy/pull/6405

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.