1

I have a problem I don't know how to describe it so you understand. I am going to give an example. Let's say we have this array (B) in python:

[[ 1 1] [ 7 11] [1 20] [20 1] [26 11] [31 11]] 

The first column represents the users. The second the tags. Now, I want to create a matrix who will have "1s" where edges exist otherwise "0s". We have 5 and 4 different users and tags respectevily, that is a 6*5 matrix.. If I write:

zero = np.zeros((6,5,).astype(int) #it needs one more row and column for line in B: if line[2]: zero[line[0],line[1]] = 1 

the error is:

 zero[line[0],line[1]] = 1 

IndexError: index 7 is out of bounds for axis 0 with size 7

Ok, how can I make the combination between two matrices because I want the element "31" to be the fifth row and element "11" the fourth column.

6
  • 2
    Can you please show the desired output in matrix format ? Commented Dec 22, 2016 at 14:07
  • Why is the matrix 6*5? Commented Dec 22, 2016 at 14:12
  • You only have 3 tags Commented Dec 22, 2016 at 14:13
  • Yes you are right! Three tags. I made a typo mistake. Commented Dec 22, 2016 at 14:17
  • @angelk Please accept an answer if your problem is solved by the answer. This will help other what worked best. Commented Dec 24, 2016 at 23:51

2 Answers 2

3

Use pandas and numpy

>>>import numpy as np >>>import pandas as pd >>> tagsArray = np.unique([1,11,20,1,11,11]) >>> userArray = np.unique([1,7,20,26,31]) >>> aa = [[ 1,1],[ 7, 11],[1, 20],[20, 1],[26, 11],[31, 11]] >>> df = pd.DataFrame(index=userArray,columns=tagsArray) >>> for s in aa: ... df.loc[s[0],s[1]] = 1 ... >>> df.fillna(0,inplace=True) >>> df 1 11 20 1 1 NaN 1 7 NaN 1 NaN 20 1 NaN NaN 26 NaN 1 NaN 31 NaN 1 NaN 
Sign up to request clarification or add additional context in comments.

1 Comment

And df.fillna(0,inplace=True).
0

Staying close to your initial attempt, listed below is a NumPy based approach. We can use np.unique(..,return_inverse=1) for those two columns to give us unique IDs that could be used as row and column indices respectively for indexing into the output. Thereafter, we would simply initialize the output array and index into it to give us the desired result.

Thus, an implementation would be -

r,c = [np.unique(i,return_inverse=1)[1] for i in B.T] out = np.zeros((r.max()+1,c.max()+1),dtype=int) out[r,c] = 1 

Alternatively, a more explicit way to get r and c would be like so -

r = np.unique(B[:,0],return_inverse=1)[1] c = np.unique(B[:,1],return_inverse=1)[1] 

Sample input, output -

In [27]: B # Input array Out[27]: array([[ 1, 1], [ 7, 11], [ 1, 20], [20, 1], [26, 11], [31, 11]]) In [28]: out # Output Out[28]: array([[1, 0, 1], [0, 1, 0], [1, 0, 0], r = np.unique(B[:,0],return_inverse=1)[1] c = np.unique(B[:,1],return_inverse=1)[1] [0, 1, 0], [0, 1, 0]]) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.