2

I'm reading the book Introduction Machine Learning to Python and this code is on page 69. I can't understand that why I can get a result of this.

X = np.array([[0, 1, 0, 1], [1, 0, 1, 1], [0, 0, 0, 1], [1, 0, 1, 0]]) y = np.array([0, 1, 0, 1]) for label in np.unique(y): print(X[y == label]) 

result:

[[0 1 0 1] [0 0 0 1]] [[1 0 1 1] [1 0 1 0]] 
1
  • what were you expecting to get? Commented Jul 3, 2021 at 1:53

1 Answer 1

3

Let's break this down piece by piece.

  1. np.unique(y)

    np.unique returns all of the unique elements of the given array. In this case, that's [0, 1] (since there are only two unique elements in y, 0 and 1).

    So running

    for label in np.unique(y): 

    will iterate twice. The first time label will equal 0, and the second time it will equal 1. You can inspect this for yourself by running

    for label in np.unique(y): print(label) 
  2. y == label

    If you run a comparison against an array, numpy will return a boolean array of the same size. So on the first iteration of the for loop, it'll be running y == 0, which gives us [ True False True False], since the first and third element in y are 1. Then on the second iteration, it'll be running y == 1, giving us the inverse, [False True False True].

    You can inspect this for yourself by running:

    for label in np.unique(y): print(y == label) 
  3. X[y == label]

    Now that we know what the rest of the logic is doing, the last piece is determining what happens when passing the y == label as a "selector" for elements in X. When you pass a boolean array as a selector, you're telling numpy which elements of the original array you want to return. In this case, X is a 2-dimensional array (size 4x4), and if we pass an array of length 4, we're telling numpy which rows we want to select.

    As mentioned above, on the first iteration y == label is [ True False True False], so we're saying we want the first and third rows of X. So print(X[y == label]) gives

    [[0 1 0 1] [0 0 0 1]] 

    And then on the second iteration, per the above, y == label will be selecting the second and fourth rows, giving:

    [[1 0 1 1] [1 0 1 0]] 

Hope that clarifies!

Sign up to request clarification or add additional context in comments.

3 Comments

Good explanation!
#3 X[y == label] With X[[True, False, True, False]] how you can obtain the first and third rows of X. You can explain?
The array [True, False, True, False] is choosing which rows of X based on which indexes are True. Since the first and third elements of the array are True (and others are False), you're telling numpy you want to select the first and third rows of X. So for example, if you instead passed in [False, False, True, False], you'd be selecting only the third row of X, since only the third index is True.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.