4

I have a 1D numpy numpy array with integers, where I want to replace zeros with the previous non-zero value if and only if the next non-zero value is the same.

For example, an array of:

in: x = np.array([1,0,1,1,0,0,2,0,3,0,0,0,3,1,0,1]) out: [1,0,1,1,0,0,2,0,3,0,0,0,3,1,0,1] 

should become

out: [1,1,1,1,0,0,2,0,3,3,3,3,3,1,1,1] 

Is there a vectorized way to do this? I found some way to fill values of zeros here, but not how to do it with exceptions, i.e. to not fill the zeros that are within integers with different value.

5
  • 1
    What do you mean by exceptions? Do you have non-numeric values? Commented Jan 14, 2018 at 16:06
  • I mean the iff statement, to not fill the zeros that are within integers with different value. Commented Jan 14, 2018 at 16:09
  • 1
    Aha, ok, I read "exceptions" in terms of programmatic exceptions, not exceptions to a rule. Sorry. Commented Jan 14, 2018 at 16:13
  • 1
    @BramZijlstra is it always the case that only one zero is between the same elements i.e. [1, 0, 1..., 3, 0, 3, ..., 1,0,1] ? Commented Jan 14, 2018 at 16:31
  • No, more than one zero can occur. Commented Jan 14, 2018 at 16:35

2 Answers 2

4

Here's a vectorized approach taking inspiration from NumPy based forward-filling for the forward-filling part in this solution alongwith masking and slicing -

def forward_fill_ifsame(x): # Get mask of non-zeros and then use it to forward-filled indices mask = x!=0 idx = np.where(mask,np.arange(len(x)),0) np.maximum.accumulate(idx,axis=0, out=idx) # Now we need to work on the additional requirement of filling only # if the previous and next ones being same # Store a copy as we need to work and change input data x1 = x.copy() # Get non-zero elements xm = x1[mask] # Off the selected elements, we need to assign zeros to the previous places # that don't have their correspnding next ones different xm[:-1][xm[1:] != xm[:-1]] = 0 # Assign the valid ones to x1. Invalid ones become zero. x1[mask] = xm # Use idx for indexing to do the forward filling out = x1[idx] # For the invalid ones, keep the previous masked elements out[mask] = x[mask] return out 

Sample runs -

In [289]: x = np.array([1,0,1,1,0,0,2,0,3,0,0,0,3,1,0,1]) In [290]: np.vstack((x, forward_fill_ifsame(x))) Out[290]: array([[1, 0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 3, 1, 0, 1], [1, 1, 1, 1, 0, 0, 2, 0, 3, 3, 3, 3, 3, 1, 1, 1]]) In [291]: x = np.array([1,0,1,1,0,0,2,0,3,0,0,0,1,1,0,1]) In [292]: np.vstack((x, forward_fill_ifsame(x))) Out[292]: array([[1, 0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 1, 1, 0, 1], [1, 1, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 1, 1, 1, 1]]) In [293]: x = np.array([1,0,1,1,0,0,2,0,3,0,0,0,1,1,0,2]) In [294]: np.vstack((x, forward_fill_ifsame(x))) Out[294]: array([[1, 0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 1, 1, 0, 2], [1, 1, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 1, 1, 0, 2]]) 
Sign up to request clarification or add additional context in comments.

2 Comments

@Divakar When I studied Tensor decompositions, one of the tasks was to analyse what terms people commonly used to hyperlink the pages. It was done on a real dataset. And the analysis came out to be not so good. Because, almost always people used terms like "see here", "in this post", "another blog", "at this link" etc., which were not so interesting; So, when hyperlinking, it'd be a good idea to use the question topic instead. viz. "Most efficient way to forward-fill NaN values" which should make the link title informative :) and a bit more nice as well
@Divakar, you always inspire me with your ability to comprehend seemingly complex requirements and finding straight forward, probably most efficient solutions
0
import numpy as np a = np.array([1, 0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 3, 1, 0, 1]) a = np.asarray(a) forwardFillIdx = np.maximum.accumulate(np.where(a != 0, np.arange(len(a)), -1 )) revFillIdx = np.minimum.accumulate(np.where(a[::-1] != 0, np.arange(len(a))[::-1], len(a) ))[::-1] aa = a[forwardFillIdx] bb = a[revFillIdx] ''' aa : [1 1 1 1 1 1 2 2 3 3 3 3 3 1 1 1] bb: [1 1 1 1 2 2 2 3 3 3 3 3 3 1 1 1] ''' shouldFill = (aa == bb) res = np.where(shouldFill,a[forwardFillIdx],a) print(res) ''' [1 1 1 1 0 0 2 0 3 3 3 3 3 1 1 1] ''' 

If you want different functions :

import numpy as np def forward_fill(array): """Fills forward missing values in an array.""" indices = np.where(array != 0, np.arange(len(array)), -1) filled_array = array[np.maximum.accumulate(indices)] return filled_array def backward_fill(array): """Fills backward missing values in an array.""" indices = np.where(array[::-1] != 0, np.arange(len(array))[::-1], len(array)) filled_array = array[np.minimum.accumulate(indices)][::-1] return filled_array def fill_missing_values(array): """Fills missing values in an array with the nearest non-zero value.""" forward_filled = forward_fill(array) backward_filled = backward_fill(array) should_fill = forward_filled == backward_filled filled_array = np.where(should_fill, forward_filled, array) return filled_array a = np.array([1, 0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 3, 1, 0, 1]) result = fill_missing_values(a) print(result) # Output: [1 1 1 1 0 0 2 2 3 3 3 3 3 1 1 1] 

FUN :

import numpy as np a = np.array([1, 0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 3, 1, 0, 1]) # Create a mask where values between 1s are set to 1 mask = np.maximum.accumulate(a == 1) & np.maximum.accumulate((a == 1)[::-1])[::-1] # XOR with the original array to retain values that are not between 1s res = mask ^ (a != 0) print(res.astype(int)) ''' [1 1 1 1 1 1 2 2 3 3 3 3 3 1 1 1] ''' mask1 = np.maximum.accumulate(a) & np.maximum.accumulate(a[::-1])[::-1] print(mask1) res1 = mask1 ^ (a != 0 ) print(res1) ''' [0 1 0 0 1 1 3 2 2 3 3 3 2 0 1 0] ''' 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.