0

I have a time sorted pandas dataframe, where the columns are dates with boolean values as the rows as to whether a person was present at that date. If they are, I want to persist that 'present' to all the following columns (the columns are sorted chronologically).

I've reduced the problem to a simpler numpy problem. Say I have ndarray:

ndarr = np.array([[0.0, 0.0, 1.0], [1.0, 0.0, 0.0], [1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 1.0, 0.0]]) array([[ 0., 0., 1.], [ 1., 0., 0.], [ 1., 0., 0.], [ 0., 1., 0.], [ 0., 1., 0.]]) 

How do I make it so that if a one appears in one column, it's persisted to the right?

My current solution iterates over the columns in python and I'm wondering whether there is a more elegant solution.

Current solution:

nd_store = np.ones(ndarr.shape[0]) for i in reversed(range(ndarr.shape[1])): tmp = np.copy(ndarr[:,i]) ndarr[:,i] = nd_store nd_store = (tmp != nd_store) * 1.0 array([[ 0., 0., 1.], [ 1., 1., 1.], [ 1., 1., 1.], [ 0., 1., 1.], [ 0., 1., 1.]]) 

3 Answers 3

2

Use np.logical_or.accumulate. .accumulate() basically makes any ufunc work like cumsum() for addition or cumproduct() for multiplication.

nd_store = np.logical_or.accumulate(ndarr, axis = 1).astype(ndarr.dtype) nd_store Out[]: array([[0., 0., 1.], [1., 1., 1.], [1., 1., 1.], [0., 1., 1.], [0., 1., 1.]]) 
Sign up to request clarification or add additional context in comments.

1 Comment

That's real nice! Thanks!!
1

You can iterate over the rows like so:

for r, c in zip(*np.where(ndarr==1)): ndarr[r,c:] = 1 

1 Comment

Too slow. I have millions of rows. Probably should have mentioned that in the OP
1

I would do this :

df[df==0]=np.NaN df.ffill(axis=1).fillna(0.0) 

1 Comment

posted the same answer as you :) deleted mine

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.