Get row count in DataFrame without for loop

Question

I need to find if the last value of dataframe['position'] is different from 0, then count the previous (so in reverse) values until them changes and store the counted index before the change in a variable, this without for loops. By loc or iloc for example...

dataframe: | position | 0 1 1 0 2 1 <4 3 1 <3 4 1 <2 5 1 <1 count = 4

I achieved this by a for loop, but I need to avoid it:

count = 1 if data['position'].iloc[-1] != 0: for i in data['position']: if data['position'].iloc[-count] == data['position'].iloc[-1]: count = count + 1 else: break if data['position'].iloc[-count] != data['position'].iloc[-1]: count = count - 1

df['col1'][::-1].cumprod().sum()

Scott Boston
– Scott Boston

2022-09-29 02:14:12 +00:00
Commented Sep 29, 2022 at 2:14 — Scott Boston
– Scott Boston, Commented Sep 29, 2022 at 2:14

mozway · Accepted Answer · 2022-09-29 03:01:12Z

You can reverse your Series, convert to boolean using the target condition (here "not equal 0" with ne), and apply a cummin to propagate the False upwards and sum to count the trailing True:

count = df.loc[::-1, 'position'].ne(0).cummin().sum()

Output: 4

If you have multiple columns:

counts = df.loc[::-1].ne(0).cummin().sum()

alternative

A slightly faster alternative (~25% faster), but relying on the assumptions that you have at least one zero and non duplicated indices could be to find the last zero and use indexing

m = df['position'].eq(0) count = len(df.loc[m[m].index[-1]:])-1

Without the requirement to have at least one zero:

m = df['position'].eq(0) m = m[m] count = len(df) if m.empty else len(df.loc[m.index[-1]:])-1

Michael Delgado · Accepted Answer · 2022-09-29 00:45:42Z

This should do the trick:

((data.iloc[-1] != 0) & (data[::-1] == data.iloc[-1])).cumprod().sum()

This builds a condition ((data.iloc[-1] != 0) & (data[::-1] == data.iloc[-1])) indicating whether the value in each row (counting backwards from the end) is nonzero and equals the last value. Then, the values are coerced into 0 or 1 and the cumulative product is taken, so that the first non-matching zero will break the sequence and all subsequent values will be zero. Then the flags are summed to get the count of these consecutive matched values.

Depending on your data, though, stepping iteratively backwards from the end may be faster. This solution is vectorized, but it requires working with the entire column of data and doing several computations which are the same size as the original series.

Example:

In [12]: data = pd.DataFrame(np.random.randint(0, 3, size=(10, 5)), columns=list('ABCDE')) ...: data Out[12]: A B C D E 0 2 0 1 2 0 1 1 0 1 2 1 2 2 1 2 1 0 3 1 0 1 2 2 4 1 1 0 0 2 5 2 2 1 0 2 6 2 1 1 2 2 7 0 0 0 1 0 8 2 2 0 0 1 9 2 0 0 2 1 In [13]: ((data.iloc[-1] != 0) & (data[::-1] == data.iloc[-1])).cumprod().sum() Out[13]: A 2 B 0 C 0 D 1 E 2 dtype: int64

Collectives™ on Stack Overflow

Get row count in DataFrame without for loop

2 Answers 2

alternative

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

alternative

Comments

Comments

Related