1

Consider the following dataframe...

 month term 0 Jun-22 one 1 Jun-22 one two 2 Jul-22 one 3 Jul-22 three 4 Jul-22 three four 5 Jul-22 three four five 

I would like to add a column that is the count of the rows that contain the 'term' for the corresponding month excluding the row itself.

This is the desired output...

 month term term_count 0 Jun-22 one 1 1 Jun-22 one two 0 2 Jul-22 one 0 3 Jul-22 three 2 4 Jul-22 three four 1 5 Jul-22 three four five 0 

So, for row 0, the term 'one' exists in one other row for Jun-22 so that term_count is 1.

For row 1, no other row in Jun-22 contains 'one two', so term_count is 0.

For row 2, no other row in Jul-22 contains 'one', so term_count is 0.

For row 3, two other rows in Jul-22 contain 'three', so term_count is 2.

etc

What is the best way to achieve this using pandas please?

Thank you.

1 Answer 1

2

Use custom lambda function for test if exist values per groups in GroupBy.transform, for avoid count per row is subtract 1:

f = lambda x: [sum(z in w for w in x.tolist()[i:]) for i, z in enumerate(x)] df['term_count'] = df.groupby('month')['term'].transform(f).sub(1) print (df) month term term_count 0 Jun-22 one 1 1 Jun-22 one two 0 2 Jul-22 one 0 3 Jul-22 three 2 4 Jul-22 three four 1 5 Jul-22 three four five 0 
Sign up to request clarification or add additional context in comments.

2 Comments

What is the purpose of x.tolist()[i:]? Won't this ignore previous rows? I think this will give me what I want if I just change this to x.tolist() as it is possible that rows before may contain the text. Thanks.
@Adam - I understand question need test ony rows after, not also previous. If need laso previous, use f = lambda x: [sum(z in w for w in x) for z in x]

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.