1

I have following dataframe:

 Date Type Local 0 2015-01-02 B A12 1 2015-01-02 B A12 2 2015-01-02 B B23 3 2015-01-02 B B23 4 2015-01-02 B C4 

I want to keep only those rows which have 'local' value, that appears >100 times in the df.

I have tried:

df = df[(df["local"].isin(df["local"].value_counts() > 100) == True)] 
df = df[(df["local"] == (df["local"].value_counts() > 100)] 
df = df[(df["local"] == (df["local"].value_counts() > 100)) == True] 

And none have worked. What am I missing here?

2 Answers 2

4

Use groupby().transform():

df[df.groupby('local')['local'].transform('size') > 100 ] 

or use index to get the local in value_counts():

counts = df["local"].value_counts() > 100 df[df['local'].isin(counts[counts].index )] 
Sign up to request clarification or add additional context in comments.

3 Comments

Worked! I just dont understand why your second answer works, yet: df = df[df["local"].isin((df["local"].value_counts() > 100).index)] , doesnt!
df["local"].value_counts() > 100 still contains all the local values, just masked with True and False. You want to keep only True values, so counts[counts] in my code removes all the False values.
Got it! Thank you
3

Try:

df[df['Local'].map(df['Local'].value_counts()).gt(100)] 

As an example, see:

res = df[df['Local'].map(df['Local'].value_counts()).gt(1)] print(res) 

Output

 Date Type Local 0 2015-01-02 B A12 1 2015-01-02 B A12 2 2015-01-02 B B23 3 2015-01-02 B B23 

For the above example only those with frequency above 1 are kept.

1 Comment

Worked! Thank you

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.