4

I am trying to use df['column_name'].str.count("+") in python pandas, but I receive

"error: nothing to repeat"

. With the regular characters the method works, e.g. df['column_name'].str.count("a") works fine.

Also, there is a problem with the "^"-sign. If I use df['column_name'].str.contains("^") the result is incorrect - it looks like "^" gets interpreted as " " (empty space).

Surprisingly, if I use .count("+") and .contains("^") on a regular, non-pandas string they work perfectly fine.

simple working example:

df = pd.DataFrame({'column1': ['Nighthawks+', 'Dragoons'], 'column2': ['1st', '2nd']}, columns = ['column1', 'column2']) 

When applying df["column1"].str.contains("^") one gets "True, True" but is should be "False, False".

And when applying df["column1"].str.count("+") one gets

"error: nothing to repeat"

But then, outside of panda, "bla++".count("+") gives correctly the result "2".

Any solutions? Thanks

0

2 Answers 2

7

You need to escape the plus sign:

In[10]: df = pd.DataFrame({'a':['dsa^', '^++', '+++','asdasads']}) df Out[10]: a 0 dsa^ 1 ^++ 2 +++ 3 asdasads In[11]: df['a'].str.count("\+") Out[11]: 0 0 1 2 2 3 3 0 Name: a, dtype: int64 

Also when you do df['a'].str.count('^') this just returns 1 for all rows:

In[12]: df['a'].str.count('^') Out[12]: 0 1 1 1 2 1 3 1 Name: a, dtype: int64 

Again you need to escape the pattern:

In[16]: df['a'].str.count('\^') Out[16]: 0 1 1 1 2 0 3 0 Name: a, dtype: int64 

EDIT

Regarding the semantic difference between count on a normal string and on a Series, count on a python str just does a character count, but str.count takes a regex pattern. The ^ and + are special characters which need to be escaped with a backslash if you are searching for those characters

Sign up to request clarification or add additional context in comments.

Comments

6

in str.count() for special characters you need to use backslash for regex patters. (it is explained detaily from @EdChum above).

On the other hand in str.contains() we don't need to use backslash for regex patters. Only need to add regex=False parameter like df['a'].str.contains("+", regex=False)) to search and find the string which include special characters.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.