Python pandas doesn't recognize special characters

Question

I am trying to use df['column_name'].str.count("+") in python pandas, but I receive

"error: nothing to repeat"

. With the regular characters the method works, e.g. df['column_name'].str.count("a") works fine.

Also, there is a problem with the "^"-sign. If I use df['column_name'].str.contains("^") the result is incorrect - it looks like "^" gets interpreted as " " (empty space).

Surprisingly, if I use .count("+") and .contains("^") on a regular, non-pandas string they work perfectly fine.

simple working example:

df = pd.DataFrame({'column1': ['Nighthawks+', 'Dragoons'], 'column2': ['1st', '2nd']}, columns = ['column1', 'column2'])

When applying df["column1"].str.contains("^") one gets "True, True" but is should be "False, False".

And when applying df["column1"].str.count("+") one gets

"error: nothing to repeat"

But then, outside of panda, "bla++".count("+") gives correctly the result "2".

Any solutions? Thanks

EdChum · Accepted Answer · 2017-09-01 15:49:31Z

You need to escape the plus sign:

In[10]: df = pd.DataFrame({'a':['dsa^', '^++', '+++','asdasads']}) df Out[10]: a 0 dsa^ 1 ^++ 2 +++ 3 asdasads In[11]: df['a'].str.count("\+") Out[11]: 0 0 1 2 2 3 3 0 Name: a, dtype: int64

Also when you do df['a'].str.count('^') this just returns 1 for all rows:

In[12]: df['a'].str.count('^') Out[12]: 0 1 1 1 2 1 3 1 Name: a, dtype: int64

Again you need to escape the pattern:

In[16]: df['a'].str.count('\^') Out[16]: 0 1 1 1 2 0 3 0 Name: a, dtype: int64

EDIT

Regarding the semantic difference between count on a normal string and on a Series, count on a python str just does a character count, but str.count takes a regex pattern. The ^ and + are special characters which need to be escaped with a backslash if you are searching for those characters

msklc · Accepted Answer · 2020-08-25 13:59:15Z

in str.count() for special characters you need to use backslash for regex patters. (it is explained detaily from @EdChum above).

On the other hand in str.contains() we don't need to use backslash for regex patters. Only need to add regex=False parameter like df['a'].str.contains("+", regex=False)) to search and find the string which include special characters.

Collectives™ on Stack Overflow

Python pandas doesn't recognize special characters

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related