2

I'm trying to use a regex to clean some data before I insert the items into the database. I haven't been able to solve the issue of removing trailing special characters at the end of my strings.

How do I write this regex to only remove trailing special characters?

import re strings = ['string01_','str_ing02_^','string03_@_', 'string04_1', 'string05_a_'] for item in strings: clean_this = (re.sub(r'([_+!@#$?^])', '', item)) print (clean_this) outputs this: string01 # correct string02 # incorrect because it remove _ in the string string03 # correct string041 # incorrect because it remove _ in the string string05a # incorrect because it remove _ in the string and not just the trailing _ 
1
  • 4
    Try [_+!@#$?^]+$ Commented Dec 18, 2018 at 16:19

3 Answers 3

6

You could also use the special purpose rstrip method of strings

[s.rstrip('_+!@#$?^') for s in strings] # ['string01', 'str_ing02', 'string03', 'string04_1', 'string05_a'] 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for pointing out this method, because I haven't seen it before.
2

You could repeat the character class 1+ times or else only 1 special character would be replaced. Then assert the end of the string $. Note that you don't need the capturing group around the character class:

[_+!@#$?^]+$ 

For example:

import re strings = ['string01_','str_ing02_^','string03_@_', 'string04_1', 'string05_a_'] for item in strings: clean_this = (re.sub(r'[_+!@#$?^]+$', '', item)) print (clean_this) 

See the Regex demo | Python demo

If you also want to remove whitespace characters at the end you could add \s to the character class:

[_+!@#$?^\s]+$ 

Regex demo

4 Comments

For some reason, certain strings aren't being removed, such as 05_+_
my current regex is this: (re.sub(r'[!@#$%^&*<>?_\s]+$', '', item))
It seems to work ideone.com/qfhM1I Can you share the code where you pull the values from the list?
It seems that the key was changing str(item_name).strip().lower() to repr(item_name).strip().lower() in the section of code that created the list containing the data.
2

You need an end-of-word anchor $

 clean_this = (re.sub(r'[_+!@#$?^]+$', '', item)) 

Demo

5 Comments

This works, but for some odd reason in my production data items that end with _ are not always removed.
Could you show examples of data that cause those issues?
some of the production data strings not being cleaned at the end: on_+_ , 05____ , 50__j_
Are you sure there is no trailing whitespace?
It seems that the key was changing str(item_name).strip().lower() to repr(item_name).strip().lower() in the section of code that created the list containing the data.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.