2

In a list I need to match specific instances, except for a specific combination of strings:

let's say I have a list of strings like the following:

l = [ 'PSSTFRPPLYO', 'BNTETNTT', 'DE52 5055 0020 0005 9287 29', '210-0601001-41', 'BSABESBBXXX', 'COMMERZBANK' ] 

I need to match all the words that points to a swift / bic code, this code has the following form: 6 letters followed by 2 letters/digits followed by 3 optional letters / digits

hence I have written the following regex to match such specific pattern

import re regex = re.compile(r'(?<!\w)[a-zA-Z]{6}[a-zA-Z0-9]{2}([a-zA-Z0-9]{3})?(?!\w)') for item in l: match = regex.search(item) if match: print('found a match, the matched string {} the match {}'.format( item, item[match.start() : match.end()] else: print('found no match in {}'.format(item) 

I need the following cases to be macthed:

result = ['PSSTFRPPLYO', 'BNTETNTT', 'BSABESBBXXX' ] 

rather I get

result = ['PSSTFRPPLYO', 'BNTETNTT', 'BSABESBBXXX', 'COMMERZBANK' ] 

so what I need is to match only the strings that don't contain the word 'bank'

to do so I have refined my regex to :

regex = re.compile((?<!bank/i)(?<!\w)[a-zA-Z]{6}[a-zA-Z0-9]{2}([a-zA-Z0-9]{3})?(?!\w)(?!bank/i)) 

simply I have used negative look behind and ahead for more information about theses two concepts refer to link

My regex doesn't do the filtration intended to do, what did I miss?

6
  • 2
    (?!.*bank.*)^[a-z]{6}(?:[a-z0-9]{2})(?:[a-z0-9]{3})?$ with i modifier? Commented Oct 24, 2017 at 13:39
  • @ctwheels that's a nice trick thanks a lot. Commented Oct 24, 2017 at 13:47
  • 2
    My previous regex can actually be shortened to (?!.*bank.*)^[a-z]{6}(?:[a-z0-9]{2}|[a-z0-9]{5})$ (4 less characters and 2 less steps) or (?![a-z0-9]*bank[a-z0-9]*)^[a-z]{6}(?:[a-z0-9]{2}|[a-z0-9]{5})$ (more characters, but almost 400 less steps) Commented Oct 24, 2017 at 13:57
  • 1
    @ctwheels What did you use to analyze the number of steps? Commented Oct 24, 2017 at 13:59
  • 2
    regex101 Commented Oct 24, 2017 at 14:02

1 Answer 1

2

You can try this:

import re final_vals = [i for i in l if re.findall('^[a-zA-Z]{6}\w{2}|(^[a-zA-Z]{6}\w{2}\w{3})', i) and not re.findall('BANK', i, re.IGNORECASE)] 

Output:

['PSSTFRPPLYO', 'BNTETNTT', 'BSABESBBXXX'] 
Sign up to request clarification or add additional context in comments.

1 Comment

Is there no way to include the condition in the regex? instead of traversion the string twice, don't get me wrong your solution does the job, thanks a lot.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.