In a list I need to match specific instances, except for a specific combination of strings:
let's say I have a list of strings like the following:
l = [ 'PSSTFRPPLYO', 'BNTETNTT', 'DE52 5055 0020 0005 9287 29', '210-0601001-41', 'BSABESBBXXX', 'COMMERZBANK' ] I need to match all the words that points to a swift / bic code, this code has the following form: 6 letters followed by 2 letters/digits followed by 3 optional letters / digits
hence I have written the following regex to match such specific pattern
import re regex = re.compile(r'(?<!\w)[a-zA-Z]{6}[a-zA-Z0-9]{2}([a-zA-Z0-9]{3})?(?!\w)') for item in l: match = regex.search(item) if match: print('found a match, the matched string {} the match {}'.format( item, item[match.start() : match.end()] else: print('found no match in {}'.format(item) I need the following cases to be macthed:
result = ['PSSTFRPPLYO', 'BNTETNTT', 'BSABESBBXXX' ] rather I get
result = ['PSSTFRPPLYO', 'BNTETNTT', 'BSABESBBXXX', 'COMMERZBANK' ] so what I need is to match only the strings that don't contain the word 'bank'
to do so I have refined my regex to :
regex = re.compile((?<!bank/i)(?<!\w)[a-zA-Z]{6}[a-zA-Z0-9]{2}([a-zA-Z0-9]{3})?(?!\w)(?!bank/i)) simply I have used negative look behind and ahead for more information about theses two concepts refer to link
My regex doesn't do the filtration intended to do, what did I miss?
(?!.*bank.*)^[a-z]{6}(?:[a-z0-9]{2})(?:[a-z0-9]{3})?$withimodifier?(?!.*bank.*)^[a-z]{6}(?:[a-z0-9]{2}|[a-z0-9]{5})$(4 less characters and 2 less steps) or(?![a-z0-9]*bank[a-z0-9]*)^[a-z]{6}(?:[a-z0-9]{2}|[a-z0-9]{5})$(more characters, but almost 400 less steps)