40

I'm learning about regular expression. I don't know how to combine different regular expression to make a single generic regular expression.

I want to write a single regular expression which works for multiple cases. I know this is can be done with naive approach by using or " | " operator.

I don't like this approach. Can anybody tell me better approach?

0

5 Answers 5

41

You need to compile all your regex functions. Check this example:

import re re1 = r'\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*' re2 = '\d*[/]\d*[A-Z]*\d*\s[A-Z]*\d*[A-Z]*' re3 = '[A-Z]*\d+[/]\d+[A-Z]\d+' re4 = '\d+[/]\d+[A-Z]*\d+\s\d+[A-Z]\s[A-Z]*' sentences = [string1, string2, string3, string4] for sentence in sentences: generic_re = re.compile("(%s|%s|%s|%s)" % (re1, re2, re3, re4)).findall(sentence) 
Sign up to request clarification or add additional context in comments.

5 Comments

@Amit Iv'e fixed it. I used the variable name that you wrote "generic-re" and it cause the error.
Character class with only one element is a nonsens and make regex harder to read.
Thanks working!! I got the answer for this question stackoverflow.com/questions/53947401/…
variable sentence is not defined + findall takes a string not a list. maybe you meant to do for sentence in sentences?
See also stackoverflow.com/a/36870447/288875 to confirm that the | operator in fact has the lowest precedence (i.e. it has a 'weaker binding force' than any other operator which may be used in any of re1...re4).
15

To findall with an arbitrary series of REs all you have to do is concatenate the list of matches which each returns:

re_list = [ '\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*', # re1 in question, ... '\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*', # re4 in question ] matches = [] for r in re_list: matches += re.findall( r, string) 

For efficiency it would be better to use a list of compiled REs.

Alternatively you could join the element RE strings using

generic_re = re.compile( '|'.join( re_list) ) 

2 Comments

Are you sure the above works? I get 'str' object has no attribute 'findall' just by copying and pasting.
@gented Silly error on my part, Any variable name except re, which is what you imported! I'll edit my answer.
4

I see lots of people are using pipes, but that seems to only match the first instance. If you want to match all, then try using lookaheads.

Example:

>>> fruit_string = "10a11p" >>> fruit_regex = r'(?=.*?(?P<pears>\d+)p)(?=.*?(?P<apples>\d+)a)' >>> re.match(fruit_regex, fruit_string).groupdict() {'apples': '10', 'pears': '11'} >>> re.match(fruit_regex, fruit_string).group(0) '10a,11p' >>> re.match(fruit_regex, fruit_string).group(1) '11' 

(?= ...) is a look ahead:

Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

.*?(?P<pears>\d+)p find a number followed a p anywhere in the string and name the number "pears"

1 Comment

at first i didn't understand what you meant by 'that only seems to match the first instance' but then i realized you were looking to find the first match of multiple regexes in one pass, instead of every match of multiple regexes (which the other answers have shown can be done with something like find_all). that's an interesting problem to solve. neat solution. though i'm not sure how you got '10a,11p' from group(0). when i ran it it just gave me ''. did you mean groups()?
0

You might not need to compile both regex patterns. Here is a way, let's see if it works for you.

>>> import re >>> text = 'aaabaaaabbb' >>> A = 'aaa' >>> B = 'bbb' >>> re.findall(A+B, text) ['aaabbb'] >>> 

further read read_doc

Comments

0

If you need to squash multiple regex patterns together the result can be annoying to parse--unless you use P<?> and .groupdict() but doing that can be pretty verbose and hacky. If you only need a couple matches then doing something like the following could be mostly safe:

bucket_name, blob_path = tuple(item for item in matches.groups() if item is not None) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.