10
x = r"abc" y = r"def" z = join([x,y], "|") z # => r"r\"abc\"|r\"def\"" 

Is there a way to join (and in general manipulate) Regex that deals only with the regex content (i.e. does not treat the r modifier as if it's part of the content). The desired output for z is:

z # => r"abc|def" 
5
  • What is the output you're getting? Commented Dec 9, 2013 at 19:27
  • @UriMikhli It is the last line in the first code block. Commented Dec 9, 2013 at 19:28
  • 1
    Well, there's Regex(join([x.pattern,y.pattern], "|")), but that's not very pretty, and I don't know how it would behave in more complex cases. Commented Dec 9, 2013 at 19:30
  • @DSM Not pretty but better than I had, I didn't know about the pattern attribute! Commented Dec 9, 2013 at 19:32
  • I think you should open this as a issue or maybe a Pull request on github.com/julialang/julia. I think this behaviour is a oversight. Commented Dec 9, 2013 at 19:51

2 Answers 2

8
macro p_str(s) s end x = p"abc" y = p"def" z = Regex(join([x,y], "|")) 

The r"quote" operator actually compiles a regular expression for you which takes time. If you have just parts of a regular expression that you want to use to build a bigger one then you should store the parts using "regular quotes".

But what about the sketchy escaping rules that you get with r"quote" versus "regular quotes" you ask? If you want the sketchy r"quote" rules but not to compile a regular expression immediately then you can use a macro like:

macro p_str(s) s end 

Now you have a p"quote" that escapes like an r"quote" but just returns a string.

Not to go off topic but you might define a bunch of quotes for getting around tricky alphabets. Here's some convenient ones:

 # "baked\nescape" -> baked\nescape macro p_mstr(s) s end # p"""raw\nescape""" -> raw\\nescape macro dq_str(s) "\"" * s * "\"" end # dq"with quotes" -> "with quotes" macro sq_str(s) "'" * s * "'" end # sq"with quotes" -> 'with quotes' macro s_mstr(s) strip(lstrip(s)) end # s""" "stripme" """-> "stripme" 

When you're done making fragments you can do your join and make a regex like:

myre = Regex(join([x, y], "|")) 

Just like you thought.

If you want to learn more about what members an object has (such as Regex.pattern) try:

julia> dump(r"pat") Regex pattern: ASCIIString "pat" options: Uint32 33564672 regex: Array(Uint8,(61,)) [0x45,0x52,0x43,0x50,0x3d,0x00,0x00,0x00,0x00,0x28 … 0x1d,0x70,0x1d,0x61,0x1d,0x74,0x72,0x00,0x09,0x00] 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Michael. It seems the answer to my question is no. Your answer contains some cool stuff (I didn't even know about dump()), but I already understand that I can construct regexes by manipulating string parts and then calling Regex(). The specific scenario I have, though, is when one has regexes and not strings. I guess in that case you have to use pattern.
It seems that once you use join() to combine the p-strings the escaping reverts to what it would normally be in a string. So the combined pattern does not have the correct escaping after all. I could be missing something, of course, since I am new to Julia.
1

Instead of joining regexes, I think that it is better to join strings and then convert the result to regex. In this way, you can solve your problem as follows:

x = "abc" y = "def" z = Regex(join([x,y], "|")) println(z) 

You should get r"abc|def" as the output.


Note: Here I exploited the answer of Michel Fox by removing the macro

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.