3

I'm trying to match special kind of string literals with some funky escaping rules.

The general form looks like this:

"some string" 

Which are simple to match using a pattern such as "(.*?)"

However you can escape quotes by doubling them, such as:

"hello "" there" becomes hello " there
"hello """" there" becomes hello "" there

And this is where my regex skills fail me. How can I match strings like this?

Oh, and I'm using python 3.1.

6
  • I think that this way of formatting is better. Feel free to rollback if you don't think so. Commented Jun 3, 2013 at 18:36
  • It definitely looks better. Commented Jun 3, 2013 at 18:38
  • Although, I'm still not sure what you are trying to do. You have input strings as "hello "" there" and "hello """" there" and want the output to be hello " there and hello "" there respectively or? Commented Jun 3, 2013 at 18:41
  • The regex doesn't have to unescape the string, I can do that later, however I need to match the entire string. My current pattern stops matching when hitting the double " while it should continue until the first " which isn't doubled. Commented Jun 3, 2013 at 18:45
  • what would you expect in case of """? Can you match the string first and then replace the quotes? Commented Jun 3, 2013 at 18:46

2 Answers 2

3
regex = re.compile(r'"(?:[^"]|"")*"') 

This just finds the literals, it doesn't decode them by replacing the doubled quotes.

Sign up to request clarification or add additional context in comments.

Comments

1

Not using a regular expression, but you've specified Python, so here's a way to get your expected output:

>>> import csv >>> strings = ['"some string"', '"hello "" there"', '"hello """" there"'] >>> for s in strings: print next(csv.reader([s])) ['some string'] ['hello " there'] ['hello "" there'] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.