2

I have a string:

mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n" 

What I want is a list of substrings between the markers start="&maker1" and end="/\n". Thus, the expected result is:

whatIwant = ["The String that I want", "Another string that I want"] 

I've read the answers here:

  1. Find string between two substrings [duplicate]
  2. How to extract the substring between two markers?

And tried this but not successfully,

>>> import re >>> mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n" >>> whatIwant = re.search("&marker1(.*)/\n", mystr) >>> whatIwant.group(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'NoneType' object has no attribute 'group' 

What could I do to resolve this? Also, I have a very long string

>>> len(myactualstring) 7792818 

2 Answers 2

4

What could I do to resolve this? I would do:

import re mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n" found = re.findall(r"\&marker1\n(.*?)/\n", mystr) print(found) 

Output:

['The String that I want ', 'Another string that I want '] 

Note that:

  • & has special meaning in re patterns, if you want literal & you need to escape it (\&)
  • . does match anything except newlines
  • findall is better suited choiced if you just want list of matched substrings, rather than search
  • *? is non-greedy, in this case .* would work too, because . do not match newline, but in other cases you might ending matching more than you wish
  • I used so-called raw-string (r-prefixed) to make escaping easier

Read module re documentation for discussion of raw-string usage and implicit list of characters with special meaning.

Sign up to request clarification or add additional context in comments.

1 Comment

Is there a limit on the string length which can be process by regular expressions module?
2

Consider this option using re.findall:

mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n" matches = re.findall(r'&marker1\n(.*?)\s*/\n', mystr) print(matches) 

This prints:

['The String that I want', 'Another string that I want'] 

Here is an explanation of the regex pattern:

&marker1 match a marker \n newline (.*?) match AND capture all content until reaching the first \s* optional whitespace, followed by /\n / and newline 

Note that re.findall will only capture what appears in the (...) capture group, which is what you are trying to extract.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.