0

This has been driving me crazy, I can't find a solution that works! I'm trying to do a regex between a couple of tags, bad idea I've heard but necessary this time :P What I have at the start is a <body class="foo"> where foo can vary between files - <body.*?> search works fine to locate the only copy in each file.

At the end I have a <div id="bar">, bar doesn't change between files.

eg.

<body class="foo"> sometext some more text <maybe even some tags> <div id="bar"> 

What I need to do is select everything between the two tags but not including them - everything between the closing > on body and the opening < on div - sometext to maybe even some tags.

I've tried a bunch of things, mostly variations on (?<=<body.*>)(.*?)(?=<div id="bar">) but I'm actually getting invalid expressions at worst on notepad++, http://regexpal.com/ and no matches at best.

Any help appreciated!

0

2 Answers 2

2

You are attempting to implement variable-length lookbehind in which most regular expression languages and notepad++ does not support. I assume you are using notepad++ so you can use the \K escape sequence.

<body[^>]*>\K.*?(?=<div id="bar">) 

The \K escape sequence resets the starting point of the reported match and any previously consumed characters are no longer included. Make sure you have the . matches newline checkbox checked as well.

Alternatively, you can use a capturing group and avoid using lookaround assertions.

<body[^>]*>(.*?)<div id="bar"> 

Note: Using a capturing group, you can refer to group index "1" to get your match result.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your input :) Unfortunately this is also grabbing the <body> and <div> tags. I might be a little out of my depth getting it to ignore those, I don't suppose you'd know a solution?
At the moment i'm using notepad++ on win and regexpal.com to test. Both grabbed the tags :/
1

Use the following pattern:

/<body[^>]*>(.*?)<div id="bar">/ 

3 Comments

Thanks for taking the time to reply! Unfortunately this is grabbing nothing in notepad++, on regexpal.com it grabs nothing unless I remove the '/' marks, then it grabs the outer tags I was trying to avoid grabbing.
@user891141 I assumed that you were working in JavaScript. /pattern/ is the default method of depicting a regex pattern in JS. As for grabbing the body and div tags, I think that you are confusing grabbing with grouping.
Quite possibly was confusing my terms :P Thanks again for your input :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.