0

I need a help with this regexp..

using

/\{block:(Posts|Photos|Videos)(\s\[.*?\])?\}(\s?[^\"]+\s?)\{\/block\}/U 

i get {block:Posts}abcdef{/block} from this:

<div> {block:Posts [a=1, b=2]} abcdef {/block} </div> 

But if my text is like this:

<div> {block:Posts [a=1, b=2]} {block:Text} abcdef {/block} {/block} </div> 

i get {block:Posts}{block:Text}abcdef{/block} because it's based on the first {/block} found in text.

A simple way to avoid this is using {/block:Posts} to close the block, but how can I do that since the opening block tag is optional (Posts|Photos|Videos)? If I open the block with Photos I must be sure it has to be closed with {/block:Photos}.

Using /\{block:(Posts|Photos|Videos)(\s\[.*?\])?\}(\s?[^\"]+\s?)\{\/block\:(Posts|Photos|Videos)\}/U of course doesn't help...

Can anyone please help me?

Thanks!!

PS
Is it possible, modifying the regex above, to get the optional parameters a and b as an array?

2 Answers 2

1

There might be an overall better solution for your problem, but you can use a backreference in this case, as (Posts|Photos|Videos) is capture group already:

\{\/block:\1\} 
Sign up to request clarification or add additional context in comments.

7 Comments

Is it possible to get the optional parameters [a=1, b=2] as an array? array('a'=>1, 'b'=>2)
So your language is PHP I assume? I think you have to extract that text and process it later on.
Yes it is. So you recommend to process it later? Assuming I get the params in plain text like this [a=1, b=2], would you use an explode (or stuff like that) or another regex?
ah.. you wrote "There might be an overall better solution".. can you suggest that?
No, otherwise I would have ;) I was just thinking about whether your expression could be simplified or whether there is a library which lets you define your own template language (and parse it).
|
1

You can do this using a backreference:

\{block:(Posts|Photos|Videos)(\s\[.*?\])?\}(\s?[^\"]+\s?)\{\/block\1\} 

Note the added backreference \1 at the end. The backreference will match whatever was matched by the first group, i.e. the first pair of parenthesis, in our case (Posts|Photos|Videos).

Note however that in general regular expressions are too limited to parse languages like HTML as explained by this post. Languages which require counting of opening entities (like brackets or tags) and then matching the exact number of closing entities can't be expressed using regular expressions. Another example of a language that isn't regular for this reason is the language of arithmetic expressions with parenthesis or a language composed of strings of the form aa...abb...b with the same number of a and b. General proof of this fact uses the Pumping Lemma.

Note also that regular expressions as used in software tools are usually a bit more powerful than bare mathematical regular expressions due to a number of additions beyond basic operations of union, concatenation and Kleene star that are provided by these software tools. Backreferences themselves constitue a major enhancement of regular expressions and allow one to express languages that are not considered regular in the mathematical sense. This is why your problem has a solution at all. Counting of opening and closing entities is still impossible, though.

5 Comments

What I'm trying to do is a Tumblr like theme parser that can allow me to use {block} to let users render certain contents (such as news, photos, videos, ...) in the way they like.. Reading the post and the link you suggested really taught me a lot and of course I know that writing a fast and reliable code parser is not easy at all.. At the moment regex is the best I can deal with.. But I'll work to improve that! Can you suggest me some documentation useful to do that? Thanks for your help!
If you want to parse a simple language with opening and closing entities (here: {block:...} and {/block}) you may find that a simple recursive decent parser is a good solution. These are more powerful than regular expressions, but still very easy to program (example code is in this article: en.wikipedia.org/wiki/Recursive_descent_parser). If your needs go beyond that you may want to use a compiler-compiler like yacc or bison to generate a parser for you, but most of them will require you to come up with a formal grammar for the language you need to parse.
Also, you can implement a recursive decent parser in any language of your choice, while compiler-compilers impose some limitations. Before you decide, you should however ensure that RDP is powerful enough to parse what you need.
I've just realized that if my code is {block:Posts}PLAINTEXT{/block:Posts} the regex works fine, but if there's HTML inside the block {block:Posts}<div id="xyz">some text inside HTML tag</div>{/block:Posts} the regex doesn't work anymore...! How can I fix that?
Your regexp disallows double-quotes (") between {block...} and {/block...}: (\s?[^\"]+\s?).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.