0

I have the problem to define the regexpression (for a Java program), that gives me the last matching group of something. The reason for that is the conversion of some text files (here: the export of some wiki) to the new format of the new wiki.

For example, when I have the following text:

Here another include: [[Include(a/a-1)]] 

The hierarchy of the pages is:

/a /a-1 

The old wiki referenced the hierarchy name, the new wiki will only have the title of the page. The new format should look like:

{include:a-1} 

Currently I have the following regular expression:

/\[\[Include\(([^\)]+)\)\]\]/ 

which matches from the example above a/a-1, but I need a regular expression that matches only a-1.

Is it possible to construct a regular expression for java that matches the last group only?

So for the following original lines:

[[Include(a)]] [[Include(a/b)]] [[Include(a/a-1)]] [[Include(a/a-1/a-2)]] 

I would like to match only

a b a-1 a-2 
1
  • FYI, tweaked he answer so it does not just capture the groups, but also show the search and replace regex (ps see demo). :) Commented May 24, 2014 at 18:54

1 Answer 1

1

This is the regex you're looking for. Group 1 has the text you want, see the captures pane at the bottom right of the demo, as well as the Substitutions pane at the bottom.

EDIT: per your request, replaced the [a-z0-9-] with [^/] (Did not update the regex101 demo as this regex, which I confirmed to work, breaks in regex101, which uses / as a delimiter, even when escaping the /. However here is another demo on regexplanet)

Search:

\[\[Include\((?:[^/]+\/)*([^/]+)\)\]\] 

Replace:

{include:$1} 

How does it work?

After the opening bracket of the Include, we match a combination of characters such as a-1 (made of letters, dash and digits) followed by a forward slash, zero or more times, then we capture the last such combination of characters.

In the few languages that support infinite-width lookbehinds, we could match what you want without relying on Group 1 captures.

Sign up to request clarification or add additional context in comments.

7 Comments

This answer is nearly perfect, thank you a lot! Could you adapt it by changing ...(?:[-a-z0-9]+/) (which matched my examples) to something like ...(?:[^/]+/) which matches everything that could be in a title of a page (but no forward slash)?
And one extrapoint (who could give zx82 that) for providing a demo, that is a handy tool!!
@mliebelt Thanks for your feedback. :) Changed the regex per your request. It works but breaks the regex101 demo which has a bug related to their / delimiter.
@zx81 You could easily use \[\[Include\(.*?([a-z0-9-]+)\)\]\]
@hwnd Yes, that's true. But both my earlier solution (?:[-a-z0-9]+/) and the one @mliebelt requested [^/]+ are faster, because as you know the .*? causes backtracking at every step.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.