2

I'm trying to parse a string for any occurrences of markdown style links, i.e. [text](link). I'm able to get the first of the links in a string, but if I have multiple links I can't access the rest. Here is what I've tried, you can run it on ideone:

Pattern p; try { p = Pattern.compile("[^\\[]*\\[(?<text>[^\\]]*)\\]\\((?<link>[^\\)]*)\\)(?:.*)"); } catch (PatternSyntaxException ex) { System.out.println(ex); throw(ex); } Matcher m1 = p.matcher("Hello"); Matcher m2 = p.matcher("Hello [world](ladies)"); Matcher m3 = p.matcher("Well, [this](that) has [two](too many) keys."); System.out.println("m1 matches: " + m1.matches()); // false System.out.println("m2 matches: " + m2.matches()); // true System.out.println("m3 matches: " + m3.matches()); // true System.out.println("m2 text: " + m2.group("text")); // world System.out.println("m2 link: " + m2.group("link")); // ladies System.out.println("m3 text: " + m3.group("text")); // this System.out.println("m3 link: " + m3.group("link")); // that System.out.println("m3 end: " + m3.end()); // 44 - I want 18 System.out.println("m3 count: " + m3.groupCount()); // 2 - I want 4 System.out.println("m3 find: " + m3.find()); // false - I want true 

I know I can't have repeating groups, but I figured find would work, however it does not work as I expected it to. How can I modify my approach so that I can parse each link?

2 Answers 2

1

Can't you go through the matches one by one and do the next match from an index after the previous match? You can use this regex:

\[(?<text>[^\]]*)\]\((?<link>[^\)]*)\) 

The method Find() tries to find all matches even if the match is a substring of the entire string. Each call to find gets the next match. Matches() tries to match the entire string and fails if it doesn't match. Use something like this:

while (m.find()) { String s = m.group(1); // s now contains "BAR" } 
Sign up to request clarification or add additional context in comments.

4 Comments

I tried a similar syntax, but m3.matches() was false, probably because of the trailing characters which aren't ). Any suggestions on how I can work around that?
I guess the issue with your above pattern is that (?:.*) causes the match to continue all the way to the end of the string. So you pass all other prospective matches in the way. I would use a pattern like I suggested and get all the matches from the string.
It's strange that matches is false. Find does give the results with the pattern used in answer. Test it here : java-regex-tester.appspot.com
.matches() is false with "\\[(?<text>[^\\]]*)\\]\\((?<link>[^\\)]*)\\)". But .find() is true. I think I understand what I have to do but could you write up what to do in order to get each pair of groups so I can accept your answer?
0

The regular expression I've used to match what you need (without groups) is \[\w+\]\(.+\)

It is just to show you it simple. Basically it does:

  • Filter a square: \[
  • Followed by any word char (at least 1): \w+
  • Then the square: \]

This will look for these pattern [blabla]

Then the same with parenthesis...

  • Filter a parenthesis: \(
  • Followed by any char (at least 1): .+
  • Then the parenthesis: \)

So it filters (ble...ble...)

Now if you want to store the matches on groups you can use additional parenthesis like this:

(\[\w+\])(\(.+\)) in this way you can have stored the words and links.

Hope to help.

I've tried on regexplanet.com and it's working

Update: workaround .*(\[\w+\])(\(.+\))*.*

3 Comments

This suffers from the same problem that I had with Farhad's answer.
Can you try this workaround, I've update the post at the end since can't post it here
Right, but my problem is that I can only find the first default and first key groups.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.