Java regex to parse any number of Markdown-style links

Question

I'm trying to parse a string for any occurrences of markdown style links, i.e. [text](link). I'm able to get the first of the links in a string, but if I have multiple links I can't access the rest. Here is what I've tried, you can run it on ideone:

Pattern p; try { p = Pattern.compile("[^\\[]*\\[(?<text>[^\\]]*)\\]\\((?<link>[^\\)]*)\\)(?:.*)"); } catch (PatternSyntaxException ex) { System.out.println(ex); throw(ex); } Matcher m1 = p.matcher("Hello"); Matcher m2 = p.matcher("Hello [world](ladies)"); Matcher m3 = p.matcher("Well, [this](that) has [two](too many) keys."); System.out.println("m1 matches: " + m1.matches()); // false System.out.println("m2 matches: " + m2.matches()); // true System.out.println("m3 matches: " + m3.matches()); // true System.out.println("m2 text: " + m2.group("text")); // world System.out.println("m2 link: " + m2.group("link")); // ladies System.out.println("m3 text: " + m3.group("text")); // this System.out.println("m3 link: " + m3.group("link")); // that System.out.println("m3 end: " + m3.end()); // 44 - I want 18 System.out.println("m3 count: " + m3.groupCount()); // 2 - I want 4 System.out.println("m3 find: " + m3.find()); // false - I want true

I know I can't have repeating groups, but I figured find would work, however it does not work as I expected it to. How can I modify my approach so that I can parse each link?

Farhad Alizadeh Noori · Accepted Answer · 2014-05-14 15:55:30Z

1

Can't you go through the matches one by one and do the next match from an index after the previous match? You can use this regex:

\[(?<text>[^\]]*)\]\((?<link>[^\)]*)\)

The method Find() tries to find all matches even if the match is a substring of the entire string. Each call to find gets the next match. Matches() tries to match the entire string and fails if it doesn't match. Use something like this:

while (m.find()) { String s = m.group(1); // s now contains "BAR" }

edited May 14, 2014 at 15:55

answered May 14, 2014 at 14:47

Farhad Alizadeh Noori

2,31618 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

2rs2ts Over a year ago

I tried a similar syntax, but m3.matches() was false, probably because of the trailing characters which aren't ). Any suggestions on how I can work around that?

Farhad Alizadeh Noori Over a year ago

I guess the issue with your above pattern is that (?:.*) causes the match to continue all the way to the end of the string. So you pass all other prospective matches in the way. I would use a pattern like I suggested and get all the matches from the string.

Farhad Alizadeh Noori Over a year ago

It's strange that matches is false. Find does give the results with the pattern used in answer. Test it here : java-regex-tester.appspot.com

2rs2ts Over a year ago

.matches() is false with "\\[(?<text>[^\\]]*)\\]\\((?<link>[^\\)]*)\\)". But .find() is true. I think I understand what I have to do but could you write up what to do in order to get each pair of groups so I can accept your answer?

Federico Piazza · Accepted Answer · 2014-05-14 15:41:32Z

The regular expression I've used to match what you need (without groups) is \[\w+\]\(.+\)

It is just to show you it simple. Basically it does:

Filter a square: \[
Followed by any word char (at least 1): \w+
Then the square: \]

This will look for these pattern [blabla]

Then the same with parenthesis...

Filter a parenthesis: \(
Followed by any char (at least 1): .+
Then the parenthesis: \)

So it filters (ble...ble...)

Now if you want to store the matches on groups you can use additional parenthesis like this:

(\[\w+\])(\(.+\)) in this way you can have stored the words and links.

Hope to help.

I've tried on regexplanet.com and it's working

Update: workaround .*(\[\w+\])(\(.+\))*.*

This suffers from the same problem that I had with Farhad's answer.
Can you try this workaround, I've update the post at the end since can't post it here
Right, but my problem is that I can only find the first default and first key groups.

Collectives™ on Stack Overflow

Java regex to parse any number of Markdown-style links

2 Answers 2

4 Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Linked

Related