9

I have a mkdocs instance and am writing a script to print internal links in a page. I cannot get grep to print only the matches if there are multiple per line.

This is what I currently have:

$ grep -Eon '\[([[:alpha:]]|[[:digit:]]|[[:space:]])*\]\((\/|\.).*\)' /path/to/file.md 10:[foo](../../relative_path/foobar.md) is the path to another file, also see [bar](/absolute/path/foobar.md) 

I would like the output to look like this:

10:[foo](../../relative_path/foobar.md) 10:[bar](/absolute/path/foobar.md) 

Is there a way to do this in grep or even another command like awk or sed?

4 Answers 4

6
grep -Pno "[[[:alnum:]]*]\(.*?\)" /path/to/file.md 

OR even better( this would match even ["foo anotherword"])

grep -Pno "\[([[:alnum:]]*[[:space:]]*)*?\]\(.*?\)" 

-P => Perl Regex which is used to match non-greedy using ?

OR if don't want only alpha numeric and space but any character means

 grep -Pno "\[.*?\]\(.*?\)" 
9
  • Stalin, this seems to be what user binarysta has answered, but you dropped some constraints that OP used, such as [[:space:]] and \/|\. Commented Jun 8, 2020 at 17:47
  • @Quasímodo :I answered it long back with -P option before binarysta, I just edited for some other thing.or may be in draft i don't know but was typing even before your answer ...It seems OP is looking for absolute or relative path , i think this would work... Commented Jun 8, 2020 at 17:51
  • what is the reason for omitting[[:space:]] from the OP's pattern? Commented Jun 8, 2020 at 18:44
  • @binarysta bcz OP is using | inside ( and ) that matches either only spaces or digits or alphas which makes no sense ... Commented Jun 8, 2020 at 18:47
  • Ando so you down voted ? :( Commented Jun 8, 2020 at 18:49
4
\[([[:alpha:]]|[[:digit:]]|[[:space:]])*\] 

would match [foo], that is OK. The mistake is that after it comes:

\((\/|\.).*\) 

You need to be careful when you include .* in your regexes, because it is very, very greedy! That will match (../../relative_path/foobar.md) is the path to another file, also see [bar](/absolute/path/foobar.md). Concatenating, the whole line has been matched.

You should go for

grep -Eon '\[([[:alnum:]]|[[:space:]])*\]\((\.|\/)[^)]*\)' 

The key was to replace .* by [^)]*, requiring the latter regex to stop short when if a closing parenthesis comes in its way. Also, I've applied this change:

  • [[:alpha:]]|[[:digit:]] can be collapsed into [[:alnum:]]

Output:

1:[foo](../../relative_path/foobar.md) 1:[bar](/absolute/path/foobar.md) 

(I have 1: instead of 10: because it is the first line in my file.)

3
grep -on '\[[^]]*\]([^)]*)' 

May just be enough in your case. Do you really need to restrict what characters may occur within [...] and (...)?

If you want to require the part inside [...] to only be made of alnums or whitespace and the part inside (...) to start with either a / or a ., that would simply be:

grep -on '\[[[:alnum:][:space:]]*\]([./][^)]*)' 

In any case, note the [^)]* instead of .*) as .* would swallow the closing ) and everything up to the right-most ) on the line.

No need for -E's | alternation operator here. To match a single character, you can use the [set] bracket expression, where the set can include several characters or character classes (here [:alnum:], short for [:alpha:][:digit:] and [:space:]).

2

Need to use non-greedy grep

Added ? after .* in ((\/|\.).*?\)

grep -Pon '\[([[:alpha:]]|[[:digit:]]|[[:space:]])*\]\((\/|\.).*?\)' /path/to/file.md 10:[foo](../../relative_path/foobar.md) 10:[bar](/absolute/path/foobar.md) 
  • -P for non greedy support. The regex should be in perl syntax

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.