extract only the substring after double quotes - grep

Question

I have a file which is as below.

<a href="http://firstlink.com" title="title1"> <a href="http://secondlink.com" title="title2"> <a href="http://thirdlink.com" title="title3"> <a href="http://fourthlink.com" title="title4">

I am trying to extract only the URLs from the above file. I am using the below command.

grep -o '\".*\"' new.txt

However, the above command gives me the output as,

"http://firstlink.com" title="title1"> "http://secondlink.com" title="title2"> "http://thirdlink.com" title="title3"> "http://foruthlink.com" title="title4">

I am trying to extract only the URLs without the "". So, my expected output is,

http://firstlink.com http://secondlink.com http://thirdlink.com http://fourthlink.com

How should I change the grep command? Or is it possible to do it in perl, awk or sed command?

devnull · Accepted Answer · 2014-02-11 18:47:58Z

16

You could use awk.

awk -F\" '{print $2}' filename

would produce the desired output.

Using sed:

sed 's/[^"]*"\([^"]*\).*/\1/' filename

Using grep:

grep -oP '[^"]*"\K[^"]*' filename

edited Feb 11, 2014 at 18:47

answered Feb 11, 2014 at 18:41

devnull

10.8k2 gold badges43 silver badges50 bronze badges

2

The \K is explained here: stackoverflow.com/a/33573989/318765

mgutt
– mgutt

2019-07-28 23:01:46 +00:00
Commented Jul 28, 2019 at 23:01
The grep command returns too much. Eg. also title=.

marlar
– marlar

2019-09-14 08:44:53 +00:00
Commented Sep 14, 2019 at 8:44

Add a comment |

Emmanuel · Accepted Answer · 2014-02-11 21:50:29Z

12

regexp, stream editors and interpreters are overkill here.
Use the old good cut :

cut -d \" -f 2 < filename

answered Feb 11, 2014 at 21:50

Emmanuel

4,2572 gold badges26 silver badges31 bronze badges

I agree, this is exactly what cut was designed for.

Ryan Foley
– Ryan Foley

2014-02-16 15:06:06 +00:00
Commented Feb 16, 2014 at 15:06

Add a comment |

dingrui · Accepted Answer · 2014-02-16 11:23:32Z

2

sed 's/.*"\(http.*\)" .*/\1/' filename

answered Feb 16, 2014 at 11:23

dingrui

1311 bronze badge

Add a comment |

zindigo · Accepted Answer · 2014-11-26 22:43:36Z

This is more portable, since some of the other answers depend on href being the first element

grep -o href.*\" file.txt | cut -d \" -f 2

Stack Exchange Network

extract only the substring after double quotes - grep

4 Answers 4

You must log in to answer this question.

Hot Network Questions

extract only the substring after double quotes - grep

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions