Bash Scripting: Find strings on one line of code and insert on own line

Question

I'm trying to write a small bash script that:

-wget's an html file every [x] minutes from the web
-uses some linux utility to find differences in the file between the last two updates
-Uses sed to modify the lines on which new text was detected

The problem I am running into is that the HTML file uses in-line CSS to format a table, but the actual code for the page is stored on one long line.

Effectively I need a Linux utility that can scan through a single line of code, find every instance of text between each tags, and insert those instances on their own line. That should make scanning the text easier. Every tool I've tried searches on a per-line basis which can't do what I need since the entire code is stored on a single line.

John Zwinck · Accepted Answer · 2013-02-10 00:45:40Z

1

You could first split the content into lines, by substituting (say) > with >\n. That will break up the document on the end of each HTML tag.

Maybe you don't even need to do that: if you use awk's RS variable to define the record separator as ">" instead of newline. See this page for an example of using RS: http://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr/

answered Feb 10, 2013 at 0:45

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user2057895 Over a year ago

I'm looking at the RS variable now. As for your first example, should I use sed to modify each "</td>" tag with "</td>\n"?

Bill Woodger Over a year ago

<a>some text</a> If you set RS to ">" you'll get <a>, some text, </a>, Three records, from one line. However, if your text can contain ">", it'll pickle things a little.

user2057895 Over a year ago

Taking John's advice, I tried sed -i 's/<\/tr>/<\/tr>\n/g' file.html This did the trick! Regular expressions are confusing.

John Zwinck Over a year ago

Yes, for example you could use sed to add newlines after each closing tag you're interested in. Note that most versions of sed do not make this particularly easy, so see this other answer for how to do that: stackoverflow.com/questions/6111679/insert-linefeed-in-sed

John Zwinck Over a year ago

Regarding that sed expression: you can use other characters than slash to delimit sed commands (the first one seen will set what sed expects for all the rest of the delimiters, so you can use anything!). So you may find this more readable: s@</tr>@</tr>\n@g.

Collectives™ on Stack Overflow

Bash Scripting: Find strings on one line of code and insert on own line

1 Answer 1

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Linked

Related