2

I have a simple problem. I need to remove \n all occurrence of them between two patterns. ie.

<INFOSTART A=1 B=2 C=3 D=4 <INFOEND <INFOSTART G=1 Z=3 <INFOEND 

So I would like to the output be something like following

A=1 B=2 C=3 D=4 G=1 Z=3 

any idea how I can do it? And thanks in advance guys..

4 Answers 4

5

You can use a simple state machine with awk, such as with the following input file, slightly modified to also allow text outside the markers (if there is no such text, it will still work as desired, this is just to handle extra cases):

xyzzy plugh <INFOSTART A=1 B=2 C=3 D=4 <INFOEND twisty passages <INFOSTART G=1 Z=3 <INFOEND after last 

With a data file like that (or your original), the following awk command gives you what you need, combining lines between the start and end markers into a single line:

awk ' /^<INFOSTART$/ {inside=1; sep=""; next} /^<INFOEND$/ {inside=0; print ""; next} inside {printf sep""$0; sep=" "; next} {print}' input_file xyzzy plugh A=1 B=2 C=3 D=4 twisty passages G=1 Z=3 after last 

Examining the awk code in more detail, the following sections expand on each line.


The following segment runs whenever you find a line consisting of only the start marker. It sets the inside state to true (non-zero) to indicate that you should start combining lines, and sets the initial separator to the empty string to ensure no leading space on the combined line. The next simply goes and grabs the next input line immediately, starting a new cycle:

/^<INFOSTART$/ {inside=1; sep=""; next} 

Assuming you didn't find a start marker, this segment runs for an end marker. If found, the inside state is set back to false (zero) to start printing out lines exactly as they appear in the input file. It also outputs a newline to properly finish the combined line, then restarts the cycle with the next input line:

/^<INFOEND$/ {inside=0; print ""; next} 

If you've established that the line is neither a start nor end marker, your behaviour depends on the inside state. For true, you need to combine the input lines into a single output line, so you simply print, without a trailing newline, the separator followed by the line itself. Then you set the separator to a space so the next input line will be properly separated from the previous one. It then cycles back for the next input line:

inside {printf sep""$0; sep=" "; next} 

Finally, if you get here, you know you're outside of a start/end section so you just echo the line exactly as it exists in the input file:

 {print}' 

If you don't want the nicely formatted version, you can use the following minified version, assuming you're certain the only <INFO... lines are the start and end markers:

awk '/^<INFOS/{a=1;b="";next}/^<INFOE/{a=0;print"";next}a{printf b$0;b=" ";next}1' 

However, since this will probably be in a script rather than a one-liner command, I'd tend to stick with the readable version myself.

Sign up to request clarification or add additional context in comments.

Comments

4

With tr and sed:

AMD$ tr '\n' ' ' < File | sed 's/<INFOSTART //g; s/<INFOEND /\n/g' A=1 B=2 C=3 D=4 G=1 Z=3 

Replace all newlines with space first. Then use sed to remove all <INFOSTART and replace all <INFOEND with newlines.

Comments

1

Perl to the rescue:

< input perl -ne 's/\n/ /, print if $s = /<INFOSTART/ .. ($e = /<INFOEND/) and $s > 1 and !$e; print "\n" if $e' 

$s is true when we're between the tags (using the .. operator). $e is true if we're matching the end tag, $s is 1 when we're matching the start tag.

Comments

1

This might work for you (GNU sed):

sed '/^<INFOSTART/d;:a;N;/^<INFOEND/M!s/\n/ /;ta;P;d' file 

This deletes lines beginning <INFOSTART or <INFOEND and replaces the newlines between all other lines by spaces.

The solution can be pared down further (providing the file is well formed) to:

sed '/^</d;:a;N;/^</M!s/\n/ /;ta;P;d' file 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.