Return to Revisions

3 of 13

added 1747 characters in body; added 115 characters in body; added 9 characters in body; added 12 characters in body

edited Jun 30, 2015 at 19:35

59.4k
10
122
242

sed ':n s|;N/A;|;|g;$!N s|^\(\([^;]*;\)\{3\}\)\(.*\)\n\1|\1\3;|;tn P;D ' <<\IN D04005;4;279;0;0;SSM-4-1 D04005;5;40;0;0;SSM-5-1 LE040A;1;363;(26.3);N/A;SM-1-1 LE040A;1;363;(27.4);N/A;SM-1-2 LE040A;1;363;(28.5);N/A;SM-1-3 LE040A;1;363;(29.6);N/A;SM-1-4 IN

That will continue to branch back to test for every sequential in input, merging only the tails for each.

That's portably written, but it is a little easier to write if you can use -E xtended regular expressions (as you might w/ BSD or GNU versions)...

sed -E ':n s|;N/A;|;|g;$!N s|^(([^;]*;){3})(.*)\n\1|\1\3;|;tn P;D'

If you wanted it all on one line:

sed -Ee:n -e's|;N/A;|;|g;$!N;s|^(([^;]*;){3})(.*)\n\1|\1\3;|;tn' -eP\;D

...would work, but I've never been very fond of one-liners like that...

Anyway, the output from the first there, is:

###OUTPUT

D04005;4;279;0;0;SSM-4-1 D04005;5;40;0;0;SSM-5-1 LE040A;1;363;(26.3);SM-1-1;(27.4);SM-1-2;(28.5);SM-1-3;(29.6);SM-1-4

To also move any trailing field which starts w/ SM- to the tail of the line, and to separate each of those with /, I believe the following should work:

sed -E ':n s|;N/A;|;|g s|;(SM-[^;]*)$|/\1|;$!N s|^(([^;]*;){3})(.*)\n\1|\1\3;|;tn P;D'

You know, by the way, this can get a lot easier - and a lot faster - if you can be more clear and more specific about what it is you need. To me it doesn't look like you want to merge only the first three identical fields on any two sequential lines, and to remove a field which matches N/A from any line, and to afterward move SM- fields to the tail of any line. Rather, to me it looks like all of those individual jobs you name are actually one and the same, and that you really want something like:

If an input line is found with three alpha-numeric, semicolon delimited fields, then a parenthesized float field followed by another colon delimiter, then an N/A field, we should do the following:
1. Check if the next line also matches this description, and if, so, compare the first three fields for the current line and the next.
2. If a match is found, retain only the last field from the next line, and then recurse to try again.
3. Regardless, always remove the field which matches N/A, and replace the last ;
Regardless, print all that remains to stdout.

Do you see how that differs? It is a series of tasks performed which depend on a single initial condition. If you can be that clear, the your matches won't have to make up for your generality in processing time. If I am correct, then the following could work:

sed -e:n -e'\|;N/A;|!b s||/|;$!N s|\([^(]*\)\(.*\)\n\1|\1|;tn P;D'

...or, on a single line...

sed -e:n -e'\|;N/A;|!b' -e's||/|;$!N;s|\([^(]*\)\(.*\)\n\1|\1|;tn' -eP\;D

answered Jun 30, 2015 at 18:11

mikeserv

59.4k
10
122
242