Return to Answer

added 24 characters in body

edited Aug 18, 2014 at 20:47

59.4k
10
122
242

{ nl -bpH -w1 | sed 's/^\([0-9]*\)[ \t]*\([^H]*.\)/\2\1/' } <<\DATA ... 1562 first part 1563 H  col3 H col4 1564 H  col3 H col4 ... 3241 H  col3 H col4 3242 third part DATA

###OUTPUT

... 1562 first part 1563 H1  col3 H col4 1564 H2  col3 H col4 ... 3241 H3  col3 H col4 3242 third part

That's the fastest way I can imagine it would be done - especially with a very large file. nl will number only lines containing the string H and insert that number at the head of the line followed by a <tab> character. It indents all other lines with a few spaces.

sed is passed nl's output over the |pipe. sed then replaces the following sequence:

0 or more digits occurring at the beginning of the line (referenced as \1)
0 or more <tab> or <space> characters
0 or more characters that are not H then one character (referenced as \2)

...with \2\1.

So lines not containing an H get this treatment:

^'' .*.$ = ^.*.''$

And those that do get this one:

^(digit)*<tab>(not H)*H.*$ = ^(not H)*H(digit)*.*$

...where '' is an empty string.

For maximum portability you should replace the \t in [ \t] with a literal <tab> character.

{ nl -bpH -w1 | sed 's/^\([0-9]*\)[ \t]*\([^H]*.\)/\2\1/' } <<\DATA ... 1562 first part 1563 H col3 H col4 1564 H col3 H col4 ... 3241 H col3 H col4 3242 third part DATA

###OUTPUT

... 1562 first part 1563 H1 col3 H col4 1564 H2 col3 H col4 ... 3241 H3 col3 H col4 3242 third part

sed is passed nl's output over the |pipe. sed then replaces the following sequence:

0 or more digits occurring at the beginning of the line (referenced as \1)
0 or more <tab> or <space> characters
0 or more characters that are not H then one character (referenced as \2)

...with \2\1.

So lines not containing an H get this treatment:

^'' .*.$ = ^.*.''$

And those that do get this one:

^(digit)*<tab>(not H)*H.*$ = ^(not H)*H(digit)*.*$

...where '' is an empty string.

For maximum portability you should replace the \t in [ \t] with a literal <tab> character.

{ nl -bpH -w1 | sed 's/^\([0-9]*\)[ \t]*\([^H]*.\)/\2\1/' } <<\DATA ... 1562 first part 1563 H  col3 H col4 1564 H  col3 H col4 ... 3241 H  col3 H col4 3242 third part DATA

###OUTPUT

... 1562 first part 1563 H1  col3 H col4 1564 H2  col3 H col4 ... 3241 H3  col3 H col4 3242 third part

sed is passed nl's output over the |pipe. sed then replaces the following sequence:

0 or more digits occurring at the beginning of the line (referenced as \1)
0 or more <tab> or <space> characters
0 or more characters that are not H then one character (referenced as \2)

...with \2\1.

So lines not containing an H get this treatment:

^'' .*.$ = ^.*.''$

And those that do get this one:

^(digit)*<tab>(not H)*H.*$ = ^(not H)*H(digit)*.*$

...where '' is an empty string.

For maximum portability you should replace the \t in [ \t] with a literal <tab> character.

Source Link

answered Aug 18, 2014 at 20:25

mikeserv

59.4k
10
122
242

{ nl -bpH -w1 | sed 's/^\([0-9]*\)[ \t]*\([^H]*.\)/\2\1/' } <<\DATA ... 1562 first part 1563 H col3 H col4 1564 H col3 H col4 ... 3241 H col3 H col4 3242 third part DATA

###OUTPUT

... 1562 first part 1563 H1 col3 H col4 1564 H2 col3 H col4 ... 3241 H3 col3 H col4 3242 third part

sed is passed nl's output over the |pipe. sed then replaces the following sequence:

0 or more digits occurring at the beginning of the line (referenced as \1)
0 or more <tab> or <space> characters
0 or more characters that are not H then one character (referenced as \2)

...with \2\1.

So lines not containing an H get this treatment:

^'' .*.$ = ^.*.''$

And those that do get this one:

^(digit)*<tab>(not H)*H.*$ = ^(not H)*H(digit)*.*$

...where '' is an empty string.

For maximum portability you should replace the \t in [ \t] with a literal <tab> character.