{ nl -bpH -w1 | sed 's/^\([0-9]*\)[ \t]*\([^H]*.\)/\2\1/' } <<\DATA ... 1562 first part 1563 H col3 H col4 1564 H col3 H col4 ... 3241 H col3 H col4 3242 third part DATA ###OUTPUT
... 1562 first part 1563 H1 col3 H col4 1564 H2 col3 H col4 ... 3241 H3 col3 H col4 3242 third part That's the fastest way I can imagine it would be done - especially with a very large file. nl will number only lines containing the string H and insert that number at the head of the line followed by a <tab> character. It indents all other lines with a few spaces.
sed is passed nl's output over the |pipe. sed then replaces the following sequence:
- 0 or more digits occurring at the beginning of the line (referenced as
\1) - 0 or more
<tab>or<space>characters - 0 or more characters that are not H then one character (referenced as
\2)
...with \2\1.
So lines not containing an H get this treatment:
^'' .*.$ = ^.*.''$ And those that do get this one:
^(digit)*<tab>(not H)*H.*$ = ^(not H)*H(digit)*.*$ ...where '' is an empty string.
For maximum portability you should replace the \t in [ \t] with a literal <tab> character.