Skip to main content
added 24 characters in body
Source Link
mikeserv
  • 59.4k
  • 10
  • 122
  • 242
{ nl -bpH -w1 | sed 's/^\([0-9]*\)[ \t]*\([^H]*.\)/\2\1/' } <<\DATA ... 1562 first part 1563 H  col3 H col4 1564 H  col3 H col4 ... 3241 H  col3 H col4 3242 third part DATA 

###OUTPUT

... 1562 first part 1563 H1  col3 H col4 1564 H2  col3 H col4 ... 3241 H3  col3 H col4 3242 third part 

That's the fastest way I can imagine it would be done - especially with a very large file. nl will number only lines containing the string H and insert that number at the head of the line followed by a <tab> character. It indents all other lines with a few spaces.

sed is passed nl's output over the |pipe. sed then replaces the following sequence:

  • 0 or more digits occurring at the beginning of the line (referenced as \1)
  • 0 or more <tab> or <space> characters
  • 0 or more characters that are not H then one character (referenced as \2)

...with \2\1.

So lines not containing an H get this treatment:

^'' .*.$ = ^.*.''$ 

And those that do get this one:

^(digit)*<tab>(not H)*H.*$ = ^(not H)*H(digit)*.*$ 

...where '' is an empty string.

For maximum portability you should replace the \t in [ \t] with a literal <tab> character.

{ nl -bpH -w1 | sed 's/^\([0-9]*\)[ \t]*\([^H]*.\)/\2\1/' } <<\DATA ... 1562 first part 1563 H col3 H col4 1564 H col3 H col4 ... 3241 H col3 H col4 3242 third part DATA 

###OUTPUT

... 1562 first part 1563 H1 col3 H col4 1564 H2 col3 H col4 ... 3241 H3 col3 H col4 3242 third part 

That's the fastest way I can imagine it would be done - especially with a very large file. nl will number only lines containing the string H and insert that number at the head of the line followed by a <tab> character. It indents all other lines with a few spaces.

sed is passed nl's output over the |pipe. sed then replaces the following sequence:

  • 0 or more digits occurring at the beginning of the line (referenced as \1)
  • 0 or more <tab> or <space> characters
  • 0 or more characters that are not H then one character (referenced as \2)

...with \2\1.

So lines not containing an H get this treatment:

^'' .*.$ = ^.*.''$ 

And those that do get this one:

^(digit)*<tab>(not H)*H.*$ = ^(not H)*H(digit)*.*$ 

...where '' is an empty string.

For maximum portability you should replace the \t in [ \t] with a literal <tab> character.

{ nl -bpH -w1 | sed 's/^\([0-9]*\)[ \t]*\([^H]*.\)/\2\1/' } <<\DATA ... 1562 first part 1563 H  col3 H col4 1564 H  col3 H col4 ... 3241 H  col3 H col4 3242 third part DATA 

###OUTPUT

... 1562 first part 1563 H1  col3 H col4 1564 H2  col3 H col4 ... 3241 H3  col3 H col4 3242 third part 

That's the fastest way I can imagine it would be done - especially with a very large file. nl will number only lines containing the string H and insert that number at the head of the line followed by a <tab> character. It indents all other lines with a few spaces.

sed is passed nl's output over the |pipe. sed then replaces the following sequence:

  • 0 or more digits occurring at the beginning of the line (referenced as \1)
  • 0 or more <tab> or <space> characters
  • 0 or more characters that are not H then one character (referenced as \2)

...with \2\1.

So lines not containing an H get this treatment:

^'' .*.$ = ^.*.''$ 

And those that do get this one:

^(digit)*<tab>(not H)*H.*$ = ^(not H)*H(digit)*.*$ 

...where '' is an empty string.

For maximum portability you should replace the \t in [ \t] with a literal <tab> character.

Source Link
mikeserv
  • 59.4k
  • 10
  • 122
  • 242

{ nl -bpH -w1 | sed 's/^\([0-9]*\)[ \t]*\([^H]*.\)/\2\1/' } <<\DATA ... 1562 first part 1563 H col3 H col4 1564 H col3 H col4 ... 3241 H col3 H col4 3242 third part DATA 

###OUTPUT

... 1562 first part 1563 H1 col3 H col4 1564 H2 col3 H col4 ... 3241 H3 col3 H col4 3242 third part 

That's the fastest way I can imagine it would be done - especially with a very large file. nl will number only lines containing the string H and insert that number at the head of the line followed by a <tab> character. It indents all other lines with a few spaces.

sed is passed nl's output over the |pipe. sed then replaces the following sequence:

  • 0 or more digits occurring at the beginning of the line (referenced as \1)
  • 0 or more <tab> or <space> characters
  • 0 or more characters that are not H then one character (referenced as \2)

...with \2\1.

So lines not containing an H get this treatment:

^'' .*.$ = ^.*.''$ 

And those that do get this one:

^(digit)*<tab>(not H)*H.*$ = ^(not H)*H(digit)*.*$ 

...where '' is an empty string.

For maximum portability you should replace the \t in [ \t] with a literal <tab> character.