Skip to main content
more straight forward
Source Link
Philippos
  • 13.8k
  • 2
  • 42
  • 82

I suggest to use sed to collect lines in the hold space to check whether they appeared before:

 sed -n 'H;G;/^\(C([^)]*)\).*\1 *\n/d;P'!P' 
  • H appends the current line to the hold space
  • G appends the hold space with all lines we ever saw to the pattern space
  • C([^)]*) is one of those C(…) patterns, the ^ anchors it to the beginning of the line and it's surrounded by \(…\), so it can be backreferenced as \1 later. We need \1 *\n as pattern, with the nexlinenewline (after possible whitespaces) to avoid matching the freshly appended line at the end. So the whole pattern /^\(C([^)]*)\).*\1 *\n/ matches a line with a duplicate C(…), so weonly if this d!elete that one doesn't match,
  • otherwise we Print everything before the first newline (= without the appended hold space), while default output is suppressed by the -n option

Note that depending on you sed version and file size, this may fail because over the time, all lines will be in memory.

I suggest to use sed collect lines in the hold space to check whether they appeared before:

 sed -n 'H;G;/^\(C([^)]*)\).*\1 *\n/d;P' 
  • H appends the current line to the hold space
  • G appends the hold space with all lines we ever saw to the pattern space
  • C([^)]*) is one of those C(…) patterns, the ^ anchors it to the beginning of the line and it's surrounded by \(…\), so it can be backreferenced as \1 later. We need \1 *\n as pattern, with the nexline (after possible whitespaces) to avoid matching the freshly appended line at the end. So the whole pattern /^\(C([^)]*)\).*\1 *\n/ matches a line with a duplicate C(…), so we delete that one
  • otherwise we Print everything before the first newline (= without the appended hold space), while default output is suppressed by the -n option

I suggest to use sed to collect lines in the hold space to check whether they appeared before:

 sed -n 'H;G;/^\(C([^)]*)\).*\1 *\n/!P' 
  • H appends the current line to the hold space
  • G appends the hold space with all lines we ever saw to the pattern space
  • C([^)]*) is one of those C(…) patterns, the ^ anchors it to the beginning of the line and it's surrounded by \(…\), so it can be backreferenced as \1 later. We need \1 *\n as pattern, with the newline (after possible whitespaces) to avoid matching the freshly appended line at the end. So the whole pattern /^\(C([^)]*)\).*\1 *\n/ matches a line with a duplicate C(…), so only if this ! doesn't match,
  • Print everything before the first newline (= without the appended hold space), while default output is suppressed by the -n option

Note that depending on you sed version and file size, this may fail because over the time, all lines will be in memory.

added 340 characters in body
Source Link
Philippos
  • 13.8k
  • 2
  • 42
  • 82

I suggest to use sed collect lines in the hold space to check whether they appeared before:

 sed -n 'H;G;/^\(C([^)]*) *\\).*\1 *\n/d;P' 
  • H appends the current line to the hold space
  • G appends the hold space with all lines we ever saw to the pattern space
  • C([^)]*) is one of those C(…) patterns, the ^ anchors it to the beginning of the line and it's surrounded by \(…\), so it can be backreferenced as \1 later. We need \1 *\n as pattern, with the nexline (after possible whitespaces) to avoid matching the freshly appended line at the end. So the whole pattern /^\(C([^)]*) *\\).*\1 *\n/d deletes linesmatches a line with a duplicate C(…), so we delete that one
  • otherwise we Print everything before the first newline (= without the appended hold space), while default output is suppressed by the -n option

I suggest to use sed collect lines in the hold space to check whether they appeared before:

 sed -n 'H;G;/^\(C([^)]*) *\).*\1 *\n/d;P' 
  • H appends the current line to the hold space
  • G appends the hold space with all lines we ever saw to the pattern space
  • /^\(C([^)]*) *\).*\1 *\n/d deletes lines with a duplicate C(…)
  • otherwise we Print everything before the first newline (= without the appended hold space), while default output is suppressed by the -n option

I suggest to use sed collect lines in the hold space to check whether they appeared before:

 sed -n 'H;G;/^\(C([^)]*)\).*\1 *\n/d;P' 
  • H appends the current line to the hold space
  • G appends the hold space with all lines we ever saw to the pattern space
  • C([^)]*) is one of those C(…) patterns, the ^ anchors it to the beginning of the line and it's surrounded by \(…\), so it can be backreferenced as \1 later. We need \1 *\n as pattern, with the nexline (after possible whitespaces) to avoid matching the freshly appended line at the end. So the whole pattern /^\(C([^)]*)\).*\1 *\n/ matches a line with a duplicate C(…), so we delete that one
  • otherwise we Print everything before the first newline (= without the appended hold space), while default output is suppressed by the -n option
Source Link
Philippos
  • 13.8k
  • 2
  • 42
  • 82

I suggest to use sed collect lines in the hold space to check whether they appeared before:

 sed -n 'H;G;/^\(C([^)]*) *\).*\1 *\n/d;P' 
  • H appends the current line to the hold space
  • G appends the hold space with all lines we ever saw to the pattern space
  • /^\(C([^)]*) *\).*\1 *\n/d deletes lines with a duplicate C(…)
  • otherwise we Print everything before the first newline (= without the appended hold space), while default output is suppressed by the -n option