I have a tsv.-file and there are some lines which do not end with an '"'. So now I would like to remove every line break which is not directly after an '"'. How could I accomplish that with sed? Or any other bash shell program...
Kind regards, Snafu
To elaborate on @Lev's answer, the BSD (OSX) version of sed is less forgiving about the command syntax within the curly braces -- the semicolon command separator is required for both commands:
sed '/"$/!{N;s/\n//;}' file.txt per the documentation here -- an excerpt:
Following an address or address range, sed accepts curly braces '{...}' so several commands may be applied to that line or to the lines matched by the address range. On the command line, semicolons ';' separate each instruction and must precede the closing brace.
This sed command should do it:
sed '/"$/!{N;s/\n//}' file It says: on every line not matching "$ do:
Example:
$ cat file.txt "test" "qwe rty" foo $ sed '/"$/!{N;s/\n//}' file.txt "test" "qwerty" foo $. An address enclosed in /slashes/ matches by regex. The regex is "$, i.e. a quote followed by the end of line. This is the reverse of the OP's requirement of "a line break which is not directly after an "".This might work for you (GNU sed):
sed ':a;/"$/!{N;s/\n//;ta}' file This checks if the last character of the pattern space is a " and if not appends another line, removes a newline and repeats until the condition is met or the end-of-file is encountered.
An alternative is:
sed -r ':a;N;s/([^"])\n/\1/;ta;P;D' file The mechanism is left for the reader to ponder.