awk: deal with newline separation in wrong place

Question

I have some data that looks like this:

abc 123 456 789 def 111 222 333 ghi 999 888 777 666

i.e. the records are separated by multiple newlines but in the wrong place. What I want is to get it like this:

abc 123 456 789 def 111 222 333 ghi 999 888 777 666

I have tried setting RS to \n\n\n in awk but that ends up with the records cut up wrong; the abc term ends up as the final field of the previous record rather than the first field of the current record.

I'm not sure how to use sed for this either since that works on a line-by-line basis.

RudiC · Accepted Answer · 2022-02-05 18:57:02Z

Try

awk '!NF {next} /[^0-9]/ {printf XRS; XRS = ORS} 1' file2 abc 123 456 789 def 111 222 333 ghi 999 888 777

It deletes empty lines (I read from your spec that those are really empty, no spaces etc.), then checks if there is any non-digit, indicating record headers, for which it prints a newline except for the first one which gets an empty string.

Don't use all upper case for user-defined variable names to avoid clashing with builtin names and so it doesn't obfuscate your code by making it look like you're using builtin names when you aren't. — Ed Morton
– Ed Morton, Commented Feb 5, 2022 at 19:48

Ed Morton · Accepted Answer · 2022-02-05 19:43:40Z

Using any awk in any shell on every Unix box:

$ awk '/[^0-9]/ && NR>1{print ""} NF' file abc 123 456 789 def 111 222 333 ghi 999 888 777 666

nezabudka · Accepted Answer · 2022-02-05 19:04:05Z

GNU sed:

sed '1b;/^$/d;/[a-z]/s/^/\n/' file

If there is a letter in the string, insert a newline before it.

JJoao · Accepted Answer · 2022-02-05 20:16:46Z

Assuming that we want to change: [line][emptylines][lines] --> [emptylines][line][lines], you could run something along the lines of

perl -00pe's/(\S.*\n)((\h*\n)+)/$2$1/' ex1

(this is independent of line contents (integer vs noninteger))

DanieleGrassini · Accepted Answer · 2022-02-05 22:55:16Z

Using sed :

sed -n '/^$/d;/^[0-9]*$/{h;n;//!ba;x;G;;p;d};p;d;:a H;g;s/\n/\n\n/;p;' sample.txt

Using awk :

awk ' NF && /^[0-9]*$/{f = 1;print} NF && f && /^[^0-9]*$/{print "\n" $0; f = 0} NR == 1 ' sample.txt

Using perl :

perl -alne 'if(/^\S/){$_ = (/^\d/ || $. == 1) ? $_ : "\n$_";print}' sample.txt

guest_7 · Accepted Answer · 2022-02-07 00:06:20Z

Using perl in paragraph mode (-00) where all consecutive newlines are squashed into one.

$ perl -lp -00e 's/(?=\n[a-z])/\n/' file

Using GNU sed:

$ sed -e '/[a-z]/{H;1h;z;x;}' -e '/./!d' file

awk 'BEGIN{a[1]=ORS} /[a-z]/ && sub(/^/,a[!!k++]) || NF ' file

Using GNU awk with regexified input record separator

gawk -v RS='[a-z]+\\n+' ' NR > 1 { printf "%s%s%s%s", sep, a[1], m[1], $0 sep = ORS } { split(RT,a,ORS,m) } ' file

Praveen Kumar BS · Accepted Answer · 2022-02-07 19:25:29Z

sed -e '2,$s/[a-z].*/===========================\n&/g' -e '/^$/d' filename output abc 123 456 789 =========================== def 111 222 333 =========================== ghi 999 888 777 666

Stack Exchange Network

awk: deal with newline separation in wrong place

7 Answers 7

You must log in to answer this question.

Hot Network Questions

awk: deal with newline separation in wrong place

7 Answers 7

You must log in to answer this question.

Related

Hot Network Questions