4

I have some data that looks like this:

abc 123 456 789 def 111 222 333 ghi 999 888 777 666 

i.e. the records are separated by multiple newlines but in the wrong place. What I want is to get it like this:

abc 123 456 789 def 111 222 333 ghi 999 888 777 666 

I have tried setting RS to \n\n\n in awk but that ends up with the records cut up wrong; the abc term ends up as the final field of the previous record rather than the first field of the current record.

I'm not sure how to use sed for this either since that works on a line-by-line basis.

0

7 Answers 7

5

Try

awk '!NF {next} /[^0-9]/ {printf XRS; XRS = ORS} 1' file2 abc 123 456 789 def 111 222 333 ghi 999 888 777 

It deletes empty lines (I read from your spec that those are really empty, no spaces etc.), then checks if there is any non-digit, indicating record headers, for which it prints a newline except for the first one which gets an empty string.

1
  • 1
    Don't use all upper case for user-defined variable names to avoid clashing with builtin names and so it doesn't obfuscate your code by making it look like you're using builtin names when you aren't. Commented Feb 5, 2022 at 19:48
5

Using any awk in any shell on every Unix box:

$ awk '/[^0-9]/ && NR>1{print ""} NF' file abc 123 456 789 def 111 222 333 ghi 999 888 777 666 
3

GNU sed:

sed '1b;/^$/d;/[a-z]/s/^/\n/' file 

If there is a letter in the string, insert a newline before it.

2

Assuming that we want to change: [line][emptylines][lines] --> [emptylines][line][lines], you could run something along the lines of

perl -00pe's/(\S.*\n)((\h*\n)+)/$2$1/' ex1 

(this is independent of line contents (integer vs noninteger))

1

Using sed :

sed -n '/^$/d;/^[0-9]*$/{h;n;//!ba;x;G;;p;d};p;d;:a H;g;s/\n/\n\n/;p;' sample.txt 

Using awk :

awk ' NF && /^[0-9]*$/{f = 1;print} NF && f && /^[^0-9]*$/{print "\n" $0; f = 0} NR == 1 ' sample.txt 

Using perl :

perl -alne 'if(/^\S/){$_ = (/^\d/ || $. == 1) ? $_ : "\n$_";print}' sample.txt 
1

Using perl in paragraph mode (-00) where all consecutive newlines are squashed into one.

$ perl -lp -00e 's/(?=\n[a-z])/\n/' file 

Using GNU sed:

$ sed -e '/[a-z]/{H;1h;z;x;}' -e '/./!d' file 

awk 'BEGIN{a[1]=ORS} /[a-z]/ && sub(/^/,a[!!k++]) || NF ' file 

Using GNU awk with regexified input record separator

gawk -v RS='[a-z]+\\n+' ' NR > 1 { printf "%s%s%s%s", sep, a[1], m[1], $0 sep = ORS } { split(RT,a,ORS,m) } ' ​file 
0
sed -e '2,$s/[a-z].*/===========================\n&/g' -e '/^$/d' filename output abc 123 456 789 =========================== def 111 222 333 =========================== ghi 999 888 777 666 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.