Join lines into one line using awk

Question

I have a file with the following records

ABC BCD CDE EFG

I would like to convert this into

'ABC','BCD','CDE','EFG'

I attempted to attack this problem using Awk in the following way:

awk '/START/{if (x)print x;x="";next}{x=(!x)?$0:x","$0;}END{print x;}'

but I obtain not what I expected:

ABC,BCD,CDE,EFG

Are there any suggestions on how we can achieve this?

Welcome to SO, edited your capital letters subject to proper one. — RavinderSingh13
– RavinderSingh13, Commented Sep 6, 2018 at 12:46

RavinderSingh13 · Accepted Answer · 2018-09-06 12:37:02Z

2

Could you please try following.

awk -v s1="'" 'BEGIN{OFS=","} {val=val?val OFS s1 $0 s1:s1 $0 s1} END{print val}' Input_file

Output will be as follows.

'ABC','BCD','CDE','EFG'

answered Sep 6, 2018 at 12:37

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ed Morton Over a year ago

val=val?val OFS s1 $0 s1:s1 $0 s1 = val=(val ? val OFS : "") s1 $0 s1. Improved clarity and less redundancy and improved robustness across all awks since it parenthesizes the ternary expression.

RavinderSingh13 Over a year ago

@Vikakmis, you could select any of the answer as correct answer too, to close the loop properly.

Ed Morton · Accepted Answer · 2018-09-06 13:31:35Z

With GNU awk for multi-char RS:

$ awk -v RS='\n$' -F'\n' -v OFS="','" -v q="'" '{$1=$1; print q $0 q}' file 'ABC','BCD','CDE','EFG'

justaguy · Accepted Answer · 2018-09-06 13:10:58Z

1

awk may be better

awk '{printf fmt,$1}' fmt="'%s'\n" file | paste -sd, - 'ABC','BCD','CDE','EFG'

edited Sep 6, 2018 at 13:10

answered Sep 6, 2018 at 12:46

justaguy

3,0224 gold badges23 silver badges43 bronze badges

2 Comments

Ed Morton Over a year ago

Instead of specifying the format in a variable assigned outside of the script you can just use awk '{printf "\047%s\047\n", $1}'

justaguy Over a year ago

Thank you very much for the helpful tip @Ed Morton :)

kvantour · Accepted Answer · 2018-09-06 15:28:34Z

There are many ways of achieving this:

with pipes:

sed "s/.*/'&'/" <file> | paste -sd, awk '{print '"'"'$0'"'"'}' <file> | paste -sd,

remark: we do not make use of tr here as this would lead to an extra , at the end.

reading the full file into memory:

sed ':a;N;$!ba;s/\n/'"','"'/g;s/.*/'"'&'"'/g' <file> #POSIX sed -z 's/^\|\n$/'"'"'/g;s/\n/'"','"'/g;' <file> #GNU

and the solution of @EdMorton

without reading the full file into memory:

awk '{printf (NR>1?",":"")"\047"$0"\047"}' <file>

and some random other attempts:

awk '(NR-1){s=s","}{s=s"\047"$0"\047"}END{print s}' <file> awk 'BEGIN{printf s="\047";ORS=s","s}(NR>1){print t}{t=$0}END{ORS=s;print t} <file>

So what is going on with the OP's attempts?

Writing down the OP's awk line, we have

/START/{if (x)print x;x="";next} {x=(!x)?$0:x","$0;} END{print x;}

What does this do? Let us analyze step by step:

/START/{if (x)print x;x="";next}:: This reads If the current record/line contains the string START, then do
- if (x) print x:: if x is not an empty string, print the value of x
- x="" set x to be an empty string
- next:: skip to the next record/line
In this code block, the OP probably assumed that /START/ means do this at the beginning of all things. In awk, this is however written as BEGIN and since in the beginning, all variables are empty strings or zero, the if statement is not executed by default. This block could be replaced by:
```
BEGIN{x=""} 
```
But again, this is not needed and thus one can remove it:
{x=(!x)?$0:x","$0;}:: concatenate the string with the correct delimiter. This is good, especially due to the usage of the ternary operator. Sadly the delimiter is set to , and not ',' which in awk is best written as \047,\047. So the line could read:
```
{x=(!x)?$0:x"\047,\047"$0;} 
```
This line, can be written shorter if you realize that x could be an empty string. For an empty string, x=$0 is equivalent to x=x $0 and all you want to do is add a separator which all or not could be an empty string. So you can write this as
```
{x= x ((!x)?"":"\047,\047") $0} 
```
or inverting the logic to get rid of some more characters:
```
{x=x(x?"\047,\047":"")$0} 
```
one could even write
```
{x=x(x?"\047,\047":x)$0} 
```
but this is not optimal as it needs to read what is the memory of x again. However, this form can be used to finally optimize it to (per @EdMorton's comment)
```
{x=(x?x"\047,\047":"")$0} 
```
This is better as it removes an extra concatenation operator.
END{print x}:: Here the OP prints the result. This, however, will miss the final single-quotes at the beginning and end of the string, so they could be added
```
END{print "\047" x "\047"} 
```

So the corrected version of the OP's code would read:

awk '{x=(x?x"\047,\047":"")$0}END{print "\047" x "\047"}'

Collectives™ on Stack Overflow

Join lines into one line using awk

4 Answers 4

2 Comments

Comments

2 Comments

Comments

Hot Network Questions