0

I have a file with the following records

ABC BCD CDE EFG 

I would like to convert this into

'ABC','BCD','CDE','EFG' 

I attempted to attack this problem using Awk in the following way:

awk '/START/{if (x)print x;x="";next}{x=(!x)?$0:x","$0;}END{print x;}' 

but I obtain not what I expected:

ABC,BCD,CDE,EFG 

Are there any suggestions on how we can achieve this?

1
  • Welcome to SO, edited your capital letters subject to proper one. Commented Sep 6, 2018 at 12:46

4 Answers 4

2

Could you please try following.

awk -v s1="'" 'BEGIN{OFS=","} {val=val?val OFS s1 $0 s1:s1 $0 s1} END{print val}' Input_file 

Output will be as follows.

'ABC','BCD','CDE','EFG' 
Sign up to request clarification or add additional context in comments.

2 Comments

val=val?val OFS s1 $0 s1:s1 $0 s1 = val=(val ? val OFS : "") s1 $0 s1. Improved clarity and less redundancy and improved robustness across all awks since it parenthesizes the ternary expression.
@Vikakmis, you could select any of the answer as correct answer too, to close the loop properly.
2

With GNU awk for multi-char RS:

$ awk -v RS='\n$' -F'\n' -v OFS="','" -v q="'" '{$1=$1; print q $0 q}' file 'ABC','BCD','CDE','EFG' 

Comments

1

awk may be better

awk '{printf fmt,$1}' fmt="'%s'\n" file | paste -sd, - 'ABC','BCD','CDE','EFG' 

2 Comments

Instead of specifying the format in a variable assigned outside of the script you can just use awk '{printf "\047%s\047\n", $1}'
Thank you very much for the helpful tip @Ed Morton :)
1

There are many ways of achieving this:

with pipes:

sed "s/.*/'&'/" <file> | paste -sd, awk '{print '"'"'$0'"'"'}' <file> | paste -sd, 

remark: we do not make use of tr here as this would lead to an extra , at the end.

reading the full file into memory:

sed ':a;N;$!ba;s/\n/'"','"'/g;s/.*/'"'&'"'/g' <file> #POSIX sed -z 's/^\|\n$/'"'"'/g;s/\n/'"','"'/g;' <file> #GNU 

and the solution of @EdMorton

without reading the full file into memory:

awk '{printf (NR>1?",":"")"\047"$0"\047"}' <file> 

and some random other attempts:

awk '(NR-1){s=s","}{s=s"\047"$0"\047"}END{print s}' <file> awk 'BEGIN{printf s="\047";ORS=s","s}(NR>1){print t}{t=$0}END{ORS=s;print t} <file> 

So what is going on with the OP's attempts?

Writing down the OP's awk line, we have

/START/{if (x)print x;x="";next} {x=(!x)?$0:x","$0;} END{print x;} 

What does this do? Let us analyze step by step:

  • /START/{if (x)print x;x="";next}:: This reads If the current record/line contains the string START, then do

    • if (x) print x:: if x is not an empty string, print the value of x
    • x="" set x to be an empty string
    • next:: skip to the next record/line

    In this code block, the OP probably assumed that /START/ means do this at the beginning of all things. In awk, this is however written as BEGIN and since in the beginning, all variables are empty strings or zero, the if statement is not executed by default. This block could be replaced by:

    BEGIN{x=""} 

    But again, this is not needed and thus one can remove it:

  • {x=(!x)?$0:x","$0;}:: concatenate the string with the correct delimiter. This is good, especially due to the usage of the ternary operator. Sadly the delimiter is set to , and not ',' which in awk is best written as \047,\047. So the line could read:

    {x=(!x)?$0:x"\047,\047"$0;} 

    This line, can be written shorter if you realize that x could be an empty string. For an empty string, x=$0 is equivalent to x=x $0 and all you want to do is add a separator which all or not could be an empty string. So you can write this as

    {x= x ((!x)?"":"\047,\047") $0} 

    or inverting the logic to get rid of some more characters:

    {x=x(x?"\047,\047":"")$0} 

    one could even write

    {x=x(x?"\047,\047":x)$0} 

    but this is not optimal as it needs to read what is the memory of x again. However, this form can be used to finally optimize it to (per @EdMorton's comment)

    {x=(x?x"\047,\047":"")$0} 

    This is better as it removes an extra concatenation operator.

  • END{print x}:: Here the OP prints the result. This, however, will miss the final single-quotes at the beginning and end of the string, so they could be added

    END{print "\047" x "\047"} 

So the corrected version of the OP's code would read:

awk '{x=(x?x"\047,\047":"")$0}END{print "\047" x "\047"}' 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.