0

I have the following CSV file:

ID,COUNTRY,DESCRIPTION,PRICE 1,USA,Short description,11 2,UK,"Description with comma , inside",2.3 

So if field contains comma, it is put in double quotes.

How to replace delimiter from comma to tabulator but ignore commas in quotes? So I will have:

ID COUNTRY DESCRIPTION PRICE 1 USA Short description 11 2 UK "Description with comma , inside" 2.3 

or without double quotes:

ID COUNTRY DESCRIPTION PRICE 1 USA Short description 11 2 UK Description with comma , inside 2.3 

I use this code before I noticed that some fields contain commas:

$VAR='\t' sed -i $"s/,/$VAR/" $FILE_NAME 
4
  • 1
    On So its highly encouraged to add efforts in questions. So kindly do add your efforts in form of code in your question and let us know then. Commented Aug 31, 2020 at 13:54
  • 1
    it was mistake, edited Commented Aug 31, 2020 at 13:59
  • Look into csvkit tools. Commented Aug 31, 2020 at 19:42
  • 1
    Don't you need to convert tabs in your data to blanks or something else so you don't end up with more fields than you started with? See also whats-the-most-robust-way-to-efficiently-parse-csv-using-awk. Commented Sep 1, 2020 at 12:02

3 Answers 3

5

With GNU awk, could you please try following written with shown samples(In case you want to make edit within Input_file itself then append > temp && mv temp Input_file in following solutions too).

awk -v FPAT='[^,]*|"[^"]+"' -v OFS="\t" '{$1=$1}1' Input_file 

OR to remove " from starting and ending of the fields of line then run following.

awk -v FPAT='[^,]*|"[^"]+"' -v OFS="\t" ' { for(i=1;i<=NF;i++){ gsub(/^"|"$/,"",$i) } $1=$1 } 1 ' Input_file 
Sign up to request clarification or add additional context in comments.

3 Comments

thanks for this answer, that FPAT looks to be exactly what I needed...unfortunately it does not work in our system, because we have lower awk installed (< 4.0)
@user1632454, sure no issues cheers and happy learning.
@user1632454 you should update your gawk version as you're about 10 years out of date and missing a few bug fixes and a ton of extremely useful new functionality.
3

This might work for you (GNU sed):

sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^"]*),/\1\n/;ta;y/,/\t/;y/\n/,/' file 

Convert all commas inside double quotes to newlines, convert remaining commas to tabs (or whatever) and then convert newlines back to commas.

N.B. This will not work if the new delimiter is a newline or if there are quoted double quotes inside double quotes.

Comments

3

Ruby has a nice csv module:

ruby -rcsv -e ' out = CSV.new($stdout, col_sep: "\t") CSV.foreach(ARGV.shift) {|row| out << row} ' file.csv 

Will automatically quote output fields with tabs:

$ cat -A file.csv ID,COUNTRY,DESCRIPTION,PRICE$ 1,USA,Short description,11$ 2,UK,"Description with comma , inside",2.3$ 3,CA,"Description with^Itab inside",2.3$ $ ruby -rcsv -e ' out = CSV.new($stdout, col_sep: "\t") CSV.foreach(ARGV.shift) {|row| out << row} ' file.csv ID COUNTRY DESCRIPTION PRICE 1 USA Short description 11 2 UK Description with comma , inside 2.3 3 CA "Description with tab inside" 2.3 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.