2

I've got a CSV file that looks like:

1,3,"3,5",4,"5,5" 

Now I want to change all the "," not within quotes to ";" with sed, so it looks like this:

1;3;"3,5";5;"5,5" 

But I can't find a pattern that works.

1
  • this was just covered recently here. Search for tag=gawk/awk and CSV. Very hard to do, especially with sed given the data you have shown. Good luck. Commented Jan 22, 2012 at 18:34

4 Answers 4

2

If you are expecting only numbers then the following expression will work

sed -e 's/,/;/g' -e 's/\("[0-9][0-9]*\);\([0-9][0-9]*"\)/\1,\2/g' 

e.g.

$ echo '1,3,"3,5",4,"5,5"' | sed -e 's/,/;/g' -e 's/\("[0-9][0-9]*\);\([0-9][0-9]*"\)/\1,\2/g' 1;3;"3,5";4;"5,5" 

You can't just replace the [0-9][0-9]* with .* to retain any , in that is delimted by quotes, .* is too greedy and matches too much. So you have to use [a-z0-9]*

$ echo '1,3,"3,5",4,"5,5",",6","4,",7,"a,b",c' | sed -e 's/,/;/g' -e 's/\("[a-z0-9]*\);\([a-z0-9]*"\)/\1,\2/g' 1;3;"3,5";4;"5,5";",6";"4,";7;"a,b";c 

It also has the advantage over the first solution of being simple to understand. We just replace every , by ; and then correct every ; in quotes back to a ,

Sign up to request clarification or add additional context in comments.

Comments

1

You could try something like this:

echo '1,3,"3,5",4,"5,5"' | sed -r 's|("[^"]*),([^"]*")|\1\x1\2|g;s|,|;|g;s|\x1|,|g' 

which replaces all commas within quotes with \x1 char, then replaces all commas left with semicolons, and then replaces \x1 chars back to commas. This might work, given the file is correctly formed, there're initially no \x1 chars in it and there're no situations where there is a double quote inside double quotes, like "a\"b".

3 Comments

Heck, write a script to catch the bad cases (false positive on \\" would probably be better than missing \").
I could suggest the same, but that's not what author wanted :)
In CSV, double quotes in quotes are supposed to be written as "foo""bar". I know because looked it up in RFC 4180. (Not that this helps if the person generating the data had never looked at the spec…)
0

Using gawk

gawk '{$1=$1}1' FPAT="([^,]+)|(\"[^\"]+\")" OFS=';' filename 

Test:

[jaypal:~/Temp] cat filename 1,3,"3,5",4,"5,5" [jaypal:~/Temp] gawk '{$1=$1}1' FPAT='([^,]+)|(\"[^\"]+\")' OFS=';' filename 1;3;"3,5";4;"5,5" 

Comments

0

This might work for you:

echo '1,3,"3,5",4,"5,5"' | sed 's/\("[^",]*\),\([^"]*"\)/\1\n\2/g;y/,/;/;s/\n/,/g' 1;3;"3,5";4;"5,5" 

Here's alternative solution which is longer but more flexible:

echo '1,3,"3,5",4,"5,5"' | sed 's/^/\n/;:a;s/\n\([^,"]\|"[^"]*"\)/\1\n/;ta;s/\n,/;\n/;ta;s/\n//' 1;3;"3,5";4;"5,5" 

2 Comments

I have been waiting for your answer over here
@Jaypal Thanks, I've submitted a solution. Also added an alternative to this one :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.