Change all but first ',' to "<COMMA>" for each row in file (bash) [duplicate]

Question

I am using bash and have a csv file (dat.csv) that needs to be only two columns (App, Blurb) of data but due to many ',' on each row it is becoming MANY columns.

EXAMPLE OF PROBLEM csv.dat:

 App , Blurb diff, this is the diff program, bla bla bla, yadda yadda word, this is ms product, it is not very good, I dont like it dd, this is a Linux disk application , its awesome!, bla bla, ttly ...

The problem I am having is that because the 'Blurb' col has additional ',' the data is getting piped into subsequent columns (c, d, etc) of the dat.csv file.

THE GOAL is to have all but the first ',' from every row be changed to "COMMA" so all the 'Blurb' data remains in col B.

E.g. DESIRED OUTPUT:

 App, Blurb diff, this is the diff program<COMMMA> bla bla bla<COMMA> yadda yadda word, this is ms product<COMMA> it is not very good<COMMA> I dont like it dd, this is a Linux disk application <COMMA> its awesome!<COMMA>bla bla<COMMA> ttly ...

Thanks!

Bash isn't a text editor; you just need something you can run from the command-line, yes? — Jeff Schaller
– Jeff Schaller ♦, Commented Aug 30, 2018 at 16:33

αғsнιη · Accepted Answer · 2018-08-30 16:39:44Z

4

Using GNU sed:

sed 's/,/<COMMA>/2g' infile

Or portability:

sed 's/,/<COMMA>/g; s/<COMMA>/,/' infile

edited Aug 30, 2018 at 16:39

answered Aug 30, 2018 at 16:34

αғsнιη

41.9k17 gold badges75 silver badges118 bronze badges

2

GNU sed docs: "Only replace the numberth match of the regexp. interaction in s command Note: the POSIX standard does not specify what should happen when you mix the g and number modifiers, and currently there is no widely agreed upon meaning across sed implementations. For GNU sed, the interaction is defined to be: ignore matches before the numberth, and then match and replace all matches from the numberth on."

Jeff Schaller
– Jeff Schaller ♦

2018-08-30 16:37:36 +00:00
Commented Aug 30, 2018 at 16:37
@afshin In case the string "<COMMA>" were to exist in the first field, that is to say, before the first "," then the portable sed code would disturb the first field.

Rakesh Sharma
– Rakesh Sharma

2018-08-30 18:49:51 +00:00
Commented Aug 30, 2018 at 18:49
@RakeshSharma that's an exception, which can be modified to sed 's/,/something-Uniq/; s/,/<COMMA>/g; s/something-Uniq/,/' infile with another exception that something-Uniq never be occur in the input file where you had the same answer about this.

αғsнιη
– αғsнιη

2018-08-30 18:57:55 +00:00
Commented Aug 30, 2018 at 18:57
1

@afshin Now the problem has been transferred to something-uniq. what if it too is already sitting there before the first , ? This something-uniq can only be a newline.

Rakesh Sharma
– Rakesh Sharma

2018-08-30 19:03:03 +00:00
Commented Aug 30, 2018 at 19:03

Add a comment |

Rakesh Sharma · Accepted Answer · 2018-08-30 18:41:17Z

You could also do it POSIX-ly as follows:

sed -e ' y/,/\n/ ;# change all commas to newlines, which are guaranteed to not be there s/\n/,/ ;# then change the first of those newlines to a comma, i.e., restore s//<COMMA>/g ;# and all the remaining newline(s) change to <COMMA> ' dat.csv

Stéphane Chazelas · Accepted Answer · 2018-08-30 19:37:43Z

Maybe you can put quotes around the fields, which should tell csv parsers that the commas inside are not field separators:

sed 's/"/""/g; # escape existing " as "" s/[[:space:]]*,[[:space:]]*/","/; # replace the first , and the # whitespace around it with "," s/^[[:space:]]*/"/; # add a " at the start (and # get rid of whitespace there) s/[[:space:]]*$/"/; # same at the end'

@Stephane_Chazelas Some commenting is in order here to indicate the flow of events going on. — Rakesh Sharma
– Rakesh Sharma, Commented Aug 30, 2018 at 18:51

Stack Exchange Network

Change all but first ',' to "<COMMA>" for each row in file (bash) [duplicate]

3 Answers 3

Linked

Hot Network Questions

Change all but first ',' to "<COMMA>" for each row in file (bash) [duplicate]

3 Answers 3

Linked

Related

Hot Network Questions