-2

I am using bash and have a csv file (dat.csv) that needs to be only two columns (App, Blurb) of data but due to many ',' on each row it is becoming MANY columns.

EXAMPLE OF PROBLEM csv.dat:

 App , Blurb diff, this is the diff program, bla bla bla, yadda yadda word, this is ms product, it is not very good, I dont like it dd, this is a Linux disk application , its awesome!, bla bla, ttly ... 

The problem I am having is that because the 'Blurb' col has additional ',' the data is getting piped into subsequent columns (c, d, etc) of the dat.csv file.

THE GOAL is to have all but the first ',' from every row be changed to "COMMA" so all the 'Blurb' data remains in col B.

E.g. DESIRED OUTPUT:

 App, Blurb diff, this is the diff program<COMMMA> bla bla bla<COMMA> yadda yadda word, this is ms product<COMMA> it is not very good<COMMA> I dont like it dd, this is a Linux disk application <COMMA> its awesome!<COMMA>bla bla<COMMA> ttly ... 

Thanks!

2
  • Bash isn't a text editor; you just need something you can run from the command-line, yes? Commented Aug 30, 2018 at 16:33
  • Sorry, I should have said "bash script". thx Commented Aug 30, 2018 at 18:28

3 Answers 3

4

Using GNU sed:

sed 's/,/<COMMA>/2g' infile 

Or portability:

sed 's/,/<COMMA>/g; s/<COMMA>/,/' infile 
4
  • 2
    GNU sed docs: "Only replace the numberth match of the regexp. interaction in s command Note: the POSIX standard does not specify what should happen when you mix the g and number modifiers, and currently there is no widely agreed upon meaning across sed implementations. For GNU sed, the interaction is defined to be: ignore matches before the numberth, and then match and replace all matches from the numberth on." Commented Aug 30, 2018 at 16:37
  • @afshin In case the string "<COMMA>" were to exist in the first field, that is to say, before the first "," then the portable sed code would disturb the first field. Commented Aug 30, 2018 at 18:49
  • @RakeshSharma that's an exception, which can be modified to sed 's/,/something-Uniq/; s/,/<COMMA>/g; s/something-Uniq/,/' infile with another exception that something-Uniq never be occur in the input file where you had the same answer about this. Commented Aug 30, 2018 at 18:57
  • 1
    @afshin Now the problem has been transferred to something-uniq. what if it too is already sitting there before the first , ? This something-uniq can only be a newline. Commented Aug 30, 2018 at 19:03
2

You could also do it POSIX-ly as follows:

sed -e ' y/,/\n/ ;# change all commas to newlines, which are guaranteed to not be there s/\n/,/ ;# then change the first of those newlines to a comma, i.e., restore s//<COMMA>/g ;# and all the remaining newline(s) change to <COMMA> ' dat.csv 
2

Maybe you can put quotes around the fields, which should tell csv parsers that the commas inside are not field separators:

sed 's/"/""/g; # escape existing " as "" s/[[:space:]]*,[[:space:]]*/","/; # replace the first , and the # whitespace around it with "," s/^[[:space:]]*/"/; # add a " at the start (and # get rid of whitespace there) s/[[:space:]]*$/"/; # same at the end' 
1
  • @Stephane_Chazelas Some commenting is in order here to indicate the flow of events going on. Commented Aug 30, 2018 at 18:51

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.