0

I use a txt file, and I need to convert it in csv file

Saint Petersburg 0 10 0.1 - N Moscow - 9 0 - N Novgorod 0 7 1 30 Y 

In bash, how can I insert comma after the last letter, and after every number or "-"

For example

Saint Petersburg, 0, 10, 0.1, -, N Moscow, -, 9, 0, -, N Novgorod, 0, 7, 1, 30, Y 

Best

3
  • Must this be done with bash alone? Are the fields separated by tabs? Commented Nov 10, 2014 at 17:54
  • it is not a problem with another language, but I prefer not to mix codes...but it is not a problem. The number fields are separated by tabs, well I think. The cities field and the second field with several tabs...I think... Commented Nov 10, 2014 at 18:01
  • @EnricAgudPique when asking such questions, the nature of the field delimiter is essential. Most solutions will depend on it. You can find out by passing your file through od : od -c file.csv. Commented Nov 10, 2014 at 18:42

3 Answers 3

3

This will replace sequences of 2 or more spaces by a command and the same spaces less one:

sed 's/ \( \+\)/,\1/g' file.txt 

Here using the \+ extension of GNU regexps, the standard equivalent being \{1,\}, or use the -E option to switch to extended regular expressions where + is standard:

sed -E 's/ ( +)/,\1/g' file.txt 

It will fail if there is only one space between columns.


If you only want ", " as the field separator:

sed 's/ \{2,\}/, /g' file 

same as

sed -E 's/ {2,}/, /g' file 

Or use:

sed -E 's/ {2,}/,/g' file 

For no spaces around the ,s as in most CSV formats.

1
  • It is great!!! Now I want to delete spaces between fields, and I use sed -r 's/\s+/ /g' ...it runs...is it possible to use both commands in a simple line? Commented Nov 10, 2014 at 18:15
1

Using Miller (mlr) to read the input and treat any multiple of two or more spaces as a field delimiter. The output format is header-less CSV; we use Miller's cat operation to pass the data through unchanged.

$ mlr --nidx --ifs-regex ' {2,}' --ocsv --headerless-csv-output cat file Saint Petersburg,0,10,0.1,-,N Moscow,-,9,0,-,N Novgorod,0,7,1,30,Y 
0

Instead of relying of there being at least two spaces between fields, you could work on the fact that there are always 6 fields the last 5 not contain whitespace and do:

perl -lpe 's/^\s*(.*?)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s*$/$1,$2,$3,$4,$5,$6/' file Saint Petersburg,0,10,0.1,-,N Moscow,-,9,0,-,N Novgorod,0,7,1,30,Y 

Or to make sure that the output is valid CSV, use the Text::CSV module which will ensure fields are quoted if needed:

$ perl -MText::CSV -lne ' BEGIN{$c = Text::CSV->new({binary=>1})} $c->print(*STDOUT, \@{^CAPTURE}) if /^\s*(.*?)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s*$/' file2 "Saint Petersburg",0,10,0.1,-,N "Moscow, Russia Capital",-,9,0,-,N Novgorod,0,7,1,30,Y 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.