Remove spaces inbetween fields in a CSV file in UNIX

Question

CSV input file:

"18","Agent","To identify^M ","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M "1078","Repeat","Identify it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M "621","Com Dot Com","Identify ","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M

In the above input file, I have 3 different type of records.

1) Record No 18 (first 2 lines), even though it should be one line it comes in as 2 lines. The ^M is placed incorrectly at the end of first line.

Expected Output (^M removed from first line and make it one line)

"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M

2) Record No 1078 (Line no 3 & 4) - Here i dont have the ^M at the end of line 3. I want to combine Line 3 & 4 and make it one line.

Expected Output

"1078","Repeat","Identify it has ","0164f3eb-beeb-47dd-b9b99b762f430e14","1"^M

3) Record No 621 (Line 4, 5 & 6) - This has ^M only at the end of the line, but it has a blank line inbetween. i want to remove the blankline and make it one line.

Expected Output

"621","Com Dot Com","Identify","7fc9e73e-3470-4b31 8524fcb97a4dadee","1"^M

Please use formatting tools to format your question clearly. — jaypal singh
– jaypal singh, Commented Jul 11, 2014 at 3:20
I tidied up the formatting for you but don't you think you could demonstrate your issues with shorter lines with fewer fields and less text in the fields. It's all very off-putting for anyone considering helping to have to try to read through all of that to figure out where the issues are. It's stopped me from thinking about it at least. — Ed Morton
– Ed Morton, Commented Jul 11, 2014 at 3:38
Deleted the full input and made it shorter for easier readability.. — user3072054
– user3072054, Commented Jul 11, 2014 at 3:48

konsolebox · Accepted Answer · 2014-07-11 04:46:06Z

Using Ruby:

ruby -e 'require "csv"; CSV.parse(File.read(ARGV.shift)).each{ |e| e.map!{ |f| f.strip.gsub(/[[:space:]]+/, " ") }; puts CSV.generate_line(e, {:force_quotes => true}); }' csv_file

Output:

"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1" "1078","Repeat","Identify it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1" "621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"

A little more readable form:

ruby -e 'require "csv" CSV.parse(File.read(ARGV.shift)).each{ |e| e.map!{ |f| f.strip.gsub(/[[:space:]]+/, " ") } puts CSV.generate_line(e, {:force_quotes => true}) }' csv_file

Bash's history expansion may affect the command, so just you can disable it if you want: shopt -u -o histexpand

Script version:

#!/usr/bin/env ruby require 'csv' CSV.parse(File.read(ARGV.shift)).each{ |e| e.map!{ |f| f.strip.gsub(/[[:space:]]+/, " ") } puts CSV.generate_line(e, {:force_quotes => true}) }

Run with

ruby script.rb csv_file

See Ruby-Doc.org for everything.

ooga · Accepted Answer · 2014-07-11 04:49:22Z

0

This might work:

awk -F \",\" ' /^[[:space:]]*$/ { next } { line = line $0 if (split(line, a) == 10) { print line line = "" } } ' file

I have a feeling there will still be some problems (like missing spaces).

edited Jul 11, 2014 at 4:49

answered Jul 11, 2014 at 3:37

ooga

15.6k2 gold badges23 silver badges23 bronze badges

5 Comments

user3072054 Over a year ago

thanks ooga.. It accidentally came up, all the 3 records had ", as start of second line. But it might have other characters too. I modified the record no 1078. Sorry for the confusion.

user3072054 Over a year ago

this worked on a test file (5 records). when i executed this on the real file didnt get the expected output..

ooga Over a year ago

@user3072054 As I have no idea what the "real file" looks like, there's not much I can do! :-) If the file is to big to post here, you can post it to wetransfer.com and post the link here in a comment.

user3072054 Over a year ago

here is the link we.tl/D7Gy6mIa8V . faulty records 18,32,51,56,90,232, 252 etc

ooga Over a year ago

@user3072054 Firstly, the lines do not end in the characters ^M as you said they did, so I'm not sure what you meant by that. Secondly, why don't you just fix the lines manually? You'd have had it done an hour ago.

Ed Morton · Accepted Answer · 2014-07-11 13:49:28Z

Using GNU awk for multi-char RS:

$ awk -v RS='^$' -v ORS= 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) gsub(/\n/,"",$i) }1' file "18","Agent","To identify^M","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M "1078","Repeat","Identifyit has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M "621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M

Since it's not clear if you really have control-Ms or not, I left them as the characters "^M" for now. If you have them just gsub() them out.

Collectives™ on Stack Overflow

Remove spaces inbetween fields in a CSV file in UNIX

3 Answers 3

Comments

5 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

Comments

Related