1

CSV input file:

"18","Agent","To identify^M ","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M "1078","Repeat","Identify it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M "621","Com Dot Com","Identify ","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M 

In the above input file, I have 3 different type of records.

1) Record No 18 (first 2 lines), even though it should be one line it comes in as 2 lines. The ^M is placed incorrectly at the end of first line.

Expected Output (^M removed from first line and make it one line)

"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M 

2) Record No 1078 (Line no 3 & 4) - Here i dont have the ^M at the end of line 3. I want to combine Line 3 & 4 and make it one line.

Expected Output

"1078","Repeat","Identify it has ","0164f3eb-beeb-47dd-b9b99b762f430e14","1"^M 

3) Record No 621 (Line 4, 5 & 6) - This has ^M only at the end of the line, but it has a blank line inbetween. i want to remove the blankline and make it one line.

Expected Output

"621","Com Dot Com","Identify","7fc9e73e-3470-4b31 8524fcb97a4dadee","1"^M 
4
  • Please use formatting tools to format your question clearly. Commented Jul 11, 2014 at 3:20
  • Sure..Thanks..Will do for future posts.. Commented Jul 11, 2014 at 3:28
  • I tidied up the formatting for you but don't you think you could demonstrate your issues with shorter lines with fewer fields and less text in the fields. It's all very off-putting for anyone considering helping to have to try to read through all of that to figure out where the issues are. It's stopped me from thinking about it at least. Commented Jul 11, 2014 at 3:38
  • Deleted the full input and made it shorter for easier readability.. Commented Jul 11, 2014 at 3:48

3 Answers 3

1

Using Ruby:

ruby -e 'require "csv"; CSV.parse(File.read(ARGV.shift)).each{ |e| e.map!{ |f| f.strip.gsub(/[[:space:]]+/, " ") }; puts CSV.generate_line(e, {:force_quotes => true}); }' csv_file 

Output:

"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1" "1078","Repeat","Identify it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1" "621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1" 

A little more readable form:

ruby -e 'require "csv" CSV.parse(File.read(ARGV.shift)).each{ |e| e.map!{ |f| f.strip.gsub(/[[:space:]]+/, " ") } puts CSV.generate_line(e, {:force_quotes => true}) }' csv_file 
  • Bash's history expansion may affect the command, so just you can disable it if you want: shopt -u -o histexpand

Script version:

#!/usr/bin/env ruby require 'csv' CSV.parse(File.read(ARGV.shift)).each{ |e| e.map!{ |f| f.strip.gsub(/[[:space:]]+/, " ") } puts CSV.generate_line(e, {:force_quotes => true}) } 

Run with

ruby script.rb csv_file 

See Ruby-Doc.org for everything.

Sign up to request clarification or add additional context in comments.

Comments

0

This might work:

awk -F \",\" ' /^[[:space:]]*$/ { next } { line = line $0 if (split(line, a) == 10) { print line line = "" } } ' file 

I have a feeling there will still be some problems (like missing spaces).

5 Comments

thanks ooga.. It accidentally came up, all the 3 records had ", as start of second line. But it might have other characters too. I modified the record no 1078. Sorry for the confusion.
this worked on a test file (5 records). when i executed this on the real file didnt get the expected output..
@user3072054 As I have no idea what the "real file" looks like, there's not much I can do! :-) If the file is to big to post here, you can post it to wetransfer.com and post the link here in a comment.
here is the link we.tl/D7Gy6mIa8V . faulty records 18,32,51,56,90,232, 252 etc
@user3072054 Firstly, the lines do not end in the characters ^M as you said they did, so I'm not sure what you meant by that. Secondly, why don't you just fix the lines manually? You'd have had it done an hour ago.
0

Using GNU awk for multi-char RS:

$ awk -v RS='^$' -v ORS= 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) gsub(/\n/,"",$i) }1' file "18","Agent","To identify^M","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M "1078","Repeat","Identifyit has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M "621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M 

Since it's not clear if you really have control-Ms or not, I left them as the characters "^M" for now. If you have them just gsub() them out.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.