0

My file has the below comma separated values

dev.visualwebsiteoptimizer.com 80,versioncheck-bg.addons.mozilla.org 80, ,frontweb-stg.shoprunner.com 443,p.typekit.net 443,sra.s-9.us 443,www.shoprunner.com 443,cdn.optimizely.com 443,logx.optimizely.com 443,sra.s-9.us 443,ocsp.digicert.com 443,code.jquery.com 443,ocsp2.globalsign.com 443,dev.visualwebsiteoptimizer.com 443,versioncheck-bg.addons.mozilla.org 443, , 

few places i see empty space followed by comma

I would like to have the below output:

dev.visualwebsiteoptimizer.com,versioncheck-bg.addons.mozilla.org,,frontweb-stg.shoprunner.com,p.typekit.net,sra.s-9.us,www.shoprunner.com,cdn.optimizely.com,logx.optimizely.com,sra.s-9.us,ocsp.digicert.com,code.jquery.com,ocsp2.globalsign.com,dev.visualwebsiteoptimizer.com,versioncheck-bg.addons.mozilla.org,, 

Ideally I want remove whitespaces till i see comma,

I tried with

sed -i 's/^[[:space:]]*,/,/g' sample.file 

but nothing favoured.

Any help would be appreciated

3
  • You want to remove the numbers such as 80 as well? Commented Dec 6, 2016 at 22:21
  • 1
    yes I want to see only urls and no portnumbers. Commented Dec 6, 2016 at 22:24
  • sed -i 's/[[:space:]][^,]*,/,/g' this solution works for me, but if my file has the line like A B c,dev.visualwebsiteoptimizer.com 80,versioncheck-bg.addons.mozilla.org 80, I want remove only numbers but this solution is generic for all values followed by space and till , I tried with 's/[[:space:]][^[[0-9]*],]*,/,/g' , i am not sure, what is wrong here. Commented Dec 7, 2016 at 17:32

3 Answers 3

3

First of all, ^ means beginning of line. Remove it.

Secondly, you appear to want to remove all non-commas between each space and the following comma, but you didn't include that in the pattern.

sed -i 's/[[:space:]][^,]*,/,/g' sample.file 
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for your time, Neither this helped me. I am looking to remove whitespaces and whitespaces followed by numbers
Re "Neither this helped me", Fixed. The * got left out accidentally. // Re "I am looking to remove whitespaces and whitespaces followed by numbers", If the Question is wrong, please fix it.
Thanks a lot .. can you please explain me, it would be more helpful for my understanding.
huh? I already explained each change! I removed ^ because you don't want to match the beginning of the line, and I changed [[:space:]]* to [[:space:]][^,]* because you want to match the junk between the spaces and the comma.
There's no way that worked in vim, since [^0-9,] means "a char other than a digit or comma". /// If your question needs updating, do so. Don't post edits in the comments. Or if you're asking a new question, similarly don't post it as a comment.
|
1
awk '{gsub(/[ ]+/,"")gsub(/[0,3-8]/,"")}1' file 

The first gsub removes space and the next one takes away unwanted numbers.

1 Comment

please explain more in detail for non awk-masters
1

A perl solution:

perl -i -pe 's/\s+\d*(?=,)//g' file 

Perl's startup cost is higher than, say, Sed's or Awk's, but Perl's more powerful regular expression support often makes things easier:

  • \s is a convenient shortcut for matching whitespace (tab, space, newline); similarly, \d is a shortcut for [0-9].

  • + as the one-or-more-instances duplication symbol is always available, whereas to use it portably in sed you'd have to use the awkward \{1,\} construct.

  • (?=...) is a look-ahead assertion that allows looking for a subexpression without including it in the match.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.