2

I'm new in using regex, hope someone can help me. I'm using the regex below to grep a csv file for string that exactly has one pipe character (i.e. |)

grep "^([^\\|]+\\|){1}[^\\|]+$" myfile.csv 

Unfortunately, the above yields no result when used with grep. Any ideas?

A sample csv file content is as below, where I expect the 2nd line to be found.

"foo"|"foo"|"foo" "bar"|"bar" 

Solutions to this question:

grep -E "^([^|]+\|){1}[^|]+$" myfile.csv 

and

egrep "^[^|]+\|[^|]+$" myfile.csv 
8
  • 1
    This is exactly the kind of thing that regexes really shouldn't be used for: they're not good at counting. Your language/framework of choice may very well have a str.count() method or function; it certainly has a str.find() that would be much more appropriate. Commented Sep 20, 2013 at 2:57
  • 1
    @JoshCaswell I agree that this might be easier if using a language that has something like this, but it's also perfectly fine for regex (and there are certainly applications for regex where there is no host language available as you suggest). As the OP shows, she's using grep. Commented Sep 20, 2013 at 3:01
  • You may want to specify the -E flag to grep in order to get full "extended" regex support. Commented Sep 20, 2013 at 3:04
  • @Phrogz: It's easy enough to substitute grep for another more appropriate tool. Commented Sep 20, 2013 at 3:12
  • 1
    Thanks for your responses! But this is purely for adhoc thingy (i.e. grep only) to find problematic entries in a csv file. I would have done it differently if I'm using it in my code. :) Btw, thanks a lot @Phrogz for the -E tip plus that of @arshajii about escaping |. This one works now perfectly! `grep -E "^([^|]+\|){1}[^|]+$" myfile.csv' Commented Sep 20, 2013 at 4:05

5 Answers 5

5

You can try:

^[^|]*\|[^|]*$ 

You don't need to escape | in a character class. Also you presumably want * instead of + here to allow for strings like |abc, xyz|, and just | on its own.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the info about escaping '|'. I also used this pattern too earlier but it returns all lines (both with 1 and 2 '|').
1

Try the following:

^[^|]+\|[^|]+$

5 Comments

Thanks, but I forgot to mention that I tried that regex pattern too before the grouped one I used in my question. But this pattern does not return any result.
No, * instead of + because there is no requirement that it not start or end with a |
@pguardiario, Well that was never specified and since the initial OP's regex was using the + repetition operator, I assumed that's what he wanted. You should also look at the Solutions to this question: part of his post. He is using + and you will see that my answer is in the solution list while the selected answer isin't ;)
So because his wrong solution uses +, you're arguing that the correct solution must use +? I'm sorry this is a weak and weird argument.
@pguardiario, Who are you to decide that it's a wrong solution when the OP himself edited his question by adding working solutions to his problem and note that these uses the + operator. I initially assumed that the data format was rigid an couldn't contain empty values based on the fact that the initial OP's regex was using the + operator and you assumed the opposite based on what? I'm sorry, but it seems you are the one making invalid assumptions here. Will you now downvote the other answers using *? I hope not.
1

Solution using awk

awk 'gsub(/\|/,"|")==1' file 

gsub(/\|/,"|") this counts number of | replaced, if this equal 1, then do default action, print $0

Edit:Another awk:

awk 'split($0,a,"|")==2' file 

Count how many parts text is dived into by |, if 2 print.

Comments

0

Here are the solutions to my question. Thanks to the comments that led me to solving this.

grep -E "^([^|]+\|){1}[^|]+$" myfile.csv 

and

egrep "^[^|]+\|[^|]+$" myfile.csv 

Comments

0

Grep and regexes are the wrong tool for this task. Use something that is intended for counting:

# Use a split function with the pipe as delimiter awk 'split($0, _, "|") == 2 {print}' the_file # Set awk's field separator to the pipe character # and check the number of fields on each line awk -F'|' 'NF == 2 {print}' the_file 

1 Comment

Thanks! That worked. I'm not familiar with awk so I did not use it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.