Regex to find string that exactly has one pipe character

Question

I'm new in using regex, hope someone can help me. I'm using the regex below to grep a csv file for string that exactly has one pipe character (i.e. |)

grep "^([^\\|]+\\|){1}[^\\|]+$" myfile.csv

Unfortunately, the above yields no result when used with grep. Any ideas?

A sample csv file content is as below, where I expect the 2nd line to be found.

"foo"|"foo"|"foo" "bar"|"bar"

Solutions to this question:

grep -E "^([^|]+\|){1}[^|]+$" myfile.csv

and

egrep "^[^|]+\|[^|]+$" myfile.csv

This is exactly the kind of thing that regexes really shouldn't be used for: they're not good at counting. Your language/framework of choice may very well have a str.count() method or function; it certainly has a str.find() that would be much more appropriate. — jscs
– jscs, Commented Sep 20, 2013 at 2:57
@JoshCaswell I agree that this might be easier if using a language that has something like this, but it's also perfectly fine for regex (and there are certainly applications for regex where there is no host language available as you suggest). As the OP shows, she's using grep. — Phrogz
– Phrogz, Commented Sep 20, 2013 at 3:01
You may want to specify the -E flag to grep in order to get full "extended" regex support. — Phrogz
– Phrogz, Commented Sep 20, 2013 at 3:04
@Phrogz: It's easy enough to substitute grep for another more appropriate tool. — jscs
– jscs, Commented Sep 20, 2013 at 3:12
Thanks for your responses! But this is purely for adhoc thingy (i.e. grep only) to find problematic entries in a csv file. I would have done it differently if I'm using it in my code. :) Btw, thanks a lot @Phrogz for the -E tip plus that of @arshajii about escaping |. This one works now perfectly! `grep -E "^([^|]+\|){1}[^|]+$" myfile.csv' — Rebecca Abriam
– Rebecca Abriam, Commented Sep 20, 2013 at 4:05

arshajii · Accepted Answer · 2013-09-20 02:51:34Z

5

You can try:

^[^|]*\|[^|]*$

You don't need to escape | in a character class. Also you presumably want * instead of + here to allow for strings like |abc, xyz|, and just | on its own.

edited Sep 20, 2013 at 2:51

answered Sep 20, 2013 at 2:45

arshajii

130k26 gold badges246 silver badges293 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rebecca Abriam Over a year ago

Thanks for the info about escaping '|'. I also used this pattern too earlier but it returns all lines (both with 1 and 2 '|').

plalx · Accepted Answer · 2013-09-20 02:47:08Z

1

Try the following:

^[^|]+\|[^|]+$

answered Sep 20, 2013 at 2:47

plalx

43.8k7 gold badges77 silver badges94 bronze badges

5 Comments

Rebecca Abriam Over a year ago

Thanks, but I forgot to mention that I tried that regex pattern too before the grouped one I used in my question. But this pattern does not return any result.

pguardiario Over a year ago

No, * instead of + because there is no requirement that it not start or end with a |

plalx Over a year ago

@pguardiario, Well that was never specified and since the initial OP's regex was using the + repetition operator, I assumed that's what he wanted. You should also look at the Solutions to this question: part of his post. He is using + and you will see that my answer is in the solution list while the selected answer isin't ;)

pguardiario Over a year ago

So because his wrong solution uses +, you're arguing that the correct solution must use +? I'm sorry this is a weak and weird argument.

plalx Over a year ago

@pguardiario, Who are you to decide that it's a wrong solution when the OP himself edited his question by adding working solutions to his problem and note that these uses the + operator. I initially assumed that the data format was rigid an couldn't contain empty values based on the fact that the initial OP's regex was using the + operator and you assumed the opposite based on what? I'm sorry, but it seems you are the one making invalid assumptions here. Will you now downvote the other answers using *? I hope not.

Jotne · Accepted Answer · 2013-09-20 08:24:46Z

Solution using awk

awk 'gsub(/\|/,"|")==1' file

gsub(/\|/,"|") this counts number of | replaced, if this equal 1, then do default action, print $0

Edit:Another awk:

awk 'split($0,a,"|")==2' file

Count how many parts text is dived into by |, if 2 print.

Rebecca Abriam · Accepted Answer · 2013-09-20 13:21:16Z

Here are the solutions to my question. Thanks to the comments that led me to solving this.

grep -E "^([^|]+\|){1}[^|]+$" myfile.csv

and

egrep "^[^|]+\|[^|]+$" myfile.csv

jscs · Accepted Answer · 2013-09-20 20:07:20Z

Grep and regexes are the wrong tool for this task. Use something that is intended for counting:

# Use a split function with the pipe as delimiter awk 'split($0, _, "|") == 2 {print}' the_file # Set awk's field separator to the pipe character # and check the number of fields on each line awk -F'|' 'NF == 2 {print}' the_file

Thanks! That worked. I'm not familiar with awk so I did not use it.

Collectives™ on Stack Overflow

Regex to find string that exactly has one pipe character

5 Answers 5

1 Comment

5 Comments

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

5 Comments

Comments

Comments

1 Comment

Linked

Related