Skip to main content
2 of 2
added 11 characters in body
slm
  • 380k
  • 127
  • 793
  • 897

A couple of other ways to look at this.

Method #1

Since you only are interested in lines if they have more than 2 characters separated by commas you could just grep for commas:

$ grep "," sample.txt chr2 3323 C T,A chr3 5251 C T,G chr3 9990 G C,T 

Method #2

You could use grep's PCRE facility. This is where grep can use Perl's regular expression engine to do the matching. It's quite powerful and lets you do a lot of what you can do with Perl from grep.

loosely defined

$ grep -P "(\w,)+" sample.txt 

strictly defined

$ grep -P '\w+\d\s+\d+\s+\w\s+(\w,)+' sample.txt 

Method #3

Using awk. This again is taking advantage of the fact that only the lines with a comma (,) are of interest, so it just finds them and prints them:

loosely defined

$ awk '/,/{print}' sample.txt 

more strictly defined

$ awk '/([[:alpha:]])+,[[:alpha:]]/{print}' sample.txt 

even more strictly defined

$ awk '$4 ~ /([[:alpha:]])+,[[:alpha:]]/{print}' sample.txt 

This one looks at the contents of the 4th column and checks that it's a letter followed by a comma, followed by another letter.

even more strictly defined

$ awk '$4 ~ /([GATC])+,[GATC]/{print}' sample.txt 

This looks for only a G,A,T, or C followed by a comma, followed by another G,A,T or C.

slm
  • 380k
  • 127
  • 793
  • 897