I am very new to unix!
trying to figure out, from a fastq file how many reads have 3 or MORE As in a row?
I used egrep 'A{3}' to tell me how many AAA I have. But now I want to know >= 3 AAA in a row. But >= doesn't work. Can I use awk to help me determine this?
Also, how can I use regular expression to determine How many reads have a run of 4 or more As followed by something other than a T? (G C or A) So A has to be >= 4, and followed by GCorA
EDIT: When I mean to say 3As in a row, I mean something like this: GGCTAAAAAACGGAT
I want to know >= 3 AAA in a rowI thought you were trying to get a count of lines whereAAAappears 3 or more times on a line, e.g.fooAAAbarAAAetcAAAetc.but all the answers so far are interpreting your question differently from me and, in some cases, from each other. Please edit your question to clarify your requirements and provide concise, testable sample input and expected output that adequately demonstrates those requirements.