3

Trying to get into Julia after learning python, and I'm stumbling over some seemingly easy things. I'd like to have a function that takes strings as arguments, but uses one of those arguments as a regular expression to go searching for something. So:

function patterncount(string::ASCIIString, kmer::ASCIIString) numpatterns = eachmatch(kmer, string, true) count(numpatterns) end 

There are a couple of problems with this. First, eachmatch expects a Regex object as the first argument and I can't seem to figure out how to convert a string. In python I'd do r"{0}".format(kmer) - is there something similar?

Second, I clearly don't understand how the count function works (from the docs):

count(p, itr) → Integer

Count the number of elements in itr for which predicate p returns true.

But I can't seem to figure out what the predicate is for just counting how many things are in an iterator. I can make a simple counter loop, but I figure that has to be built in. I just can't find it (tried the docs, tried searching SO... no luck).

Edit: I also tried numpatterns = eachmatch(r"$kmer", string, true) - no go.

1
  • meta question: should this be broken into two separate questions? Commented Jun 23, 2015 at 10:44

1 Answer 1

6

To convert a string to a regex, call the Regex function on the string.

Typically, to get the length of an iterator you an use the length function. However, in this case that won't really work. The eachmatch function returns an object of type Base.RegexMatchIterator, which doesn't have a length method. So, you can use count, as you thought. The first argument (the predicate) should be a one argument function that returns true or false depending on whether you would like to count a particular item in your iterator. In this case that function can simply be the anonymous function x->true, because for all x in the RegexMatchIterator, we want to count it.

So, given that info, I would write your function like this:

patterncount(s::ASCIIString, kmer::ASCIIString) = count(x->true, eachmatch(Regex(kmer), s, true)) 

EDIT: I also changed the name of the first argument to be s instead of string, because string is a Julia function. Nothing terrible would have happened if we would have left that argument name the same in this example, but it is usually good practice not to give variable names the same as a built-in function name.

Sign up to request clarification or add additional context in comments.

2 Comments

Derp - length is the same as python, not sure why I didn't think of it. Both of these solutions work great. One question for my own education. If s is really long, is there any performance difference between the iterator and loading all of the matches into an array? edit: you already answered my second question - reading comprehension fail
If there are many matches, it should be more memory efficient to use the count(x->true, eachmatch(...)) solution (iterator version) -- rather than loading all matches into an array (like the matchall function would). I'd probably do some benchmarking to verify that though.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.