Why is MatchQ faster than StringMatchQ in testing string matches?

Question

Suppose I have myStrs, a list of one million alphabets (i.e., 26 million single-character strings):

charRange = CharacterRange["A", "Z"]; myStrs = Flatten[ConstantArray[charRange, 10^6]]; // AbsoluteTiming {0.258808, Null} ScientificForm[Length[myStrs], DigitBlock -> 3] (* OUTPUT: *) (* {0.258808, Null} *) (* 26,000,000 *)

I wish to create a function that will return True if a string is "A" or "B", and False. I have come up with several methods for doing this:

using equality testing (Equal, ==)
StringMatchQ;
MatchQ; and
MemberQ

Some of these methods have different forms: operator and/or listable forms.

(* METHOD 1: equality testing (Equal, ==) with Or (||) *) (* METHOD 1a, mapped *) Prepend[AbsoluteTiming[output1a = Map[# == "A" || # == "B" &, myStrs];], "METHOD 1a (==, mapped)"] (* METHOD 2: StringMatchQ with Alternatives (|) *) (* METHOD 2a: mapped *) Prepend[AbsoluteTiming[output2a = Map[StringMatchQ[#, "A" | "B"] &, myStrs];], "METHOD 2a (StringMatchQ, mapped)"] (* METHOD 2b: operator form *) Prepend[AbsoluteTiming[output2b = StringMatchQ["A" | "B"][myStrs];], "METHOD 2b (StringMatchQ, operator form)"] (* METHOD 2b: listable form *) Prepend[AbsoluteTiming[output2c = StringMatchQ[myStrs, "A" | "B"];], "METHOD 2c (StringMatchQ, listable form)"] (* METHOD 3: MatchQ with Alternatives (|) *) (* METHOD 3a: mapped *) Prepend[AbsoluteTiming[output3a = Map[MatchQ[#, "A" | "B"] &, myStrs];], "METHOD 3a (MatchQ, mapped)"] (* METHOD 4: MemberQ *) (* METHOD 4a: MemberQ, mapped *) Prepend[AbsoluteTiming[output4a = Map[MemberQ[{"A", "B"}, #] &, myStrs];], "METHOD 4a (MemberQ, mapped)"]

Before presenting the output, I will show that the output lists are all identical:

(* Check that the resulting lists are all identical. *) (output1a == output2a) && (output2a == output2b) && (output2b == output2c) && (output2c == output3a) && (output3a == output4a) (* Tally the output. *) Tally[output1a] (* OUTPUT: *) (* True *) (* {26 000 000, {True, True, False, False, False, False, False, False, False, False}} *) (* {{True, 2 000 000}, {False, 24 000 000}} *)

Here are the timings, as measured by AbsoluteTiming, that I obtained on different systems and Mathematica versions:

Mathematica 14.0.0 on Windows 11

Processor: 13th Gen Intel(R) Core(TM) i7-13700, 2.10 GHz (x86-64; released Q1 2023)
Total cores: 16; performance-core base frequency: 2.10 GHz
$ProcessorCount: 16

(* Mathematica 14.0.0 on 64-bit Windows 11 Enterprise *) (* OUTPUT: *) (* {"METHOD 1a (==, mapped)", 10.65, Null} *) (* {"METHOD 2a (StringMatchQ, mapped)", 17.2123, Null} *) (* {"METHOD 2b (StringMatchQ, operator form)", 1.20552, Null} *) (* {"METHOD 2c (StringMatchQ, listable form)", 1.20461, Null} *) (* {"METHOD 3a (MatchQ, mapped)", 5.91739, Null} *) (* {"METHOD 4a (MemberQ, mapped)", 8.67486, Null} *)

Mathematica 13.3.1 on Windows 11

Processor: 13th Gen Intel(R) Core(TM) i7-13700, 2.10 GHz (x86-64; released Q1 2023)
Total cores: 16; performance-core base frequency: 2.10 GHz
$ProcessorCount: 16

(* Mathematica 13.3.1 on 64-bit Windows 11 Enterprise *) (* Processor: 13th Gen Intel(R) Core(TM) i7-13700, 2.10 GHz *) (* OUTPUT: *) (* {"METHOD 1a (==, mapped)", 12.2645, Null} *) (* {"METHOD 2a (StringMatchQ, mapped)", 19.5481, Null} *) (* {"METHOD 2b (StringMatchQ, operator form)", 1.23778, Null} *) (* {"METHOD 2c (StringMatchQ, listable form)", 1.2151, Null} *) (* {"METHOD 3a (MatchQ, mapped)", 7.78893, Null} *) (* {"METHOD 4a (MemberQ, mapped)", 9.70204, Null} *)

Mathematica 9.0 on Windows 7

Processor: 4th Gen Intel(R) Core(TM) i7-4790, 4.00 GHz (x86-64; released Q2 2014)
Total cores: 4; processor base frequency: 4.00 GHz
$ProcessorCount: 4

(* Mathematica 9.0 on 64-bit Windows 7 Pro *) (* (Mathematica 9.0 does not include the operator form for StringMatchQ, so I cannot use Method 2b on Mathematica 9.) *) (* Processor: 4th Gen Intel(R) Core(TM) i7-4790, 4.00 GHz *) (* OUTPUT: *) (* {"METHOD 1a (==, mapped)", 19.562671, Null} *) (* {"METHOD 2a (StringMatchQ, mapped)", 33.438184, Null} *) (* {"METHOD 2c (StringMatchQ, listable form)", 3.348192, Null} *) (* {"METHOD 3a (MatchQ, mapped)", 23.965319, Null} *) (* {"METHOD 4a (MemberQ, mapped)", 19.509194, Null} *)

Summary of timings

Mathematica 14.0.0 & 13.3.1: Method 2c ~ 2b < 3a < 4a < 1a < 2a
- In other words, 2c(StringMatchQ,listable) ~ 2b(StringMatchQ,operator) << 3a(MatchQ,mapped) < 4a(MemberQ,mapped) < 1a(==,mapped) < 2a(StringMatchQ,mapped)
Mathematica 9.0: Method 2c < 4a ~ 1a < 3a < 2a
- In other words, 2c(StringMatchQ,listable) < 4a(MemberQ,mapped) ~ 1a(==,mapped) < 3a(MatchQ,mapped) < 2a(StringMatchQ,mapped)

Overall summary

Method 2c(StringMatchQ,listable) had the best performance in all Mathematica versions tested: 1.20461, 1.2151, and 3.348192 seconds for versions 14.0.0, 13.3.1, and 9.0, respectively.
Method 2a(StringMatchQ,mapped) had the poorest performance in all Mathematica versions tested: seconds for versions 14.0.0, 13.3.1, and 9.0, respectively.

My question

Why is Method 3a(MatchQ,mapped) significantly faster than Method 2a(StringMatchQ,mapped)?

v14: 5.91739 vs. 17.2123 sec; v13.3.1: 7.78893 vs. 19.5481; and v9: 23.965319 vs. 33.438184. In other words, 2a(MatchQ,mapped) is ~2.9x (v14), ~2.5x (v13.3.1), and ~1.4x (v9) faster than 3a(StringMatchQ,mapped), I think.
This surprises me because I would've thought that StringMatchQ would be more optimized for strings than MatchQ in the case that StringMatchQ is NOT asked to do any "preprocessing" on the strings (e.g., implicitly, IgnoreCase -> False and SpellingCorrection -> False in StringMatchQ in the present case).
Without being able to look "under the hood," so to speak, of the Wolfram language, it's difficult to say with certainty why MatchQ is so much faster than StringMatchQ. But does anyone have any insight or potential explanations?

Take-home message

It seems to me that a take-home message from this exercise is that "listability" in StringMatchQ is highly optimized. (Interestingly, however, StringMatchQ does not appear to actually have attribute Listable: Attributes[StringMatchQ] returns only {Protected} in the Mathematica versions that I tested. Hence my quotation marks around "listability.") I suspect that "listability" in other Wolfram language built-in functions are similarly well optimized.
Are there any other glaring take-home messages in the results above?

There's a lot to unpack to tackle your question. It may help if you could boil it down to just one comparison, since ultimately that's what you focus on in your take-home messages. For instance, do you think the version-dependent results are relevant? — MarcoB
– MarcoB, Commented Nov 14, 2024 at 19:56
It's likely that Mathematica converts the string pattern to a regular expression internally. There is a function for that at least. When you use the form of StringMatchQ where the first element is a list, you only have to do that once. When you map the function, you have to do it over and over. The operator form might be similarly optimized. — C. E.
– C. E. ♦, Commented Nov 15, 2024 at 13:22
Regarding listability vs. mapping of built-in functions, you may also find this answer from Leonid interesting. — MarcoB
– MarcoB, Commented Nov 15, 2024 at 14:22

Stack Exchange Network

Why is MatchQ faster than StringMatchQ in testing string matches?

Mathematica 14.0.0 on Windows 11

Mathematica 13.3.1 on Windows 11

Mathematica 9.0 on Windows 7

Summary of timings

Overall summary

My question

Take-home message

0

Linked

Hot Network Questions

Why is MatchQ faster than StringMatchQ in testing string matches?

Mathematica 14.0.0 on Windows 11

Mathematica 13.3.1 on Windows 11

Mathematica 9.0 on Windows 7

Summary of timings

Overall summary

My question

Take-home message

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked

Related

Hot Network Questions