The following is a distilled-down version of a previous sprawling question of mine.
Suppose I place one million copies of the uppercase English alphabet (i.e., the uppercase ISO basic Latin alphabet) into a list alphabets:
charRange = CharacterRange["A", "Z"]; alphabets = Flatten[ConstantArray[charRange, 10^6]]; // AbsoluteTiming ScientificForm[Length[alphabets], DigitBlock -> 3] (* OUTPUT (Mathematica 14.0.0 for Microsoft Windows 64-bit): *) (* {0.245727, Null} *) (* 26,000,000 *) Now suppose I wish to generate a "pick list" that has the element True for each corresponding element in the list alphabets that is either "A" or "B", and that has the element False otherwise. I can accomplish this by mapping either StringMatchQ ("Method 1") or MatchQ ("Method 2") across the list alphabets:
(* METHOD 1: StringMatchQ, mapped *) boolList1 = Map[StringMatchQ[#, "A" | "B"] &, alphabets]; // AbsoluteTiming (* METHOD 2: MatchQ, mapped *) boolList2 = Map[MatchQ[#, "A" | "B"] &, alphabets]; // AbsoluteTiming boolList1 == boolList2 (* OUTPUT: *) (* {18.4239, Null} *) (* {6.59128, Null} *) (* True *) The MatchQ method ("Method 2") is approximately 2.8 times faster than the StringMatchQ method ("Method 1"). This is somewhat surprising to me.
- Naively, I would've expected the
StringMatchQmethod to be faster since I assume that the pattern supplied toStringMatchQmust be a string pattern (a more restrictive condition), rather than any type of pattern (a less restrictive condition). - But in fact, my assumption is incorrect; according to the first two entries ("overloads") in the documentation for
StringMatchQ, the pattern supplied toStringMatchQcan be a string pattern, a regular expression, or a list of patterns.
I realize that without being able to "look under the hood" of StringMatchQ and MatchQ and see how they are implemented in Wolfram language, it's difficult to gain much understanding of why a built-in function behaves the way it does, performance-wise. But can anyone speculate on possible (and reasonable) explanations for the performance behavior of StringMatchQ and MatchQ demonstrated above?
(Note: A much faster method to generate the "pick list" is to use the listable form of StringMatchQ, rather than using Map, as shown below. But my question does not pertain to this faster method.)
boolList1a = StringMatchQ[alphabets, "A" | "B"]; // AbsoluteTiming (* OUTPUT: *) (* {1.20825, Null} *) 

charRange = CharacterRange["A", "Z"];in your code. $\endgroup$