Performance of exhaustive Haskell pattern-matching

Question

I am wondering how Haskell pattern matching is resolved internally and how this impacts performance. Say we have a function that is expensive to compute so we precompute the first and / or more frequently used values and pattern match on the corresponding input before making the actual computation itself:

expensiveComputation :: Int -> Int expensiveComputation 1 = 101 expensiveComputation 2 = 3333333 expensiveComputation 3 = 42 ... expensiveComputation n = the actual function

We could precompute a lot of these cases, thousands if we wanted, but do we? I'm assuming that it takes time for Haskell to actually find the pattern that matches the input, so maybe it is actually quicker to compute the, say, 1000th value instead of making a 1000 pattern matches?

Pattern matches are concetually done by iterating over the list, but a compiler is free to optimize it (and given there is a certain structure in the numbers, most will). — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Feb 1, 2018 at 22:03
Another option is to use memoization to "precompute" (rather, compute on the first use only). data-memocombinators has a combinator arrayRange which will memoize a certain range of inputs into a flat array with constant time lookup, which is basically what you are doing manually here. — luqui
– luqui, Commented Feb 1, 2018 at 23:26
Related: Haskell GHC: what is the time complexity of a pattern match with N constructors? — duplode
– duplode, Commented Feb 2, 2018 at 0:25

Thomas M. DuBuisson · Accepted Answer · 2018-02-01 23:05:06Z

I made a simple test in the form:

f 0 = 4 f 1 = 5 ...

And compiled with ghc -O2 -ddump-asm -c. I observe key ASM of:

What appears to be implementations for each top level equation:

_c2aD: movl $28,%ebx ; N.B. the constant 28 matches my largest value jmp *(%rbp) _c2aC: movl $27,%ebx jmp *(%rbp) _c2aB: movl $26,%ebx jmp *(%rbp) ....

What appears to be a jump table referencing the above implementations:

_n2aN: .quad _c2af-(_n2aN)+0 .quad _c2ag-(_n2aN)+0 .quad _c2ah-(_n2aN)+0 .quad _c2ai-(_n2aN)+0 .quad _c2aj-(_n2aN)+0 .quad _c2ak-(_n2aN)+0 ...

What appears to be a pair of range guards followed by a dispatch:

_c2aE: cmpq $25,%r14 ; N.B. the last defined input is 24 jge _c2ae _u2aH: testq %r14,%r14 jl _c2ae _u2aI: leaq _n2aN(%rip),%rax addq (%rax,%r14,8),%rax jmp *%rax

So do I think this will be slow for 1000 entries? No, if they are contiguous I bet GHC will generate a jump table just like I'm seeing here. Another interesting test would be if they aren't contiguous.

The code createSwitchPlan starting at github.com/ghc/ghc/blob/master/compiler/cmm/CmmSwitch.hs#L324 makes all these decisions, using jump tables for the continguous sections of a sparse pattern match.
Why are there two conditional jumps? Would it be better to have (at most) one subtraction and then one conditional jump?
@dfeuer Can you explain how you would check x < 0 and x > 24 with a single compare?
(fromIntegral x :: Word) < 24, however you say that in assembly, should do the trick, I believe.

Collectives™ on Stack Overflow

Performance of exhaustive Haskell pattern-matching

1 Answer 1

6 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Linked

Related