Analyzing slow performance of a Haskell program

Question

I was trying to solve ITA Software's "Word Nubmers" puzzle using a brute force approach. It looks like my Haskell version is more than 10 times slower than a C#/C++ version.

The answer

Thanks to Bryan O'Sullivan's answer, I was able to "correct" my program to acceptable performance. You can read his code which is much cleaner than mine. I am going to outline the key points here.

Int is Int64 on Linux GHC x64. Unless you unsafeCoerce, you should just use Int. This saves you from having to fromIntegral. Doing Int64 on Windows 32-bit GHC is just darn slow, avoid it. (This is in fact not GHC's fault. As mentioned in my blog post below, 64 bit integers in 32-bit programs is slow in general (at least in Windows))
-fllvm or -fvia-C for performance.
Prefer quotRem to divMod, quotRem already suffices. That gave me 20% speed up.
In general, prefer Data.Vector to Data.Array as an "array"
Use the wrapper-worker pattern liberally.

The above points were enough to give me about 100% boost over my original version.

In my blog post, I have detailed a step-by-step illustrated example of how I turned the original program to match Bryan's program. There are other points mentioned there as well.

The original question

(This may sound like a "could you do the work for me" post, but I argue that such a concrete example would be very instructive since profiling Haskell performance is often seen as a myth)

(As noted in the comments, I think I have misinterpreted the problem. But who cares, we can focus on performance in a different problem)

Here's a my version of a quick recap of the problem:

A wordNumber is defined as wordNumber 1 = "one" wordNumber 2 = "onetwo" wordNumber 3 = "onethree" wordNumber 15 = "onetwothreefourfivesixseveneightnineteneleventwelvethirteenfourteenfifteen" ... Problem: Find the 51-billion-th letter of (wordNumber Infinity); assume that letter is found at 'wordNumber x', also find 'sum [1..x]'

From an imperative perspective, a naive algorithm would be to have 2 counters, one for sum of numbers and one for sum of lengths. Keep counting the length of each wordNumber and "break" to return the result.

The imperative brute-force approach is implemented in C# here: http://ideone.com/JjCb3. It takes about 1.5 minutes to find the answer on my computer. There is also an C++ implementation that runs in 45 seconds on my computer.

Then I implemented a brute-force Haskell version: http://ideone.com/ngfFq. It cannot finish the calculation in 5 minutes on my machine. (Irony: it's has more lines than the C# version)

Here is the -p profile of the Haskell program: http://hpaste.org/49934

Question: How to make it perform comparatively to the C# version? Are there obvious mistakes I am making?

(Note: I am fully aware that brute-forcing it is not the correct solution to this problem. I am mainly interested in making the Haskell version perform comparatively to the C# version. Right now it is at least 5x slower so obviously I am missing something obvious)

(Note 2: It does not seem to be space leaking. The program runs with constant memory (about 2MB) on my computer)

(Note 3: I am compiling with `ghc -O2 WordNumber.hs)

To make the question more reader friendly, I include the "gist" of the two versions.

// C# long sumNum = 0; long sumLen = 0; long target = 51000000000; long i = 1; for (; i < 999999999; i++) { // WordiLength(1) = 3 "one" // WordiLength(101) = 13 "onehundredone" long newLength = sumLen + WordiLength(i); if (newLength >= target) break; sumNum += i; sumLen = newLength; } Console.WriteLine(Wordify(i)[Convert.ToInt32(target - sumLen - 1)]);

-

-- Haskell -- This has become totally ugly during my squeeze for -- performance -- Tail recursive -- n-th number (51000000000 in our problem) -> accumulated result -> list of 'zipped' left to try -- accumulated has the format (sum of numbers, current lengths of the whole chain, the current number) solve :: Int64 -> (Int64, Int64, Int64) -> [(Int64, Int64)] -> (Int64, Int64, Int64) solve !n !acc@(!sumNum, !sumLen, !curr) ((!num, !len):xs) | sumLen' >= n = (sumNum', sumLen, num) | otherwise = solve n (sumNum', sumLen', num) xs where sumNum' = sumNum + num sumLen' = sumLen + len -- wordLength 1 = 3 "one" -- wordLength 101 = 13 "onehundredone" wordLength :: Int64 -> Int64 -- wordLength = ... solution :: Int64 -> (Int64, Char) solution !x = let (sumNum, sumLen, n) = solve x (0,0,1) (map (\n -> (n, wordLength n)) [1..]) in (sumNum, (wordify n) !! (fromIntegral $ x - sumLen - 1))

The original puzzle states that the words are sorted alphabetically. You seem to be solving a different puzzle, where words are sorted numerically. — n. m. could be an AI
– n. m. could be an AI, Commented Aug 7, 2011 at 5:52
@n.m. I think you are right, I think mis-read the problem. But I guess it doesn't matter to the heart of my particular question. — kizzx2
– kizzx2, Commented Aug 7, 2011 at 5:54
For those interested, a blog-poster solved this in haskell and have made a series of blog-posts where he explains how he solved it etc. conway.rutgers.edu/~ccshan/wiki/blog/posts/WordNumbers1 — Tarrasch
– Tarrasch, Commented Aug 7, 2011 at 6:55
If you wrote in the worker-wrapper style you shouldn't need so many bangs (!) littering your code. The strictness analyser should pick up useful strictness for you - you might want a define a strict pair datatype though. — stephen tetley
– stephen tetley, Commented Aug 7, 2011 at 10:10
IMO the culprit is wordLength'. It allocates memory for no apparent reason. I have no idea why, or how to rewrite it so it doesn't. I have tried (not too hard) but with no success so far. — n. m. could be an AI
– n. m. could be an AI, Commented Aug 7, 2011 at 10:22

Bryan O'Sullivan · Accepted Answer · 2011-08-09 06:52:22Z

I've written a gist that contains both a C++ version (a copy of yours from a Haskell-cafe message, with a bug fixed) and a Haskell translation.

Notice that the two are structurally almost identical. When compiled with -fllvm, the Haskell code runs at about half the speed of the C++ code, which is pretty good.

Now let's compare my Haskell wordLength code to yours. You're passing around an extra unnecessary parameter, which is unnecessary (you apparently figured that out when writing the C++ code that I translated). Also, the large number of bang patterns suggests panic; they're almost all useless.

Your solve function is also very confused.

You're passing parameters in three different ways: a regular Int, a 3-tuple, and a list! Whoa.
This function is necessarily not very regular in its behaviour, so while you gain nothing stylistically by using a list to supply your counter, you probably force GHC to allocate memory. In other words, this both obfuscates the code and makes it slower.
By using a tuple for three parameters (for no obvious reason), you're again working hard to force GHC to allocate memory for every step through the loop, when it could avoid doing so if you passed the parameters directly.
Only your n parameter is dealt with in a sensible way, but you don't need a bang pattern on it.
The only parameter that needs a bang pattern is sumNum, because you never inspect its value until after the loop has finished. GHC's strictness analyser will deal with the others. All of your other bang patterns are unnecessary at best, misdirections at worst.

Community · Accepted Answer · 2017-05-23 12:34:54Z

Here are two pointers I could come up with in a quick investigation:

Note that using Int64 is really slow when you are using a 32 bit build of GHC, as is the default for Haskell Platform, currently. This also turned out to be the main villain in a previous performance problem (there I give a few more details).
For reasons I don't quite understand the divMod function does not seem to get inlined. As a result, the numbers are returned on the heap. When using div and mod separately, wordLength' executes purely on the stack as it should be.

Sadly I currently have no 64-bit GHC around to test whether this is enough to solve the problem.

Alas! That linked question received so many answer zomg! Anyway very fresh insights. I'll try what you mentioned and report back.
No luck, I tried it on x64 Arch Linux GHC 7.0.3, similar results (taking over 3 minutes). Where a C++ implementation (g++ -O3) takes 45 seconds. Isn't it mysterious that such an innocent sounding program is so difficult to get right in Haskell.

Collectives™ on Stack Overflow

Analyzing slow performance of a Haskell program

The answer

The original question

Question: How to make it perform comparatively to the C# version? Are there obvious mistakes I am making?

2 Answers 2

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

The answer

The original question

Question: How to make it perform comparatively to the C# version? Are there obvious mistakes I am making?

2 Answers 2

Comments

2 Comments

Linked

Related