Profiling high-performance Haskell code

Question

I have some high-performance Haskell code - the inner loop compiles down to 6 assembly instructions. Modifying the inner loop to be less efficient doesn't have any noticable impact on the performance, suggesting the inner loop is not the bottleneck. However, when I turn on profiling, the assembly code generated for the inner loop becomes dramatically less efficient, and the profiler reports that the inner loop takes 85% of the time.

I suspect something is unnecessarily slow, but when I use profiling to see what, I suspect that profiling makes the inner loop sufficiently slow that it dominates. What techniques can I use to see where the time is going? A sampling profiler would be great, if one existed for Haskell.

Maybe if you add cost centres manually, you could not add a cost centre to that loop and so it won't be slower? Not sure if that works though — bennofs
– bennofs, Commented Jan 2, 2014 at 11:46
Good idea, but it doesn't seem to work. Even with no cost centers at all, just adding -prof puts a call enterFunCCS in the inner loop, plus about 5 other additional instructions (but the call will be the one that kills it). — Neil Mitchell
– Neil Mitchell, Commented Jan 2, 2014 at 11:57
I have some hacks for doing sampling. Basically you just link with the profiling startup code. The problem is that the generated .prof file makes gprof die. Probably because gprof has some assumptions about the symbols. But a hacked up gprof that knows about ghc symbols should be able to work. — augustss
– augustss, Commented Jan 2, 2014 at 13:29
Would it be possible to comment out the loop? That way you could profile the rest of the code, without having the loop messing the profile. — Pedro Rodrigues
– Pedro Rodrigues, Commented Jan 2, 2014 at 14:57
Maybe you could try using "ticky-ticky" profiling: ghc.haskell.org/trac/ghc/wiki/Debugging/TickyTicky (I have no experience doing this, just stumbled over it on the GHC wiki) — bennofs
– bennofs, Commented Jan 4, 2014 at 15:00

tibbe · Accepted Answer · 2014-01-06 10:57:07Z

You can use Linux perf events: https://ghc.haskell.org/trac/ghc/wiki/Debugging/LowLevelProfiling/Perf

This will give you an output that looks like:

# Samples: 9161149923 # # Overhead Command Shared Object Symbol # ........ ....... ................. ...... # 30.65% queens queens [.] s1ql_info 18.67% queens queens [.] s1qj_info 12.17% queens queens [.] s1qi_info 9.94% queens queens [.] s1o9_info 5.85% queens queens [.] r1nI_info 5.33% queens queens [.] s1sF_info 5.18% queens queens [.] s1sG_info 3.69% queens queens [.] s1oP_info 1.68% queens queens [.] stg_upd_frame_info 0.88% queens queens [.] stg_ap_2_upd_info 0.62% queens queens [.] s1sE_info 0.56% queens [kernel] [k] read_hpet 0.39% queens queens [.] stg_ap_p_info 0.35% :2030 f76beb [.] 0x00000000f76beb 0.31% queens queens [.] s1oD_info 0.28% swapper [kernel] [k] mwait_idle_with_hints 0.25% queens queens [.] __stg_gc_enter_1 0.23% queens queens [.] evacuate 0.18% swapper [kernel] [k] read_hpet 0.12% queens queens [.] scavenge_block

If you save the core as you compile you can map these symbols back to the functions in core.

A bit painful, but gives you more trustworthy results.

There is some work afoot to do this automatically.

Collectives™ on Stack Overflow

Profiling high-performance Haskell code

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related