0

I am optimizing a program I've been working on, and have hit a wall. The function julia-subrect maps over for-each-pixel a large number of times. I've optimized for-each-pixel to have a ~16x speedup. However, my optimized version of julia-subrect shows no evidence of this. Here are my benchmarks and relevant code:

; ======== Old `for-each-pixel` ======== ;(bench (julia/for-each-pixel (->Complex rc ic) max-itrs radius r-min x-step y-step [xt yt]))) ;Evaluation count : 3825300 in 60 samples of 63755 calls. ;Execution time mean : 16.018466 µs ; ======== New `for-each-pixel`. optimized 16x. ======== ;(bench (julia/for-each-pixel-opt [rc ic] [max-itrs radius r-min] [x-step y-step] [xt yt]))) ;Evaluation count : 59542860 in 60 samples of 992381 calls. ;Execution time mean : 1.038955 µs (defn julia-subrect [^Long start-x ^Long start-y ^Long end-x ^Long end-y ^Long total-width ^Long total-height ^Complex constant ^Long max-itrs] (let [grid (for [y (range start-y end-y)] (vec (for [x (range start-x end-x)] [x y]))) radius (calculate-r constant) r-min (- radius) r-max radius x-step (/ (Math/abs (- r-max r-min)) total-width) y-step (/ (Math/abs (- r-max r-min)) total-height) ; Uses old implementation of `for-each-pixel` calculate-pixel (partial for-each-pixel constant max-itrs radius r-min x-step y-step) for-each-row (fn [r] (map calculate-pixel r))] (map for-each-row grid))) ; ======== Old `julia-subrect` ======== ;(bench (doall (julia/julia-subrect start-x start-y end-x end-y total-width total-height c max-itrs)))) ;Evaluation count : 22080 in 60 samples of 368 calls. ;Execution time mean : 2.746852 ms (defn julia-subrect-opt [[^long start-x ^long start-y ^long end-x ^long end-y] [^double rc ^double ic] total-width total-height max-itrs ] (let [grid (for [y (range start-y end-y)] (vec (for [x (range start-x end-x)] [x y]))) radius (calculate-r-opt rc ic) r-min (- radius) r-max radius x-step (/ (Math/abs (- r-max r-min)) total-width) y-step (/ (Math/abs (- r-max r-min)) total-height) ;Uses new implementation of `for-each-pixel` calculate-pixel (fn [px] (for-each-pixel-opt [rc ic] [max-itrs radius r-min] [x-step y-step] px)) for-each-row (fn [r] (map calculate-pixel r))] (map for-each-row grid))) ; ======== New `julia-subrect`, but no speedup ======== ;(bench (doall (julia/julia-subrect-opt [start-x start-y end-x end-y] [rc ic] total-width total-height max-itrs)))) ;Evaluation count : 21720 in 60 samples of 362 calls. ;Execution time mean : 2.831553 ms 

Here is a gist containing source code for all the functions I've specified: https://gist.github.com/johnmarinelli/adc5533c19fb0b6d74cf4ef04ae55ee6

So, can anyone tell me why julia-subrect is showing no signs of speedup? Also, I'm still new to clojure so bear with me if the code is unidiomatic/ugly. Right now, I'm focusing on making the program run quicker.

1
  • 1
    did you profile the code? if that function originally only ever took 0.1ms to run over the life of the program, a 16x speedup is pointless, it's a microscopically small fraction of the total. you optimize where bottlenecks are - where the cpu spends a majority of its time executing things. Commented Jul 18, 2016 at 19:37

1 Answer 1

2

As a general guideline:

  1. profile!
  2. actually get around to profiling, like for real ;-)
  3. remove reflection (looks like you did this)
  4. split the operations into easy to think about functions
  5. remove laziness (transducers should be the last step in this part)
  6. combine steps using loop/recur to make your code impossible to figure out and slightly faster (this is the last step for a reason)

Specifically thinking about the code you posted:

At a glance, it looks like this function will spend much of it's time generating a lazy list of value in the for loop which are then immediately realized (evaluated to no longer be lazy) so the time spent generating that structure is wasted. You may consider changing this to produce vectors directly, mapv is useful for this.

The second part is the call to map in for-each-row which will produce a lot of intermediate data structures. For that one you may consider using a non-lazy expression like mapv or loop/recur.

It looks like you have done steps 2-4 already, and there is no obvious reason for you to skip to step seven. I'd spend the next couple hours on limiting laziness and if you have to, learning about transducers.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the guidelines! I will have to take a day to get familiar with transducers. I profiled the code and from what I can tell, most of the work is being in the clojure.lang namespace. Thanks @marc-b

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.