5

In Clojure I want to find the result of multiple reductions while only consuming the sequence once. In Java I would do something like the following:

double min = Double.MIN_VALUE; double max = Double.MAX_VALUE; for (Item item : items) { double price = item.getPrice(); if (price > min) { min = price; } if (price < max) { max = price; } } 

In Clojure I could do much the same thing by using loop and recur, but it's not very composable - I'd like to do something that lets you add in other aggregation functions as needed.

I've written the following function to do this:

(defn reduce-multi "Given a sequence of fns and a coll, returns a vector of the result of each fn when reduced over the coll." [fns coll] (let [n (count fns) r (rest coll) initial-v (transient (into [] (repeat n (first coll)))) fns (into [] fns) reduction-fn (fn [v x] (loop [v-current v, i 0] (let [y (nth v-current i) f (nth fns i) v-new (assoc! v-current i (f y x))] (if (= i (- n 1)) v-new (recur v-new (inc i))))))] (persistent! (reduce reduction-fn initial-v r)))) 

This can be used in the following way:

(reduce-multi [max min] [4 3 6 7 0 1 8 2 5 9]) => [9 0] 

I appreciate that it's not implemented in the most idiomatic way, but the main problem is that it's about 10x as slow as doing the reductions one at at time. This might be useful for lots performing lots of reductions where the seq is doing heavy IO, but surely this could be better.

Is there something in an existing Clojure library that would do what I want? If not, where am I going wrong in my function?

5
  • 2
    Have you read into Transducers? If so, are they confusing? If not, it would be suggested. Commented Mar 28, 2016 at 20:18
  • 6
    See juxt at github.com/cgrand/xforms Commented Mar 28, 2016 at 20:21
  • @AWebb juxt seems perfect ... but not required, right? My understanding is that transducers allow a series of separate reduce calls to be made without intermediate sequences being created, which is what I take to the goal to be here. Commented Mar 28, 2016 at 21:28
  • @Mars In series, as in composition, yes. I'm not immediately seeing side-by-side, as in juxtaposition, cleanly, but sadly haven't played much with writing my own transducers yet. Commented Mar 28, 2016 at 22:25
  • @Frank C. No, just starting to look into them now. I had heard of them before but assumed that they were more for chaining these operations together efficiently rather than allowing you to do something completely different. As noted in my comments to leetwinski's answer, maybe that performance boost is necessary to achieve the elgegance while still being on par in speed with a more naive approach. Commented Mar 29, 2016 at 22:16

3 Answers 3

3

that's what i would do: simply delegate this task to a core reduce function, like this:

(defn multi-reduce ([fs accs xs] (reduce (fn [accs x] (doall (map #(%1 %2 x) fs accs))) accs xs)) ([fs xs] (when (seq xs) (multi-reduce fs (repeat (count fs) (first xs)) (rest xs))))) 

in repl:

user> (multi-reduce [+ * min max] (range 1 10)) (45 362880 1 9) user> (multi-reduce [+ * min max] [10]) (10 10 10 10) user> (multi-reduce [+ * min max] []) nil user> (multi-reduce [+ * min max] [1 1 1000 0] []) [1 1 1000 0] user> (multi-reduce [+ * min max] [1 1 1000 0] [1]) (2 1 1 1) user> (multi-reduce [+ * min max] [1 1 1000 0] (range 1 10)) (46 362880 1 9) user> (multi-reduce [max min] (range 1000000)) (999999 0) 
Sign up to request clarification or add additional context in comments.

7 Comments

This gets a stack overflow error for large enough sequences, e.g. (multi-reduce [max min] (range 10000)) StackOverflowError clojure.lang.LazySeq.sval (LazySeq.java:40) However this is definitely on the right track - it's the version of map that accepts multiple collections that helps a lot. I had completely forgotten about that, which causes much of the complexity in my own solution.
oh, this stack owerflow is easy to solve. My fault. That's because the map should be fully realized.
Could also just replace map -> mapv instead of doall
Ah, thanks. Trouble is, although it is much more elegant than my solution, it is about twice as slow as my solution; (time (multi-reduce [max min] (range 1000000))) is about 700ms on my machine, my reduce-multi i about 350m, both of which trail behind simply reducing twice separately which takes about 20ms. For most cases it probably wouldn't matter, but it feels a shame if such elegance has to come at this cost. Maybe this is where transducers come in.
@RobertJohnson This could probably be sped up some by using the reducers eager versions, e.g. of map, and transients as in the OP. But, ranges reduce themselves very efficiently -- compiles to a tight little loop -- so there is very little overhead in your benchmark for reducing twice versus operations on container classes each iteration of a single pass.
|
1

The code for reduce is fast for reducible collections. So it's worth trying to base multi-reduce on core reduce. To do so, we have to be able to construct reducing functions of the right shape. An ancillary function to do so is ...

(defn juxt-reducer [f g] (fn [[fa ga] x] [(f fa x) (g ga x)])) 

Now we can define the function you want, which combines juxt with reduce as ...

(defn juxt-reduce ([[f g] coll] (if-let [[x & xs] (seq coll)] (juxt-reduce (list f g) [x x] xs) [(f) (g)])) ([[f g] init coll] (reduce (juxt-reducer f g) init coll))) 

For example,

(juxt-reduce [max min] [4 3 6 7 0 1 8 2 5 9]) ;=> [9 0] 

The above follows the shape of core reduce. It can clearly be extended to cope with more than two functions. And I'd expect it to be faster than yours for reducible collections.

3 Comments

What is a reducible collection? Or perhaps it would be better to ask what does a non-reducible collection look like?
A reducible collection is one that knows how to reduce itself - an instance of clojure.lang.IReduce, as seen at the source code for reduce. I think this is what @AWebb is referring to in his comment on @Leetwinski's answer. The repl tells me that vectors and ranges are reducible, but lists, sets, and maps are not.
Great, thanks. So where does the loss of performance come in? I the example I am working with I am using a range which should be fast. It seems to me that the issue is that the reducing functions that I and @Leetwinski have used are somehow slow.
-1

Here is how I would do it:

(ns clj.core (:require [clojure.string :as str] ) (:use tupelo.core)) (def data (flatten [ (range 5 10) (range 5) ] )) (spyx data) (def result (reduce (fn [cum-result curr-val] ; reducing (accumulator) fn (it-> cum-result (update it :min-val min curr-val) (update it :max-val max curr-val))) { :min-val (first data) :max-val (first data) } ; inital value data)) ; seq to reduce (spyx result) (defn -main [] ) ;=> data => (5 6 7 8 9 0 1 2 3 4) ;=> result => {:min-val 0, :max-val 9} 

So the reducing function (fn ...) carries along a map like {:min-val xxx :max-val yyy} through each element of the sequence, updating the min & max values as required at each step.

While this does make only one pass through the data, it is doing a lot of extra work calling update twice per element. Unless your sequence is very unusual, it is probably more efficient to make two (very efficient) passes through the data like:

(def min-val (apply min data)) (def max-val (apply max data)) (spyx min-val) (spyx max-val) ;=> min-val => 0 ;=> max-val => 9 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.