0

I currently have 2 lists List('a','b','a') and List(45,65,12) with many more elements and elements in 2nd list linked to elements in first list by having a key value relationship. I want combine elements with same keys by adding their corresponding values and create a map which should look like Map('a'-> 57,'b'->65) as 57 = 45 + 12.

I have currently implemented it as

val keys = List('a','b','a') val values = List(45,65,12) val finalMap:Map(char:Int) = scala.collection.mutable.Map().withDefaultValue(0) 0 until keys.length map (w => finalMap(keys(w)) += values(w)) 

I feel that there should be a better way(functional way) of creating the desired map than how I am doing it. How could I improve my code and do the same thing in more functional way?

1
  • 2
    The answer to this question are appropriate if the collection is very large stackoverflow.com/questions/15529726/…. Scala collections should really provide a combineByKey like Apache Spark does. Commented Feb 9, 2014 at 1:00

3 Answers 3

6
val m = keys.zip(values).groupBy(_._1).mapValues(l => l.map(_._2).sum) 

EDIT: To explain how the code works, zip pairs corresponding elements of two input sequences, so

keys.zip(values) = List((a, 45), (b, 65), (a, 12)) 

Now you want to group together all the pairs with the same first element. This can be done with groupBy:

keys.zip(values).groupBy(_._1) = Map((a, List((a, 45), (a, 12))), (b, List((b, 65)))) 

groupBy returns a map whose keys are the type being grouped on, and whose values are a list of the elements in the input sequence with the same key.

The keys of this map are the characters in keys, and the values are a list of associated pair from keys and values. Since the keys are the ones you want in the output map, you only need to transform the values from List[Char, Int] to List[Int].

You can do this by summing the values from the second element of each pair in the list.

You can extract the values from each pair using map e.g.

List((a, 45), (a, 12)).map(_._2) = List(45,12) 

Now you can sum these values using sum:

List(45, 12).sum = 57 

You can apply this transform to all the values in the map using mapValues to get the result you want.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi @Lee - it's always a good idea to explain your solution so that total n00bs can learn a thing or two from your great idea... :) ie, don't just post the code - give a brief explanation of why this is better and how it works etc :)
@TarynEast - Sorry, I was being a bit lazy there. I've added an explanation.
A couple of minor tweaks: (keys,values).zipped.groupBy(_._1).mapValues(_.map(_._2).sum)
4

I was going to +1 Lee's first version, but mapValues is a view, and ell always looks like one to me. Just not to seem petty.

scala> (keys zip values) groupBy (_._1) map { case (k,v) => (k, (v map (_._2)).sum) } res0: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57) 

Hey, the answer with fold disappeared. You can't blink on SO, the action is so fast.

I'm going to +1 Lee's typing speed anyway.

Edit: to explain how mapValues is a view:

scala> keys.zip(values).groupBy(_._1).mapValues(l => l.map { v => | println("OK mapping") | v._2 | }.sum) OK mapping OK mapping OK mapping res2: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57) scala> res2('a') // recomputes OK mapping OK mapping res4: Int = 57 

Sometimes that is what you want, but often it is surprising. I think there is a puzzler for it.

1 Comment

@AmigoNico You're asking about "mapValues is a view"? As the scaladoc says, the function isn't applied until you ask for it, so the final map and sum would be incurred on every lookup.
0

You were actually on the right track to a reasonably efficient functional solution. If we just switch to an immutable collection and use a fold on a key-value zip, we get:

( Map[Char,Int]() /: (keys,values).zipped ) ( (m,kv) => m + ( kv._1 -> ( m.getOrElse( kv._1, 0 ) + kv._2 ) ) ) 

Or you could use withDefaultValue 0, as you did, if you want the final map to have that default. Note that .zipped is faster than zip because it doesn't create an intermediate collection. And a groupBy would create a number of other intermediate collections. Of course it may not be worth optimizing, and if it is you could do even better than this, but I wanted to show you that your line of thinking wasn't far off the mark.

5 Comments

How does it work and this syntax is different from what I have seen till now.
instead of case (m,kv) you can write case (m,(k,v)) and have the ugly underscores magically disappear: m + ( k -> ( m.getOrElse(k,0) + v ) )
How would the complete statemant look like with your change because ( Map[Char,Int]() /: (keys,values).zipped ) ( (m,(k,v)) => m + ( k -> ( m.getOrElse( k, 0 ) + v ) )) gave me an error.
Kevin, I'd love for you to be right, but foldLeft takes a function with two parameters, so I'm thinking that we can't use a PartialFunction here. No?
"this syntax is different" -- actually, it's not really syntax. /: is just another method, also known as foldLeft. This article may help you understand what it does. Basically whenever you initialize a mutable variable and update it for each element of a collection, you could do the same thing immutably with a fold.