3

I have a hash with approximately 150k elements, and an array with 25k elements. I need to create a new hash, or modify the existing one, to remove all elements whose key is not on the array. Here is what I have now:

hash.select {|k,v| array.include?(k)} new_hash = hash.delete_if {|k,v| !array.include?(k)} 

The two methods are extremely slow because of the comparison complexity. Is there is a way it can be sped up?

2 Answers 2

4
(hash.keys - array).each{|k| hash.delete(k)} 

Or, this might be even faster:

keys_to_be_removed = {} hash.each{|k, _| keys_to_be_removed[k] = true} array.each{|k| keys_to_be_removed[k] = false} keys_to_be_removed.each{|k, v| hash.delete(k) if v} 

The point is to avoid array operations and do everything in hash as much as possible.

Sign up to request clarification or add additional context in comments.

2 Comments

I added another one. Please try it too.
@sawa: I added your second version to my microbenchmark. It's certainly not an improvement :(
2

@sawa's answer is fine at getting good speed, but it does make you express your intent a little more awkwardly than Hash#select. Your initial approach would work just fine if the array was a Set with O(1) lookup instead of an array with O(N) lookup.

require 'set' set = array.to_set hash.select { |k,v| set.include?(k) } 

This microbenchmark demonstrates sets are fast and this approach is marginally faster than the key-subtraction-delete method @sawa recommended when the set is pre-built and just marginally slower if the set must be made on the fly.

 user system total real noop 0.600000 0.020000 0.620000 ( 0.614905) keys_minus_arr_delete 1.190000 0.020000 1.210000 ( 1.213376) select_set_include 1.050000 0.010000 1.060000 ( 1.084079) select_set_include_fly 1.350000 0.020000 1.370000 ( 1.361623) sawa2 1.860000 0.020000 1.880000 ( 1.870162) 

1 Comment

Really nice answer. I didn't know about the Set performance.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.