Performance of `Hash#has_key?` and `Array#index`

Question

I frequently check if a specific value is in a large array. I can do this by Array#index. To make this more efficient, I created a hash of the array values and called Hash#has_key?:

Method 1
```
arr = ["a","b","c","d"] arr.index("c") 
```

Method 2

h = {"a"=> true, "b"=> true, "c"=> true, "d"=> true} h.has_key?("c")

But I noticed ruby throws an exception if a key isn't in a given hash. I'm wondering what the relative performance of the two methods is.

Why not find out yourself what works best for your situation? ruby-doc.org//stdlib-1.9.3/libdoc/benchmark/rdoc/Benchmark.html — Brennan
– Brennan, Commented Feb 19, 2015 at 21:45
Not sure why you think ruby would throw an exception if a key isn't in a hash (unless you deliberately use h.fetch) — Frederick Cheung
– Frederick Cheung, Commented Feb 19, 2015 at 21:52
h["a"] would not throw an exception in your example code; it would simply return nil — gregates
– gregates, Commented Feb 19, 2015 at 21:53
Weird...for some reason when I first tried it, I got an exemption. Turns out I forgot the "?" in has_key and wasn't paying attention to the error very well. Sorry! — user1561753
– user1561753, Commented Feb 19, 2015 at 21:55
Your point is not clear. Is the problem that your second method throws an error? That part is not reproduced. If your problem is only about performance, then you should remove the first point. — sawa
– sawa, Commented Feb 19, 2015 at 22:17

mjgpy3 · Accepted Answer · 2015-02-19 22:08:53Z

To answer your question, "Method 2" should be faster. Now, that is a very loaded statement which partially depends on the very nature of hashes (e.g. collisions when inserting).

However, for your specific use case, I think arrays and hashes are both the "wrong tool for the job". In general, if you're using a hash to check unique set existence (hint hint), use a set.

One final thought, which may or may not be valuable depending on how contrived your example is. If you're storing some finite set of ordered values ('a'-'d' in your example) an array is definitely the way to go. Why? Because you can easily map the values of your alphabet to an array index (e.g. a maps to 0, b maps to 1 and so forth) by, in your case, converting the letters to ascii and subtracting to get their desired location. This would give you an O(1) lookup time.

Thinking about it, a set would definitely be the ideal data structure. All the answers pretty much stated as much, but you'll get the check for being first! Thanks!
"[C]onverting the letters to ascii and subtracting to get their desired location" is an ugly hack. It is not Ruby-ish.
@sawa What does that even mean? "Ugly hacks" are standard procedure when you're worried about performance.

gregates · Accepted Answer · 2015-02-19 22:02:28Z

Ruby has a construct in the standard library that gives you what you want: O(1) lookups using #include?.

Set class documentation

require 'set' arr = ["a","b","c","d"] set = Set.new(arr) set.include?("c")

Note however that this only works if you don't care about duplicate elements (but I am assuming that's the case based on your 2nd method, which also depends on that assumption).

Collectives™ on Stack Overflow

Performance of `Hash#has_key?` and `Array#index`

2 Answers 2

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Related