4

I'm working with some large datasets, and trying to improve performance. I need to determine whether an object is contained in an array. I was considering using either index or include?, so I benchmarked both.

require 'benchmark' a = (1..1_000_000).to_a num = 100_000 reps = 100 Benchmark.bmbm do |bm| bm.report('include?') do reps.times { a.include? num } end bm.report('index') do reps.times { a.index num } end end 

Surprisingly (to me), index was considerably faster.

 user system total real include? 0.330000 0.000000 0.330000 ( 0.334328) index 0.040000 0.000000 0.040000 ( 0.039812) 

Since index provides more information than include?, I would have expected it to be slightly slower if anything, although this was not the case. Why is it faster?

(I know that index comes directly from the array class, and include? is inherited from Enumerable. Might that explain it?)

4
  • Nice catch. I think you should file an issue to MRI. Commented Sep 11, 2014 at 6:02
  • Related: stackoverflow.com/questions/23729175/… Commented Sep 11, 2014 at 6:15
  • @undur_gongor Thanks. And done. Commented Sep 11, 2014 at 6:35
  • 2
    @Sparhawk: The issue has been fixed since. Commented Sep 22, 2014 at 8:57

3 Answers 3

5

Looking at the Ruby MRI source, it seems that index uses the optimized rb_equal_opt while include? uses rb_equal. This can be seen in rb_ary_includes and rb_ary_index. Here is the commit that made the change. Its not immediately clear to me why its used in index and not include?

You might also find it interesting to read the discussion of this feature

Sign up to request clarification or add additional context in comments.

Comments

1

I ran the same test for benchmarking. Seems like include? is faster than index, though it is not very consistent. Here are my results for two different scenarios.

  1. ruby code is same as yours
 user system total real index 0.065803 0.000652 0.066455 ( 0.067181) include? 0.065551 0.000590 0.066141 ( 0.066894) 
  1. Another benchmark
 user system total real index 0.000034 0.000005 0.000039 ( 0.000037) include? 0.000017 0.000001 0.000018 ( 0.000017) 

code:

require 'benchmark' # parse ranks and return number of reports to using index def solution_using_index(ranks) return 0 if ranks.nil? || ranks.empty? || ranks.length <= 1 return ((ranks[0] - ranks[1] == 1) || (ranks[1] - ranks[0] == 1) ? 1 : 0) if ranks.length == 2 return 0 if ranks.max > 1000000000 || ranks.min < 0 grouped_ranks = ranks.group_by(&:itself) report_to, rank_keys= 0, grouped_ranks.keys rank_keys.each {|rank| report_to += grouped_ranks[rank].length if rank_keys.index(rank+1) } report_to end # parse ranks and return number of reports to using include def solution_using_include(ranks) return 0 if ranks.nil? || ranks.empty? || ranks.length <= 1 return ((ranks[0] - ranks[1] == 1) || (ranks[1] - ranks[0] == 1) ? 1 : 0) if ranks.length == 2 return 0 if ranks.max > 1000000000 || ranks.min < 0 grouped_ranks = ranks.group_by(&:itself) report_to, rank_keys= 0, grouped_ranks.keys rank_keys.each {|rank| report_to += grouped_ranks[rank].length if rank_keys.include?(rank+1) } report_to end test_data = [[3, 4, 3, 0, 2, 2, 3, 0, 0], [4, 4, 3, 3, 1, 0], [4, 2, 0] ] Benchmark.bmbm do |bm| bm.report('index') do test_data.each do |ranks| reports_to = solution_using_index(ranks) end end bm.report('include?') do test_data.each do |ranks| reports_to = solution_using_include(ranks) end end end 

2 Comments

Your results seem internally consistent to me? include? is always faster. However, did you read the comments? The bug has been fixed, hence your results.
yes, it was misleading for me, so wanted to justify it. I tested it with 2.6.0, 2.6.0preview1, 2.6.0preview2, 2.6.0preview3 and its consistent.
0

If performance is your goal, you should be using Array#bsearch which traverses the array using binary search.

https://ruby-doc.org/core-2.7.0/Array.html#method-i-bsearch

a.bsearch {|a| num <=> a } 

It smokes bothindex and include

Rehearsal -------------------------------------------- include? 0.108172 0.000805 0.108977 ( 0.112928) index 0.122730 0.000502 0.123232 ( 0.126323) bsearch 0.000254 0.000027 0.000281 ( 0.000354) ----------------------------------- total: 0.232490sec user system total real include? 0.106727 0.000036 0.106763 ( 0.108495) index 0.107732 0.000330 0.108062 ( 0.110272) bsearch 0.000201 0.000008 0.000209 ( 0.000206) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.